Closed
Description
Thanks for this project!
Some ideas (note that I merely have read blog post and didn't dig futher):
- This may be good idea to fully replicate git's CLI. At least as an option. This will help spreading the project
- Migrate away from SHA1. It is broken. It is one very unfortunate git's design mistake. Also, you should change hashes regularly anyway: https://valerieaurora.org/hash.html . (Well, actual migrating from SHA1 will likely break github compatibility, so, of course, it makes sense to support SHA1 for now. But please support other hashes, too. Don't repeat git's mistake: git simply hardcoded SHA1 everywhere originally.)
- In the past I spent a lot of time researching CDC-and-deduplication. My findings are here: casync decompresses x1.5 faster than borg on same config (and other benchmarks) borgbackup/borg#7674 . Short overview of FOSS solutions is here: https://lobste.rs/s/0itosu/look_at_rapidcdc_quickcdc#c_ygqxsl . In short, existing solutions are under-optimized, and there is a lot of low handling fruit here. I was able very easily create very small program in Rust, which beats existing deduplication solutions by wide margin (but my program doesn't use CDC). So I suggest reading my ideas and comparing speed of your solution with other solutions
- Patch-based merging seems to be killer feature (assuming it works well). So, I suggest making it main ad strategy. Linux devs often maintain their patchsets as series of patch files, not as git branches, exactly because git merging doesn't work well. So, reach Linux devs and tell them about your tool. In particular, person number 2 in Linux, Greg KH, maintainer of stable Linux trees, stores his stable trees as series of patch files in git (aaaah!). Here he describes his workflow: http://www.kroah.com/log/blog/2019/08/14/patch-workflow-with-mutt-2019/ . Key parts are these: "The stable kernel tree, while under development, is kept as a series of patches that need to be applied to the previous release. This series of patches is maintained by using a tool called (quilt)... Anyway, the stable patches are kept in a quilt series in a repository that is kept under version control in git (complex, yeah, sorry.) That queue can always be found (here)". Same applies to a lot of Debian packages. For example, gcc (and lots of other Debian packages) is, again, maintained as patches-stored-in-git. See here https://salsa.debian.org/toolchain-team/gcc/-/tree/gcc-14-debian/debian/patches . I think this is, again, because of
git mergeandgit rebaseproblems. So, spread your xit as tool to solve all these problems. Of course, it helps if you are CLI-compatible with git - "If the first byte is 0, it is uncompressed; if it is 1, it is zlib-compressed". I suggest moving to zstd, it is better in every way (faster and smaller). Also, zstd may be good in compressing binary files (at least I hope zstd doesn't do them sufficiently larger). "While xit has compression support, it currently disables it even for text files". Try
zstd -0, it is fast enough, while giving substantial compression for text files. If it is too slow, try lz4, it is even faster - "Want to find the descendent(s) of a commit? Uhhh...well, you can't". As pointed out on lobsters, you can see descendants: https://lobste.rs/s/mltpfg/xit_is_coming#c_cnwsps . (But I understand your point, i. e. you argue that we need separate data structure for this)
Feel free to ask any questions.
Also: even if you implement all these, I still do not plan to use xit. (I'm not trying to insult you, I just am trying to be honest here about my motivations.)
Also, there is discussion of your project here https://lobste.rs/s/mltpfg/xit_is_coming . If you want, I can give you invite
Metadata
Metadata
Assignees
Labels
No labels
Activity