Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idea: hardlink support #2

Open
PhotonQuantum opened this issue Feb 15, 2023 · 0 comments
Open

Idea: hardlink support #2

PhotonQuantum opened this issue Feb 15, 2023 · 0 comments

Comments

@PhotonQuantum
Copy link
Member

Currently, all hard links are resolved as regular files. Sometimes, excessive bytes are read from the remote server. This can be problematic for repositories making heavy use of hard links, e.g., fedora.

Rsync has hard link support and can accurately transfer hard links between servers. Dev and inode ids are transmitted through the wire on the file list transfer stage. The client may recognize duplicated dev and ino pairs and initiate file content transfer only for the first instance.

One naive approach is to manually specify a source by some heuristics and treat the rest as symlinks. However, there are challenges with this implementation.

Hard links are non-directional. So it's better to see them as a cluster rather than a link. One approach is to manually specify a source by some heuristics and treat the rest as symlinks. However, if the "virtual" source is removed later, a new source must be chosen, and all other files initially sharing the same inode should be rewritten to point to the new source. Furthermore, detecting them without changing the metadata format (to book-keeping hard links) is expensive because we must reverse-track all entries pointed to this source. Therefore it's not a good choice to reuse existing symlink handling.

Another possible implementation is to use hard link info only as an optimization. When the generator requests a file, first check if there's another file with the same dev & ino already asked. If yes, do not request this file and reuse the hash (remember, we use the content hash to address files). The only extra cost we need to pay (other than receiving and storing dev & ino fields in FileEntrys) is a hash table from (dev, ino) to file idx.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant