Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mirror-clone roadmap #14

Open
skyzh opened this issue Nov 3, 2020 · 0 comments
Open

mirror-clone roadmap #14

skyzh opened this issue Nov 3, 2020 · 0 comments

Comments

@skyzh
Copy link
Member

skyzh commented Nov 3, 2020

The ultimate goal of mirror-clone is provide an easy-to-use abstraction layer for developers who want to clone a software repo to their own local registry.

Developers will need to implement two interface, SourceFS and TargetFS, in order to clone a registry.

SourceFS

SourceFS generally refers to the source software registry. For example, crates.io, opam, conda, etc. It provides the following functionalities:

  • snapshot provides a file list of current software registry.
    • For OPAM, taking a snapshot involves download repo and index.tar.gz, and parse the information.
    • For conda, this involves download repodata.json and generate file list.
    • For crates.io, this involves scanning the crates.io-index repo and generate file list.
  • entry provides the way to download a file from source filesystem.
    • For most of the mirroring tasks, this is to find corresponding URL and checksum to a file.
    • Also, index file should be included. For example, index.tar.gz.

TargetFS

TargetFS generally refers to a local filesystem. It could also be an object storage, or a key-value database.

TargetFS should be able to:

  • list files
  • read file
  • write file
  • get metadata of a file

Mirror-Clone

mirror-clone provides utilities for mirroring a repo.

tmpfs

tmpfs stores file temporarily. When taking a snapshot, source filesystem may download some index file. They could be saved to tmpfs, and be served directly when entry is being called.

downloader

downloader helps download a file from a given URL.

transferrer

Transferrer transfers a file from source filesystem to target filesystem. It will automatically retry failed requests.

comparator

Given an entry on source filesystem and target filesystem, a comparator decides whether a file requires re-transferring.

buffer layer

Buffer layer stands between transferrer and target filesystem.

Transaction Buffer provides a transaction-commit interface. It's normal that a file could not be downloaded successfully because of network issues. Buffer layer commits a file to target filesystem only when a file is successfully downloaded (or wait until all files have been downloaded)

Fuse Buffer ensures that a file is never downloaded twice by fusing it. It will also record file metadata in a single cache file to speed up listing all files in target filesystem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant