Skip to content

Introduction to mirror intel

Alex Chi edited this page Mar 2, 2021 · 1 revision

mirror-intel is a middleware for SJTUG mirror. It is a smart cache leveraging s3 backend.

Generally, our infra configures the upstream for a given repo in mirror-intel/Rocket.toml. mirror-intel handles the request as follows.

  • mirror-intel will first query if object exists in s3 backend
  • if yes, it will redirect user to s3 object storage
  • if not, it will redirect user to original site, and submit a background task for download
  • the task will download file from original site and upload it to s3 backend

Therefore, mirror-intel basically serves as a cache for our user.

As we are adding more repos, mirror-intel now supports more functionalities.

Rewrite

Some repos require us to rewrite URLs inside the request. An example is PyPI-index.

Proxy

Some package manager doesn't support 301/302 redirect, and we have to proxy all requests instead of issuing 301/302 redirection.

mirror-clone

mirror-clone has a default backend mirror-intel. By adding a repo in mirror-clone, it can collaborate with mirror-intel to make a full clone of a repo. mirror-clone will first generate a file snapshot of a remote repo. For example, when cloning the crates.io repo, mirror-clone will download latest crates.io-index and generate a file list of all crates. Then, mirror-clone will issue HEAD request to mirror-intel, which triggers mirror-intel to download new crates.

Issues

  • May conflict with caddy gzip config. So we have to disable gzip in all mirror-intel repos.
  • Connection is not reusable. This is obvious in repos like fedora-ostree. Generally, ostree or libcurl will establish 8 connections to a server to download objects. But if we issue redirections, HTTP connection won't be kept alive, and our server may be flooded with TCP connection establish.

Adding new repo in mirror-intel

mirror-intel repos are configured in a declarative way in the Rust programming language. If the repo can be served as-is, you may just adding a single line in src/repos.rs.

// identifier, name, path allowed to cache, path needs proxy
simple_intel! { crates_io, "crates.io", allow_all, disallow_all }

You should disable cache for files that may change. For example, package index. If an object is ever cached by mirror-intel, it will not be changed any more.

Then, you should add a configuration entry in common.rs and add the endpoint to main.rs. Release a new version of mirror-intel by sending PR and tagging. Finally we could use the repo in SJTUG mirrors.

If you need to do rewrite or more complex operations, refer to other repos in mirror-intel for reference.