Skip to content

Script to repartition index files #5

@zooba

Description

@zooba

Index files (hosted on python.org) can be split up to minimise the initial download when clients are accessing it. This is a simple chain - the next element contains a URL (relative to the location of the containing file, so typically this will just be a filename), and if no matching version is available in the first one, the next one will be loaded.

Each index file is sorted by the client (by descending 'sort-version') before checking for matches (such that py install 3 will prefer 3.13 over 3.12 regardless of which appears first), but does not sort across index files (such that if 3.13 appears in the next index only, 3.12 would be selected).

So the ideal is to have the current versions of each likely specifier in the first file, with those less likely or specific enough to exclude every option from earlier files in later ones. Certainly py install 3 and py install 3.x (for non-EOL x) should always find the correct match in the first index.

Rather than making a complex database for handling this1, we chain static files. But periodically we should merge and re-split these files for optimal performance. We should have a script in this repo to do that, ideally:

  • takes a URL/path as input
  • downloads the full chain of indexes and combines them all
  • re-sorts all available installs by sort-version and then some other sensible key2
  • divides all installs into three new files based on their sort-version

A reasonable division would be:

  • all latest x.y.z versions
  • all non-latest x.y versions since a reasonable point (currently 3.10)
  • all the rest ("legacy")

(Worth noting that this isn't the breakdown we have right now - we only have the "since 3.10" and "all the rest" indexes. Provided the URL of the first index doesn't change, we can always insert more later.)

Footnotes

  1. In other words, rather than standing up expensive infrastructure...

  2. Probably 'company' and 'tag', or 'id'.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions