Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Discussion] Prune old packages from the repos #2174

Closed
lazka opened this issue Oct 4, 2020 · 15 comments
Closed

[Discussion] Prune old packages from the repos #2174

lazka opened this issue Oct 4, 2020 · 15 comments

Comments

@lazka
Copy link
Member

lazka commented Oct 4, 2020

The repo size is growing and many of the old packages just take up space and are likely never used. We could remove all packages and source packages that are older than say 2 or 3 years and are not actively in the pacman repo.

Technically this could be a Python script that parses the repos, generates a list of files that are actively used, gets all files that have a too old mtime, and suggests removing those with a too old mtime but not in the active list.

@elieux Does this sound useful to you? Any other ideas how to achieve something similar?

@elieux
Copy link
Member

elieux commented Oct 4, 2020

Yes, useful. I can probably make a shell script for that.

I like the idea of having complete history for reproducibility's sake, but we definitely shouldn't put that weight on all the mirrors.

@lazka
Copy link
Member Author

lazka commented Oct 4, 2020

(I'd happily look into it, but only with Python.. the stdlib should be enough)

@lazka
Copy link
Member Author

lazka commented Oct 5, 2020

Some stats for the msys (not mingw) repos: Current size 46.2 GB

Savings in % when removing everything not used and older than:

  • 0 years: 91.70 %
  • 1 year: 74.50 %
  • 2 years: 50.82 %
  • 3 years: 31.82 %
  • 4 years: 20.62 %
  • 5 years: 5.04 %
  • 6 years: 0.05 %

@lazka
Copy link
Member Author

lazka commented Oct 7, 2020

mingw repo stats are similar (size 475.6 GB):

0 93.84
1 73.92
2 49.12
3 29.18
4 18.20
5 6.25
6 0.00

So for two years that would be 265GB left.

@lazka
Copy link
Member Author

lazka commented Oct 7, 2020

somewhat related, I've looked at the download stats on sourceforge since everything goes there now and we have about 750GB traffic per day. Note that this is only one day and due to pacman timeouts less downloads might happen then normally.

@mati865
Copy link
Collaborator

mati865 commented Oct 7, 2020

somewhat related, I've looked at the download stats on sourceforge since everything goes there now and we have about 750GB traffic per day. Note that this is only one day and due to pacman timeouts less downloads might happen then normally.

Probably they come mostly from CI, maybe after the cleaning we should ask CI vendors to cache/mirror the repo like the do for Linux packages?

@elieux
Copy link
Member

elieux commented Oct 7, 2020

So due to server complications, the repos have been reduced to around 130 gigs. I plan to sync that with SF.net soon.

@sjohannes
Copy link

Perhaps it's worth checking with the Arch Linux developers to see if they have insights on this, and if they have tools that can be used. IIRC they're very quick in taking down outdated packages from the primary mirror, but they also have an archive repo for those old packages.

@Biswa96
Copy link
Member

Biswa96 commented Oct 8, 2020

Size ∝ 💰

@cbrt64
Copy link
Contributor

cbrt64 commented Oct 8, 2020

Sorry I came here a little late.

I like the idea of having complete history for reproducibility's sake, [...]

Dumb questions from a hobbyist (who used to build stuff on XP, even after Cygwin/MSYS2 deprecated it):

  1. Will all the older packages be archived anywhere?
  2. Would it make sense to keep at least the packages pointed to by the old msys2 installers, either in repo or archived?
  3. Or just delete/archive the old installers as well?

@lazka
Copy link
Member Author

lazka commented Oct 8, 2020

Perhaps it's worth checking with the Arch Linux developers to see if they have insights on this, and if they have tools that can be used

I've asked on IRC and got these suggestions:

they don't really allow keeping old packages and don't deal with source packages

@elieux
Copy link
Member

elieux commented Oct 10, 2020

Will all the older packages be archived anywhere?

Not officially. I assume some mirrors are gonna keep the older packages, but that's their discretion.

Would it make sense to keep at least the packages pointed to by the old msys2 installers, either in repo or archived?

I don't think so. The package databases included in the installers(*) aren't particularly vetted, tested or anything like that. It makes sense to keep a frozen (or nearly-frozen) set of packages that still keep some compatibility promise we broke since (such as running on XP or even earlier Windows versions), but it'd take extra work and resources which I don't think we can spare.

*) Honestly, I don't even see a reason to ship the databases with the installers. People are supposed to sync them right away anyway.

Or just delete/archive the old installers as well?

That would help make the rolling-ness of our releases more obvious, but as opposed to packages, I know old installers are used regularly, e.g. in vcpkg.

@flaviojs
Copy link

In case you didn't know... github repositories have size limits and therefore should avoid binary files, but github releases only limit the individual file size so they are a very good solution for binary packages (probably intended):

We don't limit the total size of the binary files in the release or the bandwidth used to deliver them. However, each individual file must be smaller than 2 GB.

I propose creating releases in a new repository and putting binary packages there.
(or several, maybe MSYS2-repo and MINGW-repo?)
The repository could contain whatever pacman needs to know (map of package names to urls?) for end users.


Now for a more personal request... please create an archive repo and put your old packages there (in releases), including whatever you decide to prune from now on. There is no cost and the process can be automated (after you iron out the details).

Not supporting XP didn't affect me much since 32-bit stuff was still there.
Qt4 is no longer there and that is already affecting me (can't try out stuff that needs it).
Now 32-bit stuff and python 2 are "walking out the door" and that will affect me big time.
It's totally fine that you no longer support what I need, but if the old packages are gone then I can't scrape up an old "msys2 version" together.

Sometimes I look at old stuff and consider contributing to the project if it is still alive or adapting the project for something else.
That means dealing with code that only works in XP or needs qt4, 32-bit, python2, or whatever else you stop supporting.
Without trying out the original version I can't judge if the effort is worth my time, so I need a way to build/run the old code (usually in msys2/mingw32).

You have https://github.com/msys2/MINGW-packages-dev which can be used as a base for recipes of unsupported packages, so someone could adapt Portage or Homebrew or similar (does Arch linux have something?) and allow users to build what they need (which you would not support, only provide the infrastructure for the recipes). After that the next logical step is to allow users to contribute their own recipes.
With this in place, you only need to support packages that the base system needs and the rest is optional.

Personally, I only need the old packages available and a way to know which "old version" of msys2 they belonged to.

@1480c1
Copy link
Contributor

1480c1 commented Oct 20, 2020

You have https://github.com/msys2/MINGW-packages-dev which can be used as a base for recipes of unsupported packages, so someone could adapt Portage or Homebrew or similar (does Arch linux have something?) and allow users to build what they need (which you would not support, only provide the infrastructure for the recipes). After that the next logical step is to allow users to contribute their own recipes.
With this in place, you only need to support packages that the base system needs and the rest is optional.

The closest I can think of is AUR, which some people use tools like yaourt and yay to help with using AUR

@lazka
Copy link
Member Author

lazka commented Nov 13, 2022

@lazka lazka closed this as completed Nov 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants