Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide metrics for top N package storage "hogs" #4288

Closed
dstufft opened this issue Jul 12, 2018 · 14 comments
Closed

Provide metrics for top N package storage "hogs" #4288

dstufft opened this issue Jul 12, 2018 · 14 comments

Comments

@dstufft
Copy link
Member

dstufft commented Jul 12, 2018

What's the problem this feature will solve?

Miroring PyPI currently takes > 2TB of storage, and that is continuing to grow, some mirroring tools have the ability to blacklist projects from being mirrored, but it's difficult to know which projects should be targeted for blacklisting without insight into which packages take up the most space.

Additionally, as operators it can be useful to see if particular packages are consuming more or less of the total space used by PyPI.

Describe the solution you'd like

Add metrics that indicate the top N packages by total space used.

@cooperlees
Copy link
Contributor

Maybe we should go the generic https://pypi.org/stats/ and add more over time starting with this one.

Lets start with:

@dstufft
Copy link
Member Author

dstufft commented Jul 12, 2018

👍

@wayneworkman
Copy link

I would be highly appreciative of a blacklist that has the biggest 100 projects. This would probably save a ton of space.

@cooperlees
Copy link
Contributor

This is live - Just tweaking some cache config:
https://pypi.org/stats/

@brainwane
Copy link
Contributor

Is there anything left to do for this issue or shall we announce it on distutils-sig and close the issue?

@di
Copy link
Member

di commented Aug 10, 2018

@brainwane I think this is done!

@cooperlees
Copy link
Contributor

Yeah the initial stuff is all done here. I may add more stats one day.

@ewdurbin
Copy link
Member

https://pypi.org/stats/

@ewdurbin
Copy link
Member

hmmm I guess there is an open question. do we want to commit to keeping this around by documenting it? both where to find it and the alternate JSON representation available when sending Accept: application/json?

@ewdurbin ewdurbin reopened this Aug 10, 2018
@ewdurbin
Copy link
Member

The question basically boils down to if we want this to be an interim/internal solution for bandersnatch users... or "own it" until we create a better replacement endpoint.

@cooperlees
Copy link
Contributor

I'm happy to document it.

  • Where should it go?
  • Which API documentation should I model this off?

What does "own it" mean? I don't have preference where the API endpoint is. This was @dstufft's suggestion as to where to put it. What are the alternatives you're thinking?

@ewdurbin
Copy link
Member

The endpoint is useful primarily for bandersnatch users and other mirror clients. If we document it and "publicize it" we'll want to ensure that it continues working until we begin and complete the process of deprecating it. Additionally changes to this endpoint will have to remain backward compatible.

@ewdurbin
Copy link
Member

@cooperlees adding a page similar to https://github.com/pypa/warehouse/blob/master/docs/api-reference/json.rst is probably good!

@di
Copy link
Member

di commented Nov 27, 2018

Resolved by #5072.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants