Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API endpoint usage #573

Closed
chris48s opened this issue Aug 20, 2023 · 11 comments
Closed

API endpoint usage #573

chris48s opened this issue Aug 20, 2023 · 11 comments

Comments

@chris48s
Copy link

Hello.
There's a thread over on the shields.io repo about adding a PyPI Total Downloads badge using pepy as the source

badges/shields#4319

Before taking that conversation further, I wanted to open an issue to discuss because:

  • Shields.io handles a lot of traffic
  • The python community is large and our existing PyPI badges are some of the most popular on the service
  • There is the potential for this to become a big source of traffic
  • We're aware of the issue at [Document] pepy api endpoint's #477 indicating pepy may not be happy to receive a lot of traffic on this endpoint

Are you able to give us an initial indication of whether you'd be happy with us adding this?
Cheers

@psincraian
Copy link
Owner

Hey @chris48s,

I'm open to collaborating on the integration and am here to help ☺️

For context, we currently handle 5k to 7k requests per hour. I noticed from your issue that you're redirecting 8k requests hourly to pypistats, which is over double our current volume.

Before proceeding, I need to assess our server's capacity. Though it's feasible to expand pepy's capacity, I'd prefer not to due to potential cost increases. I'll assess this once I'm back from vacation.

Could you please clarify a few things:

  1. Do you experience any peak traffic times we should be aware of?
  2. Is it possible for me to introduce an API key for shields.io?
  3. Are you primarily interested in summary stats (total, monthly, weekly)? If so, I could set up a dedicated endpoint to reduce the database load.
  4. Keep in mind, Pepy is provided as a best effort service. Would any downtime be a significant issue for you?

Thanks for your cooperation ☺️

@chris48s
Copy link
Author

Hi. Just acknowledging I've seen your post but I haven't had a chance to reply yet. I'm aiming to reply with answers in the next couple of days. Cheers

@chris48s
Copy link
Author

For context, we currently handle 5k to 7k requests per hour. I noticed from your issue that you're redirecting 8k requests hourly to pypistats, which is over double our current volume.
Before proceeding, I need to assess our server's capacity. Though it's feasible to expand pepy's capacity, I'd prefer not to due to potential cost increases. I'll assess this once I'm back from vacation.

I wouldn't expect us to immediately send that kind of traffic your way. We've reached that level of usage with pypistats gradually over many years of carrying the day/week/monthly badges. I wouldn't expect to add a total downloads badge and immediately have that level of users. On day one, the traffic will be close to zero. PyPI badges are some of the most popular services on shields.io though.

In terms of keeping usage down, the main thing we can do is cache the badges downstream at the CDN. This means that badges embedded in the README of a popular project are only requested periodically. They mostly get served from cache. Our default max-age for a downloads badge is 20 mins. Given you are only updating the data once per day, I'd suggest we should set a much longer max-age for pepy. That should keep the traffic lower. Side note: They've never complained about it, but thinking this through and writing this up has made me realise pypistats are also only updating daily and we haven't customised the default 😬 , so I am going to submit a PR which will also reduce the amount of traffic we're sending their way.

Do you experience any peak traffic times we should be aware of?

Our demand curve is pretty predictable. We serve most traffic during working hours for Europe and North America and least when it is daylight over the Pacific Ocean. We also see a dip on the weekends. We scale our own infra based on scheduled events rather than in response to traffic.

Is it possible for me to introduce an API key for shields.io?

Short answer: Yes.
Slightly longer answer:

  • What would be the purpose of the key? Would it be to manage a rate limit, or just to identify traffic from shields? If it is just for identifying us, we do make ourselves known by sending a user agent header.
  • We're happy to use an API key in production, but it makes things easier for us if we can also call the API without authentication from integration tests in CI.

Are you primarily interested in summary stats (total, monthly, weekly)? If so, I could set up a dedicated endpoint to reduce the database load.

I think the only number we would want from pepy is total_downloads. If you wanted to set up a more efficient endpoint that only returns that, that would be cool.

Keep in mind, Pepy is provided as a best effort service. Would any downtime be a significant issue for you?

In general we try to avoid adding badges for services which we know to be unreliable. It provides a poor experience for users and generates support requests for us. That said, there isn't like a minimum uptime threshold or anything. If you're regularly experiencing a lot of downtime, I'd be hesitant to add this. If you just do your best but don't provide an SLA, that's fine. Shields is also a volunteer run service.

@hugovk
Copy link
Contributor

hugovk commented Aug 29, 2023

One important thing to note -- is it still the case that PePy includes downloads from all sources? That is, from PyPI and from all mirrors (such as bandersnatch, z3c.pypimirror, Artifactory, and devpi)?

For example, see #164 where people have noticed the PePy numbers are much inflated compared with pypistats, for which most endpoints are without mirrors (and one endpoint includes both with and without). See their FAQ.

  • If mirrors are included by PePy, can an endpoint be provided for Shields.io that only gives PyPI numbers?

  • If not, can Shields.io be careful not to misleadingly label the badge as a PyPI one, and name it some other way?

PS Thank you both for all your work on PePy and Shields.io, they're both excellent tools! 👏

@chris48s
Copy link
Author

This point about including/excluding mirrors is noted in badges/shields#4319 (comment)

@chris48s
Copy link
Author

I've added an additional note on it to badges/shields#4319 (comment)

@psincraian
Copy link
Owner

Let me try answer from the phone:

  1. Ok, so then this will apply to only new badges. I think it will be much easier for me to predict the traffic and see if the service is struggling.

  2. Perfect 👍 similar to what I observed then.

  3. Mainly, my idea is to do rate limiting. I would rather not have unknown traffic overloading the service. I can put some higher limit for shields, like 10x of the current traffic.

Likewise, I can still have the endpoint public but with a lot lower threshold, like 1 request per second. Will this make it easier for your CI?

  1. Perfect. Given on what you said, I think you can rely on the current endpoint and if I add a new one I can raise a pull request on your project ☺️

  2. Understood 👍 I think we are aligned on terms of SLA. Our SLA for the last year has been >99%,

@psincraian
Copy link
Owner

@hugovk I know that people is interested into downloads by installer, but with the survey that I did in June it's not the top priority for the persons who answered it.

I will focus on what the most people is interested in, more historical data, and then implement this probably ☺️

@chris48s
Copy link
Author

Hello.

We implemented these badges a few months back.

At the moment our usage is still very low.

We recently started getting 401 Unauthorized responses {"message":"Invalid API Key"} (reported in badges/shields#9730 ) so I guess you've implemented API keys. How could we get one?

@psincraian
Copy link
Owner

Hey @chris48s

Sorry, I thought this wasn't done yet. You only need to

Let me know if you have any questions or problems ☺️

If not I will close this issue

@chris48s
Copy link
Author

Thanks. Sorry. I should have closed this after we merged badges/shields#9564

I've created an account, made some keys, and checked they work. I won't have a chance to make the code updates until the weekend but it should be straightforward.

I'll close this now. Cheers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants