Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stale package indexes served by pypi.org #8568

Closed
cboylan opened this issue Sep 15, 2020 · 7 comments
Closed

Stale package indexes served by pypi.org #8568

cboylan opened this issue Sep 15, 2020 · 7 comments

Comments

@cboylan
Copy link

cboylan commented Sep 15, 2020

Describe the bug

We've been noticing that pip installs semi regularly fail to find the latest version of packages.

OpenStack projects tend to lock package versions with exact constraints from our global requirements repository, so missing packages cause the CI jobs to fail. We suspect others may be seeing similar stale-index issues, but that with more common <= type constraints they probably fall-back to older indexes without a hard break.

Digging into it further it appears that the cause of this is that fastly and/or pypi itself are serving out of date package indexes that don't include the latest releases. Our local caches then cache these responses which is bad for our CI jobs but has been helpful in debugging. What we see in the cached responses is that the fastly LCY cache is party to these requests. The responses we get don't include serials in the headers or the index.html response. The index.html repsonses also appear to be sorted by file type then version instead of version then file type. They also lack the python interpreter version requirements metadata that we've come to expect. Finally we see the Last-Modified Headers are very old (some as old as April when the latest version is from August).

We have attached headers and data content for:

Expected behavior

pip should be able to install the latest version of these packages. It needs to get up to date indexes for that to happen.

To Reproduce

Reproducing this is difficult. Basically its pip install taskflow===4.5.0 before a new taskflow release happens and do that many many times. Thankfully we've got our own local data which I've attached to hopefully narrow down the cause.

My Platform

The attached header files include user agent info. In general its pip 9.0.3, pip 20.2.2, and pip 20.2.3 across a variety of linux platforms.

Additional context

This is a small sampling. We have seen this occur with a number of packages around the world in the last day or so.

@dstufft
Copy link
Member

dstufft commented Sep 15, 2020

Our internal PyPI mirror had ran out of disk space, and had gotten stuck back on 8/2/2020, and when Fastly would fall back to that mirror, it would effectively serve a very stale copy of the simple index.

I've disabled our internal mirror in Fastly, and I'm currently refreshing the mirror. Once that's done I'll re-enable the mirror backend.

@dstufft
Copy link
Member

dstufft commented Sep 25, 2020

This should be fixed now.

@dstufft dstufft closed this as completed Sep 25, 2020
@artificial-intelligence

it seems this is a recurring error as we are currently having problems reaching pypi.org mirrors, see e.g. here:

https://zuul.opendev.org/t/openstack/build/2433c9e910a648098e0a32e1b3c41c39/log/kolla/build/000_FAILED_kolla-toolbox.log#847

as far as I understand there is an underlying issue, that the pypi.org mirror cdn is grabbing packages from a backend "a" and if backend "a" is not available it grabs the packages from backend "b" which tends to be out of sync.

could someone confirm this to be true?

If this is true, could this maybe get fixed?

If I can assist in fixing this, please tell me how.

kind regards.

@di
Copy link
Member

di commented Jun 28, 2023

In your example, it looks like you're using the http://mirror-int.ord.rax.opendev.org mirror/repository and it's missing the pbr===5.11.1 dependency. This is likely not a problem with PyPI: you can verify that PyPI includes this version in the Simple API:

$ curl -s https://pypi.org/simple/pbr/ | grep 5.11.1
    <a href="https://files.pythonhosted.org/packages/01/06/4ab11bf70db5a60689fc521b636849c8593eb67a2c6bdf73a16c72d16a12/pbr-5.11.1-py2.py3-none-any.whl#sha256=567f09558bae2b3ab53cb3c1e2e33e726ff3338e7bae3db5dc954b3a44eef12b" data-requires-python="&gt;=2.6" >pbr-5.11.1-py2.py3-none-any.whl</a><br />
    <a href="https://files.pythonhosted.org/packages/02/d8/acee75603f31e27c51134a858e0dea28d321770c5eedb9d1d673eb7d3817/pbr-5.11.1.tar.gz#sha256=aefc51675b0b533d56bb5fd1c8c6c0522fe31896679882e1c4c63d5e4a0fccb3" data-requires-python="&gt;=2.6" >pbr-5.11.1.tar.gz</a><br />

@artificial-intelligence

this is from our cache, as far as I understand it (@cboylan knows more about our infrastructure than me), and that cache is only serving this stale version, because that's what it got from upstream pypi.org at the time.

that's what this issue is about, afaik, see also the first comment in this issue.

@cboylan
Copy link
Author

cboylan commented Jun 28, 2023

Yes, that mirror isn't a true mirror. Instead it is a caching proxy for pypi.org. In this case pip is saying that pbr==5.11.1 isn't satisfiable and I'm fairly certain that is due to the fallback behavior that was addressed here a few years ago (the behavior is basically identical we get stale index back and then constraints can't be fulfilled). Back then we ran around in circles with pypi being adamant the problem couldn't be on their end as well. Would it be possible to at least do a quick check of the secondary backend (assuming that is still a thing) to make sure there isn't an obvious fault so that we can avoid running around in circles as we did before?

This is likely not a problem with PyPI: you can verify that PyPI includes this version in the Simple API:

Yes, this is why debugging this before was so painful. It works the vast majority of the time because the CDN frontend isn't falling back to the secondary backend except in very rare cases. Reproducing this is almost impossible as an end user because we cannot induce failures between the CDN and the primary backend for pypi.

Side note: If this issue is reoccurring I believe that it may represent a security concern because most people installing packages from pypi with pip are not using constraints or lockfiles. This means that pip will happily install older versions of software that otherwise meet the resolver requirements. This could potentially install old insecure versions of software without users being aware.

@dstufft
Copy link
Member

dstufft commented Jun 28, 2023

pypi/infra#143

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants