Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible bug with cache TTL expiration resulting in redownloading files #1084

Closed
jriddy opened this issue Oct 20, 2020 · 13 comments
Closed

Possible bug with cache TTL expiration resulting in redownloading files #1084

jriddy opened this issue Oct 20, 2020 · 13 comments
Assignees

Comments

@jriddy
Copy link

jriddy commented Oct 20, 2020

Using pex version 2.1.19

The current default cache-ttl of 1 hr (3600 s) is far too short for pants use cases where you end up changing dependencies a lot over a short period of time. This results in frequent unnecessary re-downloading and rebuilding of library wheels/sdists when you're changing dependencies frequently (something that can happen very easily and almost inadvertently with pants' dependency inference). I spent a day or two trying to figure out why it was taking 5 min to build constraints files all the time

This issue is compounded by pants not having a way to control this flag (gonna make an issue over there for that), nor is there any documentation on how to set this value to never expire that I can find.

That said, I don't fully understand why TTL caching is even at play here. When you download python packages, you have hashes in the filenames that you can check against what you have in your cache. If the hashes match...there doesn't need to be a download regardless of TTL or cache expiry issues. Perhaps there are other issues at play when it comes to determining which file to download, but once that determination is made, it seems like we could safely let checksums rule the day.

I can provide logs of this and the trouble that it causes if needed.

@jsirois
Copy link
Member

jsirois commented Oct 20, 2020

It probably makes sense to decouple Pants from the Pex defaults here with Pants work to plumb these options.

Moving to your hashes / why is TTL in play: unless a requirement is exact, once a TTL expires, the ranged requirement sort of has to be looked up again. I'd be surprised if the new PIp resolver does anything different here compared to the old Pip resolver (which Pex currently uses).

@jriddy
Copy link
Author

jriddy commented Oct 20, 2020

Thanks for the quick feedback

why is TTL in play

I can see why it is in play for to check for new requirements, but if that check results in a found link to a file we've already downloaded, why does it have to download it again? But come to think of it, I think pip shares this behavior, with perhaps a more reasonable default.

the ranged requirement sort of has to be looked up again
I'm using pinned requirements in a constraints file. I don't really want a resolution, I want the dependencies i've specified and already resolved previously (although I know pex and internally pip still have to locate files with matching tags and find wheels/sdists etc).

I think what we're really missing here for these use cases is a lock file.

@jsirois
Copy link
Member

jsirois commented Oct 20, 2020

I think what we're really missing here for these use cases is a lock file.

Pants has that: https://www.pantsbuild.org/docs/python-third-party-dependencies#using-a-lockfile-strongly-recommended

@jsirois
Copy link
Member

jsirois commented Oct 20, 2020

... as does Pex which Pants passes through to using pex --constraints=...

@jriddy
Copy link
Author

jriddy commented Oct 20, 2020

Pants has that: https://www.pantsbuild.org/docs/python-third-party-dependencies#using-a-lockfile-strongly-recommended

... as does Pex which Pants passes through to using pex --constraints=...

And following this advice is how I discovered this issue. I keep hitting huge slowdowns in one of the pex commands pants runs to resolve the constraints file whenever it changes. And if I don't change it when my requirements change, then I run into slowdowns as it tries to resolve my requirements each time.

The constraints file is not a lock file, not if pex still has to do a resolution. A lock file is the output of a resolution, not an input to it. For real lock file support I opened a feature request: #1086

@jriddy
Copy link
Author

jriddy commented Oct 20, 2020

For reference, here's a pex command that pants runs that results in sometimes extremely slow builds:

pex \
-vvvvvv \
--python-path \
/home/jrloc/.pyenv/versions/3.6.9/bin:/home/jrloc/.pyenv/versions/3.7.7/bin:/home/jrloc/.pyenv/versions/3.8.5/bin:/home/jrloc/.pyenv/versions/miniconda3-4.7.12/bin:/home/jrloc/.pyenv/shims:/home/jrloc/.pyenv/bin:/home/jrloc/.local/bin:/home/linuxbrew/.linuxbrew/bin:/home/jrloc/.cargo/bin:/home/linuxbrew/.linuxbrew/opt/fzf/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin \
--pex-root \
/home/jrloc/.cache/pants/named_caches/pex_root \
--tmpdir \
/tmp/tmppex \
--output-file \
requirements.pex \
--no-pypi \
--index=https://pypi.org/simple/ \
--not-zip-safe \
--no-emit-warnings \
--jobs \
2 \
--manylinux \
manylinux2014 \
--constraints \
constraints.txt \
--sources-directory=source_files \
Flask-BasicAuth==0.2.0 \
Flask==1.0.2 \
Jinja2==2.11.2 \
MarkupSafe==1.1.1 \
Pillow==8.0.0 \
PyYAML==5.3.1 \
Pygments==2.7.1 \
Werkzeug==0.16.1 \
aniso8601==8.0.0 \
apispec-webframeworks==0.5.2 \
apispec==4.0.0 \
attrs==20.2.0 \
avro-python3==1.9.0 \
backcall==0.2.0 \
boto3==1.9.100 \
botocore==1.12.253 \
certifi==2020.6.20 \
chardet==3.0.4 \
click==7.1.2 \
consulate==0.6.0 \
cytoolz==0.11.0 \
'dataclasses==0.7; python_version < "3.7"' \
decorator==4.4.2 \
dnspython3==1.15.0 \
dnspython==1.15.0 \
docutils==0.15.2 \
filelock==3.0.12 \
flask-apispec==0.10.0 \
flask-consulate==0.2.0 \
flask-restplus==0.12.1 \
flatten-dict==0.0.3.post1 \
idna==2.10 \
importlib-metadata==2.0.0 \
ipython-genutils==0.2.0 \
ipython==7.16.1 \
itsdangerous==1.1.0 \
jedi==0.17.2 \
jmespath==0.10.0 \
joblib==0.17.0 \
jsonschema==3.2.0 \
marshmallow-dataclass==8.1.0 \
marshmallow-enum==1.5.1 \
marshmallow==3.8.0 \
mypy-extensions==0.4.3 \
numpy==1.19.2 \
opencv-python-headless==4.4.0.44 \
pants==1.0.1 \
parso==0.7.1 \
pathlib2==2.3.5 \
pexpect==4.8.0 \
pickleshare==0.7.5 \
pip==20.2.4 \
prompt-toolkit==3.0.8 \
ptyprocess==0.6.0 \
pyrsistent==0.17.3 \
python-dateutil==2.8.1 \
pytz==2020.1 \
redis==3.2.1 \
requests==2.24.0 \
s3transfer==0.2.1 \
scikit-learn==0.23.2 \
scipy==1.5.3 \
setuptools==46.1.3 \
six==1.15.0 \
statsd-telegraf==3.2.1.post1 \
threadpoolctl==2.1.0 \
toolz==0.11.1 \
traitlets==4.3.3 \
typeguard==2.9.1 \
typing-extensions==3.7.4.3 \
typing-inspect==0.6.0 \
urllib3==1.25.10 \
wcwidth==0.2.5 \
webargs==6.1.1 \
wheel==0.34.2 \
zipp==3.3.1

This command has been reformatted and edited slightly to i could run it outside of pants to debug this problem

@Eric-Arellano
Copy link
Contributor

unless a requirement is exact, once a TTL expires, the ranged requirement sort of has to be looked up again.

Ah, so, if I understand correctly, turning off cache TTL would mean a loose req like Django will never pull in new versions once cached. Makes sense.

--

Do you know why it might be taking >3 minutes to resolve Josh's constraints file when the TTL is expired, even though few dependencies changed; but only ~30 seconds if the TTL is not expired? That is, how much overhead do you expect when it expires? Do you know if we actually redownload files, for example, or solely ping to see if the version has changed?

(I'm being lazy not looking; I can dig if you're not already familiar.)

@jsirois
Copy link
Member

jsirois commented Oct 20, 2020

Do you know why it might be taking >3 minutes to resolve Josh's constraints file when the TTL is expired, even though few dependencies changed; but only ~30 seconds if the TTL is not expired?

A quick search of the code over in Pants shows we do not pass --intransitive to Pex when emulating lockfile resolves. In order to emulate a lockfile you need two things:

  1. A fully pinned, fully transitive explicit set of requirements.
  2. --intransitive

Number 1 does not require passing --constraints, it just requires using a psuedo-lockfile fully pinned, fully transitive constraints file contents to be passed as requirements.

Number 2 is required since a fully pinned set of requirements, even with a fully pinned set of constraints does not guaranty closure. For example, say I have 1 requirement, requests==1.0.0 and constraints requests==1.0.0. Pex (Pip) can't know if requests 1.0.0 has ranged dependencies without reading requests distribution metadata and fully recursing. That recursion could reveal a non-pinned, non-constrained transitive dep. As such, you have to have pre-arranged the fully pinned, fully transitive property and then tell Pex (and Pip) "trust me" aka --intransitive.

@jriddy
Copy link
Author

jriddy commented Oct 20, 2020

I see the--intransitive issue, that makes sense, and I'll update #1086 to reflect that. But I think there's still a bug in the caching logic here...

I've attached a log of the session I did with --cache-ttl=1 to test this. From what I can see, this cache-ttl factor is applied to both the resolution stage (which is fine), and to the download stage (which I think is not).

At the start of the attached excerpt, I can see it scanning the project for links and looking to select one that matches. Past all the "skipping links" parts, it finds the link, and performs a cache lookup for that exact file, finds it and decides to redownload it again anyways:

Oct 19 16:15:36 Given no hashes to check 2 links for project 'MarkupSafe': discarding no candidates
Oct 19 16:15:36 Using version 1.1.1 (newest of versions: 1.1.1)
Oct 19 16:15:36 Collecting MarkupSafe==1.1.1
Oct 19 16:15:36   Created temporary directory: /tmp/tmppex/pip-unpack-ztij8yvb
Oct 19 16:15:36   Looking up "https://files.pythonhosted.org/packages/4b/20/f6d7648c81cb84815d0be935d5c74cd1cc0239e43eadb1a61062d34b6543/MarkupSafe-1.1.1-cp38-cp38-manylinux1_x86_64.whl" in the cache
Oct 19 16:15:36   Current age based on date: 1203
Oct 19 16:15:36   Ignoring unknown cache-control directive: immutable
Oct 19 16:15:36   Freshness lifetime from max-age: 365000000
Oct 19 16:15:36   Freshness lifetime from request max-age: 1
Oct 19 16:15:36   https://files.pythonhosted.org:443 "GET /packages/4b/20/f6d7648c81cb84815d0be935d5c74cd1cc0239e43eadb1a61062d34b6543/MarkupSafe-1.1.1-cp38-cp38-manylinux1_x86_64.whl HTTP/1.1" 200 32690
Oct 19 16:15:36   Downloading MarkupSafe-1.1.1-cp38-cp38-manylinux1_x86_64.whl (32 kB)
Oct 19 16:15:36   Ignoring unknown cache-control directive: immutable

This is the behavior I find kinda baffling. I'm not sure what "given no hashes to check links means here", since the hashes for these files is in the URL itself. It doesn't need to download this again.

pex-logexerpt.txt

@jriddy
Copy link
Author

jriddy commented Oct 20, 2020

I've added --intransivite to my test command and I'm getting the exact same behavior. It's redownloading everything. Despite the files being present in the pex cache. Is this perhaps related to the line Ignoring unknown cache-control directive: immutable in the above excerpt?

@benjyw benjyw self-assigned this Oct 20, 2020
Eric-Arellano added a commit to pantsbuild/pants that referenced this issue Oct 21, 2020
Closes #10994. Even with the PEX bug fix proposed in pex-tool/pex#1084, users may want to configure this advanced option.

[ci skip-rust]
[ci skip-build-wheels]
Eric-Arellano added a commit to Eric-Arellano/pants that referenced this issue Oct 21, 2020
Closes pantsbuild#10994. Even with the PEX bug fix proposed in pex-tool/pex#1084, users may want to configure this advanced option.

[ci skip-rust]
[ci skip-build-wheels]
@benjyw benjyw assigned Eric-Arellano and unassigned benjyw Oct 21, 2020
@Eric-Arellano Eric-Arellano changed the title Cache TTL mechanism has too low a default, and could be better served by hash checks Possible bug with cache TTL expiration resulting in redownloading files Oct 21, 2020
@jsirois
Copy link
Member

jsirois commented Dec 14, 2020

@jriddy it appears to be the case in command line tests that Pip, with a Cache-Control: max-age=N where N > 0 (this is how --cache-ttl is plumbed) re-fetches and re-builds once the TTL expires. With Cache-Control: max-age=0 - the Pip default - a conditional GET will always be requested, but when that returns 304, the cached wheel / tgz is re-used. As such, the default setting of Pex here of --cache-ttl=3600 is not generally useful and certainly not for Pants. In the upgrade to Pip 20.3.1 in #1133 the support for setting this goes away; so Pex will return to a default of always using a conditional GET which proves to provide better latency on re-resolves. The fix to not re-resolve at all when a lockfile is used tracked by #1086 will be the proper remedy.

@jriddy
Copy link
Author

jriddy commented Dec 14, 2020

Thanks for the update @jsirois.

I've mostly worked around these issues on my end by setting --cache-ttl to very high numbers, but I'm looking forward to this release, as I do change my constraints.txt files pretty often.

Thanks for the great work as always!

@Eric-Arellano
Copy link
Contributor

Sounds like this was fixed in Pex.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants