Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remote packages always downloaded #9113

Closed
tfausak opened this issue Jul 11, 2023 · 5 comments · Fixed by #9116
Closed

Remote packages always downloaded #9113

tfausak opened this issue Jul 11, 2023 · 5 comments · Fixed by #9116

Comments

@tfausak
Copy link
Collaborator

tfausak commented Jul 11, 2023

Describe the bug
When specifying a remote package in the packages field of the cabal.project file, that package is always downloaded when building. There does not appear to be any way to cache it.

To Reproduce
Steps to reproduce the behavior:

Create a cabal.project file like this:

packages:
  .
  https://hackage.haskell.org/package/flow-2.0.0.3/flow-2.0.0.3.tar.gz

Then set up a local package just to give Cabal something to build. Then run cabal build and you should see this output:

$ cabal build 
Downloading
https://hackage.haskell.org/package/flow-2.0.0.3/flow-2.0.0.3.tar.gz
Resolving dependencies...
# ...

That's all expected. However if you run cabal build again, it will download the remote package again even though it hasn't changed:

$ cabal build
Downloading
https://hackage.haskell.org/package/flow-2.0.0.3/flow-2.0.0.3.tar.gz
Up to date

Expected behavior
The remote package should only be downloaded once. Obviously remote resources can change, so perhaps this should be based on some header like Content-MD5 or ETag or Last-Modified.

System information

$ uname -a
Linux acf2eb0d828d 6.3.12-orbstack-00209-ga2cd8129a099 #1 SMP Mon Jul 10 05:35:25 UTC 2023 aarch64 GNU/Linux
$ ghc --version
The Glorious Glasgow Haskell Compilation System, version 9.6.2
$ cabal --version
cabal-install version 3.10.1.0
compiled using version 3.10.1.0 of the Cabal library 

Additional context
I'm using a package tarball from Hackage, but I see the same behavior from other sources such as GitHub.

@andreabedini
Copy link
Collaborator

🤔 just having a quick look at what is going on. It might be the case that cabal is not actually redownloading the entire file but using curl to check that it has not changed on the server

❯ cabal build -v --dry-run flow
Project settings changed, reconfiguring...
creating /home/andrea/tmp-tmp/dist-newstyle
creating /home/andrea/tmp-tmp/dist-newstyle/cache
Downloading
https://hackage.haskell.org/package/flow-2.0.0.3/flow-2.0.0.3.tar.gz
creating /home/andrea/tmp-tmp/dist-newstyle/src
Running: /usr/bin/curl 'https://hackage.haskell.org/package/flow-2.0.0.3/flow-2.0.0.3.tar.gz' --output /home/andrea/tmp-tmp/dist-newstyle/src/flow-2.0._-9bfb21efca6f401d.tar27893-0.gz --location --write-out '%{http_code}' --user-agent 'cabal-install/3.10.1.0 (linux; x86_64)' --silent --show-error --dump-header /home/andrea/tmp-tmp/dist-newstyle/src/curl-headers27893-1.txt --header 'If-None-Match: /home/andrea/tmp-tmp/dist-newstyle/src/flow-2.0._-9bfb21efca6f401d.tar.gz.etag'
Downloaded to
/home/andrea/tmp-tmp/dist-newstyle/src/flow-2.0._-9bfb21efca6f401d.tar.gz
this build was affected by the following (project) config files:
- /home/andrea/tmp-tmp/cabal.project
Component graph for flow-2.0.0.3: component lib
component flow-2.0.0.3-ae6bb30974af1e44d5bbd05aaa8208efeac241af1e3a8eefc01812222ec36fb3
    include base-4.17.1.0
unit flow-2.0.0.3-ae6bb30974af1e44d5bbd05aaa8208efeac241af1e3a8eefc01812222ec36fb3
    include base-4.17.1.0
    Flow=flow-2.0.0.3-ae6bb30974af1e44d5bbd05aaa8208efeac241af1e3a8eefc01812222ec36fb3:Flow
Build profile: -w ghc-9.4.5 -O1
In order, the following would be built:
 - flow-2.0.0.3 (lib) (requires build)

If my curl-fu doesn't fail me, this makes curl ask the server to send the file only if the etag has changed.
In any case this is a network request, and the log message is misleading.

@tfausak what would be your expectation? that the file is downloaded once and never checked again?

@tfausak
Copy link
Collaborator Author

tfausak commented Jul 11, 2023

Oops, I was tricked by the "Downloading" output! This is indeed doing the right thing for package tarballs from Hackage.

However it does consistently (and actually) re-download package tarballs from GitHub. For example with the following cabal.project:

packages:
  .
  https://github.com/tfausak/flow/releases/download/2.0.0.3/flow-2.0.0.3.tar.gz

I see this output:

$ cabal build --verbose
Project settings changed, reconfiguring...
creating /.../dist-newstyle
creating /.../dist-newstyle/cache
Downloading
https://github.com/tfausak/flow/releases/download/2.0.0.3/flow-2.0.0.3.tar.gz
creating /.../dist-newstyle/src
Running: /usr/bin/curl 'https://github.com/tfausak/flow/releases/download/2.0.0.3/flow-2.0.0.3.tar.gz' --output /.../dist-newstyle/src/flow-2.0._-234aa17389853b97.tar30445-0.gz --location --write-out '%{http_code}' --user-agent 'cabal-install/3.10.1.0 (linux; aarch64)' --silent --show-error --dump-header /.../dist-newstyle/src/curl-headers30445-1.txt
Downloaded to
/.../dist-newstyle/src/flow-2.0._-234aa17389853b97.tar.gz
this build was affected by the following (project) config files:
- /.../cabal.project
Compiler settings changed, reconfiguring...
# ...

$ cabal build --verbose
Project settings changed, reconfiguring...
creating /.../dist-newstyle
creating /.../dist-newstyle/cache
Downloading
https://github.com/tfausak/flow/releases/download/2.0.0.3/flow-2.0.0.3.tar.gz
creating /.../dist-newstyle/src
Running: /usr/bin/curl 'https://github.com/tfausak/flow/releases/download/2.0.0.3/flow-2.0.0.3.tar.gz' --output /.../dist-newstyle/src/flow-2.0._-234aa17389853b97.tar30855-0.gz --location --write-out '%{http_code}' --user-agent 'cabal-install/3.10.1.0 (linux; aarch64)' --silent --show-error --dump-header /.../dist-newstyle/src/curl-headers30855-1.txt
Downloaded to
/.../dist-newstyle/src/flow-2.0._-234aa17389853b97.tar.gz
this build was affected by the following (project) config files:
- /.../cabal.project
Component graph for flow-2.0.0.3: component lib
# ...

I think that's happening because GitHub redirects to the actual file:

$ http head https://github.com/tfausak/flow/releases/download/2.0.0.3/flow-2.0.0.3.tar.gz
HTTP/1.1 302 Found
Cache-Control: no-cache
Content-Length: 0
Content-Security-Policy: default-src 'none'; ...
Content-Type: text/html; charset=utf-8
Date: Tue, 11 Jul 2023 14:08:56 GMT
Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/33275764/2328f060-978d-4623-a92c-9793ab808385?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20230711%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20230711T140856Z&X-Amz-Expires=300&X-Amz-Signature=c1a6a94e32fd0bdb6e507e95b885e8c75db5bd2a3d150bef35e373dde6c3a559&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=33275764&response-content-disposition=attachment%3B%20filename%3Dflow-2.0.0.3.tar.gz&response-content-type=application%2Foctet-stream
Referrer-Policy: no-referrer-when-downgrade
Server: GitHub.com
Strict-Transport-Security: max-age=31536000; includeSubdomains; preload
Vary: X-PJAX, X-PJAX-Container, Turbo-Visit, Turbo-Frame, Accept-Encoding, Accept, X-Requested-With
X-Content-Type-Options: nosniff
X-Frame-Options: deny
X-GitHub-Request-Id: D263:8068:1C95CF1:2972DC7:64AD6277
X-XSS-Protection: 0

$ http head 'https://objects.githubusercontent.com/github-production-release-asset-2e65be/33275764/2328f060-978d-4623-a92c-9793ab808385?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20230711%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20230711T140856Z&X-Amz-Expires=300&X-Amz-Signature=c1a6a94e32fd0bdb6e507e95b885e8c75db5bd2a3d150bef35e373dde6c3a559&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=33275764&response-content-disposition=attachment%3B%20filename%3Dflow-2.0.0.3.tar.gz&response-content-type=application%2Foctet-stream'
HTTP/1.1 200 OK
Accept-Ranges: bytes
Age: 260
Connection: keep-alive
Content-Disposition: attachment; filename=flow-2.0.0.3.tar.gz
Content-Length: 5076
Content-MD5: +9NIk9fDZh2EvwWFilGqqg==
Content-Type: application/octet-stream
Date: Tue, 11 Jul 2023 14:13:40 GMT
ETag: "0x8DB27EC05D466FD"
Fastly-Restarts: 1
Last-Modified: Sat, 18 Mar 2023 20:04:46 GMT
Server: Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0
Via: 1.1 varnish, 1.1 varnish
X-Cache: MISS, HIT
X-Cache-Hits: 0, 1
X-Served-By: cache-iad-kcgs7200142-IAD, cache-fty21356-FTY
X-Timer: S1689084821.736045,VS0,VE32
x-ms-blob-type: BlockBlob
x-ms-creation-time: Sat, 18 Mar 2023 20:04:46 GMT
x-ms-lease-state: available
x-ms-lease-status: unlocked
x-ms-request-id: b63478ba-101e-0044-1e01-b46b6a000000
x-ms-server-encrypted: true
x-ms-version: 2020-04-08

@andreabedini
Copy link
Collaborator

I am pretty confident that cabal uses the same logic for remote packages:, indenently from where they are hosted. 🤔

Notice that in your case the argument --header 'If-None-Match: ... is missing. Maybe we don't parse the output headers correctly and we don't see the etag? also when following a redirect, which headers do we get? so many questions :D

@tfausak
Copy link
Collaborator Author

tfausak commented Jul 12, 2023

I think Cabal is failing to save the ETag from GitHub because GitHub uses a lowercase header (etag: ...) name but Cabal looks for an uppercase one (ETag: ...):

| ["ETag:", etag] <- map words (lines headers)

$ curl \
    --dump-header github-headers.txt \
    --location \
    --output /dev/null \
    https://github.com/tfausak/flow/releases/download/2.0.0.3/flow-2.0.0.3.tar.gz

$ grep -i etag github-headers.txt
etag: "0x8DB27EC05D466FD"

@andreabedini
Copy link
Collaborator

Far out! Good find @tfausak!

ulysses4ever pushed a commit that referenced this issue Jul 16, 2023
Mikolaj pushed a commit that referenced this issue Jul 16, 2023
@mergify mergify bot closed this as completed in #9116 Jul 16, 2023
mergify bot added a commit that referenced this issue Jul 16, 2023
Use case insensitive match on ETag headers
mergify bot pushed a commit that referenced this issue Jul 16, 2023
Fixes #9113.

(cherry picked from commit d4d17d0)

# Conflicts:
#	cabal-install/src/Distribution/Client/HttpUtils.hs
tfausak added a commit that referenced this issue Jul 27, 2023
mergify bot added a commit that referenced this issue Jul 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants