Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specific URL fails to be downloaded from different galaxy instances #17995

Open
pcm32 opened this issue Apr 16, 2024 · 3 comments · May be fixed by #18003
Open

Specific URL fails to be downloaded from different galaxy instances #17995

pcm32 opened this issue Apr 16, 2024 · 3 comments · May be fixed by #18003

Comments

@pcm32
Copy link
Member

pcm32 commented Apr 16, 2024

Describe the bug

The following GTF URL from 10x fails to be downloaded from 3 different galaxy instances (Org, EU, private 22.05 and another private one on a different version). The download works from browsers or wget in the same servers. The file is fairly big (~9 GBs), but the error comes in second, so it doesn't seem to be size related. Or no error message mentions that. I suspect that something in the client definition provided in the HTTP requests by Galaxy might be upsetting the 10x hosting somehow? Or in way at least that wget doesn't upset the same host.

Galaxy Version and/or server at which you observed the bug
Galaxy Version: multiple above 21.x
Commit:

Browser and Operating System
Operating System: macOS and Windows
Browser: Chrome

To Reproduce
Steps to reproduce the behavior:

  1. Go to 'upload data'
  2. Paste the URL https://cf.10xgenomics.com/supp/cell-exp/refdata-gex-GRCm39-2024-A.tar.gz in the Paste/fetch data part

Expected behavior
The file should download correctly, however different errors are shown (HTTP 403, 'fetch_url_allowlist'

Screenshots
If applicable, add screenshots to help explain your problem.

image
image

Additional context
Add any other context about the problem here.

@pcm32
Copy link
Member Author

pcm32 commented Apr 16, 2024

A colleague here did a small test from the command line with Python and urllib and got 403 Forbidden when trying to fetch
https://cf.10xgenomics.com/supp/cell-exp/refdata-gex-GRCm39-2024-A.tar.gz

@pcm32
Copy link
Member Author

pcm32 commented Apr 16, 2024

And then he tried setting the user-agent to something more common:

>>> import urllib.request
>>> opener = urllib.request.build_opener()
>>> opener.addheaders = [('User-Agent', 'Wget/1.14')]
>>> urllib.request.install_opener(opener)
>>> urllib.request.urlretrieve(
...   "[https://cf.10xgenomics.com/supp/cell-exp/refdata-gex-GRCm39-2024-A.tar.gz"](https://cf.10xgenomics.com/supp/cell-exp/refdata-gex-GRCm39-2024-A.tar.gz%22),
...   "test.tar.gz")

and that worked. Are we able to influence this somehow without code changes?

@mvdbeek mvdbeek self-assigned this Apr 16, 2024
@mvdbeek mvdbeek linked a pull request Apr 16, 2024 that will close this issue
4 tasks
@mvdbeek
Copy link
Member

mvdbeek commented Apr 16, 2024

Probably not, however blocking the default urllib user agent seems like a bold move from 10x, maybe they're willing to go with something less drastic ? I assume lots of people writing scripts would suffer from this.

#18003 will set the user agent to galaxy/{VERSION}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants