Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HEAD requests should be HEAD requests upon redirect #99730

Closed
haampie opened this issue Nov 23, 2022 · 6 comments
Closed

HEAD requests should be HEAD requests upon redirect #99730

haampie opened this issue Nov 23, 2022 · 6 comments
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@haampie
Copy link
Contributor

haampie commented Nov 23, 2022

Bug report

Currently the following is False

from urllib.request import Request, urlopen

len(urlopen(Request("http://google.com", method="HEAD")).read()) == 0  # False

But this is True

len(urlopen(Request("http://www.google.com", method="HEAD")).read()) == 0 # True

This is because http://google.com redirects with 302 to http://www.google.com.

This means that checking for existence of some file by URL will actually download the file when the URL responds with a redirect. This makes no sense. Also the HTTP spec says nothing about changing HEAD requests into GET requests; it just says that everything but GET and HEAD requests should require user interaction on redirect, which Python violates, but there's a comment on that explaining it's an active choice to violate the spec there.

To me it seems like this is an oversight. Note that curl -LI http://google.com also sticks to HEAD requests.

Linked PRs

@haampie haampie added the type-bug An unexpected behavior, bug, or error label Nov 23, 2022
@bcail
Copy link

bcail commented Oct 11, 2023

@haampie is there anything I can do to help with this issue/PR?

@haampie
Copy link
Contributor Author

haampie commented Oct 12, 2023

I guess it should be compared to other tools. A while after reporting this issue and implementing my own redirect handler, I hit some issue where sending a HEAD request to a server that redirects to an AWS bucket with temporary credentials would only work if the redirect was using a GET request, and error with Unauthorized if it was HEAD. Dunno how common this is.

@bcail
Copy link

bcail commented Oct 12, 2023

OK, here's confirmation that the requests package stays with HEAD:

>>> r = requests.head('http://google.com', allow_redirects=True)
>>> r.url
'http://www.google.com/'
>>> len(r.content)
0

And I'm seeing curl stay with the HEAD request as well.

Here's an MDN doc that doesn't come down strongly on one side or the other.

@bcail
Copy link

bcail commented Oct 24, 2023

@orsenthil any thoughts on this issue, whether the request handling can be updated to not change a HEAD to a GET on a redirect?

@iritkatriel iritkatriel added the stdlib Python modules in the Lib dir label Nov 28, 2023
@orsenthil
Copy link
Member

@orsenthil any thoughts on this issue, whether the request handling can be updated to not change a HEAD to a GET on a redirect?

The PR #99731 is approved. We will merge this change, and make the behavior change.

@encukou
Copy link
Member

encukou commented May 1, 2024

Merged; it'll be in 3.13.
Since this is a behaviour change, I don't think backporting to maintenance branches is appropriate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

5 participants