Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected exception: InvalidURL: No host specified. #4866

Closed
stratosgear opened this issue Nov 9, 2018 · 7 comments
Closed

Unexpected exception: InvalidURL: No host specified. #4866

stratosgear opened this issue Nov 9, 2018 · 7 comments

Comments

@stratosgear
Copy link

I have some legacy code that calls requests like this:

response = requests.get(file_resource['url'], stream=True)

Through debug statements right before that statement I see that the passed url is like:

http://xxxxxx.xxxx.xxx.int/xxx-sl-int/data-action?ProductType=MAP&MAP.MAP_OID=12655

That url, btw, correctly resolves into a 200MB file, in the browser or through a curl/wget download.

Expected Result

I'd expect my file to be downloaded (when I later try to:

for block in response.iter_content(100 * 1024):
                            handle.write(block)

Actual Result

I get the following stack trace:

Failed with: Traceback (most recent call last):

  File "/usr/local/lib/python2.7/dist-packages/rq/worker.py", line 710, in perform_job
    rv = job.perform()

  File "/usr/local/lib/python2.7/dist-packages/rq/job.py", line 560, in perform
    self._result = self._execute()

  File "/usr/local/lib/python2.7/dist-packages/rq/job.py", line 566, in _execute
    return self.func(*self.args, **self.kwargs)

  File "/plaavi/components/file_cache/worker.py", line 1131, in download
    return worker.download(fileResource)

  File "/plaavi/components/file_cache/worker.py", line 112, in download
    response = requests.get(file_resource['url'], stream=True)

  File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 75, in get
    return request('get', url, params=params, **kwargs)

  File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 60, in request
    return session.request(method=method, url=url, **kwargs)

  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 533, in request
    resp = self.send(prep, **send_kwargs)

  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 668, in send
    history = [resp for resp in gen] if allow_redirects else []

  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 247, in resolve_redirects
    **adapter_kwargs

  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 646, in send
    r = adapter.send(request, **kwargs)

  File "/usr/local/lib/python2.7/dist-packages/requests/adapters.py", line 414, in send
    raise InvalidURL(e, request=request)

InvalidURL: No host specified.

Reproduction Steps

>>> import requests
>>> response = requests.get("http://xxxxxx.xxxx.xxx.int/xxx-sl-int/data-action?ProductType=MAP&MAP.MAP_OID=12655", stream=True)
>>> response
<Response [200]>

Apparently trying the request from the same-ish machine that cause the exception, is working.

What is different in the actual exception case, is that the request to download happens inside an RQ (http://python-rq.org/) worker running inside a docker container, that I do not know how to further debug. Mind you that kind of download infrastructure, was working correctly the last two years and only now it starts to have these downloading issues.

System Information

The docker container is setup like this:

$ python -m requests.help
root@9dd4bc57a320:/tmp# python -m requests.help
{
  "chardet": {
    "version": "3.0.4"
  },
  "cryptography": {
    "version": "2.2.1"
  },
  "idna": {
    "version": ""
  },
  "implementation": {
    "name": "CPython",
    "version": "2.7.6"
  },
  "platform": {
    "release": "3.10.0-862.11.6.el7.x86_64",
    "system": "Linux"
  },
  "pyOpenSSL": {
    "openssl_version": "1010007f",
    "version": "18.0.0"
  },
  "requests": {
    "version": "2.20.1"
  },
  "system_ssl": {
    "version": "1000106f"
  },
  "urllib3": {
    "version": "1.24.1"
  },
  "using_pyopenssl": true
}

This command is only available on Requests v2.16.4 and greater. Otherwise,
please provide some basic information about your system (Python version,
operating system, &c).

Thanks!

@stratosgear
Copy link
Author

I double checked all the python requirements and the only outdated, from the ones mentioned above, is the cryptography one from 2.2.1 to 2.3.1.

I will try to upgrade that one, and report if it makes any difference.

@stratosgear
Copy link
Author

OK, that didn't help, although upgrading cryptography now leads to:

root@4d17f93cfd7a:/tmp# python -m requests.help
/usr/local/lib/python2.7/dist-packages/cryptography/hazmat/primitives/constant_time.py:26: CryptographyDeprecationWarning: Support for your Python version is deprecated. The next version of cryptography will remove support. Please upgrade to a 2.7.x release that supports hmac.compare_digest as soon as possible.
  utils.DeprecatedIn23,
{
  "chardet": {
    "version": "3.0.4"
  },
  "cryptography": {
    "version": "2.3.1"
  },
  "idna": {
    "version": ""
  },
  "implementation": {
    "name": "CPython",
    "version": "2.7.6"
  },
  "platform": {
    "release": "3.10.0-862.11.6.el7.x86_64",
    "system": "Linux"
  },
  "pyOpenSSL": {
    "openssl_version": "1010009f",
    "version": "18.0.0"
  },
  "requests": {
    "version": "2.20.1"
  },
  "system_ssl": {
    "version": "1000106f"
  },
  "urllib3": {
    "version": "1.24.1"
  },
  "using_pyopenssl": true
}

But I really think that this is unrelated!

Thanks, once again.

@nateprewitt
Copy link
Member

@stratosgear, this looks like an issue with the URI you’re providing. If you use the code below, what is the value returned? If it’s None or empty, then this is a problem with the URI or a bug in the urlparse function in the Python standard library.

from requests.compat import urlparse

URL = "http://yourURI.here"

parsed = urlparse(URL)
print(parsed.hostname)

@stratosgear
Copy link
Author

stratosgear commented Nov 10, 2018

Hmmm, I do not think so. I used the sample you suggested (taken from here I guess) and the url is parsed correctly.

I do get: xxxxxx.xxxx.xxx.int just as I should (sorry for obfuscating it)

I'm afraid I'll have to bite the bullet and fork requests, add additional debug statements, point to my fork, recompile the docker containers and try again, unless no other suggestions pop up... :(

@stratosgear
Copy link
Author

stratosgear commented Nov 12, 2018

Further debugging reveals that the resolve_redirects at sessions.py:665 follows a non existent redirect, (as my url does not redirect anywhere), and somehow dropping the host part of the url, as the debug below shows (whichjust prints the value of the url in various places):

11/12/2018 11:26:07 AM  api.py:75 url: http://xxxxxx.xxxx.xxx.int/pla-sl-int/data-action?ProductType=MAP&MAP.MAP_OID=12655
11/12/2018 11:26:07 AM  adapters.py:412 url: http://xxxxxx.xxxx.xxx.int/pla-sl-int/data-action?ProductType=MAP&MAP.MAP_OID=12655
11/12/2018 11:26:07 AM  adapters.py:412 url: https:///pla-sl-int/data-action?ProductType=MAP&MAP.MAP_OID=12655

As I do not understand what goes on in 'resolve_redirects' and it is difficult for me to debug it in the place this occurs due to my setup, is there anyone else that can understand what goes on in that method and why it drops the hostname....?

Thanks!

Note: I'm debugging from the v2.20.1 release branch.

@sigmavirus24
Copy link
Contributor

follows a non existent redirect, (as my url does not redirect anywhere),

We do exactly what the server tells us meaning that your URL must redirect somewhere.

This is not a bug in Requests and this is not a forum for helping you debug your infrastructure. It's only for actual defects in Requests, which this does not appear to be.

@stratosgear
Copy link
Author

For completeness sake this was an issue in our code base. There was a rogue:

httplib.HTTPConnection._http_vsn = 10
httplib.HTTPConnection._http_vsn_str = 'HTTP/1.0'

that I have no idea what it was trying to achieve, Removing this, helps the url to resolve correctly.

And yes, I know this is not a forum, but I did not ask anyone to help me debug my infrastructure. I honestly thought that I came across a requests bug, and I did whatever was humanly possible to debug the issue as best I could, even patching requests to help me find where the issue was. I provided, step by step progress report on what I was doing along the way. Ok, it's finally an issue in our infrastructure but that last comment was uncalled for, and really leaves me walking away from here with a bad taste in my mouth...

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 6, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants