Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IndexError when handle malformed http response #1286

Closed
beruhan opened this issue Nov 29, 2017 · 8 comments
Closed

IndexError when handle malformed http response #1286

beruhan opened this issue Nov 29, 2017 · 8 comments

Comments

@beruhan
Copy link

beruhan commented Nov 29, 2017

I use urllib3 receive a http response like this,

HTTP/1.1 200 OK
	Content-Type: application/octet-stream
	Content-Length: 89606
	Content-Disposition: attachment; filename="MB-500Ap_2009-01-12.cfg"
	Connection: close

Brickcom-50xA
OperationSetting.locale=auto
HostName.name=cam
ModuleInfo.DIDO_module=1
ModuleInfo.PIR_module=0
ModuleInfo.WLED=0
SensorFPSSetting.fps=0
ModuleInfo.AUTOIRIS_module=0
ModuleInfo.IRCUT_module=0
ModuleInfo.IRLED_module=0
ModuleInfo.lightsensor=0
ModuleInfo.EXPOSURE_module=0
ModuleInfo.MDNS_module=0
ModuleInfo.PTZ_module=1
ModuleInfo.MSN_module=0
ModuleInfo.WIFI_module=0
ModuleInfo.watchDog_module=0
ModuleInfo.sdcard_module=1
ModuleInfo.usbstorage_module=0
ModuleInfo.sambamount_module=0
ModuleInfo.QoS=0
ModuleInfo.shutter_speed=0
ModuleInfo.discovery_internet=1
ModuleInfo.POE_module=
ModuleInfo.audio_record=1

it throws a IndexError,I print the traceback,

 req = http_get(url, auth=("admin", "admin"), timeout=timeout, verify=False)
  File "C:\Python27\lib\site-packages\requests\api.py", line 72, in get
    return request('get', url, params=params, **kwargs)
  File "C:\Python27\lib\site-packages\requests\api.py", line 58, in request
    return session.request(method=method, url=url, **kwargs)
  File "C:\Python27\lib\site-packages\requests\sessions.py", line 508, in request
    resp = self.send(prep, **send_kwargs)
  File "C:\Python27\lib\site-packages\requests\sessions.py", line 618, in send
    r = adapter.send(request, **kwargs)
  File "C:\Python27\lib\site-packages\requests\adapters.py", line 440, in send
    timeout=timeout
  File "C:\Python27\lib\site-packages\urllib3\connectionpool.py", line 617, in urlopen
    **response_kw)
  File "C:\Python27\lib\site-packages\urllib3\response.py", line 456, in from_httplib
    headers = HTTPHeaderDict.from_httplib(headers)
  File "C:\Python27\lib\site-packages\urllib3\_collections.py", line 312, in from_httplib
    key, value = headers[-1]
IndexError: list index out of range

how can I deal with this issue?

@haikuginger
Copy link
Contributor

I'll see if I can replicate soon and figure out the right strategy. In the original, are those header fields spaced out with \t, or individual space characters?

@beruhan
Copy link
Author

beruhan commented Nov 29, 2017

the source response is

'HTTP/1.1 200 OK\r\n\tContent-Type: application/octet-stream\r\n\tContent-Length: 49059\r\n\tContent-Disposition: attachment; filename="configfile.txt"\r\n\tConnection: close\r\n\r\n

@haikuginger
Copy link
Contributor

Okay, so we're correctly dropping those "headers" because they're not actually headers (tab character is disallowed in header field names). This results in a response that has no headers, but we have a piece of logic that expects at least one. I'll need to dig in and see if there are any other places that would be unhappy with a headerless response and if so, what kind of effort would be involved with cleaning that up.

@sigmavirus24, thoughts? I don't think we want to try to figure out what the server means when it sends an invalid header; it's too ambiguous.

@sigmavirus24
Copy link
Contributor

@haikuginger we're not correctly dropping them necessarily. We're expecting valid headers. The server is wrong and I think we should raise an InvalidHeader exception of sorts in this case, especially when we can't just try to add it onto the previous header.

@Lukasa
Copy link
Sponsor Contributor

Lukasa commented Nov 30, 2017

In this instance I think the error is coming from httplib, so we may not be able to raise that error easily.

@haikuginger
Copy link
Contributor

haikuginger commented Dec 14, 2017

Okay, I'm just getting back to this. This is our code, not httplib, and what's happening is that we're interpreting a line that starts with '\t' or ' ' (space) as a line continuation for the last header - and then getting the last header with the -1 index. It looks like this is obsoleted with RFC-7230, but user agents may still accept it by replacing the whole thing with a single space (not the current '\r\n' we currently implement).

I propose that we adjust the implementation such that...

  1. Multiline header values are passed per RFC-7230 with a space inserted between lines in the value
  2. Items that appear to be multiline headers (preceded by '\r\n\t' or '\r\n ') raise a specific exception unless preceded by at least one header.

@sigmavirus24, does that seem reasonable to you?

Note to @beruhan: this will not prevent an exception from happening for the response you've listed above, but will make that exception more explicit and helpful, so that you can catch it and take action accordingly.

@sigmavirus24
Copy link
Contributor

@haikuginger yeah, 2 is what we're missing and definitely seems reasonable. 👍

@haikuginger
Copy link
Contributor

@beruhan, we've pushed a change to master that will result in a more comprehensible error for a response that's malformed in this way. Thanks for putting this in front of us!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants