Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix wsgiref handling of absolute HTTP Request-URI #65671

Open
mouad mannequin opened this issue May 11, 2014 · 5 comments
Open

Fix wsgiref handling of absolute HTTP Request-URI #65671

mouad mannequin opened this issue May 11, 2014 · 5 comments
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@mouad
Copy link
Mannequin

mouad mannequin commented May 11, 2014

BPO 21472
Nosy @gvanrossum, @pjeby, @rbtcollins
Files
  • issue21472.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2014-05-11.11:31:53.860>
    labels = ['type-bug', 'library']
    title = 'Fix wsgiref handling of absolute HTTP Request-URI'
    updated_at = <Date 2014-09-25.20:37:19.779>
    user = 'https://bugs.python.org/mouad'

    bugs.python.org fields:

    activity = <Date 2014-09-25.20:37:19.779>
    actor = 'rbcollins'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2014-05-11.11:31:53.860>
    creator = 'mouad'
    dependencies = []
    files = ['35214']
    hgrepos = []
    issue_num = 21472
    keywords = ['patch']
    message_count = 5.0
    messages = ['218258', '226926', '226958', '227564', '227574']
    nosy_count = 4.0
    nosy_names = ['gvanrossum', 'pje', 'rbcollins', 'mouad']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue21472'
    versions = ['Python 2.7', 'Python 3.4', 'Python 3.5']

    @mouad
    Copy link
    Mannequin Author

    mouad mannequin commented May 11, 2014

    Hi,

    As most of you know, working behind a HTTP proxy raise all this corner cases that no one think about until he has to, one of them i had to deal with some months ago is absolute request URI in HTTP request, that some client will send when they detect that they are behind an HTTP proxy, to be more precise i am talking about this section of rfc2616 (https://gist.github.com/mouadino/7930974), i.e.

    GET http://domain.com/path HTTP/1.1

    As my surprise i found that most WSGI server that are out there doesn't support very well this feature including wsgiref, to test that i have created this simple python script https://gist.github.com/mouadino/7930974 that include also status of this bug in different other projects.

    In term of bug impact, if a WSGI server doesn't support this, the WSGI environment variable PATH_INFO will end up containing an absolute URI instead of relative as the standard require (http://legacy.python.org/dev/peps/pep-0333/), which mean that most uri routing that web framework do end up failing (Returning 404) b/c most of them will try to much PATH_INFO with there different routes (which are relative paths) to find which controller action should they executed e.g. https://pypi.python.org/pypi/Routes .

    FWIW i found this bug while trying to debug boto library (that use absolute uri if proxy is detected https://github.com/mouadino/boto/blob/v2.13.2/boto/connection.py#L955) that was communicating with an OpenStack API that use eventlet wsgi server (That doesn't handle absolute URI https://github.com/eventlet/eventlet/blob/v0.14/eventlet/wsgi.py#L471) and routes (that match PATH_INFO with action path https://github.com/bbangert/routes/blob/master/routes/mapper.py#L678).

    @mouad mouad mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels May 11, 2014
    @gvanrossum
    Copy link
    Member

    Wow. This is interesting. I thought that absolute URL support was only for proxies, but the spec you quote says clearly it should be supported as a transition towards always specifying the full URL. I guess they want to get rid of the Host: header?

    In any case I would worry (a bit) that this might cause security issues if implemented as naively as shown in your patch -- the other components of the URL should probably be validated against the configuration of the server. Also I am wondering whether specifying a different port or protocol (e.g. HTTPS) should be allowed or not, and what to do with extraneous parts of the path (in particular the "#fragment" identifier). You should probably also be careful with path-less domain names -- IIRC some URL parsers produce "" as the path for e.g. "http://python.org".

    Finally, AFAIK the HTTP 1.1 standard is full of ideas that few server implementations support, for various reasons, and it's possible that a future standard actually rescinds (or discourages) some features. Have you asked for the status of this particular feature?

    @mouad
    Copy link
    Mannequin Author

    mouad mannequin commented Sep 16, 2014

    Hi Guido,

    If I understand this correctly, the HOST header was added only in HTTP1.1 and setting the absolute URI was the right behavior client should follow if they are behind a proxy for HTTP1.0, but the same behavior was kept in HTTP1.1 for backward compatibility.

    In any case I would worry (a bit) that this might cause security issues if
    implemented as naively as shown in your patch,
    the other components of the URL should probably be validated against the
    configuration of the server.
    Also I am wondering whether specifying a different port or protocol (e.g.
    HTTPS) should be allowed or not.

    If there should be a validation, I think it should be done in BaseHTTPRequestHandler, FWIW this later doesn't validate HOST header neither, just tested sending a request to "python -mhttp.server" which succeeded.

    $ telnet 127.0.0.1 8000
    GET /dummy HTTP/1.1
    HOST google.com

    One thing to note here for future work for the validation part, is that according to http://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html#sec5.2, first point:

    If Request-URI is an absoluteURI, the host is part of the Request-URI.
    Any Host header field value in the request MUST be ignored.
    

    and what to do with extraneous parts of the path (in particular the
    "#fragment" identifier)

    AFAIK clients are not supposed to send fragments to servers, and I didn't find in the HTTP spec what should happen if they do, CherryPy for example (link of the code is in the footer) will raise 400 if request URI include the #fragment part.

    An other thing that CherryPy guys also did, is that HTTPRequest(...).path will always return a relativ path, which is IMHO the right behavior but for backward compatibility I hesitate fixing this problem directly in BaseHTTPRequestHandler or should we ?

    You should probably also be careful with path-less domain names -- IIRC some
    URL parsers produce "" as the path for e.g. "http://python.org".

    According to the PEP-0333 the PATH_INFO can be empty:

    PATH_INFO
     The remainder of the request URL's "path", designating the virtual   
     "location" of the request's target within the application. This may be an
     empty string, if the request URL targets the application root and does      
     not have a trailing slash.
    

    Have you asked for the status of this particular feature?

    I sent an email to python WEB-SIG mailing list trying to gather information about this behavior, but not luck yet :(

    At this point I would like to link to both CherryPy and Werkzeug that already handle this behavior.

    Werkzeug: https://github.com/mitsuhiko/werkzeug/blob/0.9.6/werkzeug/serving.py#L75-81
    CherryPy: https://github.com/cherrypy/cherrypy/blob/cherrypy-3.2.2rc1/cherrypy/wsgiserver/wsgiserver3.py#L633-638.

    As a side note, I have already said in my email to the WEB-sig kudos to cherrypy guys, the funny part is that I am starting to think that they succeeded in both my tests because they didn't rely on core python implementation of BaseHTTPRequestHandler (I guess reinventing the wheel sometime works :)), at the opposite to other frameworks that showed this problems :).

    Cheers,

    @rbtcollins
    Copy link
    Member

    FWIW we probably need to capture the original unaltered URL somewhere, but also ensure that PATH_INFO is always a relative path.

    One should be able to implement a proxy in WSGI (because thats just another specialised app), and doing that today requires special handling depending on the WSGI container, which isn't great for consistency.

    On security; Host header <-> url host mismatches occur when the host to which a request is sent != the url; this is expected only in the case of forward proxies: any other time it would indeed be a smuggling attack, trying to find mismatches between acls and access in servers - this is another reason to consolidate things so that wsgi apps can rely on urls looking consistent.

    @rbtcollins
    Copy link
    Member

    Oh, also - while its tempting to say that it doesn't matter whether we take the urls host portion, or the host header or the server name - it does. Deployments that look like:

    LB/Firewall <-> backend container -> WSGI app

    are likely to have assumptions within the WSGI app about what sort of urls they expect to reach them. We can mitigate this by being very explicit about whatever solution we have.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants