Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proxied requests confuse the URI parser #248

Closed
SGrondin opened this issue Feb 3, 2015 · 21 comments
Closed

Proxied requests confuse the URI parser #248

SGrondin opened this issue Feb 3, 2015 · 21 comments

Comments

@SGrondin
Copy link
Contributor

SGrondin commented Feb 3, 2015

When I set the system-wide HTTP proxy on Debian to forward all traffic to a Cohttp application, the calls come in as GET //simongrondin.name/files/fifty.txt instead of GET http://simongrondin.name/files/fifty.txt.

The URI parser gets confused and interprets them like this:

Uri.to_string (Request.uri req)
> http://simongrondin.namehttp//simongrondin.name/files/fifty.txt
@rgrinberg
Copy link
Member

Are there any headers that come with this GET //simongrondin.name/files/fifty.txt request? I'm looking at a host header specifically.

@SGrondin
Copy link
Contributor Author

SGrondin commented Feb 3, 2015

Host is, as expected, Host: simongrondin.name

@rgrinberg
Copy link
Member

@SGrondin
Copy link
Contributor Author

SGrondin commented Feb 3, 2015

By the way it's easier to reproduce by setting the Firefox HTTP proxy setting to a local cohttp server.

@dsheets
Copy link
Member

dsheets commented Feb 3, 2015

@rgrinberg are you on this? It looks like d04701f#diff-daa4e580cf195143c70372b6183a5a40R140 should really be using proper Uri functional updates rather than string concatenation (will handle relative paths correctly). Also, I'd recommend switching to a scheme-relative identifiers.

@rgrinberg
Copy link
Member

@dsheets I'd like to fix this by next week. A fix from you would be much appreciated since I'm not sure of proper Uri usage.

@SGrondin
Copy link
Contributor Author

This is the absolutely horrible hack I'm using for now, but it works until a fix is (hopefully) released in 0.16. I put it here in case anyone googled and found this issue.

I would like to apologize to the gods of ocaml for this horror.

let fix_uri uri =
    let ungarble uri host protocol_len =
        uri
        |> fun uri -> ((Uri.with_host uri (Some (String.slice host 0 (-protocol_len)))), (String.length host - protocol_len))
        |> fun (uri, len) -> Uri.with_path uri (String.slice (Uri.path uri) (len + 2) 0)
    in
    match (uri |> Uri.host |> Option.value ~default:"") with
    | host when String.is_suffix ~suffix:"http" host -> ungarble uri host 4
    | host when String.is_suffix ~suffix:"https" host -> ungarble uri host 5
    | host when (((String.length host) / 2) mod 2 = 0) && ((String.slice host (String.length host / 2) 0) = (String.slice host 0 (String.length host / 2))) ->
            Uri.with_host uri (Some (String.slice host 0 (String.length host / 2)))
    | _ -> uri

avsm added a commit to avsm/ocaml-cohttp that referenced this issue Feb 16, 2015
@avsm
Copy link
Member

avsm commented Feb 16, 2015

I stuck a functional update fix into avsm/ocaml-cohttp@0260455, but uri is doing some odd things:

#require "uri.top"
Uri.with_port (Uri.of_string "/") None ;;
- : Uri.t = ///
Uri.of_string "/" ;;
- : Uri.t = /

Those should really be equivalent.

@avsm
Copy link
Member

avsm commented Feb 16, 2015

See mirage/ocaml-uri#63

@avsm avsm closed this as completed in 37bcbd8 Feb 18, 2015
@SGrondin
Copy link
Contributor Author

This is still happening even with Uri 1.8.0.

http://en.wikipedia.orghttp//en.wikipedia.org/wiki/GNU_C_Library

@rgrinberg
Copy link
Member

@SGrondin I assume using cohttp master as well?

@SGrondin
Copy link
Contributor Author

0.15.2

Oh, I didn't see it was closed in 37bcbd8. For some reason I thought it was in Uri 1.8.0.

@SGrondin
Copy link
Contributor Author

I just tested it against master and it's still happening.

Uri.to_string target -> "http://en.wikipedia.orghttp//en.wikipedia.org/favicon.ico"

@rgrinberg rgrinberg reopened this Mar 13, 2015
@rgrinberg
Copy link
Member

@SGrondin So i assume the request looks something like this then:

GET http://en.wikipedia.org/favicon.ico
Host: en.wikipedia.org

FYI I just tested the request above and am getting the correct uri

uri: http://en.wikipedia.org/favicon.ico

@SGrondin
Copy link
Contributor Author

Try it by setting your Firefox (or system) proxy to a cohttp server and then try to browse wikipedia.org in a browser. http://i.imgur.com/81vdpWS.png

@avsm avsm mentioned this issue Mar 22, 2015
@avsm
Copy link
Member

avsm commented Mar 23, 2015

I cna't reproduce this, but I'm seeing some suspicious activity in cohttp-proxy-lwt I'm investigating.

When browsing wikipedia, 400 Bad Requests show up, which they really shouldn't. Probably some hop-by-hop heading that isn't being handled right.

{65171} Cohttp debugging output is activeListening for HTTP request on: 0.0.0.0 8080
{65171} <<< GET http://en.wikipedia.org/wiki/Main_Page HTTP/1.1
{65171} <<< Host: en.wikipedia.org
{65171} <<< User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:34.0) Gecko/20100101 Firefox/34.0
{65171} <<< Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
{65171} <<< Accept-Language: en-US,en;q=0.5
{65171} <<< Accept-Encoding: gzip, deflate
{65171} <<< Referer: http://www.wikipedia.org/
{65171} <<< Connection: keep-alive
{65171} <<< If-Modified-Since: Mon, 23 Mar 2015 11:31:56 GMT
{65171} <<< Cache-Control: max-age=0
{65171} <<< 
{65171} >>> GET /wiki/Main_Page HTTP/1.1
{65171} >>> accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
{65171} >>> accept-encoding: identity
{65171} >>> accept-language: en-US,en;q=0.5
{65171} >>> cache-control: max-age=0
{65171} >>> host: en.wikipedia.org
{65171} >>> if-modified-since: Mon, 23 Mar 2015 11:31:56 GMT
{65171} >>> referer: http://www.wikipedia.org/
{65171} >>> user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:34.0) Gecko/20100101 Firefox/34.0
{65171} >>> 
{65171} >>> 0

{65171} <<< HTTP/1.1 304 Not Modified
{65171} <<< Server: Apache
{65171} <<< X-Content-Type-Options: nosniff
{65171} <<< X-Analytics: page_id=15580374;ns=0
{65171} <<< Content-language: en
{65171} <<< X-UA-Compatible: IE=Edge
{65171} <<< Vary: Accept-Encoding,Cookie
{65171} <<< X-Powered-By: HHVM/3.3.1
{65171} <<< Last-Modified: Mon, 23 Mar 2015 11:31:56 GMT
{65171} <<< Content-Type: text/html; charset=UTF-8
{65171} <<< X-Varnish: 2305557189 2305557059, 2102571268 2102571170, 758309883 757593122
{65171} <<< Via: 1.1 varnish, 1.1 varnish, 1.1 varnish
{65171} <<< Date: Mon, 23 Mar 2015 11:47:20 GMT
{65171} <<< Age: 924
{65171} <<< Connection: keep-alive
{65171} <<< X-Cache: cp1055 hit (7), amssq31 hit (22), amssq37 frontend hit (4789)
{65171} <<< Cache-Control: private, s-maxage=0, max-age=0, must-revalidate
{65171} <<< 
{65171} >>> HTTP/1.1 304 Not Modified
{65171} >>> age: 924
{65171} >>> cache-control: private, s-maxage=0, max-age=0, must-revalidate
{65171} >>> content-language: en
{65171} >>> content-type: text/html; charset=UTF-8
{65171} >>> date: Mon, 23 Mar 2015 11:47:20 GMT
{65171} >>> last-modified: Mon, 23 Mar 2015 11:31:56 GMT
{65171} >>> server: Apache
{65171} >>> transfer-encoding: chunked
{65171} >>> vary: Accept-Encoding,Cookie
{65171} >>> via: 1.1 varnish, 1.1 varnish, 1.1 varnish
{65171} >>> x-analytics: page_id=15580374;ns=0
{65171} >>> x-cache: cp1055 hit (7), amssq31 hit (22), amssq37 frontend hit (4789)
{65171} >>> x-content-type-options: nosniff
{65171} >>> x-powered-by: HHVM/3.3.1
{65171} >>> x-ua-compatible: IE=Edge
{65171} >>> x-varnish: 2305557189 2305557059, 2102571268 2102571170, 758309883 757593122
{65171} >>> 
{65171} <<<[4096] HTTP/1.1 400 Bad Request

{65171} >>> 1c
{65171} >>> HTTP/1.1 400 Bad Request

{65171} >>> 
{65171} <<<[4096] {65171} >>> 0

@avsm
Copy link
Member

avsm commented Mar 23, 2015

That debug output shows the browser writing a request, it being rewritten correctly for upstream (with the URL rewritten to make it a relative URL).

The problem appears immediately afterwards when a HTTP bad request shows up. This is possibly related to pipelining behaviour.

@SGrondin
Copy link
Contributor Author

As requested, here's a small test proxy that demonstrates the bug.

Set your system proxy to 127.0.0.1:15000 or set your Firefox HTTP proxy to the same.

As you try to browse HTTP sites, you'll see in the console that the hostname for the incoming requests is broken and nothing can be fetched. Then edit line 25 to uncomment the call to fix_uri, recompile and restart the server and you'll see that browsing is fine.

Compile with corebuild -tag debug -pkg lwt -pkg cohttp.lwt main.native or ocamlfind ocamlc -c main.ml -thread -package core,cohttp.lwt.

@objmagic
Copy link
Contributor

objmagic commented Apr 5, 2015

Without fix_uri being uncommented:
img

I cannot reproduce it. Has this bug already been fixed?

@SGrondin
Copy link
Contributor Author

SGrondin commented Apr 6, 2015

Indeed, some unrelated change in 0.16.0 fixed it. Nice

@SGrondin SGrondin closed this as completed Apr 6, 2015
@dsheets
Copy link
Member

dsheets commented Apr 7, 2015

This was a pretty severe issue. It would be nice to know which commit fixed the issue so we can understand the cause of the original defect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants