Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can't handle www.host #150

Closed
jmonster opened this issue Nov 8, 2011 · 26 comments
Closed

can't handle www.host #150

jmonster opened this issue Nov 8, 2011 · 26 comments

Comments

@jmonster
Copy link

jmonster commented Nov 8, 2011

Suppose I wanted to proxy a site like cb2.com -- visiting this site, you'll see that it immediately redirects you to www.cb2.com. This is a problem, as I'm unable to proxy the content of the site as my browser begins accessing www.cb2.com directly instead of continuing to proxy via localhost.

I've tried simply setting the host to be www.cb2.com instead of cb2.com, but it fails. No errors, the browser just waits.

    proxy.proxyRequest(req, res, {
        host: 'www.cb2.com',
        port: 80
    });
@jmonster
Copy link
Author

jmonster commented Nov 8, 2011

Correction -- it doesn't hang, but for localhost:3000/stores it returns:

Invalid URL

The requested URL "/stores", is invalid.
Reference #9.85d17941.1320781106.11d7834e

@dominictarr
Copy link
Contributor

hmm, it looks like you may need to edit the response so that it does not redirect.

can your provide more information? I'm not really sure what you are trying to do.
a script that I could run to reproduce you problem would help.

@jmonster
Copy link
Author

jmonster commented Nov 8, 2011

I'm using it in an Express app. My intention is that 95% of my webapp will do something interesting, but for routes that it doesn't it'll transparently return the original webpage. In this example, that webpage is cb2.com. Imagine the following is require'd at the start of an express app such that it starts responding to http://localhost/stores

var httpProxy = require('http-proxy'),
proxy = new httpProxy.RoutingProxy();

module.exports = function(app) {
app.get('/stores', function(req, res){
proxy.proxyRequest(req,res,{
host:'www.cb2.com',
port:80
});
});
}

with the www there I get errors. Without the www, the reply from cb2.com is a redirect to www.cb2.com. Testing in a browser, they only redirect when the url is missing the 'www.' prefix. So if http-proxy were to respect the www. prefix properly, the redirect wouldn't be an issue.

Alternatively http-proxy could just follow the redirects, but that would still be a less-than-optimal solution since it would add latency to every request. (although it would be great for one-off pages)

@dominictarr
Copy link
Contributor

ah, I think that the problem is that http-proxy doesn't know that you are proxying from a url.
what if you do req.url = req.url.replace('/stores','') and then call http-proxy?

@jmonster
Copy link
Author

jmonster commented Nov 8, 2011

doesn't make any difference, but I don't understand why it would have?
just to reiterate, this works (but the result is an instruction to redirect to www.cb2.com):

app.get('/stores', function(req, res){
    proxy.proxyRequest(req,res,{
        host:'cb2.com',
        port:80
    });
});

while this gives Invalid URL error:
app.get('/stores', function(req, res){
proxy.proxyRequest(req,res,{
host:'www.cb2.com',
port:80
});
});

@dominictarr
Copy link
Contributor

okay, I've reproduced this.
strange. this is not an error from http-proxy.

www.cb2.com is actually responding with this error:

<HTML><HEAD>
<TITLE>Invalid URL</TITLE>
</HEAD><BODY>
<H1>Invalid URL</H1>
The requested URL "&#47;", is invalid.<p>
Reference&#32;&#35;9&#46;56dd54b8&#46;1320795184&#46;17828ee6
</BODY></HTML>

@jmonster
Copy link
Author

Any idea why? You're right, I tried www.amazon.com and do not have such issues ... but if I curl www.cb2.com it works, so ... huh. Thanks for looking into it so quickly, much appreciated!

@jmonster
Copy link
Author

barnesandnoble.com does cause the same issue with including www (or redirects without it)

@jmonster
Copy link
Author

(sorry for so many posts so close together)

It looks like this is the issue : #59

not using www. is triggering the redirect and avoiding the problem. otherwise it goes through and sends an HTTP1.1 even though these serves only support HTTP1.0, which causes the error reply. Sucks.

@gotwarlost
Copy link

I'm facing the same issue and I think I know what is going on.

It looks like some sites (www.cb2.com, m.yahoo.com, sites hosted on dreamhost etc.) use the "Host" header for internal routing to the correct host and http-proxy does not change the Host header for the outgoing request.

In other words, if I send a request to the proxy running on localhost, the target of the proxy also sees the Host header as localhost (instead of target.server.com).

There seems to be a changeOrigin option in the code that can be set to true to fix the Host header but:

  1. It only seems to be implemented for websockets
  2. It is also not passed down from the higher layers (RoutingProxy etc.) to the HttpProxy object.

A one line patch (as proof of concept):

outgoing.headers.host = this.target.host + ':' + this.target.port; //around line 198 of http-proxy.js

fixes this issue.

Would it be possible to have the changeOrigin flag take effect for all proxy requests?

Awesome library, BTW.

@dominictarr
Copy link
Contributor

good catch @gotwarlost I think that should be the default behavior. I can't think of a reasonable situation where it should be otherwise.

@dominictarr
Copy link
Contributor

this is fixed in b4d41c3
(npm install http-proxy@0.7.5)

./bin/node-http-proxy --target www.cb2.com --port 8080

then

curl localhost:8080

...the real page...

can you check this works for you @jmonster ?

@jmonster
Copy link
Author

now www.cb2.com and cb2.com are causing a redirect :( so my request to localhost:3000/stores ends up redirecting me to www.cb2.com/stores instead of just returning the HTML and keeping the user at localhost:3000. Sigh. I guess cb2 sees that I'm using localhost and that triggers the redirect to be sent back?

I'll probably just go with a different approach of grabbing the body and then sending it to the user instead of trying to use a proxying package, shouldn't be that hard ... i hope.

@dominictarr
Copy link
Contributor

are you on the latest version?

I'm able to proxy www.cb2.com. I'm getting a redirect on www.cb2.com/stores . however www.cb2.com/stores/ (with the trailing slash) works correctly.

@jmonster
Copy link
Author

THE TRAILING SLASH! Hot damn! Your rock, thank you so much :) This is -wonderful-.

@gotwarlost
Copy link

@dominictarr, you rock! Works perfectly now

@dominictarr
Copy link
Contributor

sorry guys, I introduced a bug in another edgecase to fix this one, and in 0.7.6 you will need to pass in an option new HttpProxy({changeOrigin: true}) in order to get this behavior.

@gotwarlost
Copy link

No worries - I was going to ask for this enhancement anyway ;)

I noticed that the routing proxy still does not pass this option down to the http-proxy

//line 86 in routing-proxy.js

 ['https', 'enable', 'forward'].forEach(function (key) {
    if (options[key] !== false && self[key]) {
      options[key] = self[key];
    }
  });

Could you please include the changeOrigin attribute in this list as well? Thanks.

@jmonster
Copy link
Author

Thanks for the update -- any other secret parameters we can pass here? It would help me immensely to be able to drop part of the client's headers. This seems like the opposite behavior you'd want from a proxy, but the problem I'm running into is that the remote url is detecting the request is coming from a mobile device and redirecting rather than just returning the expected content --- some URLs are handled explicitly by the server while anything unspecified just falls through to the proxy. Unfortunately I'm not getting back the desired content because of the mobile-client-detection.

If you can suggest an alternative middleware way that'd be fine too. I've been having a hard time using middleware due to GZIP data being returned from the remote server (again, something that I could strip out of the client's proxied-request)

@jmonster
Copy link
Author

this is broken again on master -- I receive the following error when accessing servers as described above:

Error 324 (net::ERR_EMPTY_RESPONSE): The server closed the connection without sending any data.

It does, however, have a 200 status codes.

To recreate: use the master branch to proxy www.gnc.com and then try, for example, /home/index.jsp

I traced it back adn it's due to commit 2061c71 by Cloud9:
Revert "update outgoing.headers.host incase the destination does proxying"

@jmonster jmonster reopened this Dec 17, 2011
@AvianFlu
Copy link
Contributor

AvianFlu commented Jan 2, 2012

@jmonster can you test this against http-proxy@0.8.0? I'm fairly certain that we fixed this while doing the 0.6.x migration stuff.

@nrabinowitz
Copy link

I'm seeing this issue in version 0.8.0, and looking through the code there's no mention of changeOrigin outside the Web Socket section. This is a serious problem if you're trying to proxy an external site - many sites use the host header to direct your request on the server, and hitting them with localhost just gives you a 404.

@johnsheehan
Copy link

Also seeing this in 0.8.0. Is there an update on {changeOrigin: true}?

@jmonster
Copy link
Author

I added the functionality in my own fork since the main repo doesn't like it

@nrabinowitz
Copy link

Note that in addition to reverting the lines in 2061c71, you also need to handle the changeOrigin option in the RoutingProxy if you're using it - see nrabinowitz/gapvis@6e84313#diff-3 (sorry, I've done this by changing the module I added to my project, which I suspect is bad git practice, and makes it harder to isolate my changes).

@coderarity
Copy link
Contributor

Yeah, so I'm not sure what happened, but the master branch currently doesn't use the changeOrigin option for proxyRequest. I guess I'll add it again!

This was referenced Apr 13, 2012
typeoneerror added a commit to typeoneerror/ember-cli that referenced this issue Dec 2, 2014
After some poking around in the ProxyServerAddon and adding back the `host` param as the third parameter in here:

    app.use(function(req, res) {
      return proxy.web(req, res);
    });

I started poking around in node-http-proxy and found some earlier issues related to the "Host" header. http-proxy does not seem to change the Host header for outgoing requests, so you have to flag `changeOrigin` to make this happen.

I added changeOrigin and this immediately fixes my bugs. So, I'm not sure if this is the correct implementation (perhaps this should be an option sent to the proxy via cmd line or a ./server proxy), but I figured I'd push it so you can see where I am at. Hope it helps.

References:

* [Use changeOrigin for proxyRequest](http-party/node-http-proxy@cee3e2f)
* [Original host not being passed through](http-party/node-http-proxy#621)
* [Host header comments](http-party/node-http-proxy#150 (comment))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants