Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong Location when doing a redirect behind a Proxy. #540

Closed
vaab opened this issue Jun 5, 2014 · 6 comments

Comments

Projects
None yet
5 participants
@vaab
Copy link

commented Jun 5, 2014

Hi,

When I'm proxying an application served by werkzeug, this application issues a redirect. In the code the redirect is asked for "/". But werkzeug will fill this URL wrongly: it'll use the X-FORWARDED-HOST as the real host. By doing this it'll do the reverse proxy's job of rewritting the url. And of course, as it is not his job, werkzeug doesn't do it right. It forgets the eventual subdirs.

The result is that I can't ProxyReverse any werkzeug application that should live in a subdirectory in my main frontend.

More information

I have frontend apache host www.frontend.com, which receives the HTTP requests. It's configured to proxy (pass the requests) to a werkzeug application on host internal-application.intranet when you hit the subdirectory "abc".

So when hitting: http://www.frontend.com/abc
The requests should be proxied internally to http://internal-application.intranet. This last one is running werkzeug.

When the application issues a redirect, let me explicit the scenario to be clear:

Client - ask for https://www.frontend.com/abc/file
    --> www.frontend.com (Proxy) 
        | www.frontend.com - ask for http://internal-application.intranet/file
        |    --> internal-application.intranet
        |        |  wants to redirect, so send back a redirect with header:
        |        |      Location: http://internal-application.intranet/other-file
        |    <--
        | www.frontend.com - receive the redirect, parses 'Location' header, 
        |     and replace instances of 'http://internal-application.intranet'
        |     by 'https://www.frontend.com/abc' : it take cares of:
        |       1 - protocol: http -> https
        |       2 - host: 'internal-application.intranet' -> 'www.frontend.com'
        |       3 - subdir: '/' -> 'abc/'
    <--
Client - receives a correct redirect that will work

So the header 'Location' is usually rewritten on-the-fly by the proxy before sending it the client. For instance: when it detects the pattern "http://internal-application.intranet" in the Location header, it'll change it to https://www.frontend.com/abc.

But this doesn't work because werkzeug uses some information provided by the proxy to do some sort of a replacement of the Location header. But I don't think it should. As a result, when it issues a redirect, it doesn't send Location: http://internal-application.intranet as it should, but Location: http://www.frontend.com: it replaces the HOST, but notice that it misses the subdirectory (and there are also issue with the protocol).

I don't think that werkzeug should do these changes, and it should rely on the proxy mecanism to do it's job.

Diagnosis

Then werkzeug in file werkzeug/wrapper.py in line 1110 will replace a partial location (as "/") by the full one (as "http://myhost.com/"). Here are the lines:

                ...
                current_url = get_current_url(environ, root_only=True)
                if isinstance(current_url, text_type):
                    current_url = iri_to_uri(current_url)
                location = url_join(current_url, location)
                ...

So it calls get_current_url, which in turn will ask for the host url thanks to get_host.

And get_host will use HTTP_X_FORWARDED_HOST instead of legit information:


def get_host(environ, trusted_hosts=None):
    """Return the real host for the given WSGI environment.  This takes care
    of the `X-Forwarded-Host` header.  Optionally it verifies that the host
    is in a list of trusted hosts.  If the host is not in there it will raise
    a :exc:`~werkzeug.exceptions.SecurityError`.

    :param environ: the WSGI environment to get the host of.
    :param trusted_hosts: a list of trusted hosts, see :func:`host_is_trusted`
                          for more information.
    """
    if 'HTTP_X_FORWARDED_HOST' in environ:
        rv = environ['HTTP_X_FORWARDED_HOST'].split(',')[0].strip()
    elif 'HTTP_HOST' in environ:
        rv = environ['HTTP_HOST']
    ...

I've commented these two lines to test, used tcpdump to check content of packets between proxy and application, and all seems right : the redirect finaly works fine.

If anything is wrong in my understanding of this, please tell me. If NOT using HTTP_X_FORWARDED_HOST in get_host is the way to go, tell me, I'll fix it and send you a PR. I'd like to gather some review on my analysis before sending a PR, and there might be a lot of good reason of wanting to keep this mecanism working.

Thanks for you comments.

@untitaker

This comment has been minimized.

Copy link
Member

commented Aug 21, 2014

I'm also not sure whether it's get_host's job to infer the host from X-Forwarded-Host, that should be the task of ProxyFix.

@untitaker

This comment has been minimized.

Copy link
Member

commented Aug 21, 2014

But the proxy mechanism isn't going to replace the host variables. I am not aware of any reverse proxy doing that, so I don't see what problems you have with the current behavior.

@vaab

This comment has been minimized.

Copy link
Author

commented Aug 22, 2014

The problem I have: I can't ProxyReverse any werkzeug application that'll use a redirect call behind a subdirectory of my main frontend.

That's a fairly big issue with werkzeug.

Basically, if werkzeug wouldn't do anything, it would work. For some reason, it tries to act clever, but it doesn't have enough information to do the right thing. The proxy is ready to do this job if only werkzeug doesn't mess the URL.

I think that get_host should be split in two:

  • a version that give the host without X-Forwarded-Host taken into account. Which is what we want in most of the case.
  • In rare case, where the application wants to react based on the original domain used to reach her.
    The default version would be the first one. A simple boolean keyword argument could do the trick.

Of course, if you have no X-Forwarded-Host in your header (you are not proxying werkzeug), then both version gives the same result.

But the proxy mechanism isn't going to replace the host variables. I am not aware of any reverse proxy doing that, so I don't see what problems you have with the current behavior.

The proxy is able to replace the "Location:" header, and it doesn't forget about the original subdirectory of the original URL:

http://httpd.apache.org/docs/2.2/en/mod/mod_proxy.html#proxypassreverse
http://wiki.nginx.org/HttpProxyModule#proxy_redirect

But it's a simple string replace that is done. If the Location value is not beginning with original url of the subdirectory being redirected, then it won't touch the given value.

@mjw-pp

This comment has been minimized.

Copy link

commented Oct 22, 2014

For anyone else having trouble with this bug, you should be able to work around it with an extra ProxyPassReverse directive in the frontend apache config, something like this:

ProxyPass /abc/ http://internal-application.intranet/
ProxyPassReverse /abc/ http://internal-application.intranet/
# Workaround for https://github.com/mitsuhiko/werkzeug/issues/540
ProxyPassReverse /abc/ http://www.frontend.com/

But there's an added complication that sometimes werkzeug's unwanted host-rewriting doesn't seem to happen (eg I don't see it with Flask's redirects to add a missing trailing slash).

@spyoungtech

This comment has been minimized.

Copy link

commented Sep 12, 2016

Are you certain this is a werkzeug problem and not a problem with whatever server you're using? I'm thinking this may be a server issue that is not exactly the result of anything on the werkzeug end.

I am experiencing a similar issue with my application sitting behind a reverse proxy. I believe our issues are one in the same in kind. The frontend server is running Microsoft IIS. For whatever reason, I believe it is IIS that is rewriting the location in the response header.

For example, I have in my Flask application the following:

@app.route("/test")
def redirect_test():
    return redirect("http://google.com")

This application sits behind a reverse proxy such that http://frontend.com/myapp/test reverse proxies to http://my-internal-server:8080/test and hits the above route.

If I access the application through the frontend server, I can't manage to redirect outside of frontend.com no matter what I do. When I try the above test case I get the following response headers:

Location:http://frontend.com
Server:Microsoft-IIS/7.5

If access the route directly at http://my-internal-server:8080/test I'm correctly redirected to Google.com and the headers read as follows:

Location:http://google.com
Server:Werkzeug/0.11.10 Python/3.5.2

Regardless of what is interpreted as the host, a url_join to the location http://google.com should always be http://google.com -- and werkzeug is handling that correctly, I believe. However, it seems IIS is mucking things up as it rewrites the response for the client.

The conclusion I've drawn is that the problem domain lies in the server handling the reverse proxy. In my specific case, it's the default rewrite behavior of the ARR/urlrewrite modules in IIS. Thusly, I don't believe there's any changes that can be made in the werkzeug code that will address this problem.

@davidism

This comment has been minimized.

Copy link
Member

commented Sep 12, 2016

This is more than likely a config issue (either with the HTTP or WSGI headers), not an issue with Werkzeug. Configure the proper proxy headers and apply the ProxyFix middleware.

@davidism davidism closed this Sep 12, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.