Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regex rewriting of upstream/downstream headers in proxy #2144

Merged
merged 4 commits into from
Mar 6, 2019

Conversation

comp500
Copy link

@comp500 comp500 commented Apr 28, 2018

1. What does this change do, exactly?

Adds an extra parameter to header_upstream and header_downstream that allows regex-based replacement of headers. They are in the format header_upstream [header] [regex] [replacement]

This allows unwanted values from the server and client (e.g. redirects, cookies) to be modified by Caddy. This therefore allows search (and mobile) to be fixed on wikipedia.matt.life, and wikimirror.

2. Please link to the relevant issues.

#442 - An existing (but not merged) implementation of Location rewriting
#606 - Nginx proxy_redirect feature request
This change may be obsoleted by #1639, the proxy middleware rewrite.

3. Which documentation changes (if any) need to be made because of this PR?

Add the extra parameter to header_upstream and header_downstream, and describe it. Give examples of it being used to change the Location header, possibly referencing proxy_redirect.

4. Checklist

  • I have written tests and verified that they fail without my change
  • I have squashed any insignificant commits
  • This change has comments for package types, values, functions, and non-obvious lines of code
  • I am willing to help maintain this change if there are issues with it later

@CLAassistant
Copy link

CLAassistant commented Apr 28, 2018

CLA assistant check
All committers have signed the CLA.

@tobya
Copy link
Collaborator

tobya commented Apr 30, 2018

Hi @comp500
Thank you for the PR. Can you give a specific example of how you would write the header_upstream directive with your new functionality and what it would be used for?
Thanks.

@comp500
Copy link
Author

comp500 commented Apr 30, 2018

It could be used for modifying redirects for a mirrored site. For example, Wikipedia uses absolute redirects to en.wikipedia.org for the homepage and search.

The directive could be used like this:

wikipedia.yoursite.com

proxy / https://en.wikipedia.org {
    header_downstream Location en.wikipedia.org wikipedia.yoursite.com
}

This would ensure that all redirects are for the correct host, rather than just the main page as in Matt's tweet.
It works for any header sent to or from the proxied server, in the format header_upstream [header] [regex] [replacement] or header_downstream [header] [regex] [replacement].

@mholt
Copy link
Member

mholt commented Jun 17, 2018

This is cool. Hopefully I or someone else gets a chance to review this soon!

@lukenowak
Copy link

Are there any news regarding this pull request? Can I help somehow (eg. giving it a try in my environments and sharing the results)?

{"proxy / localhost:8080 {\n transparent \nheader_upstream X-Test Tester \nheader_upstream X-Test Test Host \n}"},

// Test #3: transparent preset with multiple params
{"proxy / localhost:8080 {\n transparent \nheader_upstream X-Test Tester \nheader_upstream X-Test Test Host \nheader_upstream X-Test er ing \n}"},
Copy link

@dyrkin dyrkin Aug 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@comp500
As I understand it, the first arg is a name of a header, the second is the regular expression of what we want to replace, and the last argument is the value for the replacement.
Will it work if the last argument contains white spaces?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct. It will work if any argument contains white spaces, because you can surround each argument in quotes, so the white spaces within are part of the argument.
e.g. header_downstream Location https://en.wikipedia.org "http://local host"

@tobya
Copy link
Collaborator

tobya commented Aug 16, 2018

@lukenowak it would be great if you could test in your environment.,

@dyrkin
Copy link

dyrkin commented Aug 16, 2018

UPD

I'm trying to run it locally, but I get an error:
2018/08/16 13:28:52 /etc/caddy/Caddyfile:39 - Error during parsing: unknown property '443'

This is what I did:

dyrkin@nas:~/go/src/github.com/comp500/caddy/caddy$ git branch
  master
* rewriteheader

dyrkin@nas:~/go/src/github.com/comp500/caddy/caddy$ go run build.go

dyrkin@nas:~/go/src/github.com/comp500/caddy/caddy$ ./caddy -log stdout -agree=true -conf=/etc/caddy/Caddyfile -root=/var/tmp
    2018/08/16 13:28:52 /etc/caddy/Caddyfile:39 - Error during parsing: unknown property '443'

My Caddyfile

webmin.example.com {
    import wildcard_cert

    proxy / http://localhost:10000/ {
        transparent
        header_downstream Location "10000" "443"
    }
}

Did I miss something?

Update
I figured out the root cause of the issue. To fix it I removed original github.com/mholt and renamed github.com/comp500 folder to github.com/mholt, after that I ran the build again.
Sorry, I'm new in go.

I verified this functionality and it works excellent.
Directive header_downstream Location http://webmin.example.com:10000/ https://webmin.example.com/ works for me.

@comp500
Copy link
Author

comp500 commented Aug 16, 2018

Your caddyfile seems to work fine for me. Try checking out the branch in the ~/go/src/github.com/mholt/caddy folder instead, as the build script might be using the wrong package.

@lukenowak
Copy link

@tobya I was able to remove specific cookies from the browser sent to the upstream by using:

http://luke.example.com:11080 {
  bind 127.0.10.1
  proxy / http://127.0.10.1:8888 {
    header_upstream Cookie "(.*)(^Chocolate=[^;]*; |; Chocolate=[^;]*|^Chocolate=[^;]*$)(.*)" "$1 $3"
    header_upstream Cookie "(.*)(^Cacao=[^;]*; |; Cacao=[^;]*|^Cacao=[^;]*$)(.*)" "$1 $3"
  }
}

It is super cool, that one header can be rewritten many times.

So for me this feature is super useful and flexible.

NexediGitlab pushed a commit to SlapOS/slapos that referenced this pull request Sep 3, 2018
Not released yet functionality for regular expression cookie rewriting
is available: caddyserver/caddy#2144
Copy link
Member

@mholt mholt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is a useful change and definitely useful. I've only made a brief pass, but it's looking mostly good so far.

Add the extra parameter to header_upstream and header_downstream, and describe it. Give examples of it being used to change the Location header, possibly referencing proxy_redirect.

Can you write up the specific docs for this? This is a complex feature and it needs to be documented correctly. For example, are placeholders (captures) supported? To what extent can people use Go's regular expressions here?

caddyhttp/proxy/upstream.go Outdated Show resolved Hide resolved
caddyhttp/proxy/upstream.go Outdated Show resolved Hide resolved
@comp500
Copy link
Author

comp500 commented Sep 8, 2018

I've written some docs in this gist, to be added to https://caddyserver.com/docs/proxy. If anything is unclear in it, please comment on the gist.

NexediGitlab pushed a commit to SlapOS/slapos that referenced this pull request Sep 11, 2018
Not released yet functionality for regular expression cookie rewriting
is available: caddyserver/caddy#2144
@LaurensBosscher
Copy link

@mholt , this PR would be very useful for us and seems almost complete.

Is there any way I can contribute to getting this merged?

Thanks!

@francislavoie
Copy link
Member

@LaurensBosscher You could help by testing the changes. Check out the latest master and merge this PR locally, compiling from source. You can also review the code and point out any potentially questionable behaviour if any.

NexediGitlab pushed a commit to SlapOS/slapos that referenced this pull request Dec 5, 2018
Not released yet functionality for regular expression cookie rewriting
is available: caddyserver/caddy#2144

Caddy v0.11.1 fixes QUIC issues.

Zenexer fixes for regerssion in v0.11.1

Note: Since Caddy release v0.11.0 certificates has to match sites
      served by Caddy. This will result in lack of response for sites served
      on HTTPS for which certificate names (subject COMMON_NAME,
      subjectAltName DNS or IP) does not match site name.
NexediGitlab pushed a commit to SlapOS/slapos that referenced this pull request Dec 6, 2018
Caddy v0.11.1 fixes QUIC issues.

Not released yet functionality for regular expression cookie rewriting
is available: caddyserver/caddy#2144

Not released yet functionality for ca_certifices in proxy:
caddyserver/caddy#2380

Zenexer fixes for regerssion in v0.11.1

Note: Since Caddy release v0.11.0 certificates has to match sites
      served by Caddy. This will result in lack of response for sites served
      on HTTPS for which certificate names (subject COMMON_NAME,
      subjectAltName DNS or IP) does not match site name.
@Ramshackles
Copy link

Very much interested in this PR 👍

@tobya
Copy link
Collaborator

tobya commented Mar 4, 2019

@Ramshackles Any chance you would be able to compile locally and test this change?

@lukenowak
Copy link

@comp500 are you willing to continue working on this? Rebase on top of master (trivial change)?

@tobya I tried to run the test suite after rebasing this work on master, but my go environment seems "damaged" (caddytls/config.go:410:31: undefined: tls.VersionTLS13 and FAIL github.com/mholt/caddy/caddytls [build failed] during go test ./...), for now I have no time to investigate it further.

@mholt
Copy link
Member

mholt commented Mar 6, 2019

@lukenowak Based on that I bet you just need to update to Go 1.12 to fix the error.

@Ramshackles
Copy link

@Ramshackles Any chance you would be able to compile locally and test this change?

Not sure how I would go about doing it, but guess I could try if needed.

@comp500
Copy link
Author

comp500 commented Mar 6, 2019

@lukenowak I am certainly willing to continue working on this. I've rebased my PR on top of master, although the AppVeyor build seems to still be using Go 1.11 for some reason.

@francislavoie
Copy link
Member

Yeah, just ignore the AppVeyor CI failures for now. AppVeyor still doesn't support Go 1.12 appveyor/ci#2875

Copy link
Member

@mholt mholt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work. :) Let's give this a shot.

@ghost
Copy link

ghost commented Apr 2, 2019

Any chance to ship this one into a release soon? I found out that many back-ends behave badly and don't honor the X-Scheme or X-Forwarded-Proto headers and send 302 redirects to an http scheme even when the original request is https, and this is often blocked by the browser...
The only workaround is downstream Location header rewriting, so I'm really hoping to get this PR released...

@mholt
Copy link
Member

mholt commented Apr 2, 2019

It'll go out in our next release, which should be 1.0beta1.

NexediGitlab pushed a commit to SlapOS/slapos that referenced this pull request Apr 26, 2019
This reverts commit 6d2019b, as new caddy
has issues with tls certificate configuration:
caddyserver/caddy#2588

About nxd-v0.11.5-4-g9d3151db:

 * not released yet functionality for regular expression cookie rewriting
   is available: caddyserver/caddy#2144

 * not released yet functionality for ca_certifices in proxy:
   caddyserver/caddy#2380

 * support for builtin log rotation disabling
@CristianCantoro
Copy link

@comp500 said:

This allows unwanted values from the server and client (e.g. redirects, cookies) to be modified by Caddy. This therefore allows search (and mobile) to be fixed on wikipedia.matt.life, and wikimirror.

Oh my goodness, more than one year after I discover this that in fact fixes a problem that I reported just recently on Wikimedia Phabricator (T232213) and here (#1011).

This is so awesome! It's hearthwarming knowing that there is somebody out there fixing my bugs even without me knowing it!

I have already pushed it on vikiansiklopedi.org and wikiproxy (formerly wikimirror, because - well - it's a proxy ont a mirror)

@mholt
Copy link
Member

mholt commented Sep 14, 2019

FWIW, I just got this implemented into Caddy 2: https://github.com/caddyserver/caddy/wiki/v2:-Documentation#httphandlersreverse_proxy

It reuses the existing http.handlers.headers module, which also has the same ability to mutate headers by adding, deleting, setting, or replacing substrings in header values. It's pretty powerful. Try it out and let me know what you think!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants