Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify Replacement UX #3948

Closed
mhils opened this issue Apr 17, 2020 · 17 comments
Closed

Unify Replacement UX #3948

mhils opened this issue Apr 17, 2020 · 17 comments
Labels
area/core kind/feature New features / enhancements RFC

Comments

@mhils
Copy link
Member

mhils commented Apr 17, 2020

Problem Description

The replacement addon currently calls flow.replace(), which performs replacements in URL, HTTP Headers, and request/response body. This is messy and terrible because there is no way to control in which part of the flow replacements should be done - it does many things, and none of them well. It's often luck that replacements work. For example, in many cases, replacements like /~s/response-body "work" because URL/header replacements are not performed if they don't match syntactically:

try:
name, value = line.split(b": ", 1)
except ValueError:
# We get a ValueError if the replacement removed the ": "
# There's not much we can do about this, so we just keep the header as-is.
pass

I think there is some opportunity for improvement here. 😄

Proposed Changes

With the advent of a "map local" editor in #3940, I think it is time to clean up things here and implement the following changes:

  1. The replacements addon is for HTTP body replacements (and later on TCP/WebSocket messages). We can possibly rename it to make this more clear (ReplaceContent? ReplaceData?). This also means we can implement it in the addon and get rid of HTTPFlow.replace and its descendants.
  2. SetHeaders is for header modifications.
  3. Map remote/map local addons are for request line modification/redirection to localhost.

This also means we can unify the expression syntax for these four addons:

old syntax new syntax
Map Local [/filter]/url/file-or-directory
Map Remote [/filter]/url-regex/replacement
SetHeader /filter/name/value [/filter]/header-name/[@]header-value
Replacements /filter[/regex]/[@]replacement [/filter]/body-regex/[@]replacement

@mitmproxy/devs, @Prinzhorn, @naivekun, any thoughts? 😃

@mhils mhils added kind/feature New features / enhancements area/core RFC labels Apr 17, 2020
@Prinzhorn
Copy link
Member

Prinzhorn commented Apr 17, 2020

I'm all for making stuff more consistent and well-defined. I didn't know about the interesting replace semantics as I've only worked with addons so far to modify content etc.

I'm not saying what Burp does is perfect or even good, but it's worth taking a look at

Selection_611
Selection_612

Let's ignore params for a moment, these things could be added later and it's hard to draw a line where to stop. E.g. if you start parsing bodies why not also allow a syntax to modify JSON bodies etc. These things can be done in a script and the command line should focus on simple use-cases.

I like the semantics of adding/removing instead of modifying based on leaving stuff out. So I'd say we make SetHeader more powerful and call it ModifyHeaders. Because if you limit Replacements to the body you remove the possibility to modify headers. So [/[@]header-value] becomes optional as well.

Can you give examples for how Map Local and Map Remote would look like for the user? Map Remote is kind of what Burp calls "Request first line" combined with Host header? But like more intelligent?

Edit: For naming, how about ModifyHeaders and ModifyBody? I guess you can also use it to add a body to a body-less request so the name should reflect that.
Considering TCP/WebSockets I might prefer ModifyContent.

@naivekun
Copy link
Contributor

naivekun commented Apr 17, 2020

for Map Remote

  • My initial thought is like: if URL match foo.com/bar.js, rewrite it to bar.com/whatever.js. Thats it. We should make it simple to users and more complex replacement/rewrite should be a job of custom script.

for Map Local

  • [/filter]/url/file-or-directory seems not enough. if an URL have something like foo.com/bar.js?v=0.12345, parameter v is dynamic each time for anti-cache or something. It will fail. So, maybe the url-regex is necessary.

@naivekun
Copy link
Contributor

naivekun commented Apr 17, 2020

Another problem. If the filter part of [/filter]/url/file-or-dir specified, will it just follow the filter and ignore the url? Or filter it with both? We could make more progress on the syntax here.

Based on @Prinzhorn ,my suggestion is that:

  • ModifyHeaders for HTTP header modify
  • ModifyContent for HTTP body and WebSocket and Raw TCP(optional)
  • MapLocalContent for simply map an URL to localfile
  • MapRemoteContent for simply map an URL to another URL

@Prinzhorn
Copy link
Member

I like the semantics of adding/removing instead of modifying based on leaving stuff out.

If anyone wonders if this could be ambiguous, I propose that if a part is optional, it should be left empty by the user instead of being left out completely. This makes the intention more explicit for all parties and less magical.

So if you don't need a regex in the current /patt/regex/replacement case you'd do /patt//replacement and not /patt/replacement. The double separator makes it explicit that the portion is empty and you don't need to document multiple different syntaxes.

@Prinzhorn
Copy link
Member

Another weekend thought: shouldn't we use some sort of glob syntax instead of regex? Maybe I'm opening pandora's box, but I think globs would be useful in a lot of places in mitmproxy. I always die a little when I have to use a regex in Burp and have to escape all the dots in a hostname or file extensions 🤦

Things that globs are amazing at:

  1. Domain matching e.g. *.example.com or static.*
  2. Path matching e.g. /*.js or /blog/*

Both things that are pretty useful in the context of a proxy 😄

@mhils
Copy link
Member Author

mhils commented Apr 29, 2020

I always die a little when I have to use a regex in Burp and have to escape all the dots in a hostname or file extensions 🤦

Just don't escape them, it will match the right things anyways 😉🙊

More seriously: I see your point, but globs are very much limited when it comes to replacements and not just matching. Let's keep this in mind, but we likely need regexes somewhere anyways, and then it may pay off to be consistent.

@mhils
Copy link
Member Author

mhils commented Apr 29, 2020

Thank you two for the very valueable input. Maybe let's take a step back and let us first collect the "main stream" use cases which we think are important. In step two we can then think about what would be best for each feature individually, and finally think about how we can combine things.
Again, we don't need to handle every corner case here, doing less is good.

Headers

  • I want to set (or overwrite) a specific header for all requests with a fixed value. The new header content could come from a file (for example, hide the true user agent).
  • I want to delete a specific header for all responses (for example CSP headers).
  • I want to set a specific header for all requests matching a pattern (for example, to add a bearer auth header).

Is there a "main stream" use cases for dynamic replacements, i.e. s/foo/bar/ in headers? I don't have one.

Bodies

  • I want to replace every occurence of "foo" with "bar". The replacement could come from a file.
  • I want to replace every occurence of "foo" with "bar", but only for responses matching a pattern.

Local Redirection

  • I want to return the contents of ~/.foo.js when a client requests example.com/foo.js.
  • I want to return the contents of ~/.foo.js when a client requests example.com/foo.js?t=1588156347.
  • Bonus: I want to return the contents of ~/.foo.js when a client requests example.com/view.php?file=foo.js.
  • Bonus: When a client requests example.com/static/*, I want to return the corresponding file in ~/.static.
  • I want this not to be horribly insecure. local redirection should always be limited to a single directory or file.

Remote Redirection

  • I want to redirect all requests to example.com to example.org.
  • I want to redirect all requests from example.com/v1/* to example.com/v2/*

Any other use cases you think we should cater for?

@naivekun
Copy link
Contributor

naivekun commented May 3, 2020

I'm not sure it is a commonly use case: Randomly add a line to a header from a file.
Example:
Replace request header with random User-Agent or X-Forwarded-For from a file. Each line of that file contains a entry. Just like --random-agent in sqlmap.

@naivekun
Copy link
Contributor

naivekun commented May 3, 2020

My suggestion to the syntax is:
/<filter-flow>/<work-mode>/<data-source>

  • <filter-flow> is for filter expression to select the flow
  • <work-mode> is to specify what to do

hrv/User-Agent for replace a header value (Header Replace Value)
hav/Test for simply add a header even key exist (Header Add Value)
hav+/Text for add a header, If key already exists, overwrite it (Header Add Value Plus)
havr/X-Forwarded-For for header random add (Header Add Value Randomly)

hdk/regexp for delete header which key matches regexp (Header Delete which Key match)
hdv/regexp for delete header which value matches regexp (Header Delete which Value match)
hd/regexp for delete header key or value matches regexp (Header Delete which k|v match)
...

  • <data-source> could be a string or path to file. if the separator typed twice, it is for file
    /foo-value for a string foo-value
    //file for a file from ./file
    ///etc/passwd for a file from /etc/passwd

@naivekun
Copy link
Contributor

naivekun commented May 3, 2020

what syntax parser need is cut the separator and extract <filter-flow> and <data-source>
Then pass the <work-mode> part and the separator to "work mode switcher".

@naivekun
Copy link
Contributor

naivekun commented May 3, 2020

I got some heavy work recently. Will start coding in next week. XP

@mplattner
Copy link
Member

I am working on this issue and plan to tackle SetHeaders first. I think ModifyHeaders is a good name, as suggested by @naivekun and @Prinzhorn.
Does it make sense to have the following cli options?

  • --add-header /<header-name>/[@]<header-value>[/flow-filter] to add a new (but not replace an existing) header
  • --replace-headers /<header-name-regex>/[@]<header-value>[/flow-filter] to replace the value of an existing header with the given value
    • Alternative (no regex): --replace-header /<header-name>/[@]<header-value>[/flow-filter] to replace the value of an existing header with the given value
  • --set-header /<header-name>/[@]<header-value>[/flow-filter] to combine add and replace
  • --remove-headers /<header-name-regex>[/flow-filter] to remove headers

Note that only --replace-headers and --remove-headers use regex to search existing headers.
I'm not sure if --replace-headers should use regex or also just a plain header-name. Is there a use case to replace multiple existing headers with the same value? (I can't think of one.)
I'm also not sure if we need separate add, replace, and set options at all.
A single --set-headers alone might be enough (but then the user can't control whether existing headers are overwritten.)

Btw, did I miss something or does the current implementation indeed only support to set header replacements via a cli argument and not within the interactive GUI?

What do you think of having the flow-filter expression last, i.e., /<header-name>/[@]<header-value>[/flow-filter] instead of [/flow-filter]/<header-name>/[@]<header-value>?
The flow-filter is optional and should come after mandatory options.
This might also be more convenient if the common use cases do not need a flow-filter. (I haven't used this feature enough to judge.)

@mhils, regarding

Is there a "main stream" use cases for dynamic replacements, i.e. s/foo/bar/ in headers? I don't have one.

You mean to search and replace within all header values without filtering for header name first?
If so, I'm not sure. Maybe replacing a domain, e.g., production URL with testing URL, but I guess this is not main-stream and it should probably be done with one of the methods above, as replacing in all headers can likely result in replacements that the user did not intend/anticipate.

@mhils
Copy link
Member Author

mhils commented Jun 24, 2020

I am working on this issue and plan to tackle SetHeaders first. I think ModifyHeaders is a good name, as suggested by @naivekun and @Prinzhorn.

Looking a this again I feel like we could go for xxxheaders and xxxbody options, where xxx is one of replace/modify/set/... . This has the added benefit that it signals that "replacements" work on the body only.

Btw, did I miss something or does the current implementation indeed only support to set header replacements via a cli argument and not within the interactive GUI?

You can modify the setheaders option from within mitmproxy, and I believe also from within mitmweb. There is an argument to be had that a more specialized editor would be nice, but let's get the basics right first.

I'm also not sure if we need separate add, replace, and set options at all.

Right - you can pretty much emulate add/replace with a flow filter such as ~h Host: example or !~h Cookie. Similarly, setting an empty header can be interpreted as deletion. I personally feel taht just having a single option is really good enough, and consistent with body replacement. If you all think that having more options make more sense, we can go with that though.

What do you think of having the flow-filter expression last.

Yes, excellent suggestion. 👍

@mhils
Copy link
Member Author

mhils commented Jul 2, 2020

For the map addon we plan to have two separate addons, MapRemote and MapLocal, right?

Yes! I think these are very distinct use-cases.

Regarding MapRemote, I thought it'd be nice to intercept the request before it is sent to the server, such that the "original" resource isn't (wastefully) requested, before it is replaced with the response of the replaced URL. I looked at eventsequence.Events and there does not seem to be something like "before-request". Am I following the wrong route here?

request triggers once we have received the request, before it is sent upstream. We essentially want to do this:

def request(flow: http.HTTPFlow) -> None:
# pretty_host takes the "Host" header of the request into account,
# which is useful in transparent mode where we usually only have the IP
# otherwise.
if flow.request.pretty_host == "example.org":
flow.request.host = "mitmproxy.org"

The request is not transmitted to the original host in this example.

For MapLocal, it's a bit different:

if flow.request.pretty_url == "http://example.com/path":
flow.response = http.HTTPResponse.make(
200, # (optional) status code
b"Hello World", # (optional) content
{"Content-Type": "text/html"} # (optional) headers
)

By setting .response we can avoid the entire upstream side.

Regarding this, another question is whether we want to replace only the body only or the entire response (including headers)?

MapRemote: return whatever upstream sent us, MapLocal: We probably need to make up some mime-type headers? Let's tackle MapRemote first! :)

@mplattner
Copy link
Member

That makes a lot of sense, thanks. After posting my comment I found that the request event is basically the before-request I was looking for. :)

At the moment I apply the replacement on request.pretty_url and assign the result to request.url. I think that gives way more flexibility, e.g., something like mitmproxy --map-remote "^.*.jpg.*^https://example.org/image.jpg" to replace all images of a page. Your use-case is possible as well using the rule /example.org/mitmproxy.org/.

@Kriechi
Copy link
Member

Kriechi commented Dec 13, 2020

@mplattner would you consider this complete? if no, please open a new issue with remaining tasks.

@Kriechi Kriechi closed this as completed Dec 13, 2020
@mhils
Copy link
Member Author

mhils commented Dec 13, 2020

Yes, complete. 😃 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/core kind/feature New features / enhancements RFC
Projects
None yet
Development

No branches or pull requests

5 participants