Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Config-based request and response header transformation #21

Closed
davidni opened this issue Mar 19, 2020 · 25 comments · Fixed by #225
Closed

Config-based request and response header transformation #21

davidni opened this issue Mar 19, 2020 · 25 comments · Fixed by #225
Assignees
Labels
Priority:0 Used for divisional .NET planning Type: Enhancement New feature or request
Projects

Comments

@davidni
Copy link
Contributor

davidni commented Mar 19, 2020

In Island Gateway we implemented the notion of Transformations, which takes inspiration from Traefik's middlewares (though we're not calling them middlewares for obvious reasons). See e.g. Traefik's StripPrefix.

A user can specify e.g. Transformations=StripPrefix('/api'), AddPrefix('/v2'), which results in the listed processing steps being applied sequentially at runtime.

The ability to specify simple transformations like these through config is a requirement for us.

Example of a route using transformations:

Rule            = Host('example.com') && Path('/myservice/{**catchall}')
Transformations = StripPrefix('/myservice')

(this is a feature we already implemented in Island Gateway, filing here as requested to track)

@analogrelay analogrelay added this to Backlog in YARP Planning Apr 8, 2020
@analogrelay analogrelay moved this from Unscheduled to 1.0.0-preview1 in YARP Planning Apr 8, 2020
@analogrelay
Copy link
Contributor

I'd like us to start expanding on this feature a bit (ping @Tratcher @halter73).

I'm wary of setting up a "language", but we could have a list of transformations:

{
	"ReverseProxy": {
		"Routes": [
			{
				"RouteId": "blah",
				"Match": {
					"Path": "/api/v1/myservice/{**catchAll}"
				},
				"Transform": [
					{ "Transformation": "StripPrefix", "Prefix": "/api/v1/myservice" },
					{ "Transformation": "AddPrefix", "Prefix": "/apis" }
					{ "Transformation": "AddHeader", "Headers": { "X-MyHeader": "myVal" } }
					{ "Transformation": "StripHeader", "Headers": [ "X-SomeUnhelpfulIncomingHeader" ] }
				]
			}
		]
	}
}

That would transform an incoming request like this:

GET /api/v1/myservice/foo/bar/baz HTTP/1.1
X-SomeUnhelpfulIncomingHeader: Foo

Into an outgoing request:

GET /apis/foo/bar/baz HTTP/1.1
X-MyHeader: myVal

(with other headers elided)

Basically the idea is that the "Proxy" component copies everything from the incoming request to the outgoing response (all the headers, path, etc.) and then run the transformations over the outgoing response.

There are some problems here:

  1. The config system is not really great at handling lists, because it flattens things out to simple key-value pairs. That's a problem for other places too though (Routes)
  2. It's more verbose. To some degree I think this is a limitation of JSON (YAML config provider anyone?). Maybe the language syntax is better here?

@analogrelay analogrelay added this to the 1.0.0-preview1 milestone Apr 8, 2020
@analogrelay analogrelay added the Type: Enhancement New feature or request label Apr 9, 2020
@Tratcher
Copy link
Member

The verbosity could be reduced:

"Transform": [
  { "StripPrefix": "/api/v1/myservice" },
  { "AddPrefix": "/apis" }
  { "AddHeader": { "X-MyHeader": "myVal" } }
  { "StripHeaders": [ "X-SomeUnhelpfulIncomingHeader" ] }
]

@Tratcher
Copy link
Member

This is a space with a lot of prior art, we should do some comparisons.

@samsp-msft
Copy link
Contributor

@davidni I suspect I know the reason, but why is this driven by config rather than code? How often are routes created in your world that need this kind of capability?
The reason I am asking is to try and form a philosophy for what we need to support in config vs writing app code. I think one of the distinguishing features of YARP is that you can create your own project using its libraries as a starting point, and can then add custom logic in code. That's not typically possible in other proxies without re-building the entire proxy from scratch. With YARP you aren't rebuilding the proxy library, just your instantiation of the existing modules, plus your custom glue.

@Tratcher
Copy link
Member

Tratcher commented Apr 13, 2020

How often are routes created in your world that need this kind of capability?

Altering the path is very common, adding or stripping a prefix, and this would be specific per route. Adding and removing headers is also common, usually headers directly related to the proxy activity like x-forwarded-. I'd expect us to provide built in support for things like x-forwarded- so manually adding them per route is less of a concern.

Again, this is a case where simple common operations should be possible via config, and advanced ones via code.

@analogrelay analogrelay changed the title Feature: URL transformation Request and Response transformation Apr 13, 2020
@analogrelay analogrelay changed the title Request and Response transformation Config-based request and response transformation Apr 13, 2020
@analogrelay analogrelay added this to To Do in Active Work Apr 14, 2020
@Kahbazi
Copy link
Collaborator

Kahbazi commented Apr 23, 2020

I will work on this issue if it's ok.

@Tratcher
Copy link
Member

@Kahbazi it needs some design first. You're welcome to make some proposals on this issue.

@Kahbazi
Copy link
Collaborator

Kahbazi commented Apr 23, 2020

I have implemented most of these. I could create a draft PR to discuss there if needed.

These classes are for configs

public abstract class Transform {}
public class AddHeaderTransform : Transform {string Key; string Value;}
public class AddPrefixTransform : Transform {string Prefix;}
public class StripPrefixTransform : Transform {string Prefix;}
public class StripHeaderTransform : Transform {string Key;}

Add a list of Transform to ProxyRoute

public sealed class ProxyRoute
{
    public IReadOnlyList<Transform> Transform { get; set; }
}

This could be in config and and since it can't be directly bind to Transform instances ,we could use services.Configure<ProxyConfigOptions>(action) to create IReadOnlyList<Transform> from IConfiguration

"Transform": [
  { "StripPrefix": "/api/v1/myservice" },
  { "AddPrefix": "/apis" }
  { "AddHeader": { "X-MyHeader": "myVal" } }
  { "StripHeaders": [ "X-SomeUnhelpfulIncomingHeader" ] }
]

Introduce a new UpStreamRequest class which holds the needed property from current request

public class UpStreamRequest
{
    public string Method { get; set; }
    public Stream Body { get; set; }
    public Uri Uri { get; set; }
    public string Protocol { get; set; }
    public IHeaderDictionary Headers { get; set; }
}

These classes are for runtime models

public abstract class Transformer
{
    public abstract ValueTask Transform(UpStreamRequest upStreamRequest);
}
public class StripPrefixTransformer : Transformer {}
public class StripHeaderTransformer : Transformer {}
public class AddPrefixTransformer : Transformer {}
public class AddHeaderTransformer : Transformer {}

Add a list of Transformer to RouteConfig

internal sealed class RouteConfig
{
       public IReadOnlyList<Transformer> Transformers { get; }
}

Use ITransformerFactory in RuntimeRouteBuilder to create Transformer from Transform in Build(...) method

public class TransformerFactory : ITransformerFactory
{
    public virtual Transformer Create(Transform transform)
    {
        return transform switch
        {
            StripHeaderTransform t => new StripHeaderTransformer(t.Key),
            AddHeaderTransform t => new AddHeaderTransformer(t.Key, t.Value),
            StripPrefixTransform t => new StripPrefixTransformer(t.Prefix),
            AddPrefixTransform t => new AddPrefixTransformer(t.Prefix),
            _ => throw new NotImplementedException()
        };
    }
}

Create a UpStreamRequest instace from HttpContext.Request and apply transformers to it in ProxyInvokerMiddleware and pass UpStreamRequest to HttpProxy.

@Tratcher
Copy link
Member

Introduce a new UpStreamRequest class which holds the needed property from current request

This I expect is the hardest part of the design: When to do the transformations and on what object model.

  • We want code and config based transformations to use a similar system for consistency.
  • It would be nice to write code based transformations with normal middleware, but directly modifying the HttpRequest has repercussions for other features in the pipeline (retry, mirroring, etc.).
  • A custom type like UpStreamRequest limits what can be transformed.
  • Modifying the HttpRequestMessage before it's sent gives you a good object model to work with and avoids the side effects of modifying the HttpContext, except modifying the url there is harder.

Create a UpStreamRequest instace from HttpContext.Request and apply transformers to it in ProxyInvokerMiddleware and pass UpStreamRequest to HttpProxy.

This results in copying the information multiple times.

How about this? Do the transformations in HttpProxy.ProxyAsync in the process of building the HttpRequestMessage.

  • First you do url transformations while you still have the constituent parts (PathBase, HostString, PathString, QueryString, etc.). This makes the individual parts easier to work with as well as helps ensure the correct encodings.
  • Do header transformations as you copy the headers to the HttpRequestMessage/HttpContent. That lets HttpContext avoid modification and prevents extra operations on either collection.
  • We'll likely need a way for AspNetCore middleware to queue up transformations to run at this stage.
  • Response transformations will need something similar. Response headers are locked as soon as we start copying the body so it can be hard for middleware to modify them without intercepting the full response body.

@Kahbazi
Copy link
Collaborator

Kahbazi commented Apr 28, 2020

We want code and config based transformations to use a similar system for consistency.

Do you mean that you prefer we use configuration binding directly to get transform instances?

First you do url transformations while you still have the constituent parts (PathBase, HostString, PathString, QueryString, etc.). This makes the individual parts easier to work with as well as helps ensure the correct encodings.

So do we need a class which holds theses parts and pass it to IUriTransformer and then create the Uri?

First you do url transformations ...
Do header transformations as you copy the headers ...

Does it mean we need to have multiple abstraction for transformers like IUriTransformer and IHeaderTransformer to know when to execute each? Should we allow a custom transofmer? One that could change Version, Method or even add Properties?

@Tratcher
Copy link
Member

We want code and config based transformations to use a similar system for consistency.

Do you mean that you prefer we use configuration binding directly to get transform instances?

I wasn't trying to imply a specific design with this point, just outlining principals.

First you do url transformations while you still have the constituent parts (PathBase, HostString, PathString, QueryString, etc.). This makes the individual parts easier to work with as well as helps ensure the correct encodings.

So do we need a class which holds theses parts and pass it to IUriTransformer and then create the Uri?

Probably? If the transformation were a single step we could avoid the intermediate object and just pass in the HttpContext and proxy data, but having multiple granular transformations means we need intermediate state of some kind.

First you do url transformations ...
Do header transformations as you copy the headers ...

Does it mean we need to have multiple abstraction for transformers like IUriTransformer and IHeaderTransformer to know when to execute each? Should we allow a custom transofmer? One that could change Version, Method or even add Properties?

It seems like there's a hierarchy of transformation types (transform, uri transform, path transform, header transform, header add transform, version, method, etc.). And yes, every field of the HttpRequestMessage/HttpContent should be customizable in some way. That doesn't mean all of those need to be exposed as config based transformations, only the common ones (headers, url). Really advanced customizations could require plugging in a DelegatingHandler. Intermediate ones could be defined in code.

@Tratcher
Copy link
Member

Tratcher commented May 1, 2020

Clarification: When adding a header it's important to specify if this is an append or replace operation. You end up with two transformations:
AppendHeader - Adds to an existing comma separated list
SetHeader - Replaces any existing value.

@Kahbazi
Copy link
Collaborator

Kahbazi commented May 1, 2020

I can think of another header transformation that gets the value from an incoming header and send it with via another header. We could use this in out current system.

@halter73
Copy link
Member

halter73 commented May 1, 2020

When would you want to append?

@Tratcher
Copy link
Member

Tratcher commented May 1, 2020

#133 "x-forwarded-*" headers append.

@halter73
Copy link
Member

halter73 commented May 1, 2020

Those aren't headers you'd add statically in config though. You might switch it on and off.

@Tratcher
Copy link
Member

Tratcher commented May 1, 2020

That's an area of high customization, so even if we had x-forwarded-proto built in people would still probably add ones we didn't support natively.

@guibirow
Copy link

guibirow commented May 2, 2020

Would be nice to have transformations for:

full path rewrite ignoring the structure of original path, given we might have the tokens already parsed(no need to parse it again).
e.g:
/foo/{id}/bar --> xyz/{id}
In the example above we would just create a new path using parsed parameters.

Drop all headers:
if we want to rebuild request headers structure, or response headers returned. optionally ignore dropping header previously set by the proxy or specific headers.

Move parameters from path to headers:
Let's say we have the following route /apis/{version}/myservice and path /apis/v1/myservice and I want to replace apis/v1 by /myservice and set header api-version to version.
Something like:
DropPrefix("/apis/{version}")
SetHeader("api-version", routeData["version"])

Alternatively we should be able to add middleware executing before a request is submitted to backend server, so we could write even more complex transformations.

One final question I have is how the ordering of transformations will affect the output? Is the output of one transformation passed to next one or they will just apply transformations to a request object.
I ask that because we could for example remove a custom header if another already exists or if a condition is meet, let's say switching authorization methods. It is a rough example, but this would not be possible if they just modify an object. Optionally we could add conditions before applying these transformations.

@karelz karelz modified the milestones: 1.0.0-preview1, 1.0.0 May 7, 2020
@karelz karelz moved this from 1.0.0-preview1 to 1.0.0-preview2 in YARP Planning May 7, 2020
@Tratcher Tratcher self-assigned this May 11, 2020
@Tratcher
Copy link
Member

Nginx comparisons

  • NGinx has a path regex based rewrite option: https://docs.nginx.com/nginx/admin-guide/web-server/web-server/#rewrite. This isn't specific to proxy request transformation, it's generic middleware. They don't have many proxy specific url transformation mechanics except prefix tweaking.
  • Most config entries support variables
  • proxy_pass url/backend - if the backend defines a path base then it replaces the route matched path. Includes support for variaibles, which replace the original path.
  • proxy_http_version - defaults to 1.0
  • proxy_method
  • proxy_pass_request_headers - enables/disables forwarding all request headers
  • proxy_set_header - set request header. Remove by setting empty. append/prepend using variables. sets Host to the destination host by default.
  • proxy_hide_header - strip response header value. Has several defined by default. proxy_pass_header allows defaults to be sent
  • add_header/trailer - add response header (not proxy specific) - Only for specific status codes unless marked as 'always'
  • proxy_cookie_domain/path - response Set-Cookie domain/path transformations
  • proxy_redirect - response Location header prefix substitution

@Tratcher
Copy link
Member

Tratcher commented May 12, 2020

I've done some sketching. Note this is focused on in-code transformations, but I'm putting it in this thread as this is where all the interesting discussion is happening and the config will need to build on top of this.

Request

I see two distinct types of transformations on the request: headers and everything else. I'm excluding Body transformations, those require a whole different level of infrastructure (e.g. custom middleware).

Request Parameters Transform - Applied before building the HttpRequestMessage

// The mutable state for applying a series of transformations
RequestParamtersTransformContext
- HttpContext
- Method
- Version // Or should that be controled by the destination?
- PathBase 
- Path 
- Query
// Excludes scheme and host/authority, those are set by the backend destination

RequestParametersTransform.Run(RequestParamtersTransformContext context) - abstract

PathStringTransformation : RequestParametersTransform
- Value PathString
- Prepend/Append/Set/RemoveStart/RemoveEnd Enum
- TransformPathBase/Path bool

PathTemplateTransformation : RequestParametersTransform
- Value string route template (supports route values?)
- Replaces whole path

// Might need to explore query scenarios more, they haven't come up in the discussion yet.
QueryStringTransformation : RequestParametersTransform
- Value string
- Prepend/Append/Set Enum

Headers

  • We don't want to change HttpRequest.Headers in case the request gets retried, nor do we want to deal with the split issue of HttpRequestMessage/Content.Headers.
  • There are asks (and NGinx supports) an option to suppress copying the request headers by default. That seems like a route level setting rather than a transformation. We'd still run transformations afterwards so you could add headers.
  • Ideally we'd transform headers as we copied from one object model to the other to avoid extra operations on the source or destination collections.
    • foreach header in HttpRequest.Headers
      • Look up any transformations for this header, run them.
      • Set result on HttpRequestMessage/Content
      • Record header as transformed (hashset)
    • foreach transformation in transformations
      • If Key was not already transformed, run transformations
HeaderTransform
- Name - header name string
- Transform(HttpContext, existingValues, out newValues)

HeaderValueTransform : HeaderTransform
- Values StringValues
- Enum Append, AppendCSV, Set, etc...

Trailers - Same as headers, different collection

Response

Responses consist of status, headers, and body, and only headers are in scope for transformation.

HeadersTransform - Same as request, applied during the copy loop

  • Conditionally applied like Nginx?

Trailers - Same as headers, different collection

  • Conditionally applied like Nginx?

App structure

  • Transformations would be defined on routes (in code or config)
  • When the route is selected then a IProxyTransformationsFeature will be initialized with any available transformations (much like how the list of available destinations flows). (EDIT: Maybe we can skip this part for now and have HttpProxy pull the transformations directly from the route. Mutating collections like this per request seems really in-efficient.)
  • Transformations are grouped in the following collections:
    • Request Parameter transforms
    • Request header transforms
    • Request trailers (not currently supported by some servers or HttpClient) transforms
    • Response header transforms
    • Response trailers transforms
  • Middleware could add additional transformations
  • HttpProxy will run the transformations as it builds the HttpRequestMessage or copies the response headers/trailers.

@espray
Copy link

espray commented May 13, 2020

You could draw some inspiration from Azure Function proxy and Azure Api Management on how they handle transformations.

@nemec
Copy link

nemec commented May 23, 2020

Maybe this is already the intended design, but I'd like to request that any SetHeader transformations skip the filters defined by _headersToSkipGoingUpstream (applied here).

At the moment, it's defined so that the Host header of your reverse proxy is never passed through to the Backend service, which is an understandable default, however it's important to have some way of manually setting the Host value that the Backend sees. Web pages that define sub-resources (js/css/etc.) that are also served by the same Backend need URLs that link to the reverse proxy's Host.

Some applications, such as grocy, simply reflect the Host header for all in-app URLs generated for its HTML output so in my current nginx config I manually set the header to the external domain name.

I see that the proxy passes an X-Forwarded-Host header by default, but unfortunately some applications don't seem to support it.

@Tratcher
Copy link
Member

@nemec absolutely. We've had similar feedback elsewhere that people need to flow or be able to set the Host header.

@pksorensen
Copy link

The design behind ProxyKit was really nice for extensibility and building upstream HttpRequestMessage - just copy most of that and performance optimize it. No reason to invent the wheel again?

I wrote a reverse proxy configuration format ontop of proxykit that was able to read : https://gist.github.com/pksorensen/5a0be2b57839ad8a89fe9509a9317aa9

Very similar to your cluster/routes concept.

I was very inspired by Nginx and implemented there routing format. So since this project kinda killed proxy kit, i hope that i will be able to swap proxykit out with yarp.

@Tratcher
Copy link
Member

Comments on closed issues are not tracked, please open a new issue with the details for your scenario.

@samsp-msft samsp-msft changed the title Config-based request and response transformation Config-based request and response header transformation Oct 22, 2020
@samsp-msft samsp-msft added the Priority:0 Used for divisional .NET planning label Jan 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority:0 Used for divisional .NET planning Type: Enhancement New feature or request
Projects
No open projects
Active Work
  
In Review
Development

Successfully merging a pull request may close this issue.