Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Cowboy CORS friendly #947

Open
manifest opened this issue Feb 11, 2016 · 24 comments
Open

Make Cowboy CORS friendly #947

manifest opened this issue Feb 11, 2016 · 24 comments

Comments

@manifest
Copy link

Maybe we could make Cowboy a bit more CORS friendly?
By now, to enable CORS we should work with raw headers, copy-pasting code like this from one header to another:

options(Req, State) ->
  Req1 = cowboy_req:set_resp_header(<<"access-control-max-age">>, <<"1728000">>, Req0),
  Req2 = cowboy_req:set_resp_header(<<"access-control-allow-methods">>, <<"HEAD, GET, POST">>, Req1),
  Req3 = cowboy_req:set_resp_header(<<"access-control-allow-headers">>, <<"content-type, authorization">>, Req2),
  Req4 = cowboy_req:set_resp_header(<<"access-control-allow-origin">>, <<$*>>, Req3),
  {ok, Req, State}.

In this way we do not actually care about allowed header's value types. For instance, "access-control-allow-methods" and "access-control-allow-headers" should only accept a list of binaries, while only a numeric value makes sense for the "access-control-max-age" header, "*" origin cannot be used for a resource that supports credentials. Ignoring this information leads to misspelling errors, breaking CORS responses and isn't flexible for working with.

My proposal is to add two new functions (set_cors_headers/2 and set_cors_preflight_headers/2) to the cowboy_req module extending functionality of set_resp_header/3.

-type cors_header()
  :: {origin, binary() | list(binary())}.

-type cors_preflight_header()
  :: {age, max | non_neg_integer()}
   | {methods, list(binary())}
   | {headers, list(binary())}
   | cors_header().

-spec set_cors_headers(list(cors_header()), Req) -> Req when Req :: cowboy_req:req().
-spec set_cors_preflight_headers(list(cors_preflight_header()), Req) -> Req when Req :: cowboy_req:req().

In my projects, I'm using this simple syntax sugar in REST handlers:

allowed_methods(Req, State) ->
  Req2 = cowboy_req:set_cors_headers([{origin, <<$*>>}], Req),
  {[<<"HEAD">>, <<"GET">>, <<"POST">>, <<"OPTIONS">>], Req2, State}.

options(Req, State) ->
  Headers =
    [ {age, max},
      {origin, <<$*>>},
      {methods, [<<"HEAD">>, <<"GET">>, <<"POST">>]},
      {headers, [<<"content-type">>, <<"authorization">>]} ],
  {ok, cowboy_req:set_cors_preflight_headers(Headers, Req), State}.

It's also possible to use those functions in a middleware:

execute(Req, Env) ->
  case cowboy_req:method(Req) of
    <<"OPTIONS">> ->
      Headers =
        [ {age, max},
          {origin, <<$*>>},
          {methods, [<<"HEAD">>, <<"GET">>, <<"POST">>]},
          {headers, [<<"content-type">>, <<"authorization">>]} ],
      {ok, cowboy_req:set_cors_preflight_headers(Headers, Req), Env};
    _ ->
      {ok, cowboy_req:set_cors_headers([{origin, <<$*>>}], Req), Env}
  end.

I believe those functions could make Cowboy yet more friendly and useful tool and would love to make a pull request, if it's ok with Cowboy's philosophy.

@manifest
Copy link
Author

I've just made a pull request for this feature. I've also changed type descriptions a bit:

-type cors_allowed_origins() :: [binary()] | binary().
-type cors_allowed_methods() :: [binary()].
-type cors_allowed_headers() :: [binary()].
-type cors_max_age() :: non_neg_integer() | max.
-type cors_header() :: {origins, cors_allowed_origins()}
  | {exposed_headers, cors_allowed_headers()}
  | {credentials, boolean()}.
-type cors_preflight_header() :: {origins, cors_allowed_origins()}
  | {methods, cors_allowed_methods()}
  | {headers, cors_allowed_headers()}
  | {credentials, boolean()}
  | {max_age, max | non_neg_integer()}.

@manifest
Copy link
Author

manifest commented Mar 1, 2016

From the server-side perspective, we need to parse just "Origin", "Access-Control-Request-Method" and "Access-Control-Expose-Headers" headers. Other CORS headers are produced by the server. So, it looks like we should create functions:

access_control_allow_credentials/1
access_control_allow_headers/1
access_control_allow_methods/1
access_control_allow_origin/1
access_control_expose_headers/1
access_control_max_age/1

in the module cow_http_hd along with (or instead of) these ones:

% @todo -export([parse_access_control_allow_credentials/1]). CORS
% @todo -export([parse_access_control_allow_headers/1]). CORS
% @todo -export([parse_access_control_allow_methods/1]). CORS
% @todo -export([parse_access_control_allow_origin/1]). CORS
% @todo -export([parse_access_control_expose_headers/1]). CORS
% @todo -export([parse_access_control_max_age/1]). CORS

In this way, we would have all the parsing/producing headers code within the cow_http_hd module. Is it a good idea?

@essen
Copy link
Member

essen commented Mar 1, 2016

Cowlib is both for clients and servers. I understand your concerns are only for the server, but if you'd like to also contribute client parsing code, be my guest. :-) Though AFAIK this wouldn't usually be useful to plain HTTP clients so no worries there.

Otherwise, the names sound correct, if they take Erlang representation and return an iodata() for the header value.

@essen
Copy link
Member

essen commented Mar 31, 2016

My notes on CORS so far:

  • access-control-allow-origin: either "*" or match the configured list against the "origin" header and send back the "origin" header if it matches, disabling CORS otherwise. REST should have a callback for providing the list of allowed origins; default to [] is fine (disallow CORS entirely)
  • access-control-allow-credentials: for REST should be "true" only if is_authorized/2 is defined
  • access-control-expose-headers: REST should have a callback, it's fairly resource specific; default to []
  • access-control-max-age: REST should have a callback; default to 30 minutes is fine
  • access-control-allow-methods: same as allowed_methods/2, but might make sense to have a callback (wait for a use case)
  • access-control-allow-headers: REST should have a callback; default to same as access-control-request-header

I will add more notes as I go through your PRs this week-end.

@essen
Copy link
Member

essen commented Mar 31, 2016

I added a note in the Cowboy PR. I'm not convinced by the interface though. The functions are a little too magical to my taste.

The way I see it, only the Origin matching is tricky, so we definitely need a function for that, otherwise cowboy_req:header/3 and cowboy_req:set_resp_header/3 are more than enough. I don't see why the match methods/headers are doing for example, this concern is strictly client-side, as far as the server is concerned it should just send what it allows and be done with it.

And the rest of the code is just that, setting headers. So considering cowlib will be fully documented in 2.0, I don't think we need much more than Origin matching and REST support.

@manifest
Copy link
Author

match_cors_method/2 and match_cors_headers/2 functions reduce the requested values to those which are allowed by the server. Don't know actually is it would be better then just returning a whole list of allowed values which is can be unbounded according to the spec. Can't remember why I decided throwing nomatch_method and nomatch_header instead of just dropping values, don't think we need it there.

As for the whole thing. Yes, those functions are a little magical to my taste too, and I like the idea of the deeper integration of CORS to the REST handler, but we also have the middleware layer, plain and loop handlers. And there we can reuse some build-in functions or copy-pasting some template code each time. I prefer the first. In any case we need something build-in. Having just set_cors_origin_header/2 don't look less magical for me than set_cors_preflight_headers/2 and set_cors_headers/2.

@essen
Copy link
Member

essen commented Mar 31, 2016

I don't mean set_cors_origin_header, I mean none of it should be in cowboy_req. All we need is a way to go from Origin header + list of allowed origins to CORS/not CORS and with what origin. The rest can be done with set_resp_header. I think good documentation > magic functions. CORS enables many different cases, we can't think of them all, so the interface should keep it simple.

We do have more than REST, but REST is the only place where we actually care about semantics, the other handlers/middlewares are more of a "write your own" so set_resp_header is fine.

The spec says the list is without bounds but in practice people are going to whitelist a handful of headers at most (github has about 10 there for example) so it makes little sense to spend processing power on that, especially considering the browser can then cache for all headers in one go.

@manifest
Copy link
Author

We also need to validate that case with allowed credentials and Origin. So, the way is a little bit further: Origin header + list of allowed origins, validation of Credentials and Origin headers and the rest.

@essen
Copy link
Member

essen commented Mar 31, 2016

We don't really need to validate that case. We only need to if we return "" from that function. I think we shouldn't. Either the service uses "" and doesn't need to call that function, or it doesn't use "*" and will call the function and get a specific Origin (or disable CORS).

@manifest
Copy link
Author

manifest commented Apr 1, 2016

What shall we return to a user when we use "*" on the server and have got a value of origin header equals to "null" or it includes unknown to the server URI scheme?

@essen
Copy link
Member

essen commented Apr 1, 2016

Origin | false should work well, where false disables CORS (if you got false you don't even need to bother sending CORS headers), and Origin is the one selected.

@manifest
Copy link
Author

manifest commented Apr 1, 2016

The globally unique identifier (an erlang reference in our case, result of parsing "null" or unknown origin representation) is a valid origin. Receiving it, we shouldn't terminate the cross-origin resource processing. Instead, we should match it with our list of allowed origins or "". And as for the "" we must grant access to the resource.

  1. Comparing Origins

Two origins are "the same" if, and only if, they are identical. In
particular:

o If the two origins are scheme/host/port triples, the two origins
are the same if, and only if, they have identical schemes, hosts,
and ports.

o An origin that is a globally unique identifier cannot be the same
as an origin that is a scheme/host/port triple.

So, Origin | false is not enough. We need Origin | reference() for the internal representation of origin. Even more than that, the Origin we got by parsing a request's header is not a single value, but a list. So, actually it would be [Origin | reference()].

The value which a user will get back on the response is a different story. We could get it from request's header for the "*". I think, It's expected behaviour for most of user agents. The question is which one from the request's origin list we should use? The last one?

@essen
Copy link
Member

essen commented Apr 3, 2016

I am talking about what we send in response. Not sure what most of your comment refers to.

Basically we get a request in, and want to match the Origin header against the origins we allow. Two things can happen: either there is a match, or there isn't. The list of origins we allow is either a detailed list of origins [<<"ninenines.eu">>, <<"erlang.org">>] or '*'.

The case of references is special and I'm not sure we really care about it? From what I understand it's basically for file:// or data:// or similar. We can either accept those or we don't. It's a special case though, so I have no problem leaving it aside and adding a separate match function with arity+1 that would accept those should it ever be needed. But until we have a use case, it's not worth spending too much time on it.

@manifest
Copy link
Author

manifest commented Apr 4, 2016

My thoughts according that matching function.

  1. The user agent may send an origin header with the "null" value.

http://tools.ietf.org/html/rfc6454#section-7.3
7.3. User Agent Requirements
Whenever a user agent issues an HTTP request from a "privacy-
sensitive" context, the user agent MUST send the value "null" in the
Origin header field.

It seems ok if we send "null" back to the user agent (for Chrome it works actually).
So, as long as our function returns something like {ok, binary()} | error it shouldn't be a problem.

  1. Multiple Origins

http://tools.ietf.org/html/rfc6454#section-7.2
7.2. Semantics
In some cases, a number of origins contribute to causing the user
agents to issue an HTTP request. In those cases, the user agent MAY
list all the origins in the Origin header field. For example, if the
HTTP request was initially issued by one origin but then later
redirected by another origin, the user agent MAY inform the server
that two origins were involved in causing the user agent to issue the
request.

https://www.w3.org/TR/cors/#resource-preflight-requests
6.2 Preflight Request
The Origin header can only contain a single origin as the user agent will not follow redirects.

In this case we have an origins list from the request's origin header, depending on request type (preflight or simple), we can perform our matching against an allowed origins list differently.

  • For preflight requests, we can just take a first value or even better terminate CORS processing steps if there more than one value in the request's origin header.
  • For simple requests, it seems we have to match all origins we got. We also should decide which one of them we return to the user agent.

So, in our matching function, we should match request's origin header and request's method against an allowed origins lists. It could be a good idea to pass a request object to it match_origin([binary()], cowboy:req()) -> {ok, binary()} | error.

  1. Corner cases

I encounter of some strange behaviour of Firefox in a case when a server returns a redirect to another origin on the cross-origin request (I can reproduce it). Resource is accepted by the user agent only when the server responds with "*" as a value of "access-control-allow-origin" headers. Seems like a bug in Firefox. So, probably, we shouldn't care about it.

  1. I still think this functionality should be within cowboy_req module.

@essen
Copy link
Member

essen commented Apr 4, 2016

1- OK let me ask you a different question: can you trigger the sending of "null" by a browser in a reasonably common manner? If not, we should just drop "null" entirely. Otherwise it should be configurable.

2- Same question, can you get a browser to send two origins? I can't even find an example of this happening, and even the Mozilla doc says it's a single URI: https://developer.mozilla.org/en-US/docs/Web/HTTP/Access_control_CORS#Origin

1- 2- Specs are nice and all but all that matters is implementations. :-)

3- We should care and document it. But it shouldn't impact the matching algorithm.

4- First we need a match function, then see what users do with it and take it from there. One step at a time.

@manifest
Copy link
Author

manifest commented Apr 4, 2016

2 - I've never seen it

1 - Yes, I made it yesterday

For that I set up the following:

Next, I send an ajax request from the web page on "http://localhost:8000" to the server handler "http://localhost:8002/redirect". Chrome replaced the origin header to "null" before performing the redirect. I logged requests on those servers, so you can see what happened:

fetch("http://localhost:8082/redirect", {method: 'GET', mode: 'cors'})
  .then(resp => resp.text())
  .then(data => console.log(data))
  .catch(error => console.log(error));
%% localhost:8082
=INFO REPORT==== 3-Apr-2016::20:05:24 ===
    #{bindings => [],
      body_length => 0,
      has_body => false,
      headers => #{<<"accept">> => <<"*/*">>,
        <<"accept-encoding">> => <<"gzip, deflate, sdch">>,
        <<"accept-language">> => <<"en-US,en;q=0.8">>,
        <<"connection">> => <<"keep-alive">>,
        <<"host">> => <<"localhost:8082">>,
        <<"origin">> => <<"http://localhost:8000">>,
        <<"referer">> => <<"http://localhost:8000/">>,
        <<"user-agent">> => <<"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2698.0 Safari/537.36">>},
      host => <<"localhost">>,
      host_info => undefined,
      method => <<"GET">>,
      path => <<"/redirect">>,
      path_info => undefined,
      pid => <0.156.0>,
      port => 8082,
      qs => <<>>,
      ref => http_listner,
      resp_headers => #{<<"access-control-allow-origin">> => <<"http://localhost:8000">>},
      scheme => <<"http">>,
      streamid => 1,
      version => 'HTTP/1.1'}

%% localhost:8083
=INFO REPORT==== 3-Apr-2016::20:05:24 ===
    #{bindings => [],
      body_length => 0,
      has_body => false,
      headers => #{<<"accept">> => <<"*/*">>,
        <<"accept-encoding">> => <<"gzip, deflate, sdch">>,
        <<"accept-language">> => <<"en-US,en;q=0.8">>,
        <<"access-control-request-headers">> => <<"x-devtools-emulate-network-conditions-client-id">>,
        <<"access-control-request-method">> => <<"GET">>,
        <<"connection">> => <<"keep-alive">>,
        <<"host">> => <<"localhost:8083">>,
        <<"origin">> => <<"null">>,
        <<"referer">> => <<"http://localhost:8000/">>,
        <<"user-agent">> => <<"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2698.0 Safari/537.36">>},
      host => <<"localhost">>,
      host_info => undefined,
      method => <<"OPTIONS">>,
      path => <<"/ping">>,
      path_info => undefined,
      pid => <0.156.0>,
      port => 8083,
      qs => <<>>,
      ref => http_listner,
      resp_headers => #{<<"access-control-allow-headers">> => <<"X-DevTools-Emulate-Network-Conditions-Client-Id">>,
        <<"access-control-allow-methods">> => <<"GET">>,
        <<"access-control-allow-origin">> => <<"null">>,
        <<"access-control-max-age">> => <<"0">>},
      scheme => <<"http">>,
      streamid => 1,
      version => 'HTTP/1.1'}

=INFO REPORT==== 3-Apr-2016::20:05:24 ===
    #{bindings => [],
      body_length => 0,
      has_body => false,
      headers => #{<<"accept">> => <<"*/*">>,
        <<"accept-encoding">> => <<"gzip, deflate, sdch">>,
        <<"accept-language">> => <<"en-US,en;q=0.8">>,
        <<"connection">> => <<"keep-alive">>,
        <<"host">> => <<"localhost:8083">>,
        <<"origin">> => <<"null">>,
        <<"referer">> => <<"http://localhost:8000/">>,
        <<"user-agent">> => <<"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2698.0 Safari/537.36">>},
      host => <<"localhost">>,
      host_info => undefined,
      method => <<"GET">>,
      path => <<"/ping">>,
      path_info => undefined,
      pid => <0.156.0>,
      port => 8083,
      qs => <<>>,
      ref => http_listner,
      resp_headers => #{<<"access-control-allow-origin">> => <<"null">>},
      scheme => <<"http">>,
      streamid => 2,
      version => 'HTTP/1.1'}

@essen
Copy link
Member

essen commented Apr 4, 2016

2- Then let's drop it and come back to it only when it's observed in the wild.

1- Interesting. One thing for sure: it is entirely dependent on the service. Basically do you accept other services to redirect requests to yours. And I think the default should be to reject that behavior. But it should be configurable.

Maybe we can just use "null" for this indeed, but that means the matching must be a little special ("null" must match references), whereas the other matching is exact matches. I'm OK with something like this.

@essen
Copy link
Member

essen commented Apr 4, 2016

An insightful tidbit of information:

If the server specifies an origin host rather than "*", then it must also include Origin in the Vary response header to indicate to clients that server responses will differ based on the value of the Origin request header.

https://developer.mozilla.org/en-US/docs/Web/HTTP/Access_control_CORS#Access-Control-Allow-Origin

@manifest
Copy link
Author

manifest commented Apr 30, 2016

Could be our function:

-spec match_origin([{binary(), binary(), 0..65535} | reference()] | '*', cowboy_req:req()) -> {ok, binary()} | error.
match_origin(AllowedOrigins, #{headers := #{<<"origin">> := HeaderVal}}) ->
  %% NOTE: we assume we always deal with single origin
  [Val] = cow_http_hd:parse_origin(HeaderVal),
  case {Val, AllowedOrigins} of
    {Val, '*'} when is_reference(Val) -> {ok, HeaderVal};
    {Val, '*'} -> {ok, HeaderVal};
    {Val, Val} -> {ok, HeaderVal};
    {Val, L} when is_list(L) -> case lists:member(Val, L) of true -> {ok, HeaderVal}; _ -> error end;
    _ -> error
  end;
match_origin(_, _) ->
  error.

How we can use it in a middleware:

-spec execute(Req, Env) -> {ok | stop, Req, Env} when Req :: cowboy_req:req(), Env :: any().
execute(Req, Env) ->
  %%AllowedOrigins = '*',
  AllowedOrigins =
    [ {<<"http">>, <<"localhost">>, 9000},
      {<<"http">>, <<"example.org">>, 80} ],

  case match_origin(AllowedOrigins, Req) of
    {ok, Origin} -> {ok, handle_cors_request(Origin, Req), Env};
    error -> {ok, Req, Env}
  end.

-spec handle_cors_request(binary(), cowboy_req:req()) -> cowboy_req:req().
handle_cors_request(Origin, #{method := Method} = Req) ->
  Req2 = cowboy_req:set_resp_header(<<"access-control-allow-origin">>, Origin, Req),
  Req3 = cowboy_req:set_resp_header(<<"vary">>, <<"Origin">>, Req2),
  case Method of
    <<"OPTIONS">> ->
      Req4 = cowboy_req:set_resp_header(<<"access-control-allow-methods">>, <<"GET,PUT">>, Req3),
      Req5 = cowboy_req:set_resp_header(<<"access-control-allow-headers">>, <<"X-DevTools-Emulate-Network-Conditions-Client-Id">>, Req4),
      Req6 = cowboy_req:set_resp_header(<<"access-control-max-age">>, <<"0">>, Req5),
      Req6;
    _ ->
      Req3
  end.

@manifest
Copy link
Author

manifest commented Apr 30, 2016

Or we could move the matching logic to the cowlib.

-spec match_origin(Origin, [Origin] | '*') -> boolean() when Origin :: {binary(), binary(), 0..65535} | reference().
match_origin(Val, '*') when is_reference(Val) -> true;
match_origin(_, '*') -> true;
match_origin(Val, Val) -> true;
match_origin(Val, L) when is_list(L) -> lists:member(Val, L);
match_origin(_, _) -> false.

Middleware:

-spec execute(Req, Env) -> {ok | stop, Req, Env} when Req :: cowboy_req:req(), Env :: any().
execute(#{headers := #{<<"origin">> := HeaderVal}} = Req, Env) ->
  %%AllowedOrigins = '*',
  AllowedOrigins = 
    [ {<<"http">>, <<"localhost">>, 9000},
      {<<"http">>, <<"example.org">>, 80} ],

  %% NOTE: we assume we always deal with single origin
  [Val] = cow_http_hd:parse_origin(HeaderVal),
  case cow_http_hd:match_origin(Val, AllowedOrigins) of
    true -> {ok, handle_cors_request(HeaderVal, Req), Env};
    _ -> {ok, Req, Env}
  end;
execute(Req, Env) ->
  {ok, Req, Env}.

@manifest
Copy link
Author

manifest commented Jun 4, 2016

@essen, what do you think about merging CORS related parsing/producing functions for cowlib?

@essen
Copy link
Member

essen commented Jun 5, 2016

Yep will do. Almost done fixing a breaking bug, so probably after that.

@manifest
Copy link
Author

@essen, what do you also think about putting the match_origin(Origin, [Origin] | '*') -> boolean() function into cowlib (perhaps, check_origin is a better name)? I could make a PR.

@essen
Copy link
Member

essen commented Jun 17, 2016

In theory I have nothing against it but right now I've been away from CORS too long so I don't want to make too hasty decisions. I think I will want to have an rfc6454 test suite in Cowboy before deciding on where things should go. And that is largely dependent on adding CORS support to cowboy_rest.

@essen essen added this to the 2.0 New Features milestone Feb 4, 2017
@essen essen removed this from the 2.0 New Features milestone Oct 2, 2017
@essen essen added this to the 2.4 "CORS" milestone Dec 5, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants