-
Notifications
You must be signed in to change notification settings - Fork 14.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Router bug with full request URL #3088
Comments
Yes, this is as it was designed to behave. |
The routing is only path name routing. If you are looking to better handle those types of URLs, you'd want to implement a top-level router that looks at the full URL and passes it to the appropriate router for the path part. I hope this answers your... statement? |
Oh, haha,I just noticed the issue title says "bug" :) I wouldn't classify this as a bug, more of a misunderstanding of how the routing works. We actually have tests that ensure we handle those full URLs like we are doing. There is a lot of discussion in the connect repo on the history, as that code originated from connect. The routing syntax is basically just method name + path name. Any other type of routing you need to layer on top of express as needed, for example the suggestion above if you are looking for that type of routing. Anyone can even make a new router implementation and plug it into express (just a function that takes req, res, next) which can do different things :) I'm not sure if this is well documented on our site, dispute it existing for multiple major versions of express. If not, we can open an issue to get improved docs around this :) ! |
It seems like something this unexpected (IMHO) should at least be noted in documentation somewhere. I could not find any such notes. This could be a real issue since there are bots (and/or malicious users looking for open proxies) that send full URLs and those requests inadvertently are responded to with a status code of 200 for Express servers with matching paths. |
I agree. I'm not sure if it is, but if not, please feel free to open an issue there or even a pull request :) ! We try hard on the docs and it's hard to remember everything, so we honestly missing things, I apologize.
If you are suggesting a security issue, please follow our security disclosure guidelines to discuss; this is not the appropriate forum, I'm sure you understand. |
Not suggesting a security issue per se, mostly just the unnecessary transfer of data. Any Express applications should already be protecting access to resources as usual. |
@mscdex, gotcha, sorry for the misunderstanding on my part. So I did just jot down some thoughts on the documentation end (because at the very least, Express 4.x and previous versions will have this behavior that needs to be documented) at the issue expressjs/expressjs.com#716 for tracking. I can see the unnecessary transfer of data part, but I mean, since the router itself is only a method + pathname router, without completely changing how the path syntax is, I'm not sure how it can take into account the scheme + host parts. I suppose we could just never route the request at all if those parts are present, but then it would be very difficult for people to even use the router at all to route those types of requests if that was their use-case. Express itself is "unopinionated" in this case, just routing based on the information Express was given: the method + pathname and simply ignoring all other parts (the scheme, host, query, etc.) and leaving that up to further user-land code currently. I was taking a look to determine if there is any traceable conversation from when this was added, but only ended up on the commit itself that added it: senchalabs/connect@7560a5a As far as I can tell, the purpose is as I have been living with so far: that the router literally only routes on what you have provided it (method + path) and ignores all the rest (scheme, host, query, etc.), leaving it up to the user-land to handle. Let me know if I'm totally off-base here or missing something :) I know this issue started off reporting a bug, which turned out to be "not a bug; functions as designed", but if you want to move to a conversation of questioning the design itself, that's defiantly OK, and I can re-open the issue (and probably change the title and initial post to make sense for the discussion starter). |
Maybe I'm in the minority here since the issue hadn't been raised previously, but IMHO I would expect a route like |
I also never understood the design decision for why FQDNs were supported in the request path either. Manually crafting these requests with a couple of domains (Cloudflare, Heroku, etc) results in a 400. In others (Google), it's a 404. Neither rewrite it to match by the path. I also would think of this as a bug, since there's no way to catch these requests when using Express and therefore no way to reject these types of requests. Does anyone have a reference to why we'd want to support it in a way that we ignore the original URL? It seems that the ideal recommendation would be that we push people to support this themselves by rewriting the path (if they want to support an FQDN path). I'm even having trouble finding what/why requests like this occur - does anyone have a reference to that either? |
Yes there is: app.use(function (req, res, next) {
if (url.parse(req.url).host) return res.sendStatus(400);
next();
});
I'm not sure. I looked through the Express and connect issues and found back in 2013 when it wasn't working right a bunch of issues and PRs were made to fix the support, but nothing since. Connect itself supports it since... forever.
I don't agree. That goes against the values of Express, namely being "unopinionated".
I would like to understand that as well. So from what I can tell so far, I see the following here: (1) Express/Connect has always acted like this (2) over time, there has not been a single objection to this feature until now (3) there have been issue reports and PRs to fix this feature, indicating people actually use it. @blakeembrey perhaps this should be a topic for the next Express.js TC meeting on 10/5, if there is no good resolution before then? So far I'm a big fat 👎 on changing this behavior, and see this issue as more of a lack of documentation issue. Express's router only routes on method + pathname; it does not route on anything else. That is just how it is designed. I'm sure if we started sending back 400 Bad Request if the Telling people they need to rewrite the URL if they want to support FQDN is non-sense, because they will have to re-implement the entire way that Perhaps what everyone is surprised on is the fact that Node.js HTTP server even lets these types of URLs in in the first place? Perhaps the Node.js HTTP server should make that an opt-in feature? Express simply routes whatever it gets; Express is not the HTTP server, only a slim routing + sugar layer on top of the Node.js core server. |
FWIW, I sent the following to www.google.com and got a 200 OK:
I got a 200 OK when sending the following to www.microsoft.com:
Even sending the following to www.microsoft.com results in a 200 OK with the Microsoft home page as the response:
|
That is one solution, but unfortunately it will result in a significant performance hit since it would be executed for every request. Perhaps a better workaround might be something like: app.use(function(req, res, next) {
if (req.originalUrl.charCodeAt(0) !== 47/*'/'*/)
return res.sendStatus(400);
next();
});
I'm not surprised, node's http module is a low-level thing (like many other built-in modules) and there are situations that require FQDNs, especially http proxies with the I think we can all agree that this is at least a documentation issue, but IMHO the behavior is still unexpected. What is wrong with changing the behavior in the next major version of Express? Does Express not follow semver or ? |
It does, but I outlined the reasons above, which boil down to a history of issue of people reporting bug to get this working better vs this being the very first issue asking to not do it. I'm not going to just put those people out in the cold without fully understanding why we should do that, providing a proposal for what Express's behavior should be, and providing documentation & migration paths for all the people currently using this feature. I even provided examples of major websites that are showing the same behavior of the default Express. |
Yea, sorry, that was just me being quick :) If you actually use the
I would like to understand why. So far it is starting to seem like perhaps we are talking past each other instead of discussing with each other? I know you are expressing that opinion, and I spent a lot of my own time trying to write down all my reasons above, but it doesn't seem like you have any responses to them, rather just re-stating your initial statement... unless I'm missing something? |
Even the following request to
It simply acts as if the request was for |
There are plenty of counter-examples though, including the ones that @blakeembrey pointed out. For example: Sending a request to
results in a 404 response. But sending this request:
results in a 200 response (because they are obviously checking FQDNs but IMHO that is not really necessary).
Because it's obviously/literally not what the client is requesting (especially if the FQDN does not match the vhost)? |
I know there are plenty of counter examples. But if the argument is there are servers that don't allow it, then the other side can argue that there are servers that do do it. |
@mscdex you sill haven't responded to most of all my statements and concerns. It's really hard for me to continue this conversation like this... it's not a conversation, just you saying the same thing over and over. |
I know this is locked, but I have been spending my Sunday afternoon researching this issue, and the best I can find an answer is this is from the HTTP/1.1 RFC itself, RFC 7230, section 5.3.2:
This quote describes what Express is currently doing as a MUST behavior for origin servers, and rejecting those types of URLs at the Express layer for all cases is a violation of that MUST, imo. |
@dougwilson It might be worth discussing briefly. Although it's a violation of the must, I'm not sure the interpretation is that we should be ignoring all hosts/scheme/port/etc in the request? I guess it's also up to interpretation of "unopinionated", since this does sound like an opinion. Perhaps it's best as not a feature of the router, but as a feature of Express.js that can be configured (much like the |
Correct, the RFC does not way to ignore it, but also doesn't way what to do with it, only that the server MUST respond to it. I can see two solutions here (1) just ignore it as we do today or (2) provide some kind of mechanism to somehow understand what the allowed hosts are in order to only route on them. We could require everyone who ever writes an Express app to explicitly declare their expected hosts to even use Express I guess... or something else? What are your thoughts or proposal? Should we put this item on the next TC meeting as a proposed to do above?
I mean, when I say that, I mean that Express doesn't care what you want to do with those types of URLs, because it's so easy to add a middleware at the top of your app if you wanted to reject them. If we always rejected them, how can people not reject them? Simply rewriting the The biggest issue, and I've brought this up on multiple TC meetings so far, is that we don't have a written explanation of what things like "unopinionated" mean. We talked about it and agreeded on what we think it should be, and there was plenty of time for all TC members to attend or even watch the recordings. I am simply expressing what we agreed upon here. If you're interested in attending the next TC meeting, we can always bring up this topic as well for discussion again, if that helps.
I mean, I'm not sure if I'm saying the right words here or some kind of disconnect. The way the router works is that is only routes on the information given to it: the method + pathname. Everything else in the url is ignored during routing, for example, the query string is ignored, just as all other non-specified aspects of the request. If you have a thought on how we should add that into the routing, please, absolutely open an issue or pull request detailing the specifics on the proposal so we can understand and evaluate it :)
I assume this is a typo? I'm not actually sure what you meant here, if you can clarify :)
If you know of some way to reach out to the users of Express (especially since this is an automatic behavior that they may not even know they are relying on), I would love to hear it :) There are many times I would like to get this kind of feedback, but it seems like an impossible audience to reach for the most part. Technically every user of Express is "using it", since it is how Express behaves by default, but probably understanding what happens if we wanted to turn this off is going to be a large project. Perhaps what we can o is build some kind of telemetry service into Express itself, and ask people to opt-in to collect telemetry on their Express apps which would give us some way to answer these types of questions using analysis back on our end. That would help a lot in these types of things when we think about how to go about changing fundamental Express behavior. I would love to hear ideas around this. Currently I try to use issues as an indicator, and based on issues, we have multiple issues through out the Express and Connect repos reporting and fixing bugs in this behavior to make it work better vs 1 issue (this one) proposing that maybe it should not work that way (and since the issue was filed as a bug rather than a proposal that it should work differently, it's hard to say it was a true proposal vs a surprise reaction against an honest mistake in our documentation). |
It appears that Express ignores the other properties of the parsed request URL when matching defined routes against an incoming request.
This is problematic because "full" request URLs for example will match any defined route handler that contains the path part of the "full" URL.
Example:
will cause Express to match a handler like this:
The text was updated successfully, but these errors were encountered: