Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Povide a setting to control req.fresh evaluation #4753

Closed
gh-andre opened this issue Nov 14, 2021 · 10 comments
Closed

Povide a setting to control req.fresh evaluation #4753

gh-andre opened this issue Nov 14, 2021 · 10 comments
Labels

Comments

@gh-andre
Copy link

It is possible to disable the behind-the-scenes logic around the Etag header via app.set("etag", false), but the same cannot be done against its counterpart, If-Modified-Since and Express silently discards a fully rendered response based on just evaluating Last-Modified vs. If-Modified-Since. RFC 7232 says that the 304 response SHOULD be sent in this case, but doesn't say it MUST be sent, which means that some configuration, similar to etag is needed to control this behavior.

In practical terms, I may have a listing of items and Last-Modified in this list would be the time stamp of the last item. Some of the items may be edited, which doesn't change the last modified date of the whole list, but may benefit from intermediate items displayed in their most up-to-date form. This is just an example, though, the req.fresh logic shouldn't enforce this behavior as if it's mandated by the RFC.

Here's an example:

const express = require("express");
const app = express();
const port = 3000;

function index_get(req, res)
{
  // ... evaluates If-Modified-Since, decides to render a new response ...

  // keep the modified date (e.g. listing item was updated, but no new items added)
  res.set("Last-Modified", new Date(2021, 10, 14, 0, 0, 0).toUTCString());

  // runs database queries and generates a slightly different response, which is thrown away because of req.fresh
  res.send("Hello World!");
}

app.set("etag", false);
app.get("/", index_get);
app.listen(port);

console.log("Listening on port %d", port);

Hit F5 a few times and after the first 200 response, all subsequent responses will be 304, even though nothing in the script generates those.

In response.js, in res.send = function send(body) this code calls req.fresh, which evaluates If-Modified-Since vs. Last-Updated and if the latter is the same as the former, it drops the response and a few headers:

  // freshness
  if (req.fresh) this.statusCode = 304;

  // strip irrelevant headers
  if (204 === this.statusCode || 304 === this.statusCode) {
    this.removeHeader('Content-Type');
    this.removeHeader('Content-Length');
    this.removeHeader('Transfer-Encoding');
    chunk = '';
...
    this.end(chunk, encoding);
  }

It would be nice if discarding responses based on req.fresh could be disabled via application settings, similar to how etag generation and evaluation may be disabled.

If you choose not to consider this change, it would be useful if the current behavior is documented. I spent some time trying to figure out why Express silently drops a fully rendered response.

@dougwilson
Copy link
Contributor

Hi @gh-andre sorry you are having trouble. We can certainly consider adding a setting to control yhe freshness support. There are various ways to disable it today, of course, just not though a setting.

As for the documentation, the res.send docs do currently state that res.send does "HTTP cache freshness support". If you do not feel like that is jot adequate or have any other feedback, please open an issue or pull request against the website repository so the documentation team can addrss it.

@dougwilson
Copy link
Contributor

As for your last-modified date, you may want to take a lool at https://datatracker.ietf.org/doc/html/rfc7232#section-2.2.1 , namely the following statement which seems to be where your problem lies according to your example:

A representation is typically the sum of many parts behind the
resource interface. The last-modified time would usually be the most
recent time that any of those parts were changed.

@dougwilson
Copy link
Contributor

dougwilson commented Nov 15, 2021

As for the default behavior, I understand your fustration, but SHOULD has aa specific meaning in RFCs:

SHOULD This word, or the adjective "RECOMMENDED", mean that there
may exist valid reasons in particular circumstances to ignore a
particular item, but the full implications must be understood and
carefully weighed before choosing a different course.

It means that not following something that is a SHOULD is an exception that needs to be weighed carefully. By default, Express is following the proper procol, but performing things that are SHOULD by default. But as you stated, we can certainly add a setting to disable this, so users of the framework who carefully concider they they want to deviate from the standard behavior can do so as they like. You can currently do this in Express today, but just not though a single setting; you can override the fresh behavior with your own (for example just always return stale).

As far as I know, this is the first time a request to disable this functionality has ever been requested, so it is not surprising that it has not been added yet.

@dougwilson dougwilson changed the title Please provide a way to disable discarding of response body based on req.fresh evaluation Povide a setting to control req.fresh evaluation Nov 15, 2021
@gh-andre
Copy link
Author

It might be difficult to figure out which of those multiple changed parts are significant and which are not. For example, item description excerpts may not contain visible changes from updates in the list, so their last updated date isn't relevant, but to figure it out is quite a technical challenge. Sending back the 200 response makes it easier and doesn't go against the grain of the RFC because SHOULD is a synonym for RECOMMENDED and is described as a best practice to follow, not as a standard behavior and in cases like this example it makes code more manageable.

Anyway, thank you for looking into this. Much appreciated. I will leave it in your hands, whichever way you decide to go. Hopefully this post will provide enough description for somebody who may come across of the same issue.

@gh-andre
Copy link
Author

Sorry, there's some overlap in my response. I didn't notice your last edit.

@dougwilson
Copy link
Contributor

dougwilson commented Nov 15, 2021

because SHOULD is a synonym for RECOMMENDED and is described as a best practice to follow, not as a standard behavior

Hm, interesting. Can you link to where that definition of SHOULD is written? The one I quoted (link is https://datatracker.ietf.org/doc/html/rfc2119) seems to indicate otherwise. You seem to me to be describing the MAY keyword, is that right?

It might be difficult to figure out which of those multiple changed parts are significant and which are not. For example, item description excerpts may not contain visible changes from updates in the list, so their last updated date isn't relevant, but to figure it out is quite a technical challenge.

I can certainly appreciate that and we can definately disable it. But note that both intermediate proxies and even the web browsers themselves will still evaluate the last-modified header and they can decide to never even make the request to the origin server (Express) at all. In those cases, even though you disabled the check in Express, your clients will still see the outdated response.

@dougwilson

This comment has been minimized.

@gh-andre
Copy link
Author

Can you link to where that definition of SHOULD is written?

It's in the excerpt you mentioned - I read it as if SHOULD and RECOMMENDED describe best practices that should be followed for most cases. As long as it's understood and documented, I read it as it may be used. I do try to follow SHOULD whenever possible, but in the case I described it's quite challenging.

As for browsers sending the request, they are supposed to send a conditional request whenever max-age or any equivalent (Expires, s-maxage, etc) suggests that the response must be revalidated. They could render the old response, indeed - Firefox seems to race cached and rendered responses, but seems to render updated ones in my tests. Chrome seems to do the same.

For no-cache, it's not quite the same. A cached response is controlled by either ETag or Last-Modified and no-cache means just that, so the response always rendered. So, if one wanted response revalidated the last modified time is the most optimal. The ETag requires a full response hashed or a cache of responses, which is harder to enforce with multiple servers.

Having said that, Last-Modified is far from the silver bullet and FireFox seems to ignore max-age in favor of heuristics a lot, so maybe I will resort to no-cache eventually, but for now I'm still trying to figure out how to send conditional requests reliably.

Thanks again for your insights. Very helpful.

@dougwilson
Copy link
Contributor

It's in the excerpt you mentioned - I read it as if SHOULD and RECOMMENDED describe best practices that should be followed for most cases. As long as it's understood and documented, I read it as it may be used. I do try to follow SHOULD whenever possible, but in the case I described it's quite challenging.

I think you are mostly understanding it, but if your source is the same quote, you can see that SHOULD means that any implementer (like Express) needs to follow it by default. To deviate from SHOULD "the full implications must be understood and carefully weighed before choosing a different course.". I'm confused on why you would expect a SHOULD to not be the default behavior. Can you ellaborate on what, specifically, I am missing from the RFC quote?

For no-cache, it's not quite the same. A cached response is controlled by either ETag or Last-Modified and no-cache means just that, so the response always rendered. So, if one wanted response revalidated the last modified time is the most optimal. The ETag requires a full response hashed or a cache of responses, which is harder to enforce with multiple servers.

I'm not really following here. Can you explain a different way or perhaps, can you link to where you are looking in https://datatracker.ietf.org/doc/html/rfc7234 ?

@gh-andre
Copy link
Author

I'm honestly surprised that I need to elaborate on the difference between no-cache and caching behaviors with ETag and Last-Modified. One disables the cache and the other one makes caching conditional. Using no-cache just to work around a caching implementation that examines fully rendered responses is not quite the same as using it for responses that are intended to be not cacheable by design.

I think you are mostly understanding it

Perhaps it's time to just close this issue. Thanks for giving it a consideration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants