Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: net/http: handle Seek error in range request by sending entire file #42173

Open
rsc opened this issue Oct 23, 2020 · 4 comments
Open

proposal: net/http: handle Seek error in range request by sending entire file #42173

rsc opened this issue Oct 23, 2020 · 4 comments
Labels
Projects
Milestone

Comments

@rsc
Copy link
Contributor

@rsc rsc commented Oct 23, 2020

In the io/fs proposal, I made the comment that if an fs.File did not support Seek then net/http couldn't serve it when presented by a range request. Many people pointed out that strictly speaking that statement is incorrect: it is always allowed for an HTTP server to respond to a request with a Range header by sending the entire content (the response headers clearly indicate whether ranges or the entire content are being sent).

That said, the net/http server behavior on Seek failure has been here for a long time, long before io/fs. While we could handle a Seek failure by responding with full content, it is less clear to me whether we should. Hence this issue, to make sure the question is properly evaluated.

There are two cases where a Seek failure can happen:

  1. A single range has been requested. The Seek to that range fails.
    In this case, if we assume the Seek left the file offset at the start of the file, then we can send the whole thing.

  2. Multiple ranges have been requested. The first Seek succeeds but a later Seek fails.
    In this case, we cannot take back the partial chunks that have already been sent.
    It seems clear that we must report an error.
    A complicating factor here is that a File with a “best-effort” Seek might succeed only for “seek to where the file already is,” and if the first range is 0-N, then that first Seek might succeed even though the next one will fail.
    If we want to try to detect Seek failures before sending any ranges, so that full content can be sent instead, we could try Seeking to the second range, then to the first range, before sending the first range. One of those two should be a non-no-op Seek.

Clearly in both these cases (with the complication in the second), we could detect a Seek failure before sending any ranges and respond by sending the whole file.

On the other hand, I have vague memories of a browser-based PDF reader being exceedingly slow against Go servers because it was making range requests and getting the full content back and discarding all but the ranges it had requested. (Maybe this was before we supported Range at all, or maybe our Range implementation was sending full content; I don't remember.) So the browser was downloading the same, entire large file over and over and over. Yes, the bug was on the client side, but even a correct implementation in that case (notice the full response and save it) would still just appear to be very slow (but not exceedingly slow) when it downloads an entire (say) 100 MB PDF to display just the first page. It may be that any reasonable server should support range on files being served, and that sending the entire file as a pedantically correct response, instead of giving a clear error that would cause the server author to add Seek support, causes more problems than it avoids.

That is, if a client asks for 1 kB of a 1 GB file, it may technically comply with the spec to send the whole GB, but at the same time it may be a worse response than an error.

(The spec is written this way because clients have to deal with the possibility of an old server that has no idea what a range request is. But that doesn't mean Go pretending to be an old server is a good idea.)

I honestly don't know what's right here. What we can do is different from what we should do.
What does nginx or Apache do if a seek fails while serving a range request? What do other servers do?

/cc @bradfitz

@rsc rsc added this to the Proposal milestone Oct 23, 2020
@gopherbot gopherbot added the Proposal label Oct 23, 2020
@ianlancetaylor ianlancetaylor added this to Incoming in Proposals Oct 28, 2020
@rsc rsc moved this from Incoming to Active in Proposals Oct 28, 2020
@rob05c
Copy link

@rob05c rob05c commented Oct 29, 2020

I vote error. If the client thinks the resource is large enough to request a Range, we should assume it is, and as you say, returning 4GB for a 4k request is bad. In-practice, IMO error is more often the better solution. And it is RFC-compliant to do so.

If this were common, I'd suggest a config option. But all modern filesystems can seek, so this should be rare. Configs for every rare error are untenable.

  1. Multiple ranges have been requested. The first Seek succeeds but a later Seek fails. In this case, we cannot take back the partial chunks that have already been sent. It seems clear that we must report an error.

Agreed. If the server has already started sending data, there's no correct answer except to terminate the connection. In theory, Go could do a full read until it gets to the point of the seek; in practice, that would rarely be fast enough to be useful. In practice, this error case seems even rarer.

What does nginx or Apache do if a seek fails while serving a range request? What do other servers do?

Apache Traffic Server (not Apache HTTP Server aka httpd) will error if any mid-read error occurs (as required), and if any initial read error occurs, will fall back to the parent/origin. But as a Caching Proxy, file reads mostly don't apply.

Apache Traffic Control's own Grove Caching Proxy will likewise treat read errors as not-in-cache. But it's written in Go, so the decisions here could potentially change that. 😄

Also, the Go authors probably know this, but for anyone reading who doesn't: I verified, net/http.ServeFile serves the ideal 416 Requested Range Not Satisfiable for ranges after the file size, i.e. when Seek returns 0 EOF. So none of the above applies in that case.

@rsc
Copy link
Contributor Author

@rsc rsc commented Nov 11, 2020

@rob05c, your response mostly mentions what happens on "read error". What about seek errors? How do the servers respond to that?

And can someone check Apache or nginx? Thanks.

@rob05c
Copy link

@rob05c rob05c commented Nov 11, 2020

I said "read" but I meant "read or seek." ATS and Grove both behave the same on seek errors. Also to be clear, ATS behavior can be a file or a raw block device, and almost all large production deployments use block devices. In which case there's no difference.

@rsc
Copy link
Contributor Author

@rsc rsc commented Nov 18, 2020

I'm not really seeing any arguments in this issue for sending the entire file.
In the absence of those, it seems like we should leave things as is.
Does anyone disagree?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.