Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.Sign up
net/http: Clarify Content-Type behavior of ResponseWriter.Write #41259
The current documentation for
What I think this means is that up to 512 bytes of the first call to write will be used to perform Content-Type detection, there is no buffering involved. I think the comment might be better worded to highlight this part as "the initial 512 bytes of written data" could be misread to mean "across multiple calls to write".
Issue with the documentation
I, for one, misread it when I tried to implement this interface and I added a buffer, but @rsc suggested this could be done without buffering, which I think makes sense by reading the docs or skimming over some of the implementations. It was also not immediately clear to @katiehockman and @FiloSottile, who were involved in discussing a fix for #40928, so I am CCing them here to voice their opinion.
Issue with the API
Arguably this is not even a very user-friendly interface as short writes can cause Content-Type to be wrongly set even when the same response is being generated, and this may only happen in very peculiar situations, making it hard to debug (e.g. some short buffers passed to
Issues with the implementation
Current state (summarized in a table below)
We currently have several implementations of this, and it looks like they all behave in slightly different ways (the following list refers to Go 1.15.1):
Inconsistent behavior between tests and cgi
Inconsistent behavior between cgi and main
Inconsistent behavior between cgi, tests and main
Server-Side Content-Type sniffing is generally a bad idea: any server that accepts any user data might fall victim to XSS unless proper sanitization and proper checks are performed during the upload phase, and those checks would be much easier to write if there was a consistent behavior wrt C-T sniffing.
A server allow users to upload a profile picture, and checks with http.DetectContentType if the uploaded file is an image before accepting it, then serves the file with http.ServeFile. If the image was called *.html it will render as html.
A server allow users to upload a profile picture, and checks that the uploaded file mime.TypeByExtension returns "image/*", then serves the file with io.Copy. If the image content was html, it will render as html.
A server allows users to upload videos and forces uploaded files to be (among other types) valid mp4. The implementation relies on C-T sniffing when serving the file. Valid mp4 files can start with
To be extra-sure, a server implementation has tests everywhere that all responses are served with "text/plain" content-type. The actual server might still sniff "text/html" just because it buffers more data before making a decision.
This has already caused some issues in the past (#31753 and #40928 come to mind) but there is probably more there to justify this behavior (it feels like most differences are due to incremental patches that only addressed the issues on the affected implementations that were reported).
I would kindly propose to sync the behaviors to all be like the main one (which seems the more coherent) and clarify in the documentation that this is the intended one.
We could even have an "internal" folder in http that will implement C-T sniffing once for all implementations, so we can be sure that they will not go out of sync again.
The main issue I can see is that this might break existing CGI and FCGI applications, and might make implementations of http.ResponseWriter that we don't know about not correct (the interface is fully exported).
@rsc also pointed out that (in addition to what is listed above) the sniffing will always set a charset of