Add Content-Disposition http header#945
Add Content-Disposition http header#945nizsheanez wants to merge 14 commits intoethersphere:masterfrom
Conversation
Add Content-Disposition http header
…sphere#527, ethersphere#944) Changed swarm/api.Upload: - when using WaitGroup no reason to use done counter. - f.Close() must be called in Defer - otherwise panic or future added early return will cause leak of file descriptors - use common DetectContentType method - one error was suppressed
…n fallback to extension (ethersphere#944)
acud
left a comment
There was a problem hiding this comment.
Thanks for the contribution 👍 🍺
Please see my comments in the code.
| awg := &sync.WaitGroup{} | ||
|
|
||
| for i, entry := range list { | ||
| if i >= dcnt+maxParallelFiles { |
There was a problem hiding this comment.
you have removed important I/O throttling code here. either convert to a worker pattern or preserve as is
| defer f.Close() | ||
| if mimeType == "" { | ||
| mimeType = detectMimeType(file) | ||
| mimeType = api.DetectContentType(file) |
There was a problem hiding this comment.
i wonder if we really need to open a new file handle inside of api.DetectContentType. at this point we already have f as an open file handle. maybe we could use that instead. WDYT?
There was a problem hiding this comment.
I found problem:
- If DetectContentType func will accept *File, then it must current preserve position in file
- Here we working with
*swarm.Filenot*os.Filewhich has no Seek() method to save and restore position
It doesn't look possible to get content probe from *swarm.File
Maybe you have an idea?
There was a problem hiding this comment.
Ah, *swarm.File it's local file, then i can cast it to Seeker
There was a problem hiding this comment.
you can also try to change the swarm.File spec to be
Reader,
Closer,
Seeker
instead of just ReadCloser. that might make your life easier. but you also might need to move the type around a bit because this will probably generate a circular dependency when you'll try to reference client from api :)
| "github.com/ethereum/go-ethereum/log" | ||
| ) | ||
|
|
||
| // detect content type by file content |
There was a problem hiding this comment.
comment should follow godoc style:
// DetectContentType by file content with fallback
// to file extension and "application/octet-stream"
| // otherwise by file extension | ||
| // returns "application/octet-stream" in worst case | ||
| func DetectContentType(file string) string { | ||
| var contentType = "application/octet-stream" // default value |
There was a problem hiding this comment.
extract to const MIME_OCTET_STREAM = "application/octet-stream", then reuse across the code (DRY)
|
|
||
| f, err := os.Open(file) | ||
| if err != nil { | ||
| log.Warn("detectMimeType: can't open file", "file", file, "err", err) |
There was a problem hiding this comment.
log.Error - err != nil
also - you're not returning on error (the rest of the code below will continue executing - resulting in an error). if you choose to have another handle opened here - please return an empty string in this code block - the function has to return here.
as mentioned above - i would remove this part and already use the open handle that is created in the enclosing call
| } | ||
|
|
||
| func testDetectCTCreateFile(t *testing.T, fileName, content, expectedContentType string) { | ||
| path := os.TempDir() + fileName |
| testDetectCTCreateFile(t, "file.css", "Lorem Ipsum", "text/css; charset=utf-8") | ||
| } | ||
|
|
||
| func testDetectCTCreateFile(t *testing.T, fileName, content, expectedContentType string) { |
There was a problem hiding this comment.
content param not used - can be removed
| defer reader.Close() | ||
|
|
||
| w.Header().Set("Content-Type", "application/x-tar") | ||
| w.Header().Set("Content-Disposition", fmt.Sprintf("attachment; filename=\"%s.tar\"", path.Base(r.URL.Path))) |
There was a problem hiding this comment.
two issues with this:
- do not use r.URL as it will contain the entire requested URI e.g.
/bzz:/somename.eth/somethingetc. please use theurivariable above which comes from the middleware that parses the bzz uri. it also has aPathmember which you can use to get to the path inside the bzz uri. - there could be a situation where you would be downloading a tarball but the filename might be empty. in that case, you should fallback to the
Addrfromuri(oruntitled)
There was a problem hiding this comment.
How about case:
- when i upload 1 file, then download it by hash only (without file name in url)
Is there known way to get file name? (As i understand - only way is manifest retrieve)
There was a problem hiding this comment.
even if its one file - if there's an enclosing manifest, i think there should be a filename there too (need to double check. in any case - there should be - let's assume that)
| } | ||
|
|
||
| if contentType == "" { | ||
| contentType = "application/octet-stream" |
| } | ||
|
|
||
| w.Header().Set("Content-Type", contentType) | ||
| w.Header().Set("Content-Disposition", fmt.Sprintf("attachment; filename=\"%s\"", path.Base(r.URL.Path))) |
There was a problem hiding this comment.
please amend this - use uri from context instead of request URI. also - provide the filename fallback
…eReview (ethersphere#527, ethersphere#944) Fixes for ethersphere#945 - revert back channel which handle I/O throttling - add tests on real files - add tests when file extension doesn't match content - move code to existing files, no real reason to have dedicated file for such functions - move http headers validation to swarm/api/http package - swarm.Open now checking content of file to detect Content Type
|
Fixed all comments, but I need carefully revisit code about " I/O throttling" tomorrow. Found edge case: |
|
hmmm this is starting to suck. we should probably provide an override for all web types because browsers need to have them correctly served, or rather, just have it the way you originally implemented (first file extension, then content based, then fallback to |
|
I will look around to existing webservers tonight. For example Nginx has mime configuration file, default is: https://github.com/nginx/nginx/blob/master/conf/mime.types For deeper content check https://github.com/h2non/filetype may help, but it looks overkill for me. |
|
Here are my thoughts. About content sniff:
About extension-based checks:
|
We don't use it. When fetching content from Swarm - we usually use
Ignore? this is a response header.
Definitely, I also think that adding tests for utf8 content (filenames included) is a must. I'll open an issue for that :)
I completely agree. The list from nginx looks pretty exhaustive for this point in time. |
…phere#944) - add hardcoded mime types list - follow stdlib way of ContentType checking
|
Ok. Added. Not solved parts yet:
|
|
This is due to another problem and is outside the scope of this issue. I'm already have this issue at mind and it will be taken care of as part of ongoing efforts to refactor and cleanup swarm manifests. |
| // MimeOctetStream default value of http Content-Type header | ||
| const MimeOctetStream = "application/octet-stream" | ||
|
|
||
| var once sync.Once // guards initMime |
There was a problem hiding this comment.
not needed - use init() instead
| ".avi": "video/x-msvideo", | ||
| } | ||
|
|
||
| func initMime() { |
|
|
||
| // DetectContentType by file content, or fallback to file extension | ||
| func DetectContentType(f *os.File) (string, error) { | ||
| once.Do(initMime) |
| testDetectContentType(t, "file-no-extension", "Lorem Ipsum", "text/plain; charset=utf-8") | ||
| testDetectContentType(t, "file-no-extension-no-content", "", "text/plain; charset=utf-8") | ||
|
|
||
| // TODO: research, define behavior when content doesn't mach file extension. Now just rely on extension. |
| return | ||
| } | ||
|
|
||
| if err := wait(ctx); err != nil { |
There was a problem hiding this comment.
I'd rather have this wait() before the mime detection happens, as in the original code
| } | ||
|
|
||
| w.Header().Set("Content-Type", contentType) | ||
| w.Header().Set("Content-Disposition", fmt.Sprintf("attachment; filename=\"%s\"", path.Base(r.URL.Path))) |
There was a problem hiding this comment.
please amend this - use uri from context instead of request URI. also - provide the filename fallback
| return strings.Contains(err.Error(), api.ErrDecrypt.Error()) | ||
| } | ||
|
|
||
| func ValidateContentTypeHeader(r *http.Request) (string, error) { |
There was a problem hiding this comment.
I'm not really sure what the purpose of this code is?
There was a problem hiding this comment.
validating incoming mime type on upload from header, to make sure that the input we're getting is correct before storing it in a manifest - just to make sure we are on par with IETF RFCs
| @@ -0,0 +1 @@ | |||
| package api | |||
…where possible (ethersphere#527, ethersphere#944) - simplified I/O throttling by semaphore primitive - don't use URL from request, use parsed Swarm uri for file name detection
|
Fixed all comments. Question: do i need to delete force of "application/octet-stream" in |
|
Great work on this! :) |
|
Thank you for on-boarding. |
| // MimeOctetStream default value of http Content-Type header | ||
| const MimeOctetStream = "application/octet-stream" | ||
|
|
||
| // builtinTypesLower stores copy of https://github.com/nginx/nginx/blob/master/conf/mime.types |
There was a problem hiding this comment.
I personally feel this mime.types file should be vendored (or just copied into a Go file) to make it easy to keep in-sync with the upstream file.
I also think it should be clear why this is here (if I saw this I'd probably just think it was unnecessary because the Go mime package does the right thing in pretty much all cases).
There was a problem hiding this comment.
By vendoring of mime.types file - you mean write some go generate command to parse it and gen code? Same approach as here https://github.com/bradrydzewski/go-mimetype (it based on mailcap instead of nginx, but there is same mime file) ?
There was a problem hiding this comment.
Something like that, yes, which would make it much easier to keep in sync, and more obvious where the types come from.
|
|
||
| func init() { | ||
| for ext, t := range builtinTypesLower { | ||
| mime.AddExtensionType(ext, t) |
There was a problem hiding this comment.
This should panic if there is an error (in case some one adds an extension or mime type that doesn't parse)
| } | ||
| } | ||
|
|
||
| // DetectContentType by file content, or fallback to file extension |
There was a problem hiding this comment.
This comment is not consistent with what the method actually does.
| if err != nil { | ||
| t.Fatal(err) | ||
| } | ||
| defer os.Remove(f.Name()) |
There was a problem hiding this comment.
You could avoid the need for a temp file by modifying DetectContentType to take (filename string, data io.ReadSeeker) and just use a bytes.Reader in the tests.
There was a problem hiding this comment.
indeed, it will simplify everything
|
|
||
| func TestDetectContentType(t *testing.T) { | ||
| // internally use http.DetectContentType, so here are test cases only about fallback to file extension check | ||
| testDetectContentType(t, "file.css", "body {background-color: orange}", "text/css; charset=utf-8") |
There was a problem hiding this comment.
This should use t.Run to isolate each test.
| if found := path.Base(uri.Path); found != "" && found != "." && found != "/" { | ||
| fileName = found | ||
| } | ||
| w.Header().Set("Content-Disposition", fmt.Sprintf("attachment; filename=\"%s\"", fileName)) |
There was a problem hiding this comment.
Are you sure we want to make all files attachments? If I go to bzz://mysite.eth/index.html in my browser, I expect to see the HTML in the browser, not download it to a local file.
There was a problem hiding this comment.
oh, how much you right. Right now i will change it to:
Content-Disposition: inline; filename="example.html"
But i will rise question in Gitter - maybe exist some cases when attachment make sense, for example in S3 owner can add any headers to it's file/folder by UI, and Content-Disposition: attachment too.
| return strings.Contains(err.Error(), api.ErrDecrypt.Error()) | ||
| } | ||
|
|
||
| func ValidateContentTypeHeader(r *http.Request) (string, error) { |
There was a problem hiding this comment.
I'm not really sure what the purpose of this code is?
7c3c4c1 to
b632a2b
Compare
…on only since go1.11, a bit of copy-paste (ethersphere#527, ethersphere#944)
|
2 more changes:
|
…on only since go1.11, a bit of copy-paste (ethersphere#527, ethersphere#944) - added text/markdown content type
d3e2886 to
afb750a
Compare
|
@lmars about |
…ion from attachment to inline (ethersphere#527, ethersphere#944) - If I go to bzz://mysite.eth/index.html in my browser, I expect to see the HTML in the browser, not download it to a local file.
08bc172 to
1e87e3d
Compare
|
@nizsheanez @justelad status? shall we move this to main repo? |
|
@zelig let's first resolve all issues here then squash+reopen on upstream |
|
Status: fixed all CR comments except one:
|
| t.Fatalf("Content-Type header expected: application/x-tar, got: %s", h) | ||
| } | ||
|
|
||
| // See header to suggest file name for Browsers |
| if found := path.Base(uri.Path); found != "" && found != "." && found != "/" { | ||
| fileName = found | ||
| } | ||
| w.Header().Set("Content-Disposition", fmt.Sprintf("inline; filename=\"%s.tar\"", fileName)) |
There was a problem hiding this comment.
i am wondering - maybe in this case we should actually send an attachment header. @lmars?
| if found := path.Base(uri.Path); found != "" && found != "." && found != "/" { | ||
| fileName = found | ||
| } | ||
| w.Header().Set("Content-Disposition", fmt.Sprintf("inline; filename=\"%s\"", fileName)) |
There was a problem hiding this comment.
@lmars I think this should be:
if contentType == 'application/*' - return attachment
else return inline.
WDYT?
There was a problem hiding this comment.
Worry about "application/pdf" and "application/rss+xml", I'm personally fine if .pdf opening in browser (re-checked, browsers do download .pdf file with Content-Disposition: attachment; file="name.pdf" header, and only Safari on my Mac auto-open it in Preview).
There was a problem hiding this comment.
i've seen the rss issue too but i think its a non-issue as its a machine readable format and is not supposed to be read by the user as-is (AFAIK)
There was a problem hiding this comment.
On my MacBook:
With inline:
- Firefox: has Built-in RSS reader. it displays RSS in readable way and has "subscribe" button.
- Safari: forward link to installed RSS Reader app.
With attachment:
- Firefox: download file
- Safari: forward link to installed RSS Reader app
Looks like we breaking some UX here.
There was a problem hiding this comment.
ok - keep it as it is for now (inline it)
|
Added generator based on mime.types file as lmars recommended. |
| // mime.types file provided by mailcap, which follow https://www.iana.org/assignments/media-types/media-types.xhtml | ||
| // | ||
| // Get last version of mime.types file by: | ||
| // docker run --rm -v $(pwd):/tmp alpine:edge /bin/sh -c "apk add -U mailcap; mv /etc/mime.types /tmp" |
There was a problem hiding this comment.
This now seems to do exactly what the package you linked to does (i.e. https://github.com/bradrydzewski/go-mimetype), should we not just use that package?
There was a problem hiding this comment.
That package looks unmaintained, but let me try to rise PR there, here is list of differences:
go-mimetypedoesn't fail ifmime.AddExtensionTypereturns error- outdated mime.types file
There was a problem hiding this comment.
Ok, I don't mean for you to go back and forth on this. I assumed you didn't want to use that package because you wanted to use nginx mime types, but if we want to use the same method as that package then we should try and PR upstream (the maintainer looks active on other projects)
There was a problem hiding this comment.
Ah, just that mime.types file easier to parse. File from nginx has nginx.config format you can see there are some bracers: https://github.com/nginx/nginx/blob/master/conf/mime.types
No problem, already created PR: bradrydzewski/go-mimetype#1
Anyway we still discussing about Content-Disposition
…ator added, based on mailcap mime.types file (ethersphere#527, ethersphere#944)
e89f0ef to
5c00804
Compare
|
@nizsheanez @justelad it seems this PR is covering two things:
It seems we've got to a good solution for improving What is driving the decision to add this |
|
@lmars this solves the problem that the browser does not always know under which filename to save swarm content and this is rather annoying. |
|
I agree that Adding @justelad , now I need to do squash and open new PR with good description? |
|
@nizsheanez many thanks for this comprehensive PR! Please squash all commits and reopen this PR against |
For #527