New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: update DetectContentType sniff signature for MP3 without ID3 #21124

Open
odeke-em opened this Issue Jul 22, 2017 · 9 comments

Comments

Projects
None yet
7 participants
@odeke-em
Member

odeke-em commented Jul 22, 2017

This is a placeholder bug to remind me to look into updating the http.DetectContentType sniff mechanisms to accomodate an addition to whatwg/mimesniff in which MP3s can be sniffed without an ID3 tag https://mimesniff.spec.whatwg.org/#signature-for-mp3-without-id3.

This shall require updating the max buffer size from 512 to 1445 and reviewing that algorithm described in https://mimesniff.spec.whatwg.org/#signature-for-mp3-without-id3. I noticed this after seeing this tweet https://twitter.com/mimesniff/status/888665270025420800 which pointed me to PR whatwg/mimesniff#4.

This can be discussed when the tree opens up for Go1.10.

@odeke-em odeke-em self-assigned this Jul 22, 2017

@odeke-em odeke-em added this to the Go1.10 milestone Jul 22, 2017

@carlmjohnson

This comment has been minimized.

Contributor

carlmjohnson commented Jul 23, 2017

Can there be an exported constant for the buffer size that DetectContentType expects if its going to change? Right now I just hardcode sending 512 bytes to it when I use it.

@bradleyfalzon

This comment has been minimized.

bradleyfalzon commented Jul 23, 2017

Further to this, I have a local change to implement whatwg/mimesniff#27 and whatwg/mimesniff#28 but I was waiting until upstream clarified their positions. Not sure if there's value implementing what's currently clarified already though (eg implement application/vnd.ms-fontobject and application/font-woff).

@agnivade

This comment has been minimized.

Member

agnivade commented Mar 13, 2018

@odeke-em - Just checking if this is in your radar ? Or I can spend some time with this and whip up a CL.

@odeke-em

This comment has been minimized.

Member

odeke-em commented Mar 14, 2018

@agnivade thanks for the ping, please go ahead, it is all yours :) Just make sure
that the content is from mimesniff.spec.whatwg.org which is our resource for content type sniffing.

@agnivade

This comment has been minimized.

Member

agnivade commented Mar 14, 2018

Great, thanks.

@gopherbot

This comment has been minimized.

gopherbot commented Mar 19, 2018

Change https://golang.org/cl/101375 mentions this issue: net/http: sniffing mp3 content without ID3

@bradfitz bradfitz modified the milestones: Go1.11, Go1.12 May 4, 2018

@agnivade

This comment has been minimized.

Member

agnivade commented May 19, 2018

Just wanted to mention that the algorithm in mimesniff.spec.whatwg.org has slight mistakes which is why I was unable to make it work in my CL. As confirmed by this issue - whatwg/mimesniff#70.

@odeke-em

This comment has been minimized.

Member

odeke-em commented Nov 29, 2018

Thank you @agnivade for tackling this! The whatwg/mimesniff spec hasn't yet been fixed and I don't think that will happen during this cycle, so perhaps punting this to Unplanned and I'll also assign it to you.

@odeke-em odeke-em assigned agnivade and unassigned odeke-em Nov 29, 2018

@odeke-em odeke-em modified the milestones: Go1.12, Unplanned Nov 29, 2018

@c12h

This comment has been minimized.

c12h commented Dec 2, 2018

There is a deeper problem here. WHATWG's MIME sniffing spec is designed for use in browsers, and only in browsers. This is actually explained in the Introduction; I guess I'm not the only person who failed to notice this the first few times I read the spec. The key sentence is in the last paragraph:

This document describes a content sniffing algorithm that carefully balances the compatibility needs of user agent[s] with the security constraints imposed by existing web content.

So, for example, when DetectContentType classifies a text file containing “GIF87a is over 30 years old” as "image/gif", it is implementing the MIME spec correctly. The problem is that the WHATWG's algorithm is not appropriate outside the narrow context of a web browser handling downloaded files securely. Sigh.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment