Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: mime: expand on what is covered by builtinTypes #69530

Open
AidanWelch opened this issue Sep 19, 2024 · 14 comments · May be fixed by #69533
Open

proposal: mime: expand on what is covered by builtinTypes #69530

AidanWelch opened this issue Sep 19, 2024 · 14 comments · May be fixed by #69533
Labels
Milestone

Comments

@AidanWelch
Copy link

Proposal Details

Right now,

mime/type.go includes what seems to be a somewhat arbitrary list of built-in types:

var builtinTypesLower = map[string]string{
	".avif": "image/avif",
	".css":  "text/css; charset=utf-8",
	".gif":  "image/gif",
	".htm":  "text/html; charset=utf-8",
	".html": "text/html; charset=utf-8",
	".jpeg": "image/jpeg",
	".jpg":  "image/jpeg",
	".js":   "text/javascript; charset=utf-8",
	".json": "application/json",
	".mjs":  "text/javascript; charset=utf-8",
	".pdf":  "application/pdf",
	".png":  "image/png",
	".svg":  "image/svg+xml",
	".wasm": "application/wasm",
	".webp": "image/webp",
	".xml":  "text/xml; charset=utf-8",
}

I think some guidance on what should be included in this would be good, rather than a consumer of the package not realizing there are arbitrary gaps. In the meantime I will submit a PR that will incorporate all MDN defined "Common Types" (which also I have to admit is arbitrary, but at least covers more common usecases.)

@gopherbot gopherbot added this to the Proposal milestone Sep 19, 2024
@seankhliao
Copy link
Member

what's included is based on WHATWG mime sniffing
https://mimesniff.spec.whatwg.org/
this gives us a clear spec to adhere to, rather than an arbitrary list.

@seankhliao seankhliao changed the title proposal: mime: Expand on what is covered by builtinTypes proposal: mime: expand on what is covered by builtinTypes Sep 19, 2024
@AidanWelch
Copy link
Author

AidanWelch commented Sep 19, 2024

@seankhliao Wow, thanks for the quick response, but I'm confused as to where that actually specifies specifically just the mime types specified in builtinTypes. From my understanding that would be more relevant for net/http's DetectContentType that is actually sniffing. But, for mime's ExtensionsByType and TypeByExtension don't we have the assumption that the file extension/type is truthful and we're trying to determine the most likely type from that- whereas sniffing wouldn't even care about the given type or extension? (And so sniffing would give most(all?) plaintext types for example the same extension/type)

AidanWelch added a commit to AidanWelch/go that referenced this issue Sep 19, 2024
Simply implements the first recommended type for each file extension listed in MDN https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/MIME_types/Common_types

However, this excludes ".3gp" and ".3gp2" as from from I can tell it is not possible to know if it is video or audio solely from file extension.

As far as I can tell there are two previous PRs that each implemented a type simply because they were in common use.

Updates golang#69530
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/614376 mentions this issue: mime: extend "builtinTypes" to include a more complete list of common types

@gabyhelp
Copy link

Related Issues and Documentation

(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)

@ianlancetaylor ianlancetaylor moved this to Incoming in Proposals Sep 19, 2024
AidanWelch added a commit to AidanWelch/go that referenced this issue Sep 20, 2024
Comment with the source of the builtin types

Updates golang#69530
@neild
Copy link
Contributor

neild commented Sep 25, 2024

what's included is based on WHATWG mime sniffing https://mimesniff.spec.whatwg.org/ this gives us a clear spec to adhere to, rather than an arbitrary list.

net/http.DetectContentType is based on WHATWG's spec; this proposal is for the type/extension mapping used by mime.TypeByExtension and other functions in the mime package when the system MIME database (/etc/mime.types or similar) isn't present.

@milhoan
Copy link

milhoan commented Oct 17, 2024

Per conversation here whatwg/mimesniff#51 (comment), the intent of the Mimesniff spec is

"Based on the recent trajectory of changes to this spec, it seems to me that the scope of the spec is client-side sniffing for cross-browser compatibility and protection for the user against malicious files"

Mimesniff spec is not an appropriate spec for a http server use case. It would be better to adopt a different spec for this.

Alternatively, a new function that is server side appropriate that implements a different spec is needed. (EDIT: This comment was regarding DetectContentType, not TypeByExtension)

@AidanWelch
Copy link
Author

@milhoan But as of now, this doesn't mimesniff. It just maps file extensions to mime types

@milhoan
Copy link

milhoan commented Oct 17, 2024

@milhoan But as of now, this doesn't mimesniff. It just maps file extensions to mime types

Sorry, I saw the discussion above about DetectContentType being based on that spec(imo it should not be). Disregard my comment as this is not about that function. I'm 100% in favor of more mime type coverage for TypeByExtension

@seankhliao
Copy link
Member

Looking at what the browsers do for matching file extensions to mime type:

Chromium https://chromium.googlesource.com/chromium/src/+/master/net/base/mime_util.cc#129
Maintains a primary and secondary mapping, with the preference order being: primary, platform, secondary.

Firefox https://searchfox.org/mozilla-central/source/uriloader/exthandler/nsExternalHelperAppService.cpp#2968
list at https://searchfox.org/mozilla-central/source/uriloader/exthandler/nsExternalHelperAppService.cpp#455
const defs https://searchfox.org/mozilla-central/source/netwerk/mime/nsMimeTypes.h
Maintains a default and extra mapping, with the preference order being: default, platform, extras.

Below is a table mapping file extensions to go mime types and chromium / firefox inclusion in primary (1) or secondary (2) lists, and their mime type if it differs from what go has.

extension go mime type chrome firefox
3g2 2 (video/3gpp2)
3gp 2 (video/3gpp)
3gpp 2 (video/3gpp)
aac 2 (audio/aac)
ai 2 (application/postscript) 2 (application/postscript)
apk 2 (application/vnd.android.package-archive) 2 (application/vnd.android.package-archive)
apng 1 (image/apng) 2 (image/apng)
appcache 2 (text/cache-manifest)
arj 2 (application/x-arj)
art 2 (image/x-jg)
avif image/avif 1 2
bin 2 (application/octet-stream) 2 (application/octet-stream)
bmp 2 (image/bmp) 2 (image/bmp)
cer 2 (application/x-x509-ca-cert)
com 2 (application/octet-stream) 2 (application/octet-stream)
crt 2 (application/x-x509-ca-cert)
crx 1 (application/x-chrome-extension)
css text/css 1 2
csv 1 (text/csv) 2 (text/csv)
cur 2 (image/x-icon)
doc 2 (application/msword) 2 (application/msword)
docx 2 (application/vnd.openxmlformats-officedocument.wordprocessingml.document) 2 (application/vnd.openxmlformats-officedocument.wordprocessingml.document)
dot 2 (application/msword)
ehtml 2 (text/html) 2 (text/html)
eml 2 (message/rfc822) 2 (message/rfc822)
eps 2 (application/postscript) 2 (application/postscript)
epub 2 (application/epub+zip)
exe 2 (application/octet-stream) 2 (application/octet-stream)
flac 1 (audio/flac) 2 (audio/flac)
ftl 1 (text/plain)
gif image/gif 1 2
gz 2 (application/x-gzip) 2 (application/gzip)
htm text/html 1 2
html text/html 1 2
ical 2 (text/calendar)
icalendar 2 (text/calendar)
ico 2 (image/vnd.microsoft.icon) 2 (image/x-icon)
ics 2 (text/calendar) 2 (text/calendar)
ifb 2 (text/calendar)
jfif 2 (image/jpeg) 2 (image/jpeg)
jpeg image/jpeg 1 2
jpg image/jpeg 1 2
js text/javascript 2 (application/javascript) 2 (application/x-javascript)
jsm 2 (application/x-javascript)
json application/json 2 2
jxl 2 (image/jxl)
locale 1 (text/plain)
m3u8 2 (application/x-mpegurl)
m4a 1 (audio/x-m4a) 2 (audio/mp4)
m4b 2 (audio/mp4)
m4v 1 (video/mp4)
mht 1 (multipart/related)
mhtml 1 (multipart/related)
mid 2 (audio/x-midi)
mjs text/javascript 1 2 (application/x-javascript)
mml 2 (application/mathml+xml)
mp2 2 (audio/mpeg)
mp3 1 (audio/mp3) 2 (audio/mpeg)
mp4 1 (video/mp4) 2 (video/mp4)
mpeg 2 (video/mpeg)
mpega 2 (audio/mpeg)
mpg 2 (video/mpeg)
odg 2 (application/vnd.oasis.opendocument.graphics)
odp 2 (application/vnd.oasis.opendocument.presentation)
ods 2 (application/vnd.oasis.opendocument.spreadsheet)
odt 2 (application/vnd.oasis.opendocument.text)
oga 1 (audio/ogg) 2 (audio/ogg)
ogg 1 (audio/ogg) 2 (application/ogg)
ogm 1 (video/ogg)
ogv 1 (video/ogg) 2 (video/ogg)
opus 1 (audio/ogg) 2 (audio/ogg)
p7c 2 (application/pkcs7-mime)
p7m 2 (application/pkcs7-mime)
p7s 2 (application/pkcs7-signature)
p7z 2 (application/pkcs7-mime)
pdf application/pdf 2 2
pjp 2 (image/jpeg) 2 (image/jpeg)
pjpeg 2 (image/jpeg) 2 (image/jpeg)
png image/png 2 (image/x-png) 2
ppt 2 (application/vnd.ms-powerpoint) 2 (application/vnd.ms-powerpoint)
pptx 2 (application/vnd.openxmlformats-officedocument.presentationml.presentation) 2 (application/vnd.openxmlformats-officedocument.presentationml.presentation)
properties 1 (text/plain)
ps 2 (application/postscript) 2 (application/postscript)
rdf 2 (application/rdf+xml) 2 (application/rdf+xml)
rss 2 (application/rss+xml)
rtf 2 (application/rtf) 2 (application/rtf)
sh 2 (text/x-sh)
shtm 1 (text/html)
shtml 1 (text/html) 2 (text/html)
svg image/svg+xml 1 2
svgz 1 (image/svg+xml)
swf 2 (application/x-shockwave-flash)
swl 2 (application/x-shockwave-flash)
tar 2 (application/x-tar)
text 2 (text/plain) 2 (text/plain)
tgz 2 (application/x-gzip)
tif 2 (image/tiff) 2 (image/tiff)
tiff 2 (image/tiff) 2 (image/tiff)
txt 2 (text/plain) 2 (text/plain)
vcard 2 (text/vcard)
vcf 2 (text/vcard)
vtt 2 (text/vtt) 2 (text/vtt)
wasm application/wasm 1 2
wav 1 (audio/wav) 2 (audio/x-wav)
weba 2 (audio/webm)
webm 1 (audio/webm) 2 (audio/webm)
webp image/webp 1 2
woff 2 (application/font-woff)
xbl 2 (text/xml) 2 (text/xml)
xbm 2 (image/x-xbitmap) 2 (image/x-xbitmap)
xht 1 (application/xhtml+xml) 2 (application/xhtml+xml)
xhtm 1 (application/xhtml+xml)
xhtml 1 (application/xhtml+xml) 2 (application/xhtml+xml)
xls 2 (application/vnd.ms-excel) 2 (application/vnd.ms-excel)
xlsx 2 (application/vnd.openxmlformats-officedocument.spreadsheetml.sheet) 2 (application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
xml text/xml 1 2
xpi 2 (application/x-xpinstall)
xsl 2 (text/xml) 2 (text/xml)
xslt 2 (text/xml)
xul 2 (application/vnd.mozilla.xul+xml)
yuv 2 (video/x-raw-yuv)
zip 2 (application/zip) 2 (application/zip)

@seankhliao
Copy link
Member

If we are to add more, I propose we limit it to what both browsers have decided to include in their built in lists.

@AidanWelch
Copy link
Author

That sounds good to me, I can update the PR if that is what's decided on

@neild
Copy link
Contributor

neild commented Nov 18, 2024

Interestingly, the one case where we override the platform value (on Windows, we ignore a registry entry mapping .js to text/plain) is one where Chrome and Firefox apparently prefer the platform setting.

Limiting our list of builtin mappings to what both Chrome and Firefox include seems reasonably principled. I'd support that.

AidanWelch added a commit to AidanWelch/go that referenced this issue Dec 27, 2024
… agree

Implement all agreed upon types, using IANA's listed media types to decide
when there is a disagreement in type.  Except in the case of `.wav` where
`audio/wav` is used.

Fixes golang#69530
@AidanWelch
Copy link
Author

Okay, my new commit implements the types supported by both. When there was disagreement I went with what IANA lists.

In the case of .wav IANA lists nothing. However, RFC 2361 describes a standard for a MIME type using audio/vnd.wave. Chrome uses audio/wav which is supported by the long expired draft-ema-vpim-wav-00 which states:

RFC 2361, "WAVE and AVI Codec Registries," is an informational
draft describing IANA namespaces for codecs registered in
Microsoft's WAVE and AVI registries. Such codecs may be described
in the following format: audio/vnd.wave; codec = [codec ID].
This format is not suited to the description of a wave file as
defined in this document, as it does not indicate the format
standard that audio/wav must adhere to for interoperability
between messaging systems. On desktop-oriented messaging systems,
audio/wav (rather than audio/vnd.wave) is the defacto standard.

Firefox uses audio/x-wav which (similar to audio/wav) was used as an example in a few RFCs but never actually described as a standard. So I decided to go with audio/wav despite that seemingly not actually standardized.

AidanWelch added a commit to AidanWelch/go that referenced this issue Dec 27, 2024
Add the newly included file extensions to the expected TypeByExtension.

Fixes golang#69530
@neild
Copy link
Contributor

neild commented Jan 3, 2025

To recap the proposal:

The mime package contains a built-in table mapping file extensions to MIME types. For example: ".png" maps to "image/png". This table is only used when the system MIME database is not present. The table currently contains 16 entries: https://cs.opensource.google/go/go/+/refs/tags/go1.23.4:src/mime/type.go;l=53

Chrome and Firefox both include similar built-in tables. The proposal is to add all entries present in both the Chrome and Firefox tables to the mime package. This expands the table to 64 entries. CL: https://go.dev/cl/614376

In the future, if Chrome and Firefox add new entries, we would follow suit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Incoming
Development

Successfully merging a pull request may close this issue.

6 participants