-
-
Notifications
You must be signed in to change notification settings - Fork 258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add mime-support as upstream source for MIME types. #205
base: master
Are you sure you want to change the base?
Conversation
…es to sources. Regenerate database. Update README.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thank you for putting this together! I left a couple comments, but in addition that is not answered from your PR text:
How does mime-support project source and gather their type data? I see lots of new data in here, so would like to understand the history of the data coming in and how they vet new data they are adding over time.
If they are sourcing from "shared-mime-info" then why don't we just source from there instead of mime-support? What does the intermediate dependency add apart from complexity and indirection? In addition, we need to get in contact with the maintainers to answer the question of how they are adding new one instead of just making assumptions :)
/** | ||
* URL for the mime.types file in the Apache HTTPD project source. | ||
*/ | ||
var URL = 'https://salsa.debian.org/debian/mime-support/-/raw/master/mime.types?inline=false' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make sure to follow up here with the permission of the site operator that we can begin to start polling / scraping their endpoint with an automated process. I couldn't find any public TOS on a quick look, so if there is one that says it's OK, then that's fine and we don't need their explicit permission.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes agree required for merge. Will do.
@@ -1066,7 +1117,8 @@ | |||
"extensions": ["pgp"] | |||
}, | |||
"application/pgp-keys": { | |||
"source": "iana" | |||
"source": "iana", | |||
"extensions": ["key"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does this come from? The official record for this type (https://tools.ietf.org/html/rfc3156#section-9.3) states the file extension is ".asc". It jumped out to me since ".key" is the Apple Keynote files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gnupg uses this for keys.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha. Can you link to the source of that information?
@@ -768,7 +809,7 @@ | |||
}, | |||
"application/mathematica": { | |||
"source": "iana", | |||
"extensions": ["ma","nb","mb"] | |||
"extensions": ["ma","nb","mb","nbp"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does this new extension come from? I see there is a registration from Wolfram Research (the owners of the spec) at https://www.iana.org/assignments/media-types/application/mathematica but this extension is not listed there.
}, | ||
"application/font-tdpfr": { | ||
"source": "iana", | ||
"extensions": ["pfr"] | ||
}, | ||
"application/font-woff": { | ||
"source": "iana", | ||
"compressible": false | ||
"compressible": false, | ||
"extensions": ["woff"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This mapping is obsolete; many of these font types (like .woff) moved under font/ tree (as font/woff in this case). More information can be found here: http://tools.ietf.org/rfc/rfc8081.txt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Obsolete yes. Is not being obsolete prerequisite for the extension inclusion? Many files are going to have that extension regardless of whether someone declares is obsolete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mime is what is obsolete, not the extension. The files would simply use the non-obsolete file extension. My comment is about mapping the extension here to the obsolete type instead of the current type.
@@ -502,20 +523,26 @@ | |||
"source": "iana" | |||
}, | |||
"application/font-sfnt": { | |||
"source": "iana" | |||
"source": "iana", | |||
"extensions": ["otf","ttf"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This mapping is obsolete; many of these font types (moved under font/ tree. More information can be found here: http://tools.ietf.org/rfc/rfc8081.txt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Obsolete yes. Is not being obsolete prerequisite for the extension inclusion? Many files are going to have that extension regardless of whether someone declares is obsolete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mime is what is obsolete, not the extension. The files would simply use the non-obsolete file extension. My comment is about mapping the extension here to the obsolete type instead of the current type.
@@ -2351,7 +2419,8 @@ | |||
"source": "iana" | |||
}, | |||
"application/vnd.debian.binary-package": { | |||
"source": "iana" | |||
"source": "iana", | |||
"extensions": ["deb","ddeb","udeb"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see deb and udeb listed in the type registration https://www.iana.org/assignments/media-types/application/vnd.debian.binary-package , but where does ddeb come from?
"compressible": false | ||
"source": "mime-support", | ||
"compressible": false, | ||
"extensions": ["m3u8"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was my understanding this extension is for the type "application/vnd.apple.mpegurl", right? Looking at the spec https://tools.ietf.org/html/rfc8216#section-4 the section says:
Each Playlist file MUST be identifiable either by the path component
of its URI or by HTTP Content-Type. In the first case, the path MUST
end with either .m3u8 or .m3u. In the second, the HTTP Content-Type
MUST be "application/vnd.apple.mpegurl" or "audio/mpegurl". Clients
SHOULD refuse to parse Playlists that are not so identified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mapping is not reversible. They both have the same extension?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what your comment means, I'm sorry. Can you state it a different way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
File with MIME type application/x-mpegurl
and application/vnd.apple.mpegurl
both can have ext m3u8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I get it. My question is can you cite where the MIME type "application/x-mpegurl" comes from? I don't see it in the specification anywhere. Without a source, one could always argue that "foo/bar" MIME type is also m3u8 :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure. To answer all of your questions, which are legit, will have to get more info on how this DB was pulled together, from the maintainer. Don't expect you to merge with out that. Fact remains though this the Debian Linux /etc/mime.types
file source, so in my mind that gives it as much legitimacy as the Apache or Nginx versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, but that doesn't mean they do not have outdated or invalid entries. When we pulled in NGINX and Apache back in the day, we did this same process and fixed a lot of bad data in their files. That is what we'd need to do in this same case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah OK . That clears it up. Not sure if I have the time right now for all that!
… Update README.md
@@ -7418,7 +8024,7 @@ | |||
}, | |||
"text/calendar": { | |||
"source": "iana", | |||
"extensions": ["ics","ifb"] | |||
"extensions": ["ics","ifb","icz"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see in the text/calendar spec https://tools.ietf.org/html/rfc5545#section-8.1 section about the type, it lists the ics and ifb file extensions, but no mention of the icz extension.
}, | ||
"font/ttf": { | ||
"source": "iana", | ||
"compressible": true, | ||
"extensions": ["ttf"] | ||
"extensions": ["ttf","otf"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why add .otf to this? My understanding is that .otf = font/otf and .ttf = font/ttf (which is what the database was prior to adding this source).
@sgpinkus I'm looking into what is coming in because there are several known places that use this module to provide file extension -> mime type mappings: The Node.js package https://www.npmjs.com/package/mime and GitHub Pages https://docs.github.com/en/enterprise/2.13/user/articles/mime-types-on-github-pages are two very notable ones. So we have to be very careful pulling in conflicting entries and understand how they are going to affect the behavior of these projects if there are conflicting mappings pulled in. |
OK. Understood. Still waiting on more info from maintainer. Will get back to you with any new info. If you can't merge that is totally fine. |
@dougwilson I got a reply from the maintainer of mime-support, Charles Plessy. It seems many of the MIME types in mime-support DB were added over the years by hand. He indicated that pulling directly from IANA would have been preferable now that IANA has become more receptive to adding new MIME types. I pointed out that you have a script that pulls from IANA already. And directed him to this repository, and also this PR. Still it would be useful to get some of the types in mime-support added here. I guess, in retrospect, the proper process for doing this is actually, 1. IANA, 2. pre IANA the adhoc "custom" type registration process you have set up here. For reference, here are the 192 MIME types in mime-support not in mime-db, in CSV, and JSON. Script to used generate is included. |
671c853
to
e35c46e
Compare
https://github.com/younggun23/mime-db/Add mime-support as upstream source for MIME types |
Adds mime-support mime.types file as 4th non custom upstream source.
File structure is identical to Apache as the upstream file is a drop in replacement for the Apache mime.types files (actually it's the other way around ..).
In this PR I've added a basic script to print out how many MIME from each source. I've also labelled custom types explicitly to help with tracing source in
db.json
:Before:
After: