Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

application/mp4 incorrectly maps to file extension mp4s. #207

Closed
Bdthomson opened this issue Aug 11, 2020 · 18 comments
Closed

application/mp4 incorrectly maps to file extension mp4s. #207

Bdthomson opened this issue Aug 11, 2020 · 18 comments

Comments

@Bdthomson
Copy link

Source IANA - https://www.iana.org/assignments/media-types/application/mp4 - "mp4 and mpg4"

https://github.com/jshttp/mime-db/blob/master/db.json#L908 - "mp4s"

@dougwilson
Copy link
Contributor

Thank you for the report. It looks like the current mapping is coming from Apache.

Can you clarify what the outcome you are looking for is? (a) add those two extensions to the list (b) remove the apache one from the list or (c) both?

@Bdthomson
Copy link
Author

Ah, that's a bit odd, I would have expected the source to say "apache" as some other mime types in that list do.

I've never heard of mp4s and can't find any record of this file extension even existing, it doesn't show up when you search any of these sites

https://www.fileinfo.com
https://www.filext.com
https://www.file-extensions.org

As far as outcome, I would expect (c) to see both of those added to the list and mp4s removed.

@dougwilson
Copy link
Contributor

Thanks! So the source is defined in the README as "where the mime type is defined.", and in this case, indeed the MIME type is defined in IANA. We do not include where the extensions are sourced from in the database.

So there is good news and bad news on that:

Good news -- I will look into what is keeping the extensions that are in IANA from showing up in our database
Bad news -- We cannot remove mp4s because it is coming from an upstream source; if you feel strongly about removing it, you would need to get it removed from Apache.

@Martii
Copy link

Martii commented Aug 12, 2020

@dougwilson

Ref: Apache MIME types

Snippet of that page:

...
# The table below contains both registered and (common) unregistered types.
...
application/mp4					mp4s
...
audio/mp4					m4a mp4a
...
video/mp4					mp4 mp4v mpg4

We cannot remove mp4s because it is coming from an upstream source

Interesting text comment in the spec snippet.

No reference to mp4s, or m4p either in this package at this Apache file, in Ubuntu File associations.

I will look into what is keeping the extensions that are in IANA from showing up in our database

Ref: https://tools.ietf.org/html/rfc4337

"a) if the file contains neither visual nor audio presentations, but
only, for example, MPEG-J or MPEG-7, use application/mp4;

b) for all other files, including those that have MPEG-J, etc., in
addition to video or audio streams, video/mp4 should be used;
however:

c) for files with audio but no visual aspect, including those that
have MPEG-J, etc., in addition to audio streams, audio/mp4 may be
used.

In any case, these indicate files conforming to the "MP4"
specification, ISO/IEC 14496-1:2000, systems file format.

@dougwilson
Copy link
Contributor

That's correct on Apache; of course registered types are in IANA. If we only cared about those we would never need to consume Apache :) We consume the Apache file in order to get the "unregistered" ones they provide, which is a lot of very useful ones not in the IANA database.

@Martii
Copy link

Martii commented Aug 12, 2020

Here we go for mp4s:

Ref: https://tools.ietf.org/html/rfc6381

"When the first element of a value is 'mp4a' (indicating some kind of
MPEG-4 audio), or 'mp4v' (indicating some kind of MPEG-4 part-2
video), or 'mp4s' (indicating some kind of MPEG-4 Systems streams
such as MPEG-4 BInary Format for Scenes (BIFS)), the second element
is the hexadecimal representation of the MP4 Registration Authority
ObjectTypeIndication (OTI), as specified in [MP4RA] and [MP41]
(including amendments). Note that [MP4RA] uses a leading "0x" with
these values, which is omitted here and hence implied."

Still no m4p found though other than here... this is puzzling. EDIT: Ref: https://www.loc.gov/preservation/digital/formats/fdd/fdd000052.shtml

"For sound files. The m4p extension is for QuickTime files containing AAC bitstreams purchased from iTunes and protected by a digital rights management scheme. Bookmarkable AAC files may carry the extension m4b. [The mp3 extension is for QuickTime sound files containing MP3 bitstreams; extent of protection unknown at this writing.]"

@dougwilson
Copy link
Contributor

So taking a look at our database, we do actually have a MIME type with mpg4 and mp4 already: video/mp4 (https://www.iana.org/assignments/media-types/video/mp4) which is coming from Apache.

So you want both video/mp4 and application/mp4 to list out those extensions, is that correct?

@dougwilson
Copy link
Contributor

P.S. if it helps at all, I just noticed the IANA registry points to an RFC that was obsoleted by other RFC. And that RFC notes that all these MPEG-related types and registered in their own alliance registry: http://mp4ra.org/

@Martii
Copy link

Martii commented Aug 12, 2020

is that correct?

Depends on the hierarchy for this projects precedence or if it's peer leveled. MPEG anything catches my eye since I've worked on it in the past. Apple hasn't always registered their MIME types but it is in the Library of Congress link... so it's "kind of" there, in history.

Because of rfc4337 conditionals it makes it a little more difficult.

@dougwilson
Copy link
Contributor

Well, as far as this project is concerned, we are not in the business to dig through documents and try and figure out what is "right" and what is not -- that is an entire project in of itself. The goal of this project is to simply aggregate the three sources listed at the top of the README into a nice little JSON file format in order for folks to consume.... So whatever those sources say is associated to what is what this module is going to say. The reason I'm asking those questions is in order to identify which of those three upstreams to correct if it's wrong or needs changing.

@Martii
Copy link

Martii commented Aug 12, 2020

Well it's the chicken in the egg syndrome in my book. One needs to get at the MIME type to determine if its a binary audio (conditional a in rfc4337) but this can be a "list" of MIME types usually included to determine the MIME type of a binary file in server side projects and flip it for client side.

Personally I'd probably leave it as is. This is probably one of those "(common) unregistered types" in Apache... as long as mp4 relates to video/mp4 and audio/mp4 I think it's okay as is... but the source should indicate Apache and anything else needed (for the other extensions)... which is why I looked it up for ya. :)

simply aggregate

Yes but who has higher priority if any? IANA over Apache, etc. or do you just peer merge it discarding duplicates?

@dougwilson
Copy link
Contributor

It is just all merged together. For example, the MIME types is an object, so it's not possible to have duplicates. The extensions are just all joined together in the extensions array, for example. The goal is that if it exists in at least one of the sources, it exists in this database.

@Martii
Copy link

Martii commented Aug 12, 2020

Is it too big of a leap to show multiple sources on merge to help alleviate confusion? CSV separated (or pipes) or even a new field to show the precedence or would that be too breaking?

@dougwilson
Copy link
Contributor

We cannot make that type of change without a major version change and even changing the name of the db file, as this project really took off more than we ever expected, and there are so many folks just pulling the file direct from master all over the Internet... That field is in use by various folks as well.

@Martii
Copy link

Martii commented Aug 12, 2020

Okay... well I guess this is one of those paradoxes where one has to check existing issues to see if it's been explained from what source. Apache has one of the application types but the other is the Library of Congress, in history,... sooo... on the merge of IANA and Apache... it picks IANA, probably as the "governing" factor on pulls from those sites. *shrugs*

@dougwilson
Copy link
Contributor

So this module does provide them split out in the src/ directory if you wanted to see the individual split out contents to see what is coming from where. I'm not sure if that answers what you're looking for or not on that front.

@Martii
Copy link

Martii commented Aug 12, 2020

Re: @Bdthomson

Ah, that's a bit odd, I would have expected the source to say "apache" as some other mime types in that list do.

...

I'm not sure if that answers what you're looking for...

Just providing a possible explanation for the author of this issue. I'm sure you've said the same thing over and over through the years with this project but my focus is not always on GH searching. ;) Sometimes it helps to have another voice asking the questions to get it in different wording. :) Plus I get a little better understanding of this project and its use cases.

So this module does provide them split out in the src/ directory if you wanted to see the individual split out contents to see what is coming from where.

Good to know. Although programmatically may not be useful since it's usually accessed from all the merged... which is great but that's why I asked in priority in the final list.

@dougwilson
Copy link
Contributor

The application/mp4 mime in the database now includes the two extensions from the IANA entry as the first ones.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants