Mimetypes as codes #4

Fstub42 · 2016-05-22T20:22:03Z

Naming every possible type is already done with mime types.
I think it would make sense to use them as codes, instead of defining your own.

"image/jpg" or "text/plain" make nice paths btw ;)

What are your thoughts on it, did I miss something?

jbenet · 2016-07-31T20:11:50Z

Yeah, we could have a mime type based multicodec, something like:

/mime/<mime-type-here>

Naming every possible type is already done with mime types.

not everything is registered under mime. also, mime is not generally specific enough. In most circumstances, application/json says very little about the encoded data. It would be nice to enable users to upgrade type annotations to signal more precise types. Im not sure multicodec will help much here, but it might.

bochaco · 2018-09-05T20:44:57Z

Hi @jbenet @Fstub42 , has there been more discussions or agreements for getting this supported?

I'm experimenting with cids and would like to potentially add support for mime-types.

Please let me know if you have some more info that I'll be up for sending a PR we could work on towards such a support.

bochaco · 2018-09-07T02:10:51Z

Based on the information found at https://www.iana.org/assignments/media-types/media-types.xhtml, plus the suggestions in previous posts here, I'm thinking the following ranges can be reserved for the different mime types/subtypes (I put some examples I'm using to play with from the cids code):

// 0x1000 - 0x17ff (11 bits) reserved for application/* (there currently are ~1,300 subtypes)
multicodec.addCodec('mime/application/json', Buffer.from('1000', 'hex'));
multicodec.addCodec('mime/application/octet-stream', Buffer.from('1001', 'hex'));
multicodec.addCodec('mime/application/ld+json', Buffer.from('1002', 'hex'));
multicodec.addCodec('mime/application/rdf+xml', Buffer.from('1003', 'hex'));

// 0x1800 - 0x18ff (8 bits) reserved for audio/* (there currently are ~150 subtypes)
multicodec.addCodec('mime/audio/mp4', Buffer.from('1800', 'hex'));

// 0x1900 - 0x190f (4 bits) reserved for font/* (there currently are ~8 subtypes)
multicodec.addCodec('mime/font/ttf', Buffer.from('1900', 'hex'));

// 0x1910 - 0x197f (7 bits) reserved for image/* (there currently are ~60 subtypes)
multicodec.addCodec('mime/image/png', Buffer.from('1910', 'hex'));

// 0x1980 - 0x19cf (5 bits) reserved for message/* (there currently are ~18 subtypes)
multicodec.addCodec('mime/message/sip', Buffer.from('1980', 'hex'));

// 0x19d0 - 0x1a3f (6 bits) reserved for model/* (there currently are ~24 subtypes)
multicodec.addCodec('mime/model/3mf', Buffer.from('19d0', 'hex'));

// 0x1a40 - 0x1a8f (5 bits) reserved for multipart/* (there currently are ~13 subtypes)
multicodec.addCodec('mime/multipart/byteranges', Buffer.from('1a40', 'hex'));

// 0x1a90 - 0x1aff (7 bits) reserved for text/* (there currently are ~71 subtypes)
multicodec.addCodec('mime/text/html', Buffer.from('1a90', 'hex'));
multicodec.addCodec('mime/text/csv', Buffer.from('1a91', 'hex'));
multicodec.addCodec('mime/text/turtle', Buffer.from('1a92', 'hex'));
multicodec.addCodec('mime/text/xml', Buffer.from('1a93', 'hex'));

// 0x1b00 - 0x1b6f (7 bits) reserved for video/* (there currently are ~78 subtypes)
multicodec.addCodec('mime/video/JPEG', Buffer.from('1b00', 'hex'));
multicodec.addCodec('mime/video/mp4', Buffer.from('1b01', 'hex'));

ghost · 2018-09-12T22:47:16Z

This is cool, thanks for giving it a push :)

It might be nicer to start with just one bucket of numbers. Most will inevitably run full anyway, so there's little sense in pushing the problem down a few years.

An approach that feels more accomodating for simple future change is to start with a single bucket that includes a snapshot of the whole mediatypes table, and then regularly add a new bucket with mediatypes added in the meantime

Mimetypes seem like a category of multicodecs that would be fine with fragmented numbers, i.e. they don't seem to benefit from being strictly consecutively numbered. (While something like the various variable-length multihash functions clearly do.)

multicodec.addCodec('mime/video/JPEG', Buffer.from('1b00', 'hex'));

I was always under the impression that mimetypes were case-insensitive -- is that the case? Important question for decoding/encoding.

It would be nice to enable users to upgrade type annotations to signal more precise types

@jbenet There's a longstanding convention for this, e.g. application/ld+json and application/epub+zip, and "more than 1000 occurances" of + in the assignment table :)

ghost · 2018-09-12T22:48:04Z

It would also be interesting to look at the complete mediatype syntax (category/foo.example+foo or more) and make sure it's fine to use these in filesystem-ish paths.

bochaco · 2018-09-13T14:42:28Z

Thanks @lgierth .
As per the RFC it seems you are right and they are case-insensitive, in https://tools.ietf.org/html/rfc2045#section-5.1 it says:

The type, subtype, and parameter names are not case sensitive. For
example, TEXT, Text, and TeXt are all equivalent top-level media
types.

As per the ranges I think you have a valid point, I was proposing to reserve enough bits to cope with quite a large addition to each of the types, but I wouldn't disagree with having them fragmented as long as we get the initial bucket of most common/popular ones all together now at least.

ConsciousCode · 2020-01-29T02:26:36Z

Since multicodec is represented with varint and MIME is a hierarchical classification system anyway, wouldn't it make sense to define one big range for MIME with a "prefix" (most significant septet of a 3-byte varint) followed by 7 bits for type and subtype? That gives a range of 128 types and 128 subtypes for each type - types with more subtypes can use multiple type septets. Squabbling over bit real estate is unnecessarily complex and less valuable than simpler decoding logic.

Speaking of all this, isn't multicodec essentially a broader (though non-hierarchical) version of mimetypes with a binary encoding?

Stebalien · 2020-01-29T03:05:41Z

#84 (comment)

Stebalien · 2020-01-29T03:06:27Z

The missing piece is actually making the mapping.

ConsciousCode · 2020-01-30T04:39:15Z

Is it outside of the scope of this topic to suggest bijective improvements of MIME in deciding the mapping? For instance, in #84 @Stebalien mentions that application/* has 1500+ entries, which strongly suggests it's an overloaded category that needs to be split up. Also, the meaning of text/* is unclear since it includes media which isn't displayed as textual (eg text/html) and many textual formats can be found under application/* (eg application/json).

Some ideas:

A bit to denote whether a format is human-readable (not used for mapping)
data/* as a catch-all instead of application/*
maybe app/* to replace vnd.*?
code/* category for source code
exe/* for executable formats (.class, .exe, elf, .o)
archive/* for file containers (.zip, .gz, .iso)
file/* for filesystem data (ext* inodes, potentially useful for IPFS)
block/* for data intended to be used in merkle-DAG structures (cryptocurrency blocks, IPLD objects)

Some of these may belong in their own multicodecs or outright unnecessary. Also, all the suggestions thus far don't suggest any future-proof way of representing higher level interpretations of a lower level data format, eg application/*+xml. Naively they could just be treated as separate entries, but this discards potentially useful information. If we give the mime "namespace" 3-4 septets I guess they could be category/format/type ? With "type" being 0 for unspecified semantics. Or a second mime namespace could be added with a longer varuint representation. Heck, just adding another "prefix" for xml would get rid of half of application/* (mime/application/svg+xml vs mime-xml/svg or mime-xml/image/svg)

Stebalien · 2020-01-30T05:37:22Z

Redefining MIME types is likely way outside of the scope of this project. This project is primarily concerned with defining short "codes" for arbitrary things.

jmgurney · 2021-03-09T00:04:56Z

Not sure where to put this comment, if in a new issue, or else where, but I'd like to see a mime-type column in the table. It should be unique, as codec parameters can be specified for things that are not. This would help ensure that duplicate entries are not added, but also provide a way to automatically map from mime-types to multicodecs w/o people building their own [and possibly incorrect] tables of such.

lewisl9029 · 2021-09-16T23:47:55Z

Made this comment in the PR, but reposting here for visibility:

Looks like this effort has been stalled for a while, mostly due to concerns around the drastic increase in table size?

The readme of the project describes a first-come, first-serve policy when it comes to adding new codecs, and I wonder if we could maybe apply that here as well with mime types. I.e. maybe we can start with a small handful of the most commonly used MIME types on the internet today (say, this list), and then add more over time based on demand, instead of dumping in all known mime-types at once?

Is there some particular need for all the mime types to be in a contiguous block that I'm not aware of?

#159 (comment)

bugeats · 2022-09-09T18:11:41Z

Just my two cents: it looks like concern over achieving the perfect encoding of mime types is what has stalled this (long overdue) work. I would suggest being much less precious about it, and just treating mime types like a legacy format.

The existing codec name plaintextv2 (0x706c61) is a conceptually different thing from the mime type text/plain. They signal different intents.

Content that has been encoded with the intent of running on a HTTP server (a legacy protocol in the context of multiformats) can and should use multicodec encoded mime type mappings because that is the nature of the content.

If in the future a more idiomatic multicodec schema is designed for stuff like images, then those codecs can be added in addition to the existing legacy mime-type mappings.

There's plenty of byte real-estate. No need to be precious.

bochaco mentioned this issue Sep 10, 2018

feat: adding MIME types as codecs #84

Closed

Stebalien linked a pull request Feb 1, 2020 that will close this issue

feat: assign codes for MIME types #159

Open

2 tasks

Stebalien mentioned this issue Apr 14, 2020

Dealing with multimedia #168

Closed

aschmahmann mentioned this issue Nov 1, 2021

multiformat code for CARs #239

Closed

Winterhuman mentioned this issue Sep 27, 2022

IPIP-305: CIDv2 - Tagged Pointers ipfs/specs#305

Closed

sshmatrix mentioned this issue Nov 6, 2023

[Proposal] ENSIP-17: DataURI Format in Contenthash ensdomains/docs#165

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mimetypes as codes #4

Mimetypes as codes #4

Fstub42 commented May 22, 2016

jbenet commented Jul 31, 2016

bochaco commented Sep 5, 2018

bochaco commented Sep 7, 2018 •

edited

ghost commented Sep 12, 2018

ghost commented Sep 12, 2018 •

edited by ghost

bochaco commented Sep 13, 2018

ConsciousCode commented Jan 29, 2020

Stebalien commented Jan 29, 2020

Stebalien commented Jan 29, 2020

ConsciousCode commented Jan 30, 2020 •

edited

Stebalien commented Jan 30, 2020

jmgurney commented Mar 9, 2021

lewisl9029 commented Sep 16, 2021

bugeats commented Sep 9, 2022

Mimetypes as codes #4

Mimetypes as codes #4

Comments

Fstub42 commented May 22, 2016

jbenet commented Jul 31, 2016

bochaco commented Sep 5, 2018

bochaco commented Sep 7, 2018 • edited

ghost commented Sep 12, 2018

ghost commented Sep 12, 2018 • edited by ghost

bochaco commented Sep 13, 2018

ConsciousCode commented Jan 29, 2020

Stebalien commented Jan 29, 2020

Stebalien commented Jan 29, 2020

ConsciousCode commented Jan 30, 2020 • edited

Stebalien commented Jan 30, 2020

jmgurney commented Mar 9, 2021

lewisl9029 commented Sep 16, 2021

bugeats commented Sep 9, 2022

bochaco commented Sep 7, 2018 •

edited

ghost commented Sep 12, 2018 •

edited by ghost

ConsciousCode commented Jan 30, 2020 •

edited