Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFE: Add MIME classification of all files to packages #1096

Open
pmatilai opened this issue Mar 4, 2020 · 2 comments
Open

RFE: Add MIME classification of all files to packages #1096

pmatilai opened this issue Mar 4, 2020 · 2 comments
Labels
RFE v6 Related to rpm v6 (readiness)
Milestone

Comments

@pmatilai
Copy link
Member

pmatilai commented Mar 4, 2020

Rpm has added a libmagic file classification string for a bunch of hardcoded file types to headers since v4.2 or thereabouts. This data has the potential to being useful for all sorts of purposes, but libmagic strings being such volatile beasts means using them is difficult at best, and there's no way to translate those strings to MIME which is what most users would expect and prefer. Also the data is not stored for all files which reduces its usability greatly.

We should add MIME type for all files into the headers. Unlike the magic strings, this is standard data and also compresses well using a simple dictionary approach.

The biggest open question to me is whether we can reuse FILECLASS tag set for this purpose or not. That data is increasingly bloaty because of increasing uniqueness of libmagic strings (buildid hashes, image sizes etc) that don't compress into a dictionary, and if there are no users that actually care about this data, or at least couldn't just as(or more) easily use MIME instead...
AFAIK few things look at FILECLASS, simply because its so erratic. IIRC rpmlint (or something similar) does, but would be better served by MIME type.

Thoughts?

@pmatilai
Copy link
Member Author

pmatilai commented Mar 4, 2020

Oh and just FWIW, the more I look at the rpmfcTokens table, the tempted I'm to axe all that and replace with nice and simple MIME data...

pmatilai added a commit to pmatilai/rpm that referenced this issue Mar 5, 2020
File magic strings are unreliable and largely unusable for anything
but human consumption, MIME types are far more meaningful for
classifying file types. Populate RPMTAG_FILECLASS (or rather, CLASSDICT)
with MIME type instead, and add types for all files and not just our
strange hardcoded list. Remove now redundant cruft.

Fixes: rpm-software-management#1096
@pmatilai pmatilai moved this from To do to In progress in Use MIME types in favor of "magic" strings Mar 5, 2020
pmatilai added a commit to pmatilai/rpm that referenced this issue Mar 11, 2020
Add new tags, rpmfiles APIs and other infra to support storing and
querying file MIME types. Store MIME type for all files, stop adding
rather arbitrarily filtered file "class" data as this is bloated and
relatively useless data, remove related cruft.

Fixes: rpm-software-management#1096
pmatilai added a commit to pmatilai/rpm that referenced this issue Mar 11, 2020
Add new tags, rpmfiles APIs and other infra to support storing and
querying file MIME types. Store MIME type for all files, stop adding
rather arbitrarily filtered file "class" data as this is bloated and
relatively useless data, remove related cruft.

Fixes: rpm-software-management#1096
pmatilai added a commit to pmatilai/rpm that referenced this issue Mar 20, 2020
Add new tags, rpmfiles APIs and other infra to support storing and
querying file MIME types. Store MIME type for all files, stop adding
rather arbitrarily filtered file "class" data as this is bloated and
relatively useless data, remove related cruft.

Fixes: rpm-software-management#1096
@pmatilai pmatilai moved this from In progress to To do in Use MIME types in favor of "magic" strings Mar 7, 2023
@pmatilai pmatilai added the v6 Related to rpm v6 (readiness) label Mar 12, 2024
@pmatilai pmatilai added this to the 6.0.0 milestone Mar 12, 2024
@pmatilai
Copy link
Member Author

Actually, v6 is the place where we can and should flick this particular switch. The libmagic strings in headers make no sense, but for v4 dropping them is a compat break.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
RFE v6 Related to rpm v6 (readiness)
Projects
Status: Todo
1 participant