New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use MIME types instead of magic strings for RPMTAG_FILECLASS #1099
Conversation
libmagic strings are notoriously unreliable as the details from version to version. We link to libelf anyway so we might as well as get the info straight from the horse's mouth. Besides being more reliable, this detaches the coloring business from the hardcoded rpmfcTokens struct and informative-only FILECLASS contents, opening the door for other changes in that area.
In addition to "magic" strings, support classifying files by matches on their MIME types which are far more predictable than the notoriously volatile magic strings which are intended for human consumption. This adds an optional %__foo_mime and %__foo_exclude_mime patterns to file attributes. If the mime-variant is present, magic is ignored if present (with a warning). The testcase is a fine example of how the grass not necessarily being any greener on the other side: the actual output is far more predictable, but the actual classification is not. Our "script" with an arbitrary unknown interpreter is considered text/plain by libmagic. Fixes: rpm-software-management#1097
This is a stupid and simple test, but then with file strings that's just the kind of thing you want as we don't want testsuite failures from file string changes.
File magic strings are unreliable and largely unusable for anything but human consumption, MIME types are far more meaningful for classifying file types. Populate RPMTAG_FILECLASS (or rather, CLASSDICT) with MIME type instead, and add types for all files and not just our strange hardcoded list. Remove now redundant cruft. Fixes: rpm-software-management#1096
I'm not sure it's a good idea to replace file magic entirely with MIME, as MIME tends to get identifiers a lot less frequently than file magic identifiers. And there's a large world of dependency generators that rely on file magic filters, and that's something I'm not sure I want to break unless we're doing a new major version. |
This doesn't affect dependency generators at all. This only replaces the gunk that goes into RPMTAG_CLASSDICT in packages. |
@pmatilai Hmm, I guess I misread this. It seemed like you wanted to transition everything away from file magic to MIME. If this is just scoped to the internal classification of content in RPM, I don't have as much of an objection. It would wind up being a file format change, though, since we're changing what kind of values to expect there. |
This isn't "internal" classification either, because its very public in the headers. Hence the request for discussion. The thing is, the existing RPMTAG_CLASSDICT content is intolerable goo. So in my view, we have two options:
The problem with 1) is that we need to carry a lot of baggage for a little-used feature, the problem with 2) is that there are users (rpmlint at least) and if we just hijack it, you don't really know what data to expect there. |
While looking at this, I rediscovered (oh the things you write and forget) makeClass() in lib/tagexts.c. |
"Plan 1" now implemented for comparison in #1114. |
Okay I suppose its become clear this is not the way to go, closing. |
File magic strings are unreliable and largely unusable for anything but human consumption, MIME types are far more meaningful for classifying file types. Populate RPMTAG_FILECLASS (or rather, CLASSDICT) with MIME type instead, and add types for all files and not just our strange hardcoded list. Remove now redundant cruft.
Fixes: #1096
Note that this is only an RFC as a basis of discussion. I know we want to include MIME data, but I'm not sure we can just replace the cruft that is currently going into RPMTAG_CLASSDICT. OTOH I'd really rather not duplicate all that to another tag set just to preserve cruft that almost nobody uses (AFAIK), such a patch would have very different line-count.
Oh and this on top of #1098 and #1095 which are pre-requisite for doing this.
Thoughts?