-
-
Notifications
You must be signed in to change notification settings - Fork 343
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move magic patterns to JSON file #39
Comments
Why? |
I guess this would just be reinventing libmagic but it would be helpful to have one large JSON file that can be used/interpreted. |
Ok, sure if you do a PR. Make sure to preserve the order as a few of the checks are order-dependent. |
I like! Note that the mime-type module have the definitions in a single JSON file, they maintain its own repo for the db (https://www.npmjs.com/package/mime-db) How would you use multiple "patterns" would you OR them? Lines 128 to 138 in 14a48a3
How would you define something like avi, where you have buf 0-3 AND buff 8-10 ? Line 176 in 14a48a3
|
The current way signatures are tested have an inherent logic to them (no pun intended), so anything that moves the complexity away from the the 'schema' of how the tests are structured seems like goldplating the architecture. However, @thisconnect @matthewbauer, something that keeps to the simplicity of actual JS would be possible, maybe a reference JSON and a method to compare the stored signatures would be ideal, some thing like (ES5ish psuedocode 😅):
At a glance, most of the complexity would may stored in the JSON and how it's formatted. Moving past that and it's almost like defining a new configuration language, ie. building an interpreter. Thoughts? |
yes that is exactly the point. reference.json should contain the patterns and the logic otherwise there is no point in doing this at all IMO. A signature should contain multiple patterns that are OR'ed and each of these have mutliple rules that are AND'ed. The script should iterate through the signature (and make sure to preserve the order) and test if ever rule is true for one of the patterns. |
I also wonder whether it could be done in more of a hash-map kind of arrangement, rather than an array of possible matches which needs to be looped over. Also it's worth noting that the current method of having static code is likely to be quite a bit faster than running against an array, due to code optimizations. You might want to generate code from the array ahead of time to mitigate this. |
Totally agreed. It's easier to argue "breaking out the file data into a JSON file" is better as a separate project. |
I agree 100% with what @alexanderlperez said. I don't see any reason to separate the checks. The only reason would be to brag about how everything is nicely separated and so clean (aka "goldplating"). But separation creates complexity. Or in other words: I came to this repo, opened the index.js and immediately understood what's going on. It's simplicity is beautiful. It's self documenting. It's trivial to port to other languages. Please don't change it just for the sake of change. drops mic |
I could see this being more of a tree structure, starting at byte 0 and matching on that (and maybe using a hex string and/or Wouldn't be very human readable, but would be relatively fast. |
Hello, guys. If someone still to be interested: https://github.com/dimapaloskin/detect-file-type |
The problem with this simple declarative mechanism is that not all common files can be detected this way. Some have more dynamic detection mechanisms. The best thing would be to implement support for libmagic's magic files, so we can reuse all their declaration files. See #68 |
Closing as this would not be feasible for many file formats. |
I'm thinking something like this:
Would that be worthwhile? Just trying to get some input.
The text was updated successfully, but these errors were encountered: