Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Be Less Confident About "Guesses" #54

Open
ghalfacree opened this issue Feb 16, 2024 · 0 comments
Open

Be Less Confident About "Guesses" #54

ghalfacree opened this issue Feb 16, 2024 · 0 comments

Comments

@ghalfacree
Copy link

I'm aware that, in its present incarnation, Magika is trained on a very small subset of all possible file types, and as as result when fed types on which it has not been trained incorrect responses are to be expected - but, at present, it's throwing out some very bizarre guesses.

As a worst-case experiment, Magika was run across the root directory of a CU Amiga cover CD from 1995 - full of files definitely not in the training dataset, and most likely not in the test corpus either.

image

Here we see several "unknown" results, which are to be expected, but also several completely-incorrect guesses: Amiga INFO Icon files are misidentified as BMP images, TIFF images, and ISO 9660 ROM images(!) - despite them all being the same format.

As feeding a tool that's expecting an ISO 9660 image an Amiga icon is likely to end poorly, I'd suggest the tool needs to be less confident when encountering something outwith its training data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant