Be Less Confident About "Guesses" #54

ghalfacree · 2024-02-16T16:14:03Z

I'm aware that, in its present incarnation, Magika is trained on a very small subset of all possible file types, and as as result when fed types on which it has not been trained incorrect responses are to be expected - but, at present, it's throwing out some very bizarre guesses.

As a worst-case experiment, Magika was run across the root directory of a CU Amiga cover CD from 1995 - full of files definitely not in the training dataset, and most likely not in the test corpus either.

Here we see several "unknown" results, which are to be expected, but also several completely-incorrect guesses: Amiga INFO Icon files are misidentified as BMP images, TIFF images, and ISO 9660 ROM images(!) - despite them all being the same format.

As feeding a tool that's expecting an ISO 9660 image an Amiga icon is likely to end poorly, I'd suggest the tool needs to be less confident when encountering something outwith its training data.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Be Less Confident About "Guesses" #54

Be Less Confident About "Guesses" #54

ghalfacree commented Feb 16, 2024

Be Less Confident About "Guesses" #54

Be Less Confident About "Guesses" #54

Comments

ghalfacree commented Feb 16, 2024