Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
Match produces 3 MB of basis information #111
Running siegfried across a collection, it's really slowing down on a group of mp3's. https://www.nationalarchives.gov.uk/pronom/fmt/134
The basis field for these matches can be up to 3,000,000 characters long with repetitions of similar data such as [795 105] * 9,650 and [807 105] * 13,068. I'm not sure what's causing this. If useful, I can probably provide a copy of the files causing this bug in the new year.
thanks Nick - this is something that's cropped up before (#94). My previous fixes have been stop-gaps but I've been tinkering with a more fundamental fix & it is good to have this as a prompt to work on it.
Basically this issue occurs for "noisy" signatures with multiple segments that generate lots of partial matches. fmt/134 is probably the worst offender. I'm currently too exhaustive in following up these matches - I'd like to make this bit of the code "lazier", it has just been hard to do so without breaking everything.