Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Match produces 3 MB of basis information #111

Closed
nkrabben opened this issue Dec 27, 2017 · 4 comments
Closed

Match produces 3 MB of basis information #111

nkrabben opened this issue Dec 27, 2017 · 4 comments
Assignees

Comments

@nkrabben
Copy link

Running siegfried across a collection, it's really slowing down on a group of mp3's. https://www.nationalarchives.gov.uk/pronom/fmt/134

The basis field for these matches can be up to 3,000,000 characters long with repetitions of similar data such as [795 105] * 9,650 and [807 105] * 13,068. I'm not sure what's causing this. If useful, I can probably provide a copy of the files causing this bug in the new year.

@richardlehane
Copy link
Owner

thanks Nick - this is something that's cropped up before (#94). My previous fixes have been stop-gaps but I've been tinkering with a more fundamental fix & it is good to have this as a prompt to work on it.

Basically this issue occurs for "noisy" signatures with multiple segments that generate lots of partial matches. fmt/134 is probably the worst offender. I'm currently too exhaustive in following up these matches - I'd like to make this bit of the code "lazier", it has just been hard to do so without breaking everything.

@richardlehane
Copy link
Owner

Hi @nkrabben I'm working on this issue now - if you have a sample you can share (either here or via email to keep private) would be great help

@richardlehane
Copy link
Owner

this is partly fixed (the verbose basis bit) in v1.7.9. But I have more work to do on the speed side of this issue, so re-opening

@richardlehane richardlehane reopened this Aug 30, 2018
@richardlehane
Copy link
Owner

second part of this bug (slowdown for MP3 with lots matches) is now fixed on develop branch and will be in next release, see this issue: #128

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants