Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Match produces 3 MB of basis information #111

Open
nkrabben opened this Issue Dec 27, 2017 · 3 comments

Comments

Projects
None yet
2 participants
@nkrabben
Copy link

nkrabben commented Dec 27, 2017

Running siegfried across a collection, it's really slowing down on a group of mp3's. https://www.nationalarchives.gov.uk/pronom/fmt/134

The basis field for these matches can be up to 3,000,000 characters long with repetitions of similar data such as [795 105] * 9,650 and [807 105] * 13,068. I'm not sure what's causing this. If useful, I can probably provide a copy of the files causing this bug in the new year.

@richardlehane

This comment has been minimized.

Copy link
Owner

richardlehane commented Dec 27, 2017

thanks Nick - this is something that's cropped up before (#94). My previous fixes have been stop-gaps but I've been tinkering with a more fundamental fix & it is good to have this as a prompt to work on it.

Basically this issue occurs for "noisy" signatures with multiple segments that generate lots of partial matches. fmt/134 is probably the worst offender. I'm currently too exhaustive in following up these matches - I'd like to make this bit of the code "lazier", it has just been hard to do so without breaking everything.

@richardlehane

This comment has been minimized.

Copy link
Owner

richardlehane commented Jan 10, 2018

Hi @nkrabben I'm working on this issue now - if you have a sample you can share (either here or via email to keep private) would be great help

@richardlehane

This comment has been minimized.

Copy link
Owner

richardlehane commented Aug 30, 2018

this is partly fixed (the verbose basis bit) in v1.7.9. But I have more work to do on the speed side of this issue, so re-opening

@richardlehane richardlehane reopened this Aug 30, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.