Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unlimited backtrace is impractical #9

Open
bertsky opened this issue Oct 9, 2018 · 1 comment
Open

Unlimited backtrace is impractical #9

bertsky opened this issue Oct 9, 2018 · 1 comment

Comments

@bertsky
Copy link

bertsky commented Oct 9, 2018

When globally aligning sequences that deviate much, combinatory explosion can quickly leed to excessive runtime memory consumption in the current implementation. And it is not always easy to detect those cases by score heuristics in a prior backtrace=False pass.

I believe these should be added:

  1. a package-exposed variable with a default limit (perhaps relative to the sequences' length)
  2. an optional parameter with an override limit to be able to control the quality-performance trade-off.

(The limit could be based on stack depth or number of alternatives, for example.)

Example: I am trying to align OCRed images of German Fraktur script with their corresponding ground truth text. Sometimes the OCR fails miserably like so:
Mitreden andrer 274. Günſtiger Eindruck der Staatsrathsſitzungen 274. (original line)
*0obe-ondrer '? '-änſiger Eindrue der Torerotheflgg,, (OCR result)
In this case, using StrictGlobalSequenceAligner tries to take more than 20 GB RSS (at which point I quit).

@bertsky
Copy link
Author

bertsky commented Oct 10, 2018

A workaround is to insert a limit into the current number of alignments as a second (non-terminal) case in backtraceFrom(): for example,

        elif len(alignments) > 100*max(f.shape):
            return # enough is enough

But surely there must be a cleaner way…

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant