Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Takes hours for finish a 300kb diff file #67

Closed
rtfpessoa opened this issue Apr 13, 2016 · 7 comments
Closed

Takes hours for finish a 300kb diff file #67

rtfpessoa opened this issue Apr 13, 2016 · 7 comments

Comments

@rtfpessoa
Copy link
Owner

MOVED FROM diff2html-cli#17

HI,
I really love the tool and currently running it under windows.
however my git diff file is around 300KB, the tool takes 3 hours to finish , without any output file (am using -F option). memory usage is around 800MB.
Just wondering if you have encountered the same issue before?

Tried diffy.org without no issues at all.
https://diffy.org/diff/4wng00ndqz7iudi

thanks.
Travis
diffReport.txt

@rtfpessoa
Copy link
Owner Author

@escitalopram I did some debugging and the Rematch.distance(amod, bmod) algorithm is taking too long, and maybe getting an infinite loop or something.
Do you have any idea?

@escitalopram
Copy link
Contributor

I'll have a look

@escitalopram
Copy link
Contributor

The problem seems to be triggered by large blocks of changes, like OASIS.csproj having 2,2k lines added and removed in one block. The algorithm is O(nm) time with n lines added and m lines removed in a single block, starting almost 5 million levenshtein distance calculations, which are in turn O(op) time with o,p being the line lengths. I'd suggest we'll just disable the line matching on blocks larger than say n*m=2500 (and maybe make that limit configurable).

The memory hunger will probably go away with that, too, because there is some cache for distance function results. If that isn't enough, maybe I could also introduce some hash function for the cache keys.

@rtfpessoa
Copy link
Owner Author

I think that is a great idea. Can you make a PR?

@escitalopram
Copy link
Contributor

Which branch should I base it on?

@rtfpessoa
Copy link
Owner Author

master

@rtfpessoa
Copy link
Owner Author

rtfpessoa commented Apr 14, 2016

Fixed by #68 in release 2.0.0-beta10

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants