-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Diffs very slow #358
Comments
Is this always the case, or just for specific notebooks? The diffing algorithm in use has not been optimized a lot. If this is for specific notebooks, it would be very helpful if you could supply an example set of such notebooks. We could add those to the set of test notebooks we use, and include them in an optimization effort! |
PS: If the notebooks in question are not publicly sharable, you could possibly send them to me privately, and I would try to profile the differ. Then, I could engineer a new set of notebooks that trigger the same snags, but which does not include the original content. |
Actually let's close this for now, since I can't recreate it on the notebooks I'm working on today. I'll re-open w/ example notebooks if it happens again. |
Whilst this is a closed issue, I figured adding some additional context might be useful for future discussions. I'm working on private (work) Notebooks that diff and merge very slowly. I'm using nbdime 1.0.0 (thanks - this is a great tool!). My primary Notebook has circa 200 cells, lots of embedded I do diffs using This Notebook is, pragmatically speaking, far too large and ought to be refactored. It is a part of a research pipeline and is still evolving. Right now it won't be refactored and two of us occasionally work on it, hence the use of Hopefully the above could provide enough ideas for someone to create a dummy long Notebook to test speed issues? |
Is this with the The current diffing algorithm is known to not scale very well with increasing number of items to compare. There is an issue for re-implementing this part using the Meyers algorithm here: #3 This should allow the diffing performance to scale much better. That being said, it would be nice to profile such huge notebooks to verify where it's spending the time. |
Yes using In addition - using Maybe a useful "beginner bug" that could be filed would be for someone to make some "realistic and large" Notebooks using different tools (e.g. pure Python in 1, some Pandas DataFrames in another, some Matplotlib output in another) to see which ones are diff'd poorly. That might help getting others to prioritise how the diffing is done? |
It took a really long time for this: https://github.com/cloudxlab/ml/blob/master/deep_learning/tensorflow_keras_regression.ipynb I had made some changes to this notebook and had enabled nbdime in git using 'nbdime config-git --enable'. I was trying to do 'git diff tensorflow_keras_regression.ipynb'. It took more than 10mins and then I gave up. After disabling nbdime, "git diff" worked pretty well. Hope that helps. Also, when I was killing git diff with CTRL+C, I noticed that it is breaking at some method of nbdime which is computing lowest common subsequence. So, the chances are that implementation is slow. So, probably moving the logic from python to cython might improve. Also, I would suggest that instead of implementing entire diff logic again, use existing libraries (probably that might hurt compatibility.) |
@girisandeep Thanks for the report! This seems like a bug rather than an inherent problem with the algorithm (it is not optimized, but >10 min is still completely out of bounds). Which versions (git tags) were you comparing? In order to reproduce locally, I would need both versions of the notebook. Also for reference: Which version of nbdime/python where you using? |
@girisandeep Actually, I was able to reproduce based on the most recent commit on that file. I'll open a separate issue for it once I've profiled it a little closer. Thanks again! |
I know this is closed, but this is happening to me, so just posting my environment for others in case they encounter this issue too. |
While the diffs are parseable, and can be used to create a Merge Review, diffing these 2 small files took a good minute. It seems to be a known issue: jupyter/nbdime#358
When I use nbdiff it takes several minutes for the diff to appear. This happens when I do it from the terminal, use the web diff command, or use the jupyter extension.
The text was updated successfully, but these errors were encountered: