New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory issue with 0.4.0+ #77
Comments
If I am correct 0.4.0 is the update that removed Apache Commons. The way @vishwakarma implemented the current algorithm uses an array for collecting the diffs and then produces the result. If I am not wrong, the LCS algorithm from Apache uses a kind of cursor called "Snake" that accumulates the result just by transversing the two lists without creating extra objects, just adding a counter. I will take a look in the 0.3.10 and the current version using a profiler. Do you have some dummy data so I can replicate it? |
I've made a MWE in gist. |
@ov7a Agreed, it can because of the nature of LCS algorithm ( internally used in case of arrays ) due to its runtime & spacetime complexity. I will work on it with your example as benchmark to improve it. |
@vishwakarma I did not have the time for profiling it yet, but I checked the Apache implementation (more importantly, this) and it uses two arrays as "buffers" instead of a matrix (our implementation). Maybe thats the cause? I will investigate it further. I had just a quick look, so this is more a guess than anything else. I will try to take a serious look at this on the weekend. |
The problem is more severe than I though. But first, a brief disclaimer: I rarely use profilers. In this case I used VirtualVM. So if the results are a bit off I beg your pardon. 0.3.10
0.4.4
@vishwakarma What are your thoughts. Should we try to reintroduce the Apache Commons dependency or solve it with our own code? In my opinion, we should reintroduce Apache Collections for now. I was digging into Apache Collections and the way they produce their Edit: Fixed Markdown for images. Edit2: I tested some more and the 10 MB difference from 0.3.10 and 0.4.4 (with Apache) is due to the better string concatenation and the precompiled Regexes that were introduced |
I agree, let's add Apache library for now to unblock the users and fix the
reported issue.
That will buy us some time to look into Apache implementation and fix with
our own code.
Please add the apache library back, I will review and test it.
Thanks
…On Mon, Aug 20, 2018, 9:07 AM Luiz Felipe da Cruz Oliveira < ***@***.***> wrote:
The problem is more severe than I though.
But first, a brief disclaimer: I rarely use profilers. In this case I used
VirtualVM. So if the results are a bit off I beg your pardon.
0.3.10
- No VM args:
[image: 0.3.10]
<https://camo.githubusercontent.com/1e72a411fde36c48f0d75471da0c620ee27d0986/68747470733a2f2f692e696d6775722e636f6d2f717573634634792e706e67>
We have very linear heap allocation and little spike present on the
creation of the final patch object.
0.4.4
- No Vm args: `OutOfMemoryError´.
- -Xms2048m -Xmx2048m: `OutOfMemoryError´.
- -Xms4096m -Xmx4096m:
[image: 0.4.4]
<https://camo.githubusercontent.com/dc1fdb02e87695887267ef2602923e7b3c8ebd94/68747470733a2f2f692e696d6775722e636f6d2f447a74525a64662e706e67>
The program doesn't even run without VM args. The heap allocation is
also linear, without any spikes, but the heap memory used is almost 20x
more than the 0.3.10 version.
- Replaced current LCS implementantion with 0.3.10 implementation:
!(Apache 0.4.4)[https://i.imgur.com/pAlK2jR.png]
The memory allocations revert back to 0.3.10.
@vishwakarma <https://github.com/vishwakarma> What are your thoughts.
Should we try to reintroduce the Apache Commons dependency or solve it with
our own code?
In my opinion, we should reintroduce Apache Collections for now. I was
digging into Apache Collections and the way they produce their EditScript
is really fascinating, we could learn a lot from there and apply the same
techniques in the rest of our library in the future.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#77 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AB-YqbXl34RjjuWbDGUDTgJBsSxPfZaJks5uSi7ogaJpZM4V95xU>
.
|
Fixed by PR #78 |
It seems that In newer versions (0.4.0+) JsonDiff consumes significantly more memory. Most probably this is related to #60 and getLCS method.
For version
0.4.4
:I'm comparing two quite large jsons, around 700 Kb each, with around 300Mb of memory.
It is caused by comparing two arrays with around 40 000 of strings each.
Version 0.3.10 works just fine.
The text was updated successfully, but these errors were encountered: