Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test on data folder 2 not passing #7

Closed
igor84 opened this issue Feb 17, 2020 · 13 comments
Closed

Test on data folder 2 not passing #7

igor84 opened this issue Feb 17, 2020 · 13 comments

Comments

@igor84
Copy link

igor84 commented Feb 17, 2020

It seems the commit 264ea17 "performance tweak" broke the test on input data in folder "2". The delta that is created no longer corresponds to the delta that was generated by the original tool. On the other hand if that new delta is applied to original data, target data is generated correctly.

I can make a pull request with updated delta file in folder 2 if you agree that is ok to do. This will mean generate-deltas.sh script can't be used any more to regenerate delta data.

@endel
Copy link
Owner

endel commented Feb 17, 2020

Hi @igor84, thanks for diving deep on this. I wasn't aware that #4 had introduced this breaking change. Maybe it makes more sense to revert that, since the deltas should be compatible with the encoding/decoding of other implementations (JavaScript, LUA, etc)

@igor84
Copy link
Author

igor84 commented Feb 19, 2020

Hi @endel, there is one more option. We can add a bool parameter to Delta.Create called useNonStandardOptimizations or something like that and only do this new optimization if that is true. It is up to you, so let me know if you would like that or if you still prefer the revert and I'll make the pull request.

@jasonsparc
Copy link

Hi @endel and @igor84

Although, I'm not really an expert about the algorithm, nor had I performed benchmarks, but I think lines 83 to 87 in, 264ea17#diff-993db0c7e13460f7683a59bed31a7144R83-R87
(of PR #4) …will not actually provide a performance gain, since that actually adds an unnecessary check (the if statement at line 83) within the supposedly hot-loop. Therefore, if that condition is never executed, it's just wasted CPU cycles. But, that's just my naive observations though, and I might be pitifully incorrect. And if so, please forgive me for my naiveness, and freely criticise me as you wish (but please don't murder me 😅).

However, if you're also thinking the same way as mine, or think that the optimization might not be worth it at all, then I would strongly vote that lines 83 to 87 of PR #4, be reverted instead.

@endel
Copy link
Owner

endel commented Feb 21, 2020

@israellot thoughts? 👀

@israellot
Copy link
Contributor

israellot commented Feb 21, 2020

Thanks for the feedback @jasonsparc !
You made a fair observation, my understanding is that it all depends on the data.
If dealing with large data and small changes, that check jumps a lot of unnecessary hashing. That's due to the fact that if I know I am inside the region that is evaluated already as the best one, I can simply skip all the landmarks inside it since they will only lead to the exact same region being found.
Reversing the landmarks leads to bigger matches being found first, which opened the possibility for this second optimization. The real hot path for this code isn't really that loop per se, but the hashing, so less hashing less cpu spent.
Now, that may only pay off if your original and target data diff is small enough to produce large matching regions. This algorithm serves the purpose of creating deltas, which inherently means small changes over previous data, so I believe that's a fair compromise.

Regarding the unit test, the deltas are compatible as the format hasn't changed, or should be. I believe @igor84 was pointing out that the baked data used to compare against is different from the delta generated after the PR. It might be interesting trying to check why that is and what exactly is different from one output to another. I remember doing these cross-checks though, applying deltas from previous version on new one and vice-versa. Maybe something slipped.

@israellot
Copy link
Contributor

Just to complement, I believe the deltas generated by the previous version could be different from the deltas generated by the new version, but that doesn't mean they aren't compatible, it only means one has a different set of commands relative to the other, or commands in a different order as the search order is different on both versions.

@igor84
Copy link
Author

igor84 commented Feb 21, 2020

@israellot you right, deltas are different but they still work.

I tested with and without these changes using files from 60KB to 21MB (with generated deltas ranging from 2KB to 20MB) and difference in speed is somewhere positive, somewhere negative but either way it is negligible (bellow 1%) so I will submit a pull request reverting all the changes. That way deltas will remain the same as in other tools.

@israellot
Copy link
Contributor

That would need a proper benchmark.
On mine usually, performance would stay the same as you notice, but on use cases that affected me it would speed up delta generation quite a lot.
But hey, revert if you will, it's a good implementation either way. Cheers

@igor84
Copy link
Author

igor84 commented Feb 21, 2020

Sorry, I messed up the pull request. I wanted to submit another one and it got merged into existing pull request. How can I fix that?

@jasonsparc
Copy link

jasonsparc commented Feb 21, 2020

@igor84 You could git rebase -i <to-prev-commit-hash>, then force push, and GitHub should recognize the change. I know because I used to do something like that. But it depends on the repo settings if force-push are allowed.

I believe, by default, force push is disabled only for the master branch on GitHub. So, you could perhaps disable that first for the master branch of your fork, then force push.

@igor84
Copy link
Author

igor84 commented Feb 21, 2020

Thanks @jasonsparc. It seems that fixed it.

@jasonsparc
Copy link

@igor84 btw, I'm excited about that Stream support from that commit I just saw (before the rebase removed it), and I can't wait already for it to be implemented.

I hope to see it implemented some time soon. 😄

@igor84
Copy link
Author

igor84 commented Feb 22, 2020

Good to hear @jasonsparc :). It is done just for Delta.Apply function, not for Delta.Create but we found that useful because we are applying patches in the app, while we only create them at build time, so hopefully others will find it useful as well. It is implemented and I'll submit a pull request as soon as this one is merged.

@endel endel closed this as completed in dc0b289 Feb 22, 2020
endel added a commit that referenced this issue Feb 22, 2020
fixed #7 test on data folder 2 not passing by reverting performance tweak
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants