Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvements on Trie handling #719

Merged
merged 5 commits into from Mar 13, 2019

Conversation

Projects
None yet
7 participants
@diega
Copy link
Contributor

commented Dec 7, 2018

A refactor for splitting the Trie and its underlying store is planned but not tackled by this PR. Having that in mind, I don't think integrating this changes should make that task harder

@diega diega requested review from tinchou and SergioDemianLerner Dec 7, 2018

@diega diega force-pushed the perf_fixes branch 10 times, most recently from 15aa5e2 to f510c5d Dec 7, 2018

@diega diega requested a review from lsebrie Dec 10, 2018

@tinchou
Copy link
Contributor

left a comment

Review first pass

@diega diega force-pushed the perf_fixes branch from f510c5d to 402923f Dec 10, 2018

@diega diega force-pushed the perf_fixes branch 14 times, most recently from cb194e3 to d668abd Dec 11, 2018

@aeidelman aeidelman requested a review from SergioDemianLerner Dec 21, 2018

@lsebrie
Copy link
Contributor

left a comment

LGTM

@diega diega force-pushed the perf_fixes branch 2 times, most recently from 10d80a8 to 3a52104 Dec 27, 2018


// don't move unchanged keys unnecessarily
Map<ByteArrayWrapper, byte[]> rowsToUpdate = new HashMap<>(rows);
rowsToUpdate.entrySet().removeIf(e -> Arrays.equals(e.getValue(), committedCache.get(e.getKey())));

This comment has been minimized.

Copy link
@SergioDemianLerner

SergioDemianLerner Dec 27, 2018

Contributor

minor: missing optimization: the commitedCache has null values for those entries that were removed, therefore if a large batch of keys are removed several times, updateBatch won't detect it, and every time the uncommittedKeysToRemove will be loaded by the same keys. And every time the removals will be applied to the base data source.
The benefit of using null for removals (both in committed and uncommited sets) is that you get every optimization for free.
Now you'll have to first remove from keysToRemove all keys that have been commited to be removed. tongue-twistter.

This comment has been minimized.

Copy link
@diega

diega Jan 2, 2019

Author Contributor

if a large batch of keys are removed several times, updateBatch won't detect it, and every time the uncommittedKeysToRemove will be loaded by the same keys

I cannot see how it's less optimal to implement a mechanism to detect recurrent removal attempts than just add that collection to the uncommittedKeysToRemove set. It may be more readable to make the uncommitedCache support null semantics tough. I'll implement this as a separate commit so you can review it, and if it looks better I'll squash that solution before merge

This comment has been minimized.

Copy link
@SergioDemianLerner

SergioDemianLerner Jan 2, 2019

Contributor

Because the "base" is assumed to be slower (for deletes, even if the element does not exists), and that's why the cache is needed.

Objects.requireNonNull(key);
ByteArrayWrapper wrappedKey = ByteUtil.wrap(key);
// fully delete this element
committedCache.remove(wrappedKey);

This comment has been minimized.

Copy link
@SergioDemianLerner

SergioDemianLerner Dec 27, 2018

Contributor

Minor minor: prevention of deleted deletes

This comment has been minimized.

Copy link
@diega

diega Jan 2, 2019

Author Contributor

do you mean to add a committedCache.containsKey(wrappedKey) check before invoking the removal itself? I don't see what would be the benefit

This comment has been minimized.

Copy link
@SergioDemianLerner

SergioDemianLerner Jan 2, 2019

Contributor

it prevents the uncommittedCache.put(wrappedKey, null) to perform a later base.delete(), because we assume the base DB is slower than a committedCache.containsKey()

Map<ByteArrayWrapper, byte[]> rowsToUpdate = new HashMap<>(rows);
rowsToUpdate.entrySet().removeIf(e -> Arrays.equals(e.getValue(), committedCache.get(e.getKey())));

committedCache.keySet().removeAll(rowsToUpdate.keySet());

This comment has been minimized.

Copy link
@SergioDemianLerner

SergioDemianLerner Jan 2, 2019

Contributor

The "delete delete" optimization has not yet been addressed.
All this key filtering would not be needed if updateBatch() received a single list, with null values for deletions, as was in the original.
Which takes me to realize that the best implementation would be to merge rows with keysToRemove first, and then proceed without filtering. (suggested code follows in a following comment)

This comment has been minimized.

Copy link
@SergioDemianLerner

SergioDemianLerner Jan 2, 2019

Contributor

Philosophic side note: There should be a balance between the simplicity of an interface vs the simplicity of an implementation. If a simple interface leads to a complex implementation, then perhaps it's better to have a no-so-simple interface, with straightforward implementation.

This comment has been minimized.

Copy link
@SergioDemianLerner

SergioDemianLerner Jan 2, 2019

Contributor

Final suggested code:

public void updateBatch(Map<ByteArrayWrapper, byte[]> rows, Set<ByteArrayWrapper> keysToRemove) {
         if (rows.containsKey(null) || rows.containsValue(null)) {
             throw new IllegalArgumentException("Cannot update null values");
        Map<ByteArrayWrapper, byte[]> combinedRows= new HashMap<>(rows);
        keysToRemove.forEach((keyToRemove) -> combinedRows.put(keyToRemove, 
 null));
       combinedRows.entrySet().removeIf(e -> Arrays.equals(e.getValue(),       uncommittedCache.get(e.getKey())));
       combinedRows.entrySet().removeIf(e -> Arrays.equals(e.getValue(),       committedCache.get(e.getKey())));
       committedCache.removeAll(combinedRows);
       uncommitedCache.putAll(combinedRows);
}

@tinchou tinchou force-pushed the master branch from 1f7d068 to 9dcb945 Jan 4, 2019

@diega diega force-pushed the perf_fixes branch 3 times, most recently from 7dcb785 to ce58573 Jan 4, 2019

@diega diega added the unitrie label Jan 10, 2019

@diega diega force-pushed the master branch from 36bbce3 to 90acd7d Mar 1, 2019

@diega diega force-pushed the perf_fixes branch from ce58573 to a32c98d Mar 1, 2019

diega and others added some commits Dec 6, 2018

Don't go to store if we want to retrieve the same base trie
Co-authored-by: Sergio Demian Lerner <sergio@rsk.co>
Change KeyValueDataSource#updateBatch signature
Co-authored-by: Sergio Demian Lerner <sergio@rsk.co>
Allign KeyValueDataSource implementations
Co-authored-by: Sergio Demian Lerner <sergio@rsk.co>
Add Datasource cache for account states
Co-authored-by: Sergio Demian Lerner <sergio@rsk.co>

@lsebrie lsebrie force-pushed the perf_fixes branch from a32c98d to 6646d4e Mar 13, 2019

@rskops

This comment has been minimized.

Copy link

commented Mar 13, 2019

SonarQube analysis reported 4 issues

  1. MAJOR TrieImpl.java#L814: Method co.rsk.trie.TrieImpl.split(TrieKeySlice) appears to call the same method on the same object redundantly rule
  2. MINOR TrieImpl.java#L519: Return a copy of "value". rule
  3. INFO LevelDbDataSource.java#L281: Constrained method org.ethereum.datasource.LevelDbDataSource.updateBatch(Map, Set) converts checked exception to unchecked rule
  4. INFO LevelDbDataSource.java#L291: Constrained method org.ethereum.datasource.LevelDbDataSource.updateBatch(Map, Set) converts checked exception to unchecked rule

@lsebrie lsebrie merged commit e853b49 into master Mar 13, 2019

3 checks passed

ci/circleci Your tests passed on CircleCI!
Details
default Build finished.
Details
sonarqube SonarQube reported 4 issues, no criticals or blockers

@lsebrie lsebrie deleted the perf_fixes branch Mar 13, 2019

@aeidelman aeidelman added this to the Orchid v0.6.2 milestone Apr 19, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.