New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazily build hashtable for MapBlock #11791

Merged
merged 1 commit into from Nov 8, 2018

Conversation

@yingsu00
Contributor

yingsu00 commented Oct 26, 2018

Fix for #11808
Presto builds hashtable for MapBlocks eagerly when constructing the
MapBlock even it's not needed in the query. Building a hashtable could
take up to 30% CPU of the scan cost on a map column. This commit defers
the hashtable build to the time it's needed in SeekKey(). Note that we
only do this to the MapBlock, not the MapBlockBuilder to avoid complex
synchronization problems. The MapBlockBuilder will always build the
hashtable. As the result MergingPageOutput and PartitionOutputOperator
will still rebuild the hashtables when needed. The measurements shows
there will be less than 10% pages for MergingPageOutput to build the
hashtables. We will have a seperate PR to improve PartitionOutput
and avoid rebuilding the pages so as to avoid hashtable rebuilding.

Simple select checsum queries show over 40% CPU gain:

Test                          | After  | Before | Improvement
select 2 map columns checksum | 11.69d | 20.06d | 42%
Select 1 map column checksum  |  9.67d | 17.73d | 45%

@yingsu00 yingsu00 requested review from dain and wenleix Oct 26, 2018

@yingsu00 yingsu00 requested a review from haozhun Oct 26, 2018

@findepi

i just skimmed. I didn't intend to review this.

{
this.keyType = requireNonNull(keyType, "keyType is null");
// keyNativeHashCode can only be null due to map block kill switch. deprecated.new-map-block
this.keyNativeHashCode = keyNativeHashCode;
// keyBlockNativeEquals can only be null due to map block kill switch. deprecated.new-map-block
this.keyBlockNativeEquals = keyBlockNativeEquals;
this.keyBlockHashCode = keyBlockHashCode;

This comment has been minimized.

@findepi

findepi Oct 26, 2018

Member

add requireNonNull (or explanatory comment)

+ valueBlock.getRetainedSizeInBytes()
+ sizeOf(offsets)
+ sizeOf(mapIsNull)
+ sizeOfIntArray(offsets[positionCount] * HASH_MULTIPLIER); // We will add the hashtable size even if it's not built yet

This comment has been minimized.

@findepi

findepi Oct 26, 2018

Member

please include rationale

@Override
protected void ensureHashTableLoaded()
{
if (getHashTables() == null) {

This comment has been minimized.

@findepi

findepi Oct 26, 2018

Member

Why not access this.hashTables directly? you do this when you complete computation anyway

int mapCount = getPositionCount();
boolean[] mapIsNull = getMapIsNull();
if (getHashTables() == null) {

This comment has been minimized.

@findepi

findepi Oct 26, 2018

Member

this.hashTables

@yingsu00 yingsu00 force-pushed the yingsu00:lazyMapHT branch from daf60d3 to 95386f0 Oct 26, 2018

@yingsu00 yingsu00 requested a review from electrum Oct 26, 2018

@electrum

There are lots of changes here, many of which seems to be refactorings. Can you pull those into separate commits so that it’s esiser to review and see the real change?

@yingsu00 yingsu00 force-pushed the yingsu00:lazyMapHT branch from 95386f0 to d369568 Nov 1, 2018

@yingsu00

This comment has been minimized.

Contributor

yingsu00 commented Nov 1, 2018

@electrum Hi David, I have removed the formatting changes (breaking long lines). Now the changes should be all related to the logic change. Please let me know if this is what you want, thanks!

@dain dain self-assigned this Nov 1, 2018

@dain

Some minor comments/suggestions from my first read.

@Override
protected void ensureHashTableLoaded()
{
if (this.hashTables == null) {

This comment has been minimized.

@dain

dain Nov 2, 2018

Contributor

Invert this if condition to remove a level of nesting. Something like this:

if (this.hashTables != null) {
    return this;
}
int mapCount = getPositionCount();
boolean[] mapIsNull = getMapIsNull();
if (this.hashTables == null) {

This comment has been minimized.

@dain

dain Nov 2, 2018

Contributor

Also, invert this one, and move to right after the start of the synchronized block.

@@ -410,6 +432,10 @@ public BlockBuilder newBlockBuilderLike(BlockBuilderStatus blockBuilderStatus)
newNegativeOneFilledArray(newSize * HASH_MULTIPLIER));
}
@Override
protected void ensureHashTableLoaded()
{}

This comment has been minimized.

@dain

dain Nov 2, 2018

Contributor

the {} should go on the previous line

if (keyBlock.getPositionCount() != valueBlock.getPositionCount() || keyBlock.getPositionCount() * HASH_MULTIPLIER != hashTable.length) {
if (keyBlock.getPositionCount() != valueBlock.getPositionCount()

This comment has been minimized.

@dain

dain Nov 2, 2018

Contributor

please add clarifying parentheses

This comment has been minimized.

@dain

dain Nov 2, 2018

Contributor

Actually, I're prefer to see two separate checks. One to ensure the keys and values are the same position count, and one the verifies the hash table size.

private final int positionCount; // The number of keys in this single map * 2
private final AbstractMapBlock mapBlock;
SingleMapBlock(int offset, int positionCount, AbstractMapBlock mapBlock)

This comment has been minimized.

@dain

dain Nov 2, 2018

Contributor

If we were to introduce an interface for this class, what would the API look like for that interface? If it is only a small number of methods, we might want to add something like that to keep the abstractions between these classes simpler. I'm not saying we should actually add an interface here, yet; I'm just curious what it would look like if we did.

This comment has been minimized.

@yingsu00

yingsu00 Nov 2, 2018

Contributor

@dain which classes are you considering to implement this interface? Did you just mean SingleMapBlock and AbstractSingleMapBlock?

This comment has been minimized.

@dain

dain Nov 2, 2018

Contributor

I mean the AbstractSingleMapBlock argument here. If that were an interface build specifically for this class, what methods would it have? If it is small, we might want to add one to clean up the code and simplify testing.... just a thought

This comment has been minimized.

@yingsu00

yingsu00 Nov 6, 2018

Contributor

@dain Did you mean the mapBlock we passed in? It's AbstractMapBlock. The referenced methods in SingleMapBlock include getRawKeyBlock(), getRawValueBlock(), getHashTables(), and it also accesses keyNativeHashCode and keyBlockNativeEquals members. Do you think it's worth making a new interface? If yes I'll add getKeyNativeHashCode() and getKeyBlockNativeEquals() and make a 5 method interface in a separate commit.

This comment has been minimized.

@dain

dain Nov 6, 2018

Contributor

If it is only 5 methods, I would consider adding the interface. This is just my opinion... I generally get a bad feeling anytime I see a method taking a parameter with a type named Abstract*, as to me it screams, we should have an interface here. In this case I would name the interface MapBlockData. Again, just my opinion. You can ask others how they feel about this.

This comment has been minimized.

@yingsu00

yingsu00 Nov 6, 2018

Contributor

@dain makes sense. Shall I send a new PR for this interface or a seperate commit in this same PR, or just in this commit?

This comment has been minimized.

@dain

dain Nov 7, 2018

Contributor

All of those options are fine with me. Maybe ask @haozhun what he prefers.

This comment has been minimized.

@haozhun

haozhun Nov 8, 2018

Contributor

The problem here is that it cannot be an interface because those fields are not public.

@yingsu00 yingsu00 force-pushed the yingsu00:lazyMapHT branch from d369568 to 8f86094 Nov 2, 2018

@dain

This seems good to me. @haozhun, did you want to review this?

@dain dain removed their assignment Nov 7, 2018

@haozhun

haozhun approved these changes Nov 8, 2018

Looks good

@@ -42,7 +43,7 @@
private final int[] offsets;
private final Block keyBlock;
private final Block valueBlock;
private final int[] hashTables; // hash to location in map;
private volatile int[] hashTables; // hash to location in map;

This comment has been minimized.

@haozhun

haozhun Nov 8, 2018

Contributor

Add comment: write to the field is protected by "this" monitor.

private final int positionCount; // The number of keys in this single map * 2
private final AbstractMapBlock mapBlock;
SingleMapBlock(int offset, int positionCount, AbstractMapBlock mapBlock)

This comment has been minimized.

@haozhun

haozhun Nov 8, 2018

Contributor

The problem here is that it cannot be an interface because those fields are not public.

@haozhun haozhun assigned rongrong and unassigned haozhun Nov 8, 2018

@yingsu00 yingsu00 force-pushed the yingsu00:lazyMapHT branch from 8f86094 to 1992ce5 Nov 8, 2018

@yingsu00 yingsu00 force-pushed the yingsu00:lazyMapHT branch from 1992ce5 to 009165f Nov 8, 2018

Lazily build hashtable for MapBlock
Presto builds hashtable for MapBlocks eagerly when constructing the
MapBlock even it's not needed in the query. Building a hashtable could
take up to 40% CPU of the scan cost on a map column. This commit defers
the hashtable build to the time it's needed in SeekKey(). Note that we
only do this to the MapBlock, not the MapBlockBuilder to avoid complex
synchronization problems. The MapBlockBuilder will always build the
hashtable. As the result MergingPageOutput and PartitionOutputOperator
will still rebuild the hashtables when needed. The measurements shows
there will be less than 10% pages for MergingPageOutput to build the
hashtables. We will have a seperate PR to improve PartitionOutput
and avoid rebuilding the pages so as to avoid hashtable rebuilding.

Simple select checsum queries show over 40% CPU gain:
Test                          | After  | Before | Improvement
select 2 map columns checksum | 11.69d | 20.06d | 42%
Select 1 map column checksum  |  9.67d | 17.73d | 45%

@yingsu00 yingsu00 force-pushed the yingsu00:lazyMapHT branch from 009165f to 62dc3a5 Nov 8, 2018

@yingsu00 yingsu00 merged commit 23de11f into prestodb:master Nov 8, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment