Skip to content

Conversation

@aweisberg
Copy link
Contributor

No description provided.

@aweisberg
Copy link
Contributor Author

Masha can you let me know if the formatting of the SQL and table/column names from TPC-H look correct to you? Some of the time they have prefixes and suffixes and refer to our internal stuff like hive.tpch.lineitem_s. Not sure what is optimal for a blog post, but if it can all execute against Presto I think it's good enough.

The original post is on medium.

@oerling
Copy link

oerling commented Jun 29, 2019 via email

@oerling
Copy link

oerling commented Jun 29, 2019 via email

@aweisberg
Copy link
Contributor Author

I spent some time last week reproducing the benchmark. I was able to reproduce the results for every query with some performance differences that we ascribing to differences in hardware. See BENCHMARK.md. We also ran it again on his workstation to make sure he stills gets the same results and he did.

There was one query where Presto was getting lucky and putting the right filter first so Orri added [this|https://github.com/aweisberg/presto/blob/aria-scan-prototype/BENCHMARK.md}. @mbasmanova WDYT of this?

I am hoping to post this on Monday.

@wenleix
Copy link
Contributor

wenleix commented Jul 16, 2019

Any way to preview the rendered result? :)

@aweisberg
Copy link
Contributor Author

Yes if you go through the instructions in website/README.md and get yarn installed you can "yarn start" and it will host the site live locally

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look great % minor comments.


The initial impulse motivating this work is the observation that table scan is by far the #1 operator in Presto workloads I have seen. This is a little over half of all Presto CPU, with repartitioning a distant second, at around 1/10 of the total. The other half of the motivation is ready opportunity: Presto in its pre-Aria state does almost none of the things that are common in table scan.

<!--truncate-->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The blog has an index as well as individual posts. In the index it posts a snippet of each blog. This tag determines where the snippet ends.


## Mechanics of a scan

Baseline Presto does this as follows: The scan `OrcPageSource` produces consecutive `Page` instances that contain a `LazyBlock` for each column. This operation as such takes no time since the `LazyBlock` instances are just promises. The actual work takes place when evaluating the generated code for the comparison. This sees that the column is not loaded, loads all the values in the range of the `LazyBlock`, typically 1024 values and then does the operation and produces a set of passing row numbers. This set is empty for all but 1/100k of the cases. If this is empty, the `LazyBlock` for `extendedprice` is not touched. If there are hits, the `extendedprice` `LazyBlock` is loaded and the values for the selected rows are copied out. When this happens, 1024 values are decoded from the column and most often one of them is accessed. Loading a `LazyBlock` allocates memory for each value. In the present case this becomes garbage immediately after first use. The same applies to the values in extended price, of which only one is copied to a `Block` of output. This is handled by a special buffering stage that accumulates rows from multiple loaded `LazyBlock` instances until there is a minimum batch worth of rows to pass to the next operator.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: The scan OrcPageSource -> The OrcPageSource``

| Aria | 4 | 44.2 | 1.0 |
| Baseline | 21 | 271 | 6.13 |

The filtered columns are of low cardinality and are encoded as dictionaries. This is an example of evaluating an expensive predicate on only distinct values. Baseline Presto misses the opportunity because all filters are generated into a monolithic code block. Aria generates filter expressions for each distinct set of required columns. In this case the filters are independent and reorderable.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oerling I didn't realize that complex filters also run on dictionaries. This is super cool. Do you have a pointer for me to check out how this is done?

The ideas presented here are currently being integrated into mainline Presto.

# Conclusions and Next Steps
We have so far had a look at the low-hanging fruits for scanning flat tables. These techniques are widely known and once one considers the fundamentals these become just matters of common sense.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once one considers the fundamentals these become just matters of common sense

Is there a way to soften this sentence?

@aweisberg aweisberg merged commit bdcec58 into prestodb:source Jul 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants