Add "Complete Table Scan" blog post by Orri Erling #31

aweisberg · 2019-06-28T20:36:54Z

No description provided.

aweisberg · 2019-06-28T20:40:51Z

Masha can you let me know if the formatting of the SQL and table/column names from TPC-H look correct to you? Some of the time they have prefixes and suffixes and refer to our internal stuff like hive.tpch.lineitem_s. Not sure what is optimal for a blog post, but if it can all execute against Presto I think it's good enough.

The original post is on medium.

website/blog/2019-07-15-complete-table-scan.md

oerling · 2019-06-29T04:59:23Z

The scale is 100G. The point is that this is small enough to run from memory and large enough not to be dominated by query setup costs. The point of 12345 is that this is not at either end of the 1 – 1M scale range of suppkey values. I think the text says a 1/1M selection, implying a 100G scale. Here you want a value that is not in the top or bottom 1/10K of the values, where there would be a good chance of row group summaries skipping whole row groups. You can try this with a value of 1 and you’ll see what I proportions of cost factors are off. From: Maria Basmanova <notifications@github.com> Sent: Friday, June 28, 2019 5:04 PM To: prestodb/prestodb.github.io <prestodb.github.io@noreply.github.com> Cc: oerling <erling@xs4all.nl>; Mention <mention@noreply.github.com> Subject: Re: [prestodb/prestodb.github.io] Add "Complete Table Scan" blog post by Orri Erling (#31) @mbasmanova commented on this pull request.

_____ In website/blog/2019-07-15-complete-table-scan.md <#31 (comment)> :

+In the previous article we looked at the abstract problem statement and possibilities inherent in scanning tables. In this piece we look at the quantitative upside with Presto. We look at a number of queries and explain the findings.

+ +The initial impulse motivating this work is the observation that table scan is by far the #1 operator in Presto workloads I have seen. This is a little over half of all Presto CPU, with repartitioning a distant second, at around 1/10 of the total. The other half of the motivation is ready opportunity: Presto in its pre-Aria state does almost none of the things that are common in table scan. + +For easy reproducibility and staying away from proprietary material, we use a TPC-H 100G dataset running on a desktop machine with 2x4 Skylake cores at 3.5GHz. The data is compressed with Snappy and we are running with warm OS cache. The Presto is a modified 0.221 where the Aria functionality can be switched on and off. Not to worry, we will talk about disaggregated storage and IO in due time but the basics will come first. + +# Simple scan + +The base case for scan optimization is the simplest possible query: + +```sql +SELECT SUM(l_extendedprice) +FROM lineitem +WHERE suppkey = 12345; +``` +| Version | Wall time (seconds) | CPU time (seconds) | @aweisberg <https://github.com/aweisberg> I'd pivot this table and a column for ration between aria and baseline. I think this will be clearer. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#31?email_source=notifications&email_token=AKPPPT2YT6FGE6KQYCQ5U4TP42RHJA5CNFSM4H4ILGVKYY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOB5BG63Q#pullrequestreview-256012142> , or mute the thread <https://github.com/notifications/unsubscribe-auth/AKPPPT2TFULAXSMAFJGMV4LP42RHJANCNFSM4H4ILGVA> .

oerling · 2019-06-29T05:18:53Z

You can use my LinkedIn public profile link. Thanks Orri From: Joel Marcey <notifications@github.com> Sent: Friday, June 28, 2019 5:42 PM To: prestodb/prestodb.github.io <prestodb.github.io@noreply.github.com> Cc: oerling <erling@xs4all.nl>; Mention <mention@noreply.github.com> Subject: Re: [prestodb/prestodb.github.io] Add "Complete Table Scan" blog post by Orri Erling (#31) @JoelMarcey commented on this pull request.

_____ In website/blog/2019-07-15-complete-table-scan.md <#31 (comment)> :

@@ -0,0 +1,169 @@

+--- +title: Complete Table Scan: A Quantitative Assessment +author: Orri Erling +authorURL: http://code.fb.com/ Don't you want a different URL than this? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#31?email_source=notifications&email_token=AKPPPT7QELSYOZUOHYEQEDLP42VWFA5CNFSM4H4ILGVKYY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOB5BHZHI#pullrequestreview-256015517> , or mute the thread <https://github.com/notifications/unsubscribe-auth/AKPPPT27S5A5477JNSFIZ43P42VWFANCNFSM4H4ILGVA> .

aweisberg · 2019-07-14T22:19:55Z

I spent some time last week reproducing the benchmark. I was able to reproduce the results for every query with some performance differences that we ascribing to differences in hardware. See BENCHMARK.md. We also ran it again on his workstation to make sure he stills gets the same results and he did.

There was one query where Presto was getting lucky and putting the right filter first so Orri added [this|https://github.com/aweisberg/presto/blob/aria-scan-prototype/BENCHMARK.md}. @mbasmanova WDYT of this?

I am hoping to post this on Monday.

wenleix · 2019-07-16T23:51:29Z

Any way to preview the rendered result? :)

aweisberg · 2019-07-16T23:58:38Z

Yes if you go through the instructions in website/README.md and get yarn installed you can "yarn start" and it will host the site live locally

mbasmanova

Look great % minor comments.

mbasmanova · 2019-07-23T15:05:07Z

website/blog/2019-07-15-complete-table-scan.md

+
+The initial impulse motivating this work is the observation that table scan is by far the #1 operator in Presto workloads I have seen. This is a little over half of all Presto CPU, with repartitioning a distant second, at around 1/10 of the total. The other half of the motivation is ready opportunity: Presto in its pre-Aria state does almost none of the things that are common in table scan.
+
+<!--truncate-->


What does this do?

The blog has an index as well as individual posts. In the index it posts a snippet of each blog. This tag determines where the snippet ends.

mbasmanova · 2019-07-23T15:06:59Z

website/blog/2019-07-15-complete-table-scan.md

+
+## Mechanics of a scan
+
+Baseline Presto does this as follows: The scan `OrcPageSource` produces consecutive `Page` instances that contain a `LazyBlock` for each column. This operation as such takes no time since the `LazyBlock` instances are just promises. The actual work takes place when evaluating the generated code for the comparison. This sees that the column is not loaded, loads all the values in the range of the `LazyBlock`, typically 1024 values and then does the operation and produces a set of passing row numbers. This set is empty for all but 1/100k of the cases. If this is empty, the `LazyBlock` for `extendedprice` is not touched. If there are hits, the `extendedprice` `LazyBlock` is loaded and the values for the selected rows are copied out. When this happens, 1024 values are decoded from the column and most often one of them is accessed. Loading a `LazyBlock` allocates memory for each value. In the present case this becomes garbage immediately after first use. The same applies to the values in extended price, of which only one is copied to a `Block` of output. This is handled by a special buffering stage that accumulates rows from multiple loaded `LazyBlock` instances until there is a minimum batch worth of rows to pass to the next operator.


typo: The scan OrcPageSource -> The OrcPageSource``

mbasmanova · 2019-07-23T15:14:34Z

website/blog/2019-07-15-complete-table-scan.md

+| Aria     | 4                   | 44.2               | 1.0                     |
+| Baseline | 21                  | 271                | 6.13                    |
+
+The filtered columns are of low cardinality and are encoded as dictionaries. This is an example of evaluating an expensive predicate on only distinct values. Baseline Presto misses the opportunity because all filters are generated into a monolithic code block. Aria generates filter expressions for each distinct set of required columns. In this case the filters are independent and reorderable.


@oerling I didn't realize that complex filters also run on dictionaries. This is super cool. Do you have a pointer for me to check out how this is done?

mbasmanova · 2019-07-23T15:16:07Z

website/blog/2019-07-15-complete-table-scan.md

+The ideas presented here are currently being integrated into mainline Presto.
+
+# Conclusions and Next Steps
+We have so far had a look at the low-hanging fruits for scanning flat tables. These techniques are widely known and once one considers the fundamentals these become just matters of common sense.


once one considers the fundamentals these become just matters of common sense

Is there a way to soften this sentence?

aweisberg requested review from JoelMarcey and mbasmanova June 28, 2019 20:36

mbasmanova reviewed Jun 29, 2019

View reviewed changes

website/blog/2019-07-15-complete-table-scan.md Outdated Show resolved Hide resolved

mbasmanova reviewed Jun 29, 2019

View reviewed changes

website/blog/2019-07-15-complete-table-scan.md Outdated Show resolved Hide resolved

mbasmanova reviewed Jun 29, 2019

View reviewed changes

website/blog/2019-07-15-complete-table-scan.md Outdated Show resolved Hide resolved

mbasmanova reviewed Jun 29, 2019

View reviewed changes

JoelMarcey reviewed Jun 29, 2019

View reviewed changes

website/blog/2019-07-15-complete-table-scan.md Outdated Show resolved Hide resolved

aweisberg force-pushed the complete_ts_post branch from b6e9f96 to 061e42c Compare July 14, 2019 22:12

mbasmanova approved these changes Jul 23, 2019

View reviewed changes

Add "Complete Table Scan" blog post by Orri Erling

5ec218c

aweisberg force-pushed the complete_ts_post branch from aad68fb to 5ec218c Compare July 23, 2019 17:16

aweisberg merged commit bdcec58 into prestodb:source Jul 23, 2019


		The initial impulse motivating this work is the observation that table scan is by far the #1 operator in Presto workloads I have seen. This is a little over half of all Presto CPU, with repartitioning a distant second, at around 1/10 of the total. The other half of the motivation is ready opportunity: Presto in its pre-Aria state does almost none of the things that are common in table scan.

		<!--truncate-->


		## Mechanics of a scan

		Baseline Presto does this as follows: The scan `OrcPageSource` produces consecutive `Page` instances that contain a `LazyBlock` for each column. This operation as such takes no time since the `LazyBlock` instances are just promises. The actual work takes place when evaluating the generated code for the comparison. This sees that the column is not loaded, loads all the values in the range of the `LazyBlock`, typically 1024 values and then does the operation and produces a set of passing row numbers. This set is empty for all but 1/100k of the cases. If this is empty, the `LazyBlock` for `extendedprice` is not touched. If there are hits, the `extendedprice` `LazyBlock` is loaded and the values for the selected rows are copied out. When this happens, 1024 values are decoded from the column and most often one of them is accessed. Loading a `LazyBlock` allocates memory for each value. In the present case this becomes garbage immediately after first use. The same applies to the values in extended price, of which only one is copied to a `Block` of output. This is handled by a special buffering stage that accumulates rows from multiple loaded `LazyBlock` instances until there is a minimum batch worth of rows to pass to the next operator.

Add "Complete Table Scan" blog post by Orri Erling #31

Add "Complete Table Scan" blog post by Orri Erling #31

Uh oh!

Conversation

aweisberg commented Jun 28, 2019

Uh oh!

aweisberg commented Jun 28, 2019

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

oerling commented Jun 29, 2019 via email

Uh oh!

oerling commented Jun 29, 2019 via email

Uh oh!

aweisberg commented Jul 14, 2019

Uh oh!

wenleix commented Jul 16, 2019

Uh oh!

aweisberg commented Jul 16, 2019

Uh oh!

mbasmanova left a comment

Choose a reason for hiding this comment

Uh oh!

mbasmanova Jul 23, 2019

Choose a reason for hiding this comment

Uh oh!

aweisberg Jul 23, 2019

Choose a reason for hiding this comment

Uh oh!

mbasmanova Jul 23, 2019

Choose a reason for hiding this comment

Uh oh!

mbasmanova Jul 23, 2019

Choose a reason for hiding this comment

Uh oh!

mbasmanova Jul 23, 2019

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants