gzip pre-decompress w/IAA #6176

yaqi-zhao · 2023-08-21T03:01:24Z

The Intel® In-Memory Analytics Accelerator (Intel® IAA) is a hardware accelerator that provides very high
throughput compression and decompression combined with primitive analytic functions. It is available in the newest generation of Intel® Xeon® Scalable processors ("Sapphire Rapids").
We can offload the GZip (window size is 4KB) decompression to the IAA hardware and save the CPU bandwidth. Here is a description of how to offload the GZip decompression to the IAA hardware

#5718

netlify · 2023-08-21T03:01:29Z

✅ Deploy Preview for meta-velox canceled.

Name	Link
🔨 Latest commit	`4ab13b3`
🔍 Latest deploy log	https://app.netlify.com/sites/meta-velox/deploys/65c334d733aadb00085fceb0

velox/dwio/parquet/reader/IAAPageReader.cpp

velox/dwio/parquet/reader/ParquetData.cpp

velox/dwio/common/QplJobPool.h

velox/dwio/common/QplJobPool.cpp

yaqi-zhao · 2023-09-05T06:33:14Z

Hi, @Yuhta. Thanks for your comments. I rebase my work on #5914. IAAPageReader is deleted and the new logic is implemented inside the PageReader according to your suggestion. Can you continue the PR review?

george-gu-2021 · 2023-10-27T05:34:08Z

@Yuhta , we updated the PR again which proves observable gain in Spark/Gluten/Velox environment w/ a couple of TPC-H queries, eg, Q15, Q14. May you help review it? Thanks very much!

pedroerp · 2023-11-03T15:38:34Z

@Yuhta , we updated the PR again which proves observable gain in Spark/Gluten/Velox environment w/ a couple of TPC-H queries, eg, Q15, Q14. May you help review it? Thanks very much!

HI @george-gu-2021 can you add more context on the performance benefits you're seeing in your benchmarks? If there is a place where I can read more, please send us the link

george-gu-2021 · 2023-11-04T07:19:34Z

@Yuhta , we updated the PR again which proves observable gain in Spark/Gluten/Velox environment w/ a couple of TPC-H queries, eg, Q15, Q14. May you help review it? Thanks very much!

HI @george-gu-2021 can you add more context on the performance benefits you're seeing in your benchmarks? If there is a place where I can read more, please send us the link

Hi @pedroerp , the PR is to implement "Velox parquet scan acceleration w/ Intel IAA (in memory acceleration)" which leverages on-die accelerator to offload gzip decompression in Velox parquet scan. Its overall design and PR context is illustrated in the issue: #5718 . In short, some queries perf can boost up to 40% against the query perf on zstd SW decompression based parquet scan.

Regarding the latest update for Q15 and Q14, it is because it fix a skip-page call flow which is missing in the PR initial version.

Hi @yaqi-zhao , be free to correct me or add more info if anything missing or incorrect. Thanks!

pedroerp · 2023-11-04T17:19:33Z

@george-gu-2021 thank you for the context. From Kelly's presentation on the monthly OSS meeting my understanding was that IAA only supported compression (hence why we were evaluating and considering it for table writes). Does it actually support compression as well, or is it a different technology?

Cc: @mbasmanova

CMakeLists.txt

third_party/CMakeLists.txt

velox/dwio/common/CMakeLists.txt

velox/dwio/common/QplJobPool.cpp

velox/dwio/common/QplJobPool.h

yaqi-zhao · 2023-11-07T09:09:29Z

This would be my preference. Can we start a discussion with some thoughts on how this API could be built to hide the accelerator complexity from the library (parquet reader/writer, shuffle operator, etc). I'm concerned about polluting the codebase with too many accelerator details, making it more complex and error prone.
Happy to help iterate on that design.

That's why I 'd comment to remove all accelerator related logic from s/w path. @yaqi-zhao can you have a design and start a discussion?

@FelixYBW @pedroerp @george-gu-2021 I have created a discussion(#7445) . I add solution introduction and duplicated code analysis based on current PR. Please add your insights on this discussion. Thanks a lot!

FelixYBW · 2023-11-15T08:51:23Z

@yaqi-zhao you may hold on the PR. Rong is creating the unified compression codec API, including sync and async. Let's finish the PR firstly.

velox/dwio/parquet/reader/IAAPageReader.cpp

Yuhta · 2023-11-20T20:23:11Z

Generally both QAT and IAA can be a clean codec library to be added to de-/compression. It can be reused in parquet reader/scan, shuffle and spill. Currently we already enabled them in Gluten's shuffle.

@FelixYBW I can see that zlib window size 4KB is not a standard setup for parquet files. So even we add it to table scan, there is no file we can read to benefit from. Would it make more sense to add to shuffle first to see some real world benefits?

george-gu-2021 · 2023-11-21T00:58:48Z

Generally both QAT and IAA can be a clean codec library to be added to de-/compression. It can be reused in parquet reader/scan, shuffle and spill. Currently we already enabled them in Gluten's shuffle.

@FelixYBW I can see that zlib window size 4KB is not a standard setup for parquet files. So even we add it to table scan, there is no file we can read to benefit from. Would it make more sense to add to shuffle first to see some real world benefits?

Hi @Yuhta , window size 4KB is a parameter that Arrow exposes in its interfaces and users can set that per their preference while generating parquet files typically in ETL processing. Some partners are open to config that. In our current validation stage, we generate some 4KB zlib parquet stream with Velox (including Arrow module) and we are happy to share the generation process and sample parquet streams if anyone needs those. Thanks!

Yuhta · 2023-11-21T16:03:14Z

Hi @Yuhta , window size 4KB is a parameter that Arrow exposes in its interfaces and users can set that per their preference while generating parquet files typically in ETL processing. Some partners are open to config that. In our current validation stage, we generate some 4KB zlib parquet stream with Velox (including Arrow module) and we are happy to share the generation process and sample parquet streams if anyone needs those. Thanks!

~~But that information is not written to the parquet file so if the file is not written by ourselves there is no way to get it.~~ Most of the other writers just write with the default window size of 32KB. That forces us to use the default window size of 32KB most of the time. Would be nice if IAA/QAT can support 32 KB window size. Is it an hardware restriction?

Yuhta · 2023-11-21T16:15:55Z

velox/dwio/parquet/reader/IAAPageReader.cpp

+  // 'rowOfPage_' is the row number of the first row of the next page.
+  this->rowOfPage_ += this->numRowsInPage_;
+
+  if (seekToPreDecompPage(row)) {


Maybe create some virtual hooks in the base class for these calls, then you don't need to duplicate the other part in subclasses.

Yes, I have thought over this solution, but there will be a lot of code changes in the current PageReader, do you think it is reasonable?

Why there will be a lot of more change? You just need to add a few more virtual functions with empty implementation and invoke them in the places needed. No existing logic will be touched.

velox/dwio/parquet/reader/IAAPageReader.h

Yuhta · 2023-11-21T16:17:53Z

velox/dwio/parquet/reader/IAAPageReader.cpp

+    }
+    this->updateRowInfoAfterPageSkipped();
+  }
+  if (isWinSizeFit) {


Yuhta · 2023-11-21T16:18:57Z

velox/dwio/parquet/reader/IAAPageReader.cpp

+
+    return;
+  }
+  if (job_success) {


Yuhta · 2023-11-21T16:20:30Z

velox/dwio/parquet/reader/IAAPageReader.h

+  BufferPtr uncompressedData;
+};
+
+class IAAPageReader : public PageReader {


Seems no need for PageReaderBase if you are inheriting from the concrete PageReader

The reason to create PageReaderBase is to realize Polymorphism, so that when ParquetData call the same PageReaderBase function behaves differently in different scenarios

Reading from the code I think we can do it a little bit differently. What you really need in IAAPageReader is a set of extension points (hooks) that you run extra code, in addition to the basic PageReader. So you can add a few virtual methods in PageReader, default to no-op, call them in the expected places. Then in IAAPageReader you add the implementations for these hooks. Does it sound good to you?

Yuhta · 2023-11-21T16:21:29Z

velox/dwio/parquet/reader/IAAPageReader.cpp

+      dictionaryEncoding_ == Encoding::PLAIN);
+
+  if (codec_ != thrift::CompressionCodec::UNCOMPRESSED) {
+    if (job_success) {


george-gu-2021 · 2023-11-22T00:45:12Z

Hi @Yuhta , window size 4KB is a parameter that Arrow exposes in its interfaces and users can set that per their preference while generating parquet files typically in ETL processing. Some partners are open to config that. In our current validation stage, we generate some 4KB zlib parquet stream with Velox (including Arrow module) and we are happy to share the generation process and sample parquet streams if anyone needs those. Thanks!

~~But that information is not written to the parquet file so if the file is not written by ourselves there is no way to get it.~~ Most of the other writers just write with the default window size of 32KB. That forces us to use the default window size of 32KB most of the time. Would be nice if IAA/QAT can support 32 KB window size. Is it an hardware restriction?

Recently most SW stacks use the default setting (32KB) except some scenarios need to set "4KB" history buffer purposely to take care of memory capacity constraints to avoid OOM or spill operations. Regarding HW capability, the current generation IAA does have 4KB history buffer limitation as well. That is why we propose to add the logic to open the option for users to leverage IAA HW where it is applicable. Thanks! @Yuhta

FelixYBW · 2023-11-27T23:45:22Z

Generally both QAT and IAA can be a clean codec library to be added to de-/compression. It can be reused in parquet reader/scan, shuffle and spill. Currently we already enabled them in Gluten's shuffle.

@FelixYBW I can see that zlib window size 4KB is not a standard setup for parquet files. So even we add it to table scan, there is no file we can read to benefit from. Would it make more sense to add to shuffle first to see some real world benefits?

We already added it to Gluten's shuffle and are working on Spill through the unified Compression API codec in this PR: #7589

george-gu-2021 · 2023-11-28T01:12:56Z

Generally both QAT and IAA can be a clean codec library to be added to de-/compression. It can be reused in parquet reader/scan, shuffle and spill. Currently we already enabled them in Gluten's shuffle.

@FelixYBW I can see that zlib window size 4KB is not a standard setup for parquet files. So even we add it to table scan, there is no file we can read to benefit from. Would it make more sense to add to shuffle first to see some real world benefits?

We already added it to Gluten's shuffle and are working on Spill through the unified Compression API codec in this PR: #7589

Hi @FelixYBW , partners will like the feature! Per the communication w/ them, they are highly interested in leveraging Gluten/Velox to conduct ETL and generate parquet data once feature is ready.

yingsu00

There should not be any major change in the Parquet folder. but, let's wait for #7471 to be merged first.

yingsu00 · 2023-12-04T10:19:38Z

velox/dwio/parquet/reader/ParquetReader.cpp

@@ -116,6 +116,7 @@ class ReaderBase {
  std::shared_ptr<const dwio::common::TypeWithId> schemaWithId_;

  const bool binaryAsString = false;
+  bool needPreDecomp = true;


These should be in dwio::common::compression

stale · 2024-05-08T04:00:36Z

This pull request has been automatically marked as stale because it has not had recent activity. If you'd still like this PR merged, please comment on the PR, make sure you've addressed reviewer comments, and rebase on the latest main. Thank you for your contributions!

yaqi-zhao force-pushed the merge_branch_1 branch 2 times, most recently from 325e9c1 to 1f15edb Compare August 21, 2023 03:37

Yuhta requested changes Aug 21, 2023

View reviewed changes

velox/dwio/parquet/reader/IAAPageReader.cpp Show resolved Hide resolved

velox/dwio/parquet/reader/ParquetData.cpp Outdated Show resolved Hide resolved

Yuhta reviewed Aug 21, 2023

View reviewed changes

velox/dwio/common/QplJobPool.h Outdated Show resolved Hide resolved

velox/dwio/common/QplJobPool.cpp Outdated Show resolved Hide resolved

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 21, 2023

jinchengchenghh reviewed Aug 22, 2023

View reviewed changes

velox/dwio/common/QplJobPool.cpp Outdated Show resolved Hide resolved

yaqi-zhao force-pushed the merge_branch_1 branch 2 times, most recently from 74eab9d to 40c9a62 Compare August 24, 2023 08:36

yaqi-zhao mentioned this pull request Aug 24, 2023

Do pre-decompress gzip with IAA hardware #5718

Open

yaqi-zhao force-pushed the merge_branch_1 branch 3 times, most recently from 6ff1026 to 842de1b Compare August 24, 2023 09:05

Yuhta self-requested a review August 24, 2023 15:25

yaqi-zhao force-pushed the merge_branch_1 branch 3 times, most recently from 5b84c3f to ca13be9 Compare September 27, 2023 07:54

yaqi-zhao force-pushed the merge_branch_1 branch 7 times, most recently from 6e20474 to c042e46 Compare October 26, 2023 05:47

pedroerp reviewed Nov 4, 2023

View reviewed changes

yaqi-zhao force-pushed the merge_branch_1 branch 3 times, most recently from 9172107 to ef56608 Compare November 7, 2023 08:42

yaqi-zhao force-pushed the merge_branch_1 branch from ef56608 to 929ef9f Compare November 8, 2023 03:17

yaqi-zhao mentioned this pull request Nov 9, 2023

Unify the compression API in Velox #7471

Open

yaqi-zhao force-pushed the merge_branch_1 branch from 01b4fd5 to 4e00d39 Compare November 15, 2023 09:08

marin-ma reviewed Nov 16, 2023

View reviewed changes

velox/dwio/parquet/reader/IAAPageReader.cpp Show resolved Hide resolved

marin-ma reviewed Nov 16, 2023

View reviewed changes

velox/dwio/parquet/reader/IAAPageReader.cpp Show resolved Hide resolved

yaqi-zhao force-pushed the merge_branch_1 branch from 4e00d39 to 227325b Compare November 16, 2023 01:46

Yuhta reviewed Nov 21, 2023

View reviewed changes

yaqi-zhao force-pushed the merge_branch_1 branch from 227325b to 281e582 Compare November 27, 2023 05:41

yaqi-zhao force-pushed the merge_branch_1 branch from 281e582 to 43767b5 Compare November 29, 2023 05:25

yingsu00 reviewed Dec 4, 2023

View reviewed changes

yaqi-zhao force-pushed the merge_branch_1 branch from 43767b5 to 036442a Compare December 11, 2023 06:06

yaqi-zhao force-pushed the merge_branch_1 branch from 036442a to 484b4d3 Compare December 27, 2023 08:48

pre-decompress with Intel IAA

1efe071

yaqi-zhao force-pushed the merge_branch_1 branch 2 times, most recently from b3ca532 to 4ab13b3 Compare February 7, 2024 07:44

define async decompressor

4ab13b3

stale bot added the stale label May 8, 2024

stale bot closed this May 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gzip pre-decompress w/IAA #6176

gzip pre-decompress w/IAA #6176

yaqi-zhao commented Aug 21, 2023

netlify bot commented Aug 21, 2023 •

edited

Loading

yaqi-zhao commented Sep 5, 2023

george-gu-2021 commented Oct 27, 2023

pedroerp commented Nov 3, 2023

george-gu-2021 commented Nov 4, 2023

pedroerp commented Nov 4, 2023

yaqi-zhao commented Nov 7, 2023

FelixYBW commented Nov 15, 2023

Yuhta commented Nov 20, 2023

george-gu-2021 commented Nov 21, 2023

Yuhta commented Nov 21, 2023 •

edited

Loading

Yuhta Nov 21, 2023

yaqi-zhao Nov 27, 2023

Yuhta Dec 1, 2023 •

edited

Loading

Yuhta Nov 21, 2023

Yuhta Nov 21, 2023

Yuhta Nov 21, 2023

yaqi-zhao Nov 22, 2023 •

edited

Loading

Yuhta Nov 22, 2023

Yuhta Nov 21, 2023

george-gu-2021 commented Nov 22, 2023

FelixYBW commented Nov 27, 2023

george-gu-2021 commented Nov 28, 2023

yingsu00 left a comment

yingsu00 Dec 4, 2023

stale bot commented May 8, 2024

gzip pre-decompress w/IAA #6176

gzip pre-decompress w/IAA #6176

Conversation

yaqi-zhao commented Aug 21, 2023

netlify bot commented Aug 21, 2023 • edited Loading

✅ Deploy Preview for meta-velox canceled.

yaqi-zhao commented Sep 5, 2023

george-gu-2021 commented Oct 27, 2023

pedroerp commented Nov 3, 2023

george-gu-2021 commented Nov 4, 2023

pedroerp commented Nov 4, 2023

yaqi-zhao commented Nov 7, 2023

FelixYBW commented Nov 15, 2023

Yuhta commented Nov 20, 2023

george-gu-2021 commented Nov 21, 2023

Yuhta commented Nov 21, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Yuhta Dec 1, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yaqi-zhao Nov 22, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

george-gu-2021 commented Nov 22, 2023

FelixYBW commented Nov 27, 2023

george-gu-2021 commented Nov 28, 2023

yingsu00 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stale bot commented May 8, 2024

netlify bot commented Aug 21, 2023 •

edited

Loading

Yuhta commented Nov 21, 2023 •

edited

Loading

Yuhta Dec 1, 2023 •

edited

Loading

yaqi-zhao Nov 22, 2023 •

edited

Loading