Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MCOL-987 LZ4 compression support. #1842

Merged
merged 1 commit into from Jul 7, 2021

Conversation

denis0x0D
Copy link
Contributor

@denis0x0D denis0x0D commented Apr 2, 2021

LZ4 compression support.

@denis0x0D denis0x0D marked this pull request as draft April 2, 2021 16:43
@denis0x0D denis0x0D force-pushed the MCOL-987_LZ branch 6 times, most recently from c24f6a9 to 5d3d766 Compare April 6, 2021 17:06
@denis0x0D denis0x0D marked this pull request as ready for review April 6, 2021 18:46
@denis0x0D denis0x0D changed the title [WIP] MCOL-987 LZ4 compression support. MCOL-987 LZ4 compression support. Apr 6, 2021
@denis0x0D
Copy link
Contributor Author

Note, I added LZ4 as default by separate commit on top on this pull request to trigger test suite under LZ4 compression, this patch should be removed, before merging.

@denis0x0D denis0x0D force-pushed the MCOL-987_LZ branch 3 times, most recently from c2b07c4 to 9c886a5 Compare April 7, 2021 16:12
utils/compress/idbcompress.cpp Outdated Show resolved Hide resolved
utils/compress/idbcompress.cpp Outdated Show resolved Hide resolved
utils/joiner/joinpartition.h Show resolved Hide resolved
utils/messageqcpp/compressed_iss.h Show resolved Hide resolved
oam/etc/Columnstore.xml Outdated Show resolved Hide resolved
utils/compress/idbcompress.h Outdated Show resolved Hide resolved
@denis0x0D
Copy link
Contributor Author

denis0x0D commented Apr 9, 2021

@mariadb-AlexeyAntipovsky thanks for review! I'll address your comments.

@denis0x0D denis0x0D force-pushed the MCOL-987_LZ branch 2 times, most recently from 8b0a506 to 394a356 Compare April 9, 2021 18:12
@denis0x0D denis0x0D force-pushed the MCOL-987_LZ branch 2 times, most recently from 98a6b22 to 863bd36 Compare June 29, 2021 14:33
dbcon/mysql/ha_mcs_ddl.cpp Show resolved Hide resolved
@@ -696,13 +696,25 @@ void loadBlock (
i = fp->pread( &cmpHdrBuf[0], 0, 4096 * 3);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4096 here and below is some random number or HDR_BUF_LEN?

Copy link
Contributor Author

@denis0x0D denis0x0D Jul 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like it's HDR_BUF_LEN, I'm not sure, why author wrote 4096, instead of HDR_BUF_LEN.

utils/compress/idbcompress.cpp Outdated Show resolved Hide resolved
utils/messageqcpp/compressed_iss.cpp Outdated Show resolved Hide resolved
writeengine/bulk/we_colbufcompressed.cpp Outdated Show resolved Hide resolved
@@ -95,7 +95,11 @@ DROP PROCEDURE IF EXISTS `compression_ratio` //

CREATE PROCEDURE compression_ratio() SQL SECURITY INVOKER
BEGIN
SELECT CONCAT((SELECT SUM(data_size) FROM information_schema.columnstore_extents ce left join information_schema.columnstore_columns cc on ce.object_id = cc.object_id where compression_type='Snappy') / (SELECT SUM(compressed_data_size) FROM information_schema.columnstore_files WHERE compressed_data_size IS NOT NULL), ':1') COMPRESSION_RATIO;

SELECT CONCAT((SELECT SUM(data_size) FROM information_schema.columnstore_extents ce left join information_schema.columnstore_columns cc on ce.object_id = cc.object_id where compression_type='Snappy') / (SELECT SUM(compressed_data_size) FROM information_schema.columnstore_files co left join information_schema.columnstore_columns cc on (co.object_id = cc.object_id) left join information_schema.columnstore_extents ce on (ce.object_id = co.object_id) where compression_type='Snappy' and compressed_data_size IS NOT NULL /* could be a situation when compressed_data_size != NULL but data_size == 0, in this case we will get wrong ratio */ and data_size > 0), ':1') COMPRESSION_RATIO_SNAPPY;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to put a literal column to tell which compression type is this and combine two SELECTs into a single UNION if UNION doesn't crash with queries from I_S :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know an SQL good enough, can you please show an example, thanks.

@denis0x0D
Copy link
Contributor Author

@mariadb-AlexeyAntipovsky @drrtuy thanks for review, updated. For some comments I've added a questions. Thanks.

* Adds CompressInterfaceLZ4 which uses LZ4 API for compress/uncompress.
* Adds CMake machinery to search LZ4 on running host.
* All methods which use static data and do not modify any internal data - become `static`,
  so we can use them without creation of the specific object. This is possible, because
  the header specification has not been modified. We still use 2 sections in header, first
  one with file meta data, the second one with pointers for compressed chunks.
* Methods `compress`, `uncompress`, `maxCompressedSize`, `getUncompressedSize` - become
  pure virtual, so we can override them for the other compression algos.
* Adds method `getChunkMagicNumber`, so we can verify chunk magic number
  for each compression algo.
* Renames "s/IDBCompressInterface/CompressInterface/g" according to requirement.
@drrtuy drrtuy self-requested a review July 7, 2021 10:12
@drrtuy drrtuy merged commit 866dc25 into mariadb-corporation:develop Jul 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants