Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MCOL-4566: Add rebuildEM tool support to work with compressed files. #1808

Conversation

denis0x0D
Copy link
Contributor

  • This patch adds rebuildEM tool support to work with compressed files.
  • This patch adds test, currently test work only for users with write access
    to systemcat.

Note: Default version of the rebuildEM tool was using very old API,
those functions are not present currently. So rebuildEM will not work with
files created without compression, because we cannot deduce some info which are
needed to create column extent.

@drrtuy drrtuy self-requested a review March 11, 2021 10:49
@denis0x0D denis0x0D changed the title MCOL-4566: Add rebuildEM tool support to work with compressed files. [WIP] MCOL-4566: Add rebuildEM tool support to work with compressed files. Mar 11, 2021
@denis0x0D denis0x0D force-pushed the MCOL-4566/rebuild_em_compressed branch from 47d979f to 009bf0a Compare March 11, 2021 18:02
@denis0x0D
Copy link
Contributor Author

Fixes:

  1. Fixed work with old header version.
  2. Added support for dictionary segment files.

@denis0x0D denis0x0D force-pushed the MCOL-4566/rebuild_em_compressed branch from 009bf0a to 70571ff Compare March 11, 2021 18:05
@denis0x0D denis0x0D changed the title [WIP] MCOL-4566: Add rebuildEM tool support to work with compressed files. MCOL-4566: Add rebuildEM tool support to work with compressed files. Mar 11, 2021
@denis0x0D denis0x0D changed the title MCOL-4566: Add rebuildEM tool support to work with compressed files. [WIP] MCOL-4566: Add rebuildEM tool support to work with compressed files. Mar 15, 2021
@denis0x0D denis0x0D force-pushed the MCOL-4566/rebuild_em_compressed branch from 70571ff to 3ba79b4 Compare March 16, 2021 18:47
@denis0x0D
Copy link
Contributor Author

denis0x0D commented Mar 17, 2021

Fixes:

  1. Add support of initializing system tables from binary blob.
  2. Setting localHWM, make extent - available.
  3. Check status of extent before creating.
  4. Refactored logic: a) At first we go inside dbroot and collect all needed information (oid, partition, segment, width, coldataype isDict). Store collected data in sorted order via set<(extent info), oidComparator> 2) Than we create extents from collected data.

@denis0x0D denis0x0D force-pushed the MCOL-4566/rebuild_em_compressed branch from 3ba79b4 to 88e7920 Compare March 18, 2021 20:31
@denis0x0D
Copy link
Contributor Author

Fixes:

  1. Fixed order of extent creation. Currently we keep it sorted by lbid and create in greedy way.

@denis0x0D
Copy link
Contributor Author

denis0x0D commented Mar 19, 2021

This tool is not compatible with cpimport tool, because cpimport uses BulkLoad. Investigating how big changes is needed to support it. That meas if we insert data with $cpimport temp t1 temp.tbl we cannot restore the extent map.

@denis0x0D denis0x0D marked this pull request as draft March 19, 2021 11:23
@denis0x0D denis0x0D force-pushed the MCOL-4566/rebuild_em_compressed branch 2 times, most recently from e4dad91 to 4f726cc Compare March 23, 2021 20:23
@denis0x0D
Copy link
Contributor Author

Added fixes:

  1. Proper HWM recovery from segment file.
  2. Added support for bulk insertion via cpimport.

Current limitations - it does not work with multiple extents per segment file.
We can simple detect those kind of files in case we recover hwm >= (columnWidth * numExtentRows) / blockSizeInBytes but there is no straight way to create an extent in the same order as it was created originally, because starting lbid is not known for each extent.
Test for different tables schemes, for example:
create table t1 (a int, b varchar (255), c int, d varchar(255), e int, d varchar(255)) engine=columnstore;
and insert 20M rows into table via cpimport.
In this case bulk will create 3 segment file for each int column and 6 for each varchar column (1 segment file with tokens and 1 dictionary file).

@denis0x0D denis0x0D marked this pull request as ready for review March 23, 2021 20:36
@denis0x0D denis0x0D changed the title [WIP] MCOL-4566: Add rebuildEM tool support to work with compressed files. MCOL-4566: Add rebuildEM tool support to work with compressed files. Mar 23, 2021
@denis0x0D denis0x0D force-pushed the MCOL-4566/rebuild_em_compressed branch from 4f726cc to 398c07e Compare March 24, 2021 10:39
@denis0x0D
Copy link
Contributor Author

Fixes related to CI build.

@denis0x0D denis0x0D force-pushed the MCOL-4566/rebuild_em_compressed branch 2 times, most recently from 28dee75 to 8c97b9b Compare March 25, 2021 16:26
@denis0x0D
Copy link
Contributor Author

Final fixes:

  1. Proper calculation HWM for dictionary file.
  2. Add support for 2 extents per segment files, could be increased if need. Actually bulk can create even more than 2, even if it set to 2 in config.

@denis0x0D denis0x0D force-pushed the MCOL-4566/rebuild_em_compressed branch 3 times, most recently from c426315 to ee71e95 Compare March 29, 2021 16:32
@denis0x0D
Copy link
Contributor Author

Rebase on develop and add review fixes.

Copy link
Collaborator

@drrtuy drrtuy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd shared some aspects that should be addressed in Slack.

@denis0x0D
Copy link
Contributor Author

Initialize chunkManager with ColumnOpCompress1 and DctnryOpCompress1 instead of default. This does not affect current version of chunkManager but when we have more than 1 compression algo and will get compress interface by compression type, the compression type == 0 could cause an error.

* This patch adds rebuildEM tool support to work with compressed files.
* This patch increases a version of the file header.

Note: Default version of the `rebuildEM` tool was using very old API,
those functions are not present currently. So `rebuildEM` will not work with
files created without compression, because we cannot deduce some info which are
needed to create column extent.
@denis0x0D denis0x0D force-pushed the MCOL-4566/rebuild_em_compressed branch from eace69a to 5d497e8 Compare April 2, 2021 07:57
@drrtuy drrtuy self-requested a review April 2, 2021 14:52
@drrtuy drrtuy merged commit 05863a3 into mariadb-corporation:develop Apr 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants