Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mcol 4673 regression cal show partition returns na #1891

Conversation

mariadb-SergeyZefirov
Copy link
Contributor

The tricky part here is to 1) keep dropped ranged dropped and 2) keep set ranges set. The first part should be compatible with cpimport which utilizes dropping ranges to CP_UPDATING state and then to CP_VALID/CP_INVALID.

This is why we keep CP_UPDATING state notwithstanding that CP_INVALID is pretty sufficient for all purposes.

@abarkov
Copy link
Contributor

abarkov commented Apr 27, 2021

Sergey, please squish the changes into a single commit.
Enabling udf_calshowpartitions.test for mtr --extern looks Ok for me.

I can't say anything for the code change as I'm not strong with that part of the code.

@mariadb-SergeyZefirov mariadb-SergeyZefirov force-pushed the MCOL-4673-Regression-calShowPartition-returns-NA branch from f63be64 to 04284e8 Compare April 27, 2021 11:49
@mariadb-SergeyZefirov
Copy link
Contributor Author

Sergey, please squish the changes into a single commit.

Done!

Enabling udf_calshowpartitions.test for mtr --extern looks Ok for me.

I can't say anything for the code change as I'm not strong with that part of the code.

Let me explain a bit - for you or for others who may be interested.

cpimport and write engine as a whole marks extents as UPDATING during update operation. Old version of code, prior to MCOL-2044, was marking these extents as UPDATING and then marked them as INVALID during final stage - this is what invalidateuncommittedExtents does.

Prior to MCOL-2044, it was done unconditionally, for both COMMIT and ROLLBACK operations. MCOL-2044 introduced keeping of ranges and blindly dropping ranges to INVALID stand in the way of the ultimate goal of I/O reduction.

So I put invalidateUncommittedExtents under condition of "NOT COMMIT" operation and that change left some of the extents marked as UPDATING as, well, UPDATING.

The scanning process in SELECT part of the test did not change UPDATING state to VALID state.

The calshowpartitions function does not distinguish between INVALID and UPDATING state and shows information for extents in these states as N/A. That is wrong.

So I changed the logic of invalidateUncommittedExtents to this: if extent was in UPDATING state, mark it INVALID. if extent in the VALID state (write engine has kept ranges up to date), do not mark it as INVALID.

In this case we have ranges kept and extents scans working properly.

@tntnatbry
Copy link
Contributor

@mariadb-SergeyZefirov At a high level, the change looks OK to me. I added a couple comments however.

I do have a question: As far as I understand, the goal of MCOL-2044 is to keep the extents in a VALID state during DML operations. The queries from MCOL-4673 issue description are the following:

DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (a DECIMAL(17,1)) ENGINE=ColumnStore;
INSERT INTO t1 VALUES (-8999999999999999.9);
SELECT * FROM t1 WHERE a=0;
SELECT calShowPartitions('t1','a');

Here, we create a single extent during the insert operation and set it to a VALID state of range (-8999999999999999.9,-8999999999999999.9)

So why would calShowPartitions return result as N/A, as in MCOL-4673?

@mariadb-SergeyZefirov mariadb-SergeyZefirov force-pushed the MCOL-4673-Regression-calShowPartition-returns-NA branch from 04284e8 to 3db001b Compare April 28, 2021 14:01
Copy link
Contributor Author

@mariadb-SergeyZefirov mariadb-SergeyZefirov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mariadb-SergeyZefirov At a high level, the change looks OK to me. I added a couple comments however.

Please, approve then.

I do have a question: As far as I understand, the goal of MCOL-2044 is to keep the extents in a VALID state during DML operations. The queries from MCOL-4673 issue description are the following:

The correct wording would be "is try to keep extents in a VALID state" - it is not always possible, at least, not for all types of extents, thus we can only try.

DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (a DECIMAL(17,1)) ENGINE=ColumnStore;
INSERT INTO t1 VALUES (-8999999999999999.9);
SELECT * FROM t1 WHERE a=0;
SELECT calShowPartitions('t1','a');

Here, we create a single extent during the insert operation and set it to a VALID state of range (-8999999999999999.9,-8999999999999999.9)

So why would calShowPartitions return result as N/A, as in MCOL-4673?

The insert operation for single value does not do the extent keeping job - I overlooked that implementing MCOL-2044. It was a part of much needed refactoring but I was too tired to do that.

The extent(s) in single value insertion are marked as UPDATING (with a call to an ironically named function _markInvalid in extentmap.cpp) and should be brought to INVALID at the end of transaction, during COMMIT or ROLLBACK operations. The call is invalidateUncommittedExtents, with implementation in dbrm.cpp, I think.

The change in MCOL-2044 was to invalidate (set to INVALID state) extents conditionally, only not on COMMIT. This change left UPDATING extents marked as UPDATING after COMMIT.

Extent scan operations do not change extent state to VALID when extent is in UPDATING state. Thus, the SELECT query did not change extent state to VALID (from assumed INVALID).

As calShowPartitions does not discern between UPDATING and INVALID (it looks only for VALID value), the end result was N/A as for INVALID extents.

What I did in this change is to remove condition to call invalidateUncommittedExtents and to change behavior there: to mark INVALID only extents that were in UPDATING state.

This made subsequent scan operations to set ranges properly and test to pass.

versioning/BRM/extentmap.cpp Show resolved Hide resolved
@tntnatbry
Copy link
Contributor

@mariadb-SergeyZefirov Thanks for the explanation. Can you please fix the original commit message?

@mariadb-SergeyZefirov mariadb-SergeyZefirov force-pushed the MCOL-4673-Regression-calShowPartition-returns-NA branch from 3db001b to f63be64 Compare May 17, 2021 11:22
@mariadb-SergeyZefirov mariadb-SergeyZefirov force-pushed the MCOL-4673-Regression-calShowPartition-returns-NA branch from f63be64 to 47b1430 Compare May 17, 2021 11:25
@drrtuy drrtuy self-requested a review May 24, 2021 08:55
@drrtuy drrtuy merged commit 8699d1d into mariadb-corporation:develop May 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants