New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mcol 4673 regression cal show partition returns na #1891
Mcol 4673 regression cal show partition returns na #1891
Conversation
|
Sergey, please squish the changes into a single commit. I can't say anything for the code change as I'm not strong with that part of the code. |
f63be64
to
04284e8
Compare
Done!
Let me explain a bit - for you or for others who may be interested. cpimport and write engine as a whole marks extents as UPDATING during update operation. Old version of code, prior to MCOL-2044, was marking these extents as UPDATING and then marked them as INVALID during final stage - this is what invalidateuncommittedExtents does. Prior to MCOL-2044, it was done unconditionally, for both COMMIT and ROLLBACK operations. MCOL-2044 introduced keeping of ranges and blindly dropping ranges to INVALID stand in the way of the ultimate goal of I/O reduction. So I put invalidateUncommittedExtents under condition of "NOT COMMIT" operation and that change left some of the extents marked as UPDATING as, well, UPDATING. The scanning process in SELECT part of the test did not change UPDATING state to VALID state. The calshowpartitions function does not distinguish between INVALID and UPDATING state and shows information for extents in these states as N/A. That is wrong. So I changed the logic of invalidateUncommittedExtents to this: if extent was in UPDATING state, mark it INVALID. if extent in the VALID state (write engine has kept ranges up to date), do not mark it as INVALID. In this case we have ranges kept and extents scans working properly. |
|
@mariadb-SergeyZefirov At a high level, the change looks OK to me. I added a couple comments however. I do have a question: As far as I understand, the goal of MCOL-2044 is to keep the extents in a VALID state during DML operations. The queries from MCOL-4673 issue description are the following: Here, we create a single extent during the insert operation and set it to a VALID state of range (-8999999999999999.9,-8999999999999999.9) So why would |
04284e8
to
3db001b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mariadb-SergeyZefirov At a high level, the change looks OK to me. I added a couple comments however.
Please, approve then.
I do have a question: As far as I understand, the goal of MCOL-2044 is to keep the extents in a VALID state during DML operations. The queries from MCOL-4673 issue description are the following:
The correct wording would be "is try to keep extents in a VALID state" - it is not always possible, at least, not for all types of extents, thus we can only try.
DROP TABLE IF EXISTS t1; CREATE TABLE t1 (a DECIMAL(17,1)) ENGINE=ColumnStore; INSERT INTO t1 VALUES (-8999999999999999.9); SELECT * FROM t1 WHERE a=0; SELECT calShowPartitions('t1','a');Here, we create a single extent during the insert operation and set it to a VALID state of range (-8999999999999999.9,-8999999999999999.9)
So why would
calShowPartitionsreturn result as N/A, as in MCOL-4673?
The insert operation for single value does not do the extent keeping job - I overlooked that implementing MCOL-2044. It was a part of much needed refactoring but I was too tired to do that.
The extent(s) in single value insertion are marked as UPDATING (with a call to an ironically named function _markInvalid in extentmap.cpp) and should be brought to INVALID at the end of transaction, during COMMIT or ROLLBACK operations. The call is invalidateUncommittedExtents, with implementation in dbrm.cpp, I think.
The change in MCOL-2044 was to invalidate (set to INVALID state) extents conditionally, only not on COMMIT. This change left UPDATING extents marked as UPDATING after COMMIT.
Extent scan operations do not change extent state to VALID when extent is in UPDATING state. Thus, the SELECT query did not change extent state to VALID (from assumed INVALID).
As calShowPartitions does not discern between UPDATING and INVALID (it looks only for VALID value), the end result was N/A as for INVALID extents.
What I did in this change is to remove condition to call invalidateUncommittedExtents and to change behavior there: to mark INVALID only extents that were in UPDATING state.
This made subsequent scan operations to set ranges properly and test to pass.
|
@mariadb-SergeyZefirov Thanks for the explanation. Can you please fix the original commit message? |
3db001b
to
f63be64
Compare
f63be64
to
47b1430
Compare
The tricky part here is to 1) keep dropped ranged dropped and 2) keep set ranges set. The first part should be compatible with cpimport which utilizes dropping ranges to CP_UPDATING state and then to CP_VALID/CP_INVALID.
This is why we keep CP_UPDATING state notwithstanding that CP_INVALID is pretty sufficient for all purposes.