Disabling split by varchar #1414

davidducos · 2024-02-22T21:10:22Z

I decided that we are going to disable the chunks by varchars.
It was not implemented correctly.
It was a good exercise, but it is not mature enough.

We need to develop a better understanding of the use cases where it will be useful. For instance, when binary is used or when UTF8MB4, and what is the best approach to split the chunks.

midenok · 2024-03-05T04:41:13Z

Any evidences it doesn't work correctly?

davidducos · 2024-03-06T12:44:14Z

@midenok,
when you mix digits and upper and lower characters, you will might end up exporting same chunks on different moments. This was not detected before due to checksum bugs.

midenok · 2024-03-06T23:51:17Z

What is the consequence of this? Duplicate PK?

davidducos · 2024-03-07T11:57:19Z

Hi @midenok, yes, data was being exported twice (or more) and in some scenarios, it was not being exported due to invalid characters.
I checked mysqlsh strategy and I don't like it as it is doing a SELECT MIN,MAX LIMIT N, which might take a similar amount of time of executing the SELECT to extract the data. I think that we can do better, but we need to take into consideration the character set and collation to do not leave gaps and make it performant.
The problem to solve is how we determine the min and max when we create a new chunk, taking into considerations the estimation of rows to export.
Taking into account that we have utf8mb4 now, we need to plan a good strategy... my first implementation was good for understanding this and I learn a lot, but it was not going to scale and it didn't work on all use cases.

Disabling split by varchar

3fd023c

davidducos added the Refactoring label Feb 22, 2024

davidducos added this to the Release 0.16.1-1 milestone Feb 22, 2024

davidducos closed this Feb 22, 2024

davidducos reopened this Feb 22, 2024

davidducos merged commit 530c689 into master Feb 22, 2024
31 of 35 checks passed

davidducos deleted the disabling_split_by_varchar branch March 12, 2024 15:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disabling split by varchar #1414

Disabling split by varchar #1414

davidducos commented Feb 22, 2024

midenok commented Mar 5, 2024

davidducos commented Mar 6, 2024

midenok commented Mar 6, 2024

davidducos commented Mar 7, 2024

Disabling split by varchar #1414

Disabling split by varchar #1414

Conversation

davidducos commented Feb 22, 2024

midenok commented Mar 5, 2024

davidducos commented Mar 6, 2024

midenok commented Mar 6, 2024

davidducos commented Mar 7, 2024