-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wip materialize my sql support enum data type #3
Wip materialize my sql support enum data type #3
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just one nitpick.
src/Core/MySQL/MySQLReplication.cpp
Outdated
case MYSQL_TYPE_ENUM: | ||
{ | ||
UInt8 val = 0; | ||
payload.readStrict(reinterpret_cast<char *>(&val), 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The payload may be one or two bytes here, depending on the enum. meta & 0xFF
should give you the number of bytes.
/// For example ENUM('a', 'b', 'c') -> ENUM('a'=1, 'b'=2, 'c'=3) | ||
if (type_name_upper.find("ENUM") != String::npos) | ||
{ | ||
Int64 i = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MySQL Supports values from 1 to 65535 here, while ClickHouse uses a signed 16-bit integer. This means we need to map MySQL values from 32768 to 65535 to negative numbers, -32768 to -1. This also requires changes in the replication code of course.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But if we go from Enum8 to Enum16 with this mapping, we will mix up our values. What would be good approach to prevent it?
Also, maybe, we can shift everything backwards on -32768 (like, 0-> -32768, 1-> -32767...)? Then the integer values will remain in the same order as they do in MySQL, and that might be useful for some queries. Also will make easier to convert Enum8 to Enum16 (just substract some number, so we start from -32768 and not -128), as well as allowing us to easily change ClickHouse values to MySQL values in queries (just add 32768 to them)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or perhaps if we don't want to change an entire column of values when going from Enum8 to Enum16, as well as we don't want to have negative numbers representing our Enum - we can use only half of Enum8 values [1,127], if we reach 128 - we switch to Enum16 (preserving all the values in columns), and if we reach 32768 - we go to negative numbers, as you suggested. That way ClickHouse enum would almost never be different to MySQL enum (I imagine that enums for more then 32767 values are not really used).
# clickhouse_node.query("CREATE DATABASE materialize_with_enum_test ENGINE = MaterializeMySQL('{}:3306', 'materialize_with_enum_test', 'root', 'clickhouse')".format(service_name)) | ||
# check_query(clickhouse_node, "DESCRIBE TABLE materialize_with_enum_test.test", "id\tInt32\t\t\t\t\t\nvalue\tNullable(String)\t\t\ttest comment\t\t\n_sign\tInt8\tMATERIALIZED\t1\t\t\t\n_version\tUInt64\tMATERIALIZED\t1\t\t\t\n") | ||
# clickhouse_node.query("DROP DATABASE materialize_with_enum_test") | ||
# mysql_node.query("DROP DATABASE materialize_with_enum_test") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add something to this test with an ALTER TABLE in MySQL that changes the ENUM column to >= 256 values, and insert a row with a 2-byte value.
I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
Changelog category (leave one):