Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wip materialize my sql support enum data type #3

Conversation

sand6255
Copy link
Owner

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Changelog category (leave one):

  • New Feature

Copy link

@havardk havardk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just one nitpick.

src/Interpreters/MySQL/InterpretersMySQLDDLQuery.cpp Outdated Show resolved Hide resolved
case MYSQL_TYPE_ENUM:
{
UInt8 val = 0;
payload.readStrict(reinterpret_cast<char *>(&val), 1);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The payload may be one or two bytes here, depending on the enum. meta & 0xFF should give you the number of bytes.

src/DataTypes/DataTypeEnum.cpp Show resolved Hide resolved
/// For example ENUM('a', 'b', 'c') -> ENUM('a'=1, 'b'=2, 'c'=3)
if (type_name_upper.find("ENUM") != String::npos)
{
Int64 i = 0;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MySQL Supports values from 1 to 65535 here, while ClickHouse uses a signed 16-bit integer. This means we need to map MySQL values from 32768 to 65535 to negative numbers, -32768 to -1. This also requires changes in the replication code of course.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if we go from Enum8 to Enum16 with this mapping, we will mix up our values. What would be good approach to prevent it?
Also, maybe, we can shift everything backwards on -32768 (like, 0-> -32768, 1-> -32767...)? Then the integer values will remain in the same order as they do in MySQL, and that might be useful for some queries. Also will make easier to convert Enum8 to Enum16 (just substract some number, so we start from -32768 and not -128), as well as allowing us to easily change ClickHouse values to MySQL values in queries (just add 32768 to them)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or perhaps if we don't want to change an entire column of values when going from Enum8 to Enum16, as well as we don't want to have negative numbers representing our Enum - we can use only half of Enum8 values [1,127], if we reach 128 - we switch to Enum16 (preserving all the values in columns), and if we reach 32768 - we go to negative numbers, as you suggested. That way ClickHouse enum would almost never be different to MySQL enum (I imagine that enums for more then 32767 values are not really used).

# clickhouse_node.query("CREATE DATABASE materialize_with_enum_test ENGINE = MaterializeMySQL('{}:3306', 'materialize_with_enum_test', 'root', 'clickhouse')".format(service_name))
# check_query(clickhouse_node, "DESCRIBE TABLE materialize_with_enum_test.test", "id\tInt32\t\t\t\t\t\nvalue\tNullable(String)\t\t\ttest comment\t\t\n_sign\tInt8\tMATERIALIZED\t1\t\t\t\n_version\tUInt64\tMATERIALIZED\t1\t\t\t\n")
# clickhouse_node.query("DROP DATABASE materialize_with_enum_test")
# mysql_node.query("DROP DATABASE materialize_with_enum_test")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add something to this test with an ALTER TABLE in MySQL that changes the ENUM column to >= 256 values, and insert a row with a 2-byte value.

@sand6255 sand6255 merged this pull request into MaterializeMySQL-Support-Enum-Data-Type Jun 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants