Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

utf8mb4 character set instead of utf8 #1340

Closed
Pierric82 opened this issue Feb 23, 2022 · 11 comments
Closed

utf8mb4 character set instead of utf8 #1340

Pierric82 opened this issue Feb 23, 2022 · 11 comments
Assignees

Comments

@Pierric82
Copy link

Force utf8mb4 character set in MariaDB tables

Need for this feature
A typical nocoDB + mariaDB installation will see insertions fail (and the API return an error) when trying to insert some subset of Unicode characters (4-byte UTF-8 characters). Although the root issue might be considered to be on MariaDB side (since 2013?!), this proposal would solve it.

Is your feature request related to a problem? Please describe.
By default MariaDB uses utf8 as character encoding. In MariaDB, this set supports unicode characters encoded in 3 bytes such as ⏪, but not those encoded on 4 bytes such as 🎪. When inserting such characters (at least through the API bulk creation), an error is returned, because MariaDB complains about Incorrect string value: '\xF0\x9F\x8E\xAA \xE2...' for column d1.nc_2fmg__testemails.subject at row 1".

Describe the solution you'd like
Although there might be options to set up MariaDB with the correct encoding utf8mb4, I think a lot of people would stumble upon this issue when trying out nocodb + mariadb. I'm proposing to include functionality in nocodb to force the creation of new tables with that character set. I see 2 ways:

  • option 1: with an environment variable at the Docker container level
  • option 2: within the GUI, as a checkbox when creating a table
    In either case, I would strongly recommend making the option selected by default (i.e., use the utf8mb4 character set).

Describe alternatives you've considered
One could argue action should be taken on the MariaDB front directly. I haven't explored that direction, but the article that I found that got me to understand what was happening goes back to... 2013. So I wouldn't count on a quick fix there.

Alternatively, manually setting up MariaDB, either by adding

[mysqld]
character-set-server    = utf8mb4

to the MariaDB configuration (untested), or by converting the database to the character set through

alter database your_db_name character set = 'utf8mb4';

in a mysql client can be done, but would need to be documented somewhere for future NocoDB onboarders. When faced with the issue for the first time without any idea of what is going on, it's hard to link it to character encoding, so I think that a default option as proposed will be better for many people (also, people installing nocodb might not want to install a mysql client).

Additional context
For reference, to fix the issue on existing tables, one can convert tables with

alter table nc_2fmg__tablename convert to character set 'utf8mb4';

whereby 2fmg is the suffix in the project id. One should also convert the database as mentioned in the previous section, to ensure future tables don't have the issue.

Additional source that helped me: article on character sets in MariaDB.

@wingkwong
Copy link
Member

If you're using MariaDB docker, you can change the character set and collation by passing --character-set-server and --collation-server.

Example:

docker run --name mariadb \
-e MYSQL_ROOT_PASSWORD=password \
-p 3306:3306 \
-d docker.io/library/mariadb:10.3 \
--character-set-server=utf8mb4 \
--collation-server=utf8mb4_unicode_ci

Result:

image

@wingkwong wingkwong added the 📞 Status: Need Internal Discussion Something requires to be discussed internally. label Feb 24, 2022
@Pierric82
Copy link
Author

If you're using MariaDB docker, you can change the character set and collation by passing --character-set-server and --collation-server.

I did not know that, that is very useful! If that works indeed, then maybe future users can be avoided the pain I went through by adding some note on https://docs.nocodb.com/getting-started/installation/ and then maybe my proposal is a bit redundant (the core issue lies with MariaDB, and my point was really to help NocoDB onboarders with no MariaDB experience. The Docker approach, if made clear, might therefore be enough).

Cheers,
Pierric.

@wingkwong
Copy link
Member

Previously I've added the same command in MySQL docker-compose.yml.

# command: --character-set-server=utf8mb4 --collation-server=utf8mb4_unicode_ci

I agree that It'd be better to include it in documentation as well. I'll do it later. The checkbox option within GUI is kinda not necessary as I think this configuration should be set by DB owners when creating the database rather than by end users.

@wingkwong wingkwong removed the 📞 Status: Need Internal Discussion Something requires to be discussed internally. label Feb 24, 2022
@wingkwong wingkwong self-assigned this Feb 24, 2022
@wingkwong
Copy link
Member

added in 2f847dc

@github-actions
Copy link
Contributor

Fixed in v0.84.13.

@PierricSE
Copy link

Thanks @wingkwong !

For the sake of being exhaustive, the suggested approach does not seem to work with the arm7 (32bit) mariadb container from linxserver.io (it seems the official mariadb container does not support this architecture). When trying I got errors as if docker was trying to run the options themselves as the command; adding mysqld before the options didn't work either.

This is of course a very specific edge case now, so not expecting further action from nocodb side, but anyone impacted by this in the future might want to fall back to setting up the mariadb config file and/or database settings as suggested in the initial post.

Cheers,
Pierric.

@netvolcano
Copy link

I had a problem to input the Chinese characters in the Noco with MySQL 5.7 (==> Cannot input the Chinese characters in the table #1294). When I read the closed issue, I thought it might been solved in the Nocodb 0.84.14 because the root of my issue was very simlar to this. But the issue is still here. I cannot use Chinese characters in the Nocodb. I think it is best to support unicode as default for Nocodb.

@o1lab o1lab reopened this Mar 4, 2022
@o1lab o1lab changed the title MariaDB table creation with utf8mb4 character set instead of utf8 utf8mb4 character set instead of utf8 Mar 4, 2022
@o1lab
Copy link
Member

o1lab commented Mar 4, 2022

Let's relook into this without MariaDB in context.

@wingkwong
Copy link
Member

Thanks @wingkwong !

For the sake of being exhaustive, the suggested approach does not seem to work with the arm7 (32bit) mariadb container from linxserver.io (it seems the official mariadb container does not support this architecture). When trying I got errors as if docker was trying to run the options themselves as the command; adding mysqld before the options didn't work either.

This is of course a very specific edge case now, so not expecting further action from nocodb side, but anyone impacted by this in the future might want to fall back to setting up the mariadb config file and/or database settings as suggested in the initial post.

Cheers, Pierric.

Maybe it was due to the order of the options. You may try put them at the end. Otherwise, an alternative way is to update it using SQL.

@wingkwong
Copy link
Member

I had a problem to input the Chinese characters in the Noco with MySQL 5.7 (==> Cannot input the Chinese characters in the table #1294). When I read the closed issue, I thought it might been solved in the Nocodb 0.84.14 because the root of my issue was very simlar to this. But the issue is still here. I cannot use Chinese characters in the Nocodb. I think it is best to support unicode as default for Nocodb.

@netvolcano The issue you mentioned is actually not related to this issue here. Will discuss there.

@wingkwong
Copy link
Member

Conclusion:

You can update char set by the either way

  • passing args to docker run (if you are using docker). Example: See here.
  • updating command (if you are using docker-compose.yml). Example: See here.
  • updating directly in DB. Example: See here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants