Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected result of using in DSN string collation and charset parameters at the same time. #745

Closed
soslanco opened this issue Jan 31, 2018 · 12 comments · Fixed by #1437
Closed
Milestone

Comments

@soslanco
Copy link

soslanco commented Jan 31, 2018

Issue description

Result of using in DSN string charset and non default collation parameters together not as expected.

In file github.com/go-sql-driver/mysql/connection.go executed SET NAMES sql command without COLLATE parameter. Therefore connection collation = default collation for charset.

Example code

sql.Open("mysql", "user:password@/db?charset=utf8mb4&collation=utf8mb4_bin")

Result

collation_connection: utf8mb4_general_ci

Configuration

Driver version (or git SHA): bc14601

Go version: go1.9.3 linux/amd64

Server version: MariaDB 10.2.12

Server OS: RHEL7

@dolmen
Copy link
Contributor

dolmen commented Apr 17, 2018

Please add the Go code that shows how you query the current charset and the current collation.

@dolmen
Copy link
Contributor

dolmen commented Apr 17, 2018

var collation string
db.QueryRow(`SELECT @@collation_connection`).Scan(&collation)
fmt.Println("Collation:", collation)

@dolmen
Copy link
Contributor

dolmen commented Apr 17, 2018

Note that the documentation for charset says:

Usage of the charset parameter is discouraged because it issues additional queries to the server. Unless you need the fallback behavior, please use collation instead.

The workaround is to set only collation and not charset.

@soslanco
Copy link
Author

soslanco commented Apr 17, 2018

No problems while using charset and collation parameters separately.
Issue is when using both parameters together.
Possible solution: SET NAMES 'charset_name' [COLLATE 'collation_name']

@soslanco
Copy link
Author

soslanco commented Apr 17, 2018

Go code:
db, _ = sql.Open("mysql", "user:password@/db?charset=utf8mb4&collation=utf8mb4_bin")
rows, _ = db.Query("SHOW VARIABLES LIKE 'collation%'")

Output:
collation_connection: utf8mb4_general_ci
collation_database: utf8mb4_bin
collation_server: utf8mb4_bin

@methane
Copy link
Member

methane commented Apr 17, 2018 via email

@soslanco
Copy link
Author

soslanco commented Apr 17, 2018

It's not my code: https://github.com/go-sql-driver/mysql/blob/master/connection.go#L62

Using both of them not deprecated: https://dev.mysql.com/doc/refman/5.7/en/charset-connection.html

Setting character_set_connection to charset_name also implicitly sets collation_connection to the default collation for charset_name. It is unnecessary to set that collation explicitly. To specify a particular collation, use the optional COLLATE clause:

SET NAMES 'charset_name' COLLATE 'collation_name'

@methane
Copy link
Member

methane commented Apr 18, 2018

@soslanco I meant about parameter of DSN. collation=utf8mb4_bin doesn't use SET NAMES.

See here. https://github.com/go-sql-driver/mysql#charset

Usage of the charset parameter is discouraged because it issues additional queries to the server. Unless you need the fallback behavior, please use collation instead.

@dolmen
Copy link
Contributor

dolmen commented Jun 15, 2018

@methane Users can be wrong, despites a good intent (specifying both charset and collation in a connection string looks good until you read the documentation). Worse: the fact that charset overrides collation is not documented.

So the driver should catch this user error because it can: it should ignore charset if collation is set.

@methane
Copy link
Member

methane commented Jun 15, 2018

I think there are two long-term solution:

  • Remove charset entirely
  • Return error when both are set.
    • When only charset is specified, use default collation for the charset. Don't use SET NAMES.

@methane methane removed the bug label Oct 24, 2018
@kaihendry
Copy link

I had issues with my collation_connection being utf8mb4_general_ci with charset=utf8mb4&collation=utf8mb4_unicode_ci. Removing charset & keeping just &collation=utf8mb4_unicode_ci in the DSN correctly results in collation_connection being utf8mb4_unicode_ci.

That's quite a gotcha.

@methane
Copy link
Member

methane commented May 25, 2023

I decided to fix this based on my experience of maintaining Python MySQL drivers.

  • When both of charset and collation are set, send SET NAMES charset COLLATE collation query on every new connection.
    • This fixes this issue.
  • When only charset is set, send SET NAMES charset.
    • This fixes wrong default collation problem (e.g. MySQL 8 uses utf8mb4_0900_ai_ci as default collation for utf8mb4, but MySQL 5.7 and MariaDB uses utf8mb4_general_ci).
    • This was discouraged, but it's the best way to chose server default collation.
  • When only collation is set, don't send SET NAMES query.
    • This saves one additional roundtrip.
    • When collation is specified, the risk of collation table mismatch is low.
    • If problem happen, we can support collation=<collation_id number> form.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants