Changes default encoding from UTF8_GENERAL_CI to UTF8MB4_UNICODE_CI #1408

dresende · 2016-05-04T15:01:37Z

Does anyone have another opinion on this?

http://stackoverflow.com/a/766996/977422

dresende · 2016-05-04T15:14:10Z

Travis configuration should be updated:

remove v0.6 and v0.8
add v6
fix/remove mariadb?

dougwilson · 2016-05-04T17:07:00Z

As for this change, I definitely think it needs to wait for 3.0, probably going along with actually fixing the way encoding works in this module, which is still fundamentally broken. The reason I think this is a major version change is because it is breaking (it will break against old server versions that do not understand this encoding) and also I noticed it will actually act differently against people using the current encoding and writing to mb4 columns with chars outside the BMP, thus changing their data storage without a major.

As for your trouble testing, not sure why you are having trouble. I literally just ran all these tests at the beginning of this week and everything worked. I am not going to remove 0.6/0.8 testing as long as we support those versions, and removing support is also a major version bump. I agree that we can definitely remove on 3.0, but this should not be because it's not technically possible to test, just less headaches.

dougwilson · 2016-05-04T17:21:21Z

Ah, @dresende, I see why the tests are failing: you committed failing commits directly to master. Please revert commits on master that fail CI. Please us branches to test commits before making them to master, or at least revert them on master as soon as you see them fail. Otherwise everyone coming to this project for the past 2 hours will think this is a joke project that cannot even pass it's own tests.

dougwilson · 2016-05-04T17:48:37Z

@dresende for now I just moved those commits over to this branch so we can take whatever time is needed figuring out what the test failures are and can get the build badges back to "green" in the meantime (I'm surprised I figured out how to do this from my phone :O while I'm away from the computer).

dougwilson · 2016-05-04T17:58:18Z

Apparently even after reverting those changes from master, AppVeyor is not going to pass. The changes in the May 2nd update (http://www.appveyor.com/updates) must have changed something, because mysqladmin no longer seems to be in the PATH. Someone will need to take a look into that and get AppVeyor back working again, which is now an urgent ask since master was touched and we have a failing badge :( I should be on a computer in about 6 hours or so, so if no one has fixed it before then, I can take a look.

dresende · 2016-05-04T21:59:02Z

About the encoding, I would be happy with change from utf8_general_ci to utf8_unicode_ci. The general one is just wrong.

dresende · 2016-05-04T22:00:25Z

About committing to master, it's supposed to be unstable. And AppVeyor is to blame on this too. I prefer to see a project failing on master with a commit from 1 week ago than a project succeeding with a commit 2 months old :)

dougwilson · 2016-05-04T22:06:24Z

About the encoding, I would be happy with change from utf8_general_ci to utf8_unicode_ci . The general one is just wrong.

It's still a major version bump in my mind, unless you can explain how it would not change the behavior of people's apps when upgrading? It would seem to me if it doesn't change anything then there is no point and if it does change something... well now we're breaking people and it would need to be a major version bump.

About committing to master, it's supposed to be unstable.

No, not on this repo, which is run with a stable master. If we think it should be unstable, then let me know what needs to be changed so that the badges on our published npm version wont be red and failing, driving users away.

And AppVeyor is to blame on this too.

Sure, but if something is pushed to master and breaks the build, the committer should not be doing anything else but fixing the issue. Yes, CIs can break, but I don't think that after all these hours any effort has been put into trying to fix it?

dougwilson · 2016-05-04T22:07:56Z

than a project succeeding with a commit 2 months old :)

The commit may have been old, but I literally just pushed it up this week, it it has been succeeding for just a few days now. Anyone can click on the build and see when it was last run.

dougwilson · 2016-05-04T22:09:53Z

I had pushed up this weekend because I wanted to cut a patch with the pending changes. Now that the build is failing, I will not release any patches until it is resolved and passing so we know we are releasing something that functions on Windows.

dougwilson · 2016-05-04T22:15:08Z

Actually that commit was just pushed up two days ago, not months ago. You can see on the build page https://travis-ci.org/felixge/node-mysql/builds that "Update bignumber.js to 2.3.0" was only just built 2 days ago, not months ago, so this means the project was building fine just recently.

And the same with AppVeyor: https://ci.appveyor.com/project/dougwilson/node-mysql/history. This project was building just fine only 2 days ago.

dougwilson · 2016-05-04T22:30:17Z

So since I got to a computer, I was able to read the AppVeyor upgrade notes and just update the path, it was pretty straight forward.

Back on topic to this pull request, I think changing the charset makes sense, just my point is that it would be a major version bump unless there is some reason it's not that I'm not seeing (http://semver.org/ has the general guidelines for how to determine these things).

If this is agreeable, we can mark this as milestone 3.0. Currently everything in https://github.com/felixge/node-mysql/milestones/3.0 is slated for 3.0, with some ready to go, just waiting for a 3.0 to land on, to other not started, but are really BIG problems, for example the really non-sense charset handling in this module (#804). Since this seems so closely related to #804 to me, I think it would make the most sense to land them together. I know #804 does not have much detail, but it boils down to this module just completely ignores all the charset information coming in the fields packets, creating a bunch of garbage in various cases. In fact, this module literally assumes that the charset sent in the handshake was accepted the server, which is not always the case, and changing this to a mb4 one could make this case more common, and case a lot of issues if #804 is not fixed first. For example, the client starts asking for mb4, the server doesn't link it and just falls back to latin1. Now this module is decoding as UTF-8, not latin1.

dresende · 2016-05-04T22:38:22Z

Check the response on stack overflow that I mentioned, I added on purpose. utf8_general_ci is bad for non roman languages, specially for sorting. It's not the default for some years now on mysql. There's no rush on pulling this, it's just something I believe is important to set as good defaults. It's there at least since 5.5.3 and no one should use that version anymore.. (I know, legacy, but I prefer security updates)

dougwilson · 2016-05-04T22:43:34Z

Right, I read the SO post :) and I agree, this change makes sense, even as-is being MB4-based. I guess my only opinion is just that I think it makes sense to target 3.0, not that the change itself is bad, if that makes sense.

dresende · 2016-05-04T22:46:40Z

About supporting v0.6 and v0.8, I was just considering tests. We shouldn't worry about versions that are no longer maintained (v0.10 will be off soon). And to properly update the module to handle encodings I think those versions will be an obstacle.

dresende · 2016-05-04T22:47:44Z

Indirectly, it can force people to upgrade to more stable and maintained versions (in terms of security).

dougwilson · 2016-05-04T22:51:00Z

And to properly update the module to handle encodings I think those versions will be an obstacle.

Since doing that I couldn't figure out how to make supporting it working in a backwards-compatible way, we can drop any versions we want during that change since it's major anyway. I figure no point on even trying to go below 0.10 these days.

kai-koch · 2016-05-05T10:18:14Z

I think, changing the default encoding is a good thing.
The question is when.
I usually change the default encoding on my projects anyway.
But changing this before 3.0 might break things for other people.
IMO 3.0 is the right time to make this change.

…ODE_CI http://stackoverflow.com/a/766996/977422

sidorares · 2016-06-09T04:47:00Z

for some reason the code below works ok with utf8 encoding and not with utf8mb4 (I'd expect the opposite)

c.query('select "test 💩" ', (err, rows, fields) => {
  console.log(rows, fields);
});

When UTF8MB4_UNICODE_CI is used as charset, all chars outside of BMP are replaced with '?' ( single 0x3f byte ) - in field name only, but not in the value, so the output using this branch is

[ RowDataPacket { 'test ?': 'test 💩' } ] [ FieldPacket {
    catalog: 'def',
    db: '',
    table: '',
    orgTable: '',
    name: 'test ?',
    orgName: '',
    charsetNr: 224,
    length: 24,
    type: 253,
    flags: 1,
    decimals: 31,
    default: undefined,
    zeroFill: false,
    protocol41: true } ]

and with master it's

[ RowDataPacket { 'test 💩': 'test 💩' } ] [ FieldPacket {
    catalog: 'def',
    db: '',
    table: '',
    orgTable: '',
    name: 'test 💩',
    orgName: '',
    charsetNr: 33,
    length: 27,
    type: 253,
    flags: 1,
    decimals: 31,
    default: undefined,
    zeroFill: false,
    protocol41: true } ]

@dougwilson , do you have explanation for this mysql behaviour?

dougwilson force-pushed the master branch from bd413f3 to 880c1ed Compare May 4, 2016 17:46

dresende added this to the 3.0 milestone May 4, 2016

sidorares mentioned this pull request May 5, 2016

fix character encoding handling and use UTF8MB4_UNICODE_CI by default sidorares/node-mysql2#302

Closed

dougwilson force-pushed the master branch 2 times, most recently from 48ff604 to e9dcd8a Compare May 7, 2016 02:46

server: changes default encoding from UTF8_GENERAL_CI to UTF8MB4_UNIC…

b6b7f06

…ODE_CI http://stackoverflow.com/a/766996/977422

dougwilson force-pushed the new-default-encoding branch from 9355039 to b6b7f06 Compare May 7, 2016 02:59

dougwilson force-pushed the master branch 2 times, most recently from 5847ec8 to be37e88 Compare May 8, 2016 05:09

dougwilson force-pushed the master branch 3 times, most recently from 65c4c0c to fa96a75 Compare June 8, 2016 02:00

sidorares mentioned this pull request Aug 15, 2016

Fixes #302 sidorares/node-mysql2#374

Merged

8 tasks

dougwilson force-pushed the master branch 2 times, most recently from 638d79d to 88bade0 Compare June 29, 2017 18:13

prust mentioned this pull request Apr 29, 2018

Feature request: support unicode characters. rockdai/sql-bricks#102

Closed

dougwilson force-pushed the master branch 2 times, most recently from 946727b to 37fbbdd Compare November 18, 2018 02:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changes default encoding from UTF8_GENERAL_CI to UTF8MB4_UNICODE_CI #1408

Changes default encoding from UTF8_GENERAL_CI to UTF8MB4_UNICODE_CI #1408

dresende commented May 4, 2016

dresende commented May 4, 2016

dougwilson commented May 4, 2016

dougwilson commented May 4, 2016

dougwilson commented May 4, 2016

dougwilson commented May 4, 2016

dresende commented May 4, 2016

dresende commented May 4, 2016 •

edited

Loading

dougwilson commented May 4, 2016

dougwilson commented May 4, 2016

dougwilson commented May 4, 2016

dougwilson commented May 4, 2016

dougwilson commented May 4, 2016

dresende commented May 4, 2016

dougwilson commented May 4, 2016 •

edited

Loading

dresende commented May 4, 2016

dresende commented May 4, 2016

dougwilson commented May 4, 2016

kai-koch commented May 5, 2016

sidorares commented Jun 9, 2016 •

edited

Loading

Changes default encoding from UTF8_GENERAL_CI to UTF8MB4_UNICODE_CI #1408

Are you sure you want to change the base?

Changes default encoding from UTF8_GENERAL_CI to UTF8MB4_UNICODE_CI #1408

Conversation

dresende commented May 4, 2016

dresende commented May 4, 2016

dougwilson commented May 4, 2016

dougwilson commented May 4, 2016

dougwilson commented May 4, 2016

dougwilson commented May 4, 2016

dresende commented May 4, 2016

dresende commented May 4, 2016 • edited Loading

dougwilson commented May 4, 2016

dougwilson commented May 4, 2016

dougwilson commented May 4, 2016

dougwilson commented May 4, 2016

dougwilson commented May 4, 2016

dresende commented May 4, 2016

dougwilson commented May 4, 2016 • edited Loading

dresende commented May 4, 2016

dresende commented May 4, 2016

dougwilson commented May 4, 2016

kai-koch commented May 5, 2016

sidorares commented Jun 9, 2016 • edited Loading

dresende commented May 4, 2016 •

edited

Loading

dougwilson commented May 4, 2016 •

edited

Loading

sidorares commented Jun 9, 2016 •

edited

Loading