Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes default encoding from UTF8_GENERAL_CI to UTF8MB4_UNICODE_CI #1408

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

dresende
Copy link
Collaborator

@dresende dresende commented May 4, 2016

Does anyone have another opinion on this?

http://stackoverflow.com/a/766996/977422

@dresende
Copy link
Collaborator Author

dresende commented May 4, 2016

Travis configuration should be updated:

  • remove v0.6 and v0.8
  • add v6
  • fix/remove mariadb?

@dougwilson
Copy link
Member

As for this change, I definitely think it needs to wait for 3.0, probably going along with actually fixing the way encoding works in this module, which is still fundamentally broken. The reason I think this is a major version change is because it is breaking (it will break against old server versions that do not understand this encoding) and also I noticed it will actually act differently against people using the current encoding and writing to mb4 columns with chars outside the BMP, thus changing their data storage without a major.

As for your trouble testing, not sure why you are having trouble. I literally just ran all these tests at the beginning of this week and everything worked. I am not going to remove 0.6/0.8 testing as long as we support those versions, and removing support is also a major version bump. I agree that we can definitely remove on 3.0, but this should not be because it's not technically possible to test, just less headaches.

@dougwilson
Copy link
Member

Ah, @dresende, I see why the tests are failing: you committed failing commits directly to master. Please revert commits on master that fail CI. Please us branches to test commits before making them to master, or at least revert them on master as soon as you see them fail. Otherwise everyone coming to this project for the past 2 hours will think this is a joke project that cannot even pass it's own tests.

@dougwilson
Copy link
Member

@dresende for now I just moved those commits over to this branch so we can take whatever time is needed figuring out what the test failures are and can get the build badges back to "green" in the meantime (I'm surprised I figured out how to do this from my phone :O while I'm away from the computer).

@dougwilson
Copy link
Member

Apparently even after reverting those changes from master, AppVeyor is not going to pass. The changes in the May 2nd update (http://www.appveyor.com/updates) must have changed something, because mysqladmin no longer seems to be in the PATH. Someone will need to take a look into that and get AppVeyor back working again, which is now an urgent ask since master was touched and we have a failing badge :( I should be on a computer in about 6 hours or so, so if no one has fixed it before then, I can take a look.

@dresende
Copy link
Collaborator Author

dresende commented May 4, 2016

About the encoding, I would be happy with change from utf8_general_ci to utf8_unicode_ci. The general one is just wrong.

@dresende
Copy link
Collaborator Author

dresende commented May 4, 2016

About committing to master, it's supposed to be unstable. And AppVeyor is to blame on this too. I prefer to see a project failing on master with a commit from 1 week ago than a project succeeding with a commit 2 months old :)

@dougwilson
Copy link
Member

About the encoding, I would be happy with change from utf8_general_ci to utf8_unicode_ci . The general one is just wrong.

It's still a major version bump in my mind, unless you can explain how it would not change the behavior of people's apps when upgrading? It would seem to me if it doesn't change anything then there is no point and if it does change something... well now we're breaking people and it would need to be a major version bump.

About committing to master, it's supposed to be unstable.

No, not on this repo, which is run with a stable master. If we think it should be unstable, then let me know what needs to be changed so that the badges on our published npm version wont be red and failing, driving users away.

And AppVeyor is to blame on this too.

Sure, but if something is pushed to master and breaks the build, the committer should not be doing anything else but fixing the issue. Yes, CIs can break, but I don't think that after all these hours any effort has been put into trying to fix it?

@dougwilson
Copy link
Member

than a project succeeding with a commit 2 months old :)

The commit may have been old, but I literally just pushed it up this week, it it has been succeeding for just a few days now. Anyone can click on the build and see when it was last run.

@dougwilson
Copy link
Member

I had pushed up this weekend because I wanted to cut a patch with the pending changes. Now that the build is failing, I will not release any patches until it is resolved and passing so we know we are releasing something that functions on Windows.

@dougwilson
Copy link
Member

Actually that commit was just pushed up two days ago, not months ago. You can see on the build page https://travis-ci.org/felixge/node-mysql/builds that "Update bignumber.js to 2.3.0" was only just built 2 days ago, not months ago, so this means the project was building fine just recently.

And the same with AppVeyor: https://ci.appveyor.com/project/dougwilson/node-mysql/history. This project was building just fine only 2 days ago.

@dougwilson
Copy link
Member

So since I got to a computer, I was able to read the AppVeyor upgrade notes and just update the path, it was pretty straight forward.

Back on topic to this pull request, I think changing the charset makes sense, just my point is that it would be a major version bump unless there is some reason it's not that I'm not seeing (http://semver.org/ has the general guidelines for how to determine these things).

If this is agreeable, we can mark this as milestone 3.0. Currently everything in https://github.com/felixge/node-mysql/milestones/3.0 is slated for 3.0, with some ready to go, just waiting for a 3.0 to land on, to other not started, but are really BIG problems, for example the really non-sense charset handling in this module (#804). Since this seems so closely related to #804 to me, I think it would make the most sense to land them together. I know #804 does not have much detail, but it boils down to this module just completely ignores all the charset information coming in the fields packets, creating a bunch of garbage in various cases. In fact, this module literally assumes that the charset sent in the handshake was accepted the server, which is not always the case, and changing this to a mb4 one could make this case more common, and case a lot of issues if #804 is not fixed first. For example, the client starts asking for mb4, the server doesn't link it and just falls back to latin1. Now this module is decoding as UTF-8, not latin1.

@dresende
Copy link
Collaborator Author

dresende commented May 4, 2016

Check the response on stack overflow that I mentioned, I added on purpose. utf8_general_ci is bad for non roman languages, specially for sorting. It's not the default for some years now on mysql. There's no rush on pulling this, it's just something I believe is important to set as good defaults. It's there at least since 5.5.3 and no one should use that version anymore.. (I know, legacy, but I prefer security updates)

@dresende dresende added this to the 3.0 milestone May 4, 2016
@dougwilson
Copy link
Member

dougwilson commented May 4, 2016

Right, I read the SO post :) and I agree, this change makes sense, even as-is being MB4-based. I guess my only opinion is just that I think it makes sense to target 3.0, not that the change itself is bad, if that makes sense.

@dresende
Copy link
Collaborator Author

dresende commented May 4, 2016

About supporting v0.6 and v0.8, I was just considering tests. We shouldn't worry about versions that are no longer maintained (v0.10 will be off soon). And to properly update the module to handle encodings I think those versions will be an obstacle.

@dresende
Copy link
Collaborator Author

dresende commented May 4, 2016

Indirectly, it can force people to upgrade to more stable and maintained versions (in terms of security).

@dougwilson
Copy link
Member

And to properly update the module to handle encodings I think those versions will be an obstacle.

Since doing that I couldn't figure out how to make supporting it working in a backwards-compatible way, we can drop any versions we want during that change since it's major anyway. I figure no point on even trying to go below 0.10 these days.

@kai-koch
Copy link
Collaborator

kai-koch commented May 5, 2016

I think, changing the default encoding is a good thing.
The question is when.
I usually change the default encoding on my projects anyway.
But changing this before 3.0 might break things for other people.
IMO 3.0 is the right time to make this change.

@dougwilson dougwilson force-pushed the master branch 2 times, most recently from 48ff604 to e9dcd8a Compare May 7, 2016 02:46
@sidorares
Copy link
Member

sidorares commented Jun 9, 2016

for some reason the code below works ok with utf8 encoding and not with utf8mb4 (I'd expect the opposite)

c.query('select "test 💩" ', (err, rows, fields) => {
  console.log(rows, fields);
});

When UTF8MB4_UNICODE_CI is used as charset, all chars outside of BMP are replaced with '?' ( single 0x3f byte ) - in field name only, but not in the value, so the output using this branch is

[ RowDataPacket { 'test ?': 'test 💩' } ] [ FieldPacket {
    catalog: 'def',
    db: '',
    table: '',
    orgTable: '',
    name: 'test ?',
    orgName: '',
    charsetNr: 224,
    length: 24,
    type: 253,
    flags: 1,
    decimals: 31,
    default: undefined,
    zeroFill: false,
    protocol41: true } ]

and with master it's

[ RowDataPacket { 'test 💩': 'test 💩' } ] [ FieldPacket {
    catalog: 'def',
    db: '',
    table: '',
    orgTable: '',
    name: 'test 💩',
    orgName: '',
    charsetNr: 33,
    length: 27,
    type: 253,
    flags: 1,
    decimals: 31,
    default: undefined,
    zeroFill: false,
    protocol41: true } ]

@dougwilson , do you have explanation for this mysql behaviour?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants