Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MySQL commands in install instructions (& elsewhere?) don't set/warn re: utf8, collation, etc #60

Closed
rowanthorpe opened this issue Sep 13, 2013 · 15 comments

Comments

@rowanthorpe
Copy link
Contributor

I don't know how you'd prefer the docs updated for this so I will just raise it as an issue, and perhaps it will not only be a docs issue. Mysql clients and servers (except perhaps the most recent versions) still default to latin1, so in the instructions at https://github.com/inex/IXP-Manager/wiki/Installation-03-Database-Creation I recommend adding --default-character-set=utf8 to the mysql invocation, or better yet utf8mb4 if you consider this guy's warning important enough. This obviously goes hand-in-hand with my PR #57 which ensured PHP connects in utf8 too... and perhaps the perl-scripts need to be checked similarly too?

NB: there is also the issue of default collation (e.g. utf8-general-ci or utf8-unicode-ci, etc).

@rowanthorpe
Copy link
Contributor Author

Here is a blog post about Forcing Utf-8 Compliance for All Connections just in case it helps (and just in case it mentions something not already known)...

@nickhilliard
Copy link
Member

probably best on an open source project to handle it explicitly in the code rather than requiring some extra server magic to do it for you. otherwise yes, we need to use wtf-8 exclusively

@pdxmaverick
Copy link

Agreed, I am new to the project and looking to deploy it in the US.

cheers
Brian

Brian Thompson
Senior Infrastructure Engineer // Senior Second Guesser

Direct: 503.943.6779
Mobile: 503.707.9018 // Twitter: iovation
www.iovation.com

On Fri, Sep 13, 2013 at 6:38 AM, Nick Hilliard notifications@github.comwrote:

probably best on an open source project to handle it explicitly in the
code rather than requiring some extra server magic to do it for you.
otherwise yes, we need to use wtf-8 exclusively


Reply to this email directly or view it on GitHubhttps://github.com//issues/60#issuecomment-24394856
.

@barryo
Copy link
Member

barryo commented Sep 13, 2013

@nickhilliard said:

yes, we need to use wtf-8 exclusively

I fully endorse a switch to what-the-fuck-8 encoding for IXP Manager ;)

@rowanthorpe
Copy link
Contributor Author

@barryo :-D I am so going to call it that from now on.

@nickhilliard It is not so much a case of "extra server magic" as it is a case of saying

"MySQL, please believe me when I say I want unicode, and don't default to latin1
anyway, thinking I don't know what I'm doing"

Everything in IXP-M seems tuned to utf8 anyway, so it is not so much a "switch to utf8" as it is

"Explicitly telling the db to treat text as utf8 in order to align with existing
code, rather than carrying on blindly and *usually* getting away with it
because most ISPs' names use Latin letters anyway"

In a sane world MySQL should be defaulting to UTF-8 by now, but until such time in my opinion at least the initial commandline "create database" should ensure it. It is also about consistency because I found out the hard way that all the tools default to latin1 except for mysqldump which defaults to utf8, so by not using any commandline flags, I once ended up with double-encoded data which looked like spaghetti. I blogged a fixup tool for it a while ago here, which I hope to never need to use again...

@nickhilliard
Copy link
Member

U NO LIKE latin1_swedish_ci??

@rowanthorpe
Copy link
Contributor Author

...extra note: when adding flags like --default-character-set=utf8 or --default-character-set=utf8mb4 they would now also need to be added to the database creation instructions in the newly created "continuous integration" pages...

@barryo
Copy link
Member

barryo commented Feb 3, 2014

@rowanthorpe - you're PR #57 addressed the main issue - that IXP Manager connects with the database using utfmb4. I've updated the wiki to reflect this as part of the set-up instructions as well as the CI set up script.

As all scripts do or will eventually only interact with IXP Manager via the CLI / API / web interface, this means all connections will be utfmb4.

@barryo barryo closed this as completed Feb 3, 2014
@nickhilliard
Copy link
Member

is there a specific reason we're using utfmb4 instead of utf8?

@barryo
Copy link
Member

barryo commented Feb 3, 2014

Read back through the thread. This link in particular:

http://mathiasbynens.be/notes/mysql-utf8mb4

@nickhilliard
Copy link
Member

No description provided.

@rowanthorpe
Copy link
Contributor Author

Um, even if all of IXP-Manager's code uses unicode, the problem (which I actually opened this PR for) is that the instructions for manual setup don't specify unicode explicitly, and this means that in all but the latest versions of MySQL it will default to latin1 when created...

Also, for the patch I sent for the config file I specified 'utf8' (not 'utf8mb4') in order to err on the side of caution, and because I wasn't sure what kind of performance hit the mb4 version might cause. Perhaps you will want to profile a few things before deciding whether to set both that config file (and the manual install instructions which this PR discusses) to either 'utf8' or 'utf8mb4'. Either way, people manually setting their databases to latin1 (by omission) and then the script talking unicode to it will cause a world of hurt with anything other than ASCII.

@rowanthorpe
Copy link
Contributor Author

Oops, just noticed you said you updated the wiki @barryo - am looking at it now. Anyway, my comment about mb4-or-not still holds...

@barryo
Copy link
Member

barryo commented Feb 3, 2014

Um, even if all of IXP-Manager's code uses unicode, the problem (which I actually opened this PR
for) is that the instructions for manual setup don't specify unicode explicitly

I've already updated the docs. If I missed something, please show me.

Also, for the patch I sent for the config file I specified 'utf8' (not 'utf8mb4') in order to err on the side
of caution, and because I wasn't sure what kind of performance hit the mb4 version might cause.

MySQL's utf8 is wrong, utf8mb4 is correct. UTF8 only uses the amount of bytes required. I'm happy with this choice.

@rowanthorpe
Copy link
Contributor Author

Ah, I just looked at the file previously changed by PR #57, and noticed that it has since been further updated to utf8mb4 (my change cautiously only set it to utf8, and I only mentioned utf8mb4 here, so I presumed it was still at that state). All good now then (I presume the mb4 hasn't caused too much of a performance hit, or it would have been evident by now).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants