Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cyrillic is stored strangely #20

Open
Ser5 opened this issue Jul 15, 2015 · 3 comments
Open

Cyrillic is stored strangely #20

Ser5 opened this issue Jul 15, 2015 · 3 comments

Comments

@Ser5
Copy link
Contributor

Ser5 commented Jul 15, 2015

Here is the code:

Gatekeeper::createGroup(array(
    'name'        => 'spam_receivers',
    'description' => 'Получатели рассылки',
));
$group = Gatekeeper::findGroupByName('spam_receivers');
var_dump($group->name, $group->description);

As you can see, there are two words in description.
The PHP script is written in UTF-8.

When I look into the "groups" table, I see this: "Получатели"
It is one word stored in an unknown way for me.
var_dump() gives me this:

string(14) "spam_receivers"
string(20) "Получатели"

It manages to decode the garbage back to cyrillic, but the second word is lost anyway.

Users cyrillic UTF data is stored in the same way - as garbage.

I think UTF-8 should be stored as UTF-8 without any magic.
BTW, groups table has right encoding of "utf8_general_ci".

@enygma
Copy link
Member

enygma commented Jul 15, 2015

Hmm, weird...I don't manually set the encoding on those tables in the migrations. Maybe I should, I think Phinx can do that. I'd have to check into it though. I hadn't really tested much yet with other character sets or languages.

@Ser5
Copy link
Contributor Author

Ser5 commented Jul 15, 2015

Please look at my pull request #25. I'm not sure if I created pull request right - didn't do it before, hehe; I just clicked some buttons here and there and something happened, I hope right things happened.

@Ser5
Copy link
Contributor Author

Ser5 commented Jul 15, 2015

Regarding description being truncated - please also see #23, about varchar(20): word "Получатели" consists exactly of 10 chars, which, without charset=utf-8, converts to "Получатели" consisting of 20 chars, second word doesn't fit into description field, being lost. With charset=utf-8 both words fit well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants