New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding tests for 4 byte unicode characters #17978

Merged
merged 15 commits into from Feb 7, 2017

Conversation

Projects
None yet
10 participants
@MorrisJobke
Member

MorrisJobke commented Jul 30, 2015

  • success on SQLite and Postgres
  • failure on MySQL due to the limited charset that only supports up to 3 bytes

Use case:

2015-07-30_10-55-52

People will name their folders/files with emojis. This then will cause errors like this on mysql:

An exception occurred while executing 'UPDATE `oc_filecache` SET `storage` = ?, `path` = ?, `path_hash` = ?, `name` = ?, `parent` =? WHERE `fileid` = ?' with params ["1", "files\/\ud83d\udca9.txt", "743923d586454a40e08523f558ae2955", "\ud83d\udca9.txt", "2", 3]: SQLSTATE[22007]: Invalid datetime format: 1366 Incorrect string value: '\xF0\x9F\x92\xA9.t...' for column 'path' at row 1

cc @Raydiation @PVince81 @DeepDiver1975

ref #4513

@DeepDiver1975 DeepDiver1975 added this to the backlog milestone Jul 30, 2015

@DeepDiver1975

This comment has been minimized.

Show comment
Hide comment
@DeepDiver1975

DeepDiver1975 Jul 30, 2015

Member

failure on MySQL due to the limited charset that only supports up to 3 bytes

as discussed on IRC: this is by design - nothing we can do.

Solution for apps: encode the data
Solution for filenames: 4byte chars are not allowed

Member

DeepDiver1975 commented Jul 30, 2015

failure on MySQL due to the limited charset that only supports up to 3 bytes

as discussed on IRC: this is by design - nothing we can do.

Solution for apps: encode the data
Solution for filenames: 4byte chars are not allowed

@MorrisJobke

This comment has been minimized.

Show comment
Hide comment
@MorrisJobke

MorrisJobke Jul 30, 2015

Member

Solution for filenames: 4byte chars are not allowed

I don't think that this is a solution. I also don't like the "solution" to simply not allow : in filenames just because Windows can't handle it. It's just stupid to get driven by the easiest way. 🙈 🙊 🙉

If there is technically no way to do this on MySQL/MariaDB we should add a big warning on installation/ in the docs to not use MySQL. Better sooner than later.

But first of all I try to play around with char sets within MySQL.

Member

MorrisJobke commented Jul 30, 2015

Solution for filenames: 4byte chars are not allowed

I don't think that this is a solution. I also don't like the "solution" to simply not allow : in filenames just because Windows can't handle it. It's just stupid to get driven by the easiest way. 🙈 🙊 🙉

If there is technically no way to do this on MySQL/MariaDB we should add a big warning on installation/ in the docs to not use MySQL. Better sooner than later.

But first of all I try to play around with char sets within MySQL.

@LukasReschke

This comment has been minimized.

Show comment
Hide comment
@LukasReschke

LukasReschke Jul 30, 2015

Member

That this works at all is another bug: #17981

Member

LukasReschke commented Jul 30, 2015

That this works at all is another bug: #17981

@DeepDiver1975

This comment has been minimized.

Show comment
Hide comment
@DeepDiver1975

DeepDiver1975 Jul 30, 2015

Member

I also don't like the "solution" to simply not allow : in filenames just because Windows can't handle it. It's just stupid to get driven by the easiest way. 🙈 🙊 🙉

This is not longer the case since 8.1 - #14346

Member

DeepDiver1975 commented Jul 30, 2015

I also don't like the "solution" to simply not allow : in filenames just because Windows can't handle it. It's just stupid to get driven by the easiest way. 🙈 🙊 🙉

This is not longer the case since 8.1 - #14346

@scrutinizer-notifier

This comment has been minimized.

Show comment
Hide comment
@DeepDiver1975

This comment has been minimized.

Show comment
Hide comment
@DeepDiver1975

DeepDiver1975 Jul 30, 2015

Member

HAHA ... you failed! @MorrisJobke You did not listen to me on IRC!
👎 👎 👎 👎 👎 👎 👎

Member

DeepDiver1975 commented Jul 30, 2015

HAHA ... you failed! @MorrisJobke You did not listen to me on IRC!
👎 👎 👎 👎 👎 👎 👎

@MorrisJobke

This comment has been minimized.

Show comment
Hide comment
@MorrisJobke

MorrisJobke Jul 30, 2015

Member

HAHA ... you failed! @MorrisJobke You did not listen to me on IRC!

Please be a bit more precise... I tried to address all your concerns. 😫

Member

MorrisJobke commented Jul 30, 2015

HAHA ... you failed! @MorrisJobke You did not listen to me on IRC!

Please be a bit more precise... I tried to address all your concerns. 😫

@DeepDiver1975

This comment has been minimized.

Show comment
Hide comment
@DeepDiver1975

DeepDiver1975 Jul 30, 2015

Member

HAHA ... you failed! @MorrisJobke You did not listen to me on IRC!

Okay - I'm taking this one back .....

Member

DeepDiver1975 commented Jul 30, 2015

HAHA ... you failed! @MorrisJobke You did not listen to me on IRC!

Okay - I'm taking this one back .....

@MorrisJobke

This comment has been minimized.

Show comment
Hide comment
@MorrisJobke

MorrisJobke Jul 30, 2015

Member

We discussed this a lot and that was the result:

Pro

  • we can handle properly 4 byte characters in the DB
  • we can drop the limitation to not accept 4 byte characters in file names
  • apps doesn't need to take care of 4 byte character on their own, because the core handles it
  • we have a plan to migrate at some point in time to a proper DB encoding/collation in MySQL by default
  • MySQL doesn't have any disadvantage compared to Sqlite/Postgres anymore (both support already 4 byte charavters)

Cons

  • with the config option this is nearly untested/unused code paths that tend to be really critical once they get activated
  • we don't want to have yet another experimental feature switch
  • even more complex testing (more resources are needed)
  • switching this on by default is not an option - some web hoster instances could be get to a "not supported anymore" state because they cannot tweak their MySQL this much
  • we don't support 4 bytes character in every column (news, music app already have bug reports for this)

Current situation

  • we need to document this somewhere very visible - I can imagine that the usage of 4 byte characters (especially emojis) will increase (in filenames, titles, ...) and then having an instance where this is simply not possible by design is really bad
Member

MorrisJobke commented Jul 30, 2015

We discussed this a lot and that was the result:

Pro

  • we can handle properly 4 byte characters in the DB
  • we can drop the limitation to not accept 4 byte characters in file names
  • apps doesn't need to take care of 4 byte character on their own, because the core handles it
  • we have a plan to migrate at some point in time to a proper DB encoding/collation in MySQL by default
  • MySQL doesn't have any disadvantage compared to Sqlite/Postgres anymore (both support already 4 byte charavters)

Cons

  • with the config option this is nearly untested/unused code paths that tend to be really critical once they get activated
  • we don't want to have yet another experimental feature switch
  • even more complex testing (more resources are needed)
  • switching this on by default is not an option - some web hoster instances could be get to a "not supported anymore" state because they cannot tweak their MySQL this much
  • we don't support 4 bytes character in every column (news, music app already have bug reports for this)

Current situation

  • we need to document this somewhere very visible - I can imagine that the usage of 4 byte characters (especially emojis) will increase (in filenames, titles, ...) and then having an instance where this is simply not possible by design is really bad
@butonic

This comment has been minimized.

Show comment
Hide comment
@butonic

butonic Jul 30, 2015

Member

Hmmm, what happens when my instance supports 4byte chars (because I may be using postgres) and I share a folder to someone who is running an oc instance that does not have this activated?

Member

butonic commented Jul 30, 2015

Hmmm, what happens when my instance supports 4byte chars (because I may be using postgres) and I share a folder to someone who is running an oc instance that does not have this activated?

@BernhardPosselt

This comment has been minimized.

Show comment
Hide comment
@BernhardPosselt

BernhardPosselt Jul 30, 2015

Contributor

What does this mean? Will this fix handle the issue in news and music?

we don't support 4 bytes character in every column (news, music app already have bug reports for this)

Contributor

BernhardPosselt commented Jul 30, 2015

What does this mean? Will this fix handle the issue in news and music?

we don't support 4 bytes character in every column (news, music app already have bug reports for this)

@MorrisJobke

This comment has been minimized.

Show comment
Hide comment
@MorrisJobke

MorrisJobke Jul 30, 2015

Member

What does this mean? Will this fix handle the issue in news and music?

It would if the admin sets up mysql correctly (see sample.config.php) and enables mysql.utf8mb4 in the config.php.

Member

MorrisJobke commented Jul 30, 2015

What does this mean? Will this fix handle the issue in news and music?

It would if the admin sets up mysql correctly (see sample.config.php) and enables mysql.utf8mb4 in the config.php.

@DeepDiver1975

This comment has been minimized.

Show comment
Hide comment
@DeepDiver1975

DeepDiver1975 Nov 12, 2015

Member

rebased

@MorrisJobke I'm willing to accept this in case we can come up with an autotest.sh firing up a proper docker

Member

DeepDiver1975 commented Nov 12, 2015

rebased

@MorrisJobke I'm willing to accept this in case we can come up with an autotest.sh firing up a proper docker

@MorrisJobke MorrisJobke self-assigned this Nov 12, 2015

@DeepDiver1975 DeepDiver1975 referenced this pull request Nov 16, 2015

Closed

Meta-Issue for Unified WebDAV #20047

16 of 28 tasks complete
@karlitschek

This comment has been minimized.

Show comment
Hide comment
@karlitschek

karlitschek Nov 19, 2015

Member

Do we have a gut feeling how most mysql servers out there are configured by default?

Member

karlitschek commented Nov 19, 2015

Do we have a gut feeling how most mysql servers out there are configured by default?

@DeepDiver1975 DeepDiver1975 modified the milestones: 9.0-current, backlog Nov 19, 2015

@DeepDiver1975

This comment has been minimized.

Show comment
Hide comment
@DeepDiver1975

DeepDiver1975 Nov 19, 2015

Member

@MorrisJobke any time available on your side to take care about the docker image and autotest.sh ??

just let me know - THX

Member

DeepDiver1975 commented Nov 19, 2015

@MorrisJobke any time available on your side to take care about the docker image and autotest.sh ??

just let me know - THX

@PVince81 PVince81 added the comp:core label Nov 20, 2015

@ghost ghost referenced this pull request Dec 30, 2015

Closed

Multiple conflicts with emojis #21397

@tschloeter

This comment has been minimized.

Show comment
Hide comment
@tschloeter

tschloeter Jan 24, 2016

I followed these multibyte UTF-8 issues for a while now and I am still interested in the solution. As I see, some people have worked on that. Some time ago, someone suggested a new config parameter that told OC that the MySQL server is configured to use utf8mb4 which I would appreciate.

Anyway, at the moment I see so many open and closed issues on the topic that I don't understand what the status is.

Could someone summarize whether the issue is fixed in the most recent version or if it is planned to fix in near future? I am still stuck on my dirty-hack patched 8.0 installation and would like to update to 8.2.x.

If there is no fix available: Is it easily possible to migrate from MySQL to Postgres? In the manual I only found how to migrate from SQLite.

Thanks in advance.

Thomas

I followed these multibyte UTF-8 issues for a while now and I am still interested in the solution. As I see, some people have worked on that. Some time ago, someone suggested a new config parameter that told OC that the MySQL server is configured to use utf8mb4 which I would appreciate.

Anyway, at the moment I see so many open and closed issues on the topic that I don't understand what the status is.

Could someone summarize whether the issue is fixed in the most recent version or if it is planned to fix in near future? I am still stuck on my dirty-hack patched 8.0 installation and would like to update to 8.2.x.

If there is no fix available: Is it easily possible to migrate from MySQL to Postgres? In the manual I only found how to migrate from SQLite.

Thanks in advance.

Thomas

@ghost

This comment has been minimized.

Show comment
Hide comment
@ghost

ghost Jan 26, 2016

I would also be interested in a summary of this issue, I get the impression that this will be resolved in version 9?

Would moving to PostgreSQL be a good interim solution?

ghost commented Jan 26, 2016

I would also be interested in a summary of this issue, I get the impression that this will be resolved in version 9?

Would moving to PostgreSQL be a good interim solution?

Show outdated Hide outdated lib/private/db/connectionfactory.php
@@ -63,6 +64,12 @@ class ConnectionFactory {
),
);
public function __construct(SystemConfig $systemConfig) {

This comment has been minimized.

@DeepDiver1975

DeepDiver1975 Feb 9, 2016

Member

this will cause setup to fall apart ....

@DeepDiver1975

DeepDiver1975 Feb 9, 2016

Member

this will cause setup to fall apart ....

@DeepDiver1975

This comment has been minimized.

Show comment
Hide comment
@DeepDiver1975

DeepDiver1975 Feb 9, 2016

Member

I'm locally having issues with the activity app ....

An exception occurred while executing 'CREATE TABLE `oc_activity` (`activity_id` INT AUTO_INCREMENT NOT NULL, `timestamp` INT DEFAULT 0 NOT NULL, `priority` INT DEFAULT 0 NOT NULL, `type` VARCHAR(255) DEFAULT NULL, `user` VARCHAR(64) DEFAULT NULL, `affecteduser` VARCHAR(64) NOT NULL, `app` VARCHAR(255) NOT NULL, `subject` VARCHAR(255) NOT NULL, `subjectparams` VARCHAR(4000) NOT NULL, `message` VARCHAR(255) DEFAULT NULL, `messageparams` VARCHAR(4000) DEFAULT NULL, `file` VARCHAR(4000) DEFAULT NULL, `link` VARCHAR(4000) DEFAULT NULL, `object_type` VARCHAR(255) DEFAULT NULL, `object_id` INT DEFAULT 0 NOT NULL, INDEX activity_time (`timestamp`), INDEX activity_user_time (`affecteduser`, `timestamp`), INDEX activity_filter_by (`affecteduser`, `user`, `timestamp`), INDEX activity_filter_app (`affecteduser`, `app`, `timestamp`), PRIMARY KEY(`activity_id`)) DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_bin ENGINE = InnoDB ROW_FORMAT = compressed':

SQLSTATE[42000]: Syntax error or access violation: 1118 Row size too large. The maximum row size for the used table type, not counting BLOBs, is 65535. This includes storage overhead, check the manual. You have to change some columns to TEXT or BLOBs
Member

DeepDiver1975 commented Feb 9, 2016

I'm locally having issues with the activity app ....

An exception occurred while executing 'CREATE TABLE `oc_activity` (`activity_id` INT AUTO_INCREMENT NOT NULL, `timestamp` INT DEFAULT 0 NOT NULL, `priority` INT DEFAULT 0 NOT NULL, `type` VARCHAR(255) DEFAULT NULL, `user` VARCHAR(64) DEFAULT NULL, `affecteduser` VARCHAR(64) NOT NULL, `app` VARCHAR(255) NOT NULL, `subject` VARCHAR(255) NOT NULL, `subjectparams` VARCHAR(4000) NOT NULL, `message` VARCHAR(255) DEFAULT NULL, `messageparams` VARCHAR(4000) DEFAULT NULL, `file` VARCHAR(4000) DEFAULT NULL, `link` VARCHAR(4000) DEFAULT NULL, `object_type` VARCHAR(255) DEFAULT NULL, `object_id` INT DEFAULT 0 NOT NULL, INDEX activity_time (`timestamp`), INDEX activity_user_time (`affecteduser`, `timestamp`), INDEX activity_filter_by (`affecteduser`, `user`, `timestamp`), INDEX activity_filter_app (`affecteduser`, `app`, `timestamp`), PRIMARY KEY(`activity_id`)) DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_bin ENGINE = InnoDB ROW_FORMAT = compressed':

SQLSTATE[42000]: Syntax error or access violation: 1118 Row size too large. The maximum row size for the used table type, not counting BLOBs, is 65535. This includes storage overhead, check the manual. You have to change some columns to TEXT or BLOBs
@DeepDiver1975

This comment has been minimized.

Show comment
Hide comment
@DeepDiver1975

DeepDiver1975 Feb 10, 2016

Member

Okay - so we have yet another limitation to fight with this - http://dev.mysql.com/doc/refman/5.1/en/column-count-limit.html

This table sums up to 69628 bytes \o/

Member

DeepDiver1975 commented Feb 10, 2016

Okay - so we have yet another limitation to fight with this - http://dev.mysql.com/doc/refman/5.1/en/column-count-limit.html

This table sums up to 69628 bytes \o/

@MorrisJobke

This comment has been minimized.

Show comment
Hide comment
@MorrisJobke

MorrisJobke Feb 10, 2016

Member

This table sums up to 69628 bytes \o/

All of those constraints are only on mysql? They really want to be not used, right? :D

Member

MorrisJobke commented Feb 10, 2016

This table sums up to 69628 bytes \o/

All of those constraints are only on mysql? They really want to be not used, right? :D

@DeepDiver1975

This comment has been minimized.

Show comment
Hide comment
@DeepDiver1975

DeepDiver1975 Feb 10, 2016

Member

They really want to be not used, right? :D

no

Member

DeepDiver1975 commented Feb 10, 2016

They really want to be not used, right? :D

no

@DeepDiver1975 DeepDiver1975 modified the milestones: 9.1-next, 9.0-current Feb 15, 2016

@DeepDiver1975

This comment has been minimized.

Show comment
Hide comment
@DeepDiver1975

DeepDiver1975 Feb 3, 2017

Member

So we need to adjust the web installer as well .... ideas on how to implement this are welcome.

maybe a checkbox to enable mb4?

Member

DeepDiver1975 commented Feb 3, 2017

So we need to adjust the web installer as well .... ideas on how to implement this are welcome.

maybe a checkbox to enable mb4?

@PVince81

This comment has been minimized.

Show comment
Hide comment
@PVince81

PVince81 Feb 3, 2017

Member

The web installer is mostly used on shared hosters. Then the question is whether they do support mb4...

Does the web installer have a list of supported databases ? Maybe just add "mysqlmb4" in the list.

And now thinking of it, we'd need the same for the setup page.
That will likely clutter the UI even more :-/

Adding a checkbox is another idea but requires special UI handling for that specific database type.

Member

PVince81 commented Feb 3, 2017

The web installer is mostly used on shared hosters. Then the question is whether they do support mb4...

Does the web installer have a list of supported databases ? Maybe just add "mysqlmb4" in the list.

And now thinking of it, we'd need the same for the setup page.
That will likely clutter the UI even more :-/

Adding a checkbox is another idea but requires special UI handling for that specific database type.

@DeepDiver1975

This comment has been minimized.

Show comment
Hide comment
@DeepDiver1975

DeepDiver1975 Feb 3, 2017

Member

The web installer is mostly used on shared hosters. Then the question is whether they do support mb4...

Does the web installer have a list of supported databases ? Maybe just add "mysqlmb4" in the list.

And now thinking of it, we'd need the same for the setup page.
That will likely clutter the UI even more :-/

Adding a checkbox is another idea but requires special UI handling for that specific database type.

I'll add it to the command line first - lets see ....

Member

DeepDiver1975 commented Feb 3, 2017

The web installer is mostly used on shared hosters. Then the question is whether they do support mb4...

Does the web installer have a list of supported databases ? Maybe just add "mysqlmb4" in the list.

And now thinking of it, we'd need the same for the setup page.
That will likely clutter the UI even more :-/

Adding a checkbox is another idea but requires special UI handling for that specific database type.

I'll add it to the command line first - lets see ....

@PVince81

This comment has been minimized.

Show comment
Hide comment
@PVince81

PVince81 Feb 3, 2017

Member

In theory the Mysql setup routine (for db type "mysql") could try and detect if the database supports mb4. If it does, it automatically switches to "mysqlmb4" type.

Member

PVince81 commented Feb 3, 2017

In theory the Mysql setup routine (for db type "mysql") could try and detect if the database supports mb4. If it does, it automatically switches to "mysqlmb4" type.

@PVince81

This comment has been minimized.

Show comment
Hide comment
@PVince81

PVince81 Feb 3, 2017

Member

and with "setup" I mean the code that runs when running the setup page

Member

PVince81 commented Feb 3, 2017

and with "setup" I mean the code that runs when running the setup page

@DeepDiver1975

This comment has been minimized.

Show comment
Hide comment
@DeepDiver1975

DeepDiver1975 Feb 3, 2017

Member

There is one issue with introducing mysqlmb4 as database type:
Many code paths expect mysql - as soon as we add mysqlmb4 I'm expecting many errors

Member

DeepDiver1975 commented Feb 3, 2017

There is one issue with introducing mysqlmb4 as database type:
Many code paths expect mysql - as soon as we add mysqlmb4 I'm expecting many errors

@PVince81

This comment has been minimized.

Show comment
Hide comment
@PVince81

PVince81 Feb 3, 2017

Member

@DeepDiver1975 ok, then let's go back to the config.php switch then ? That switch could be set by default on install if the mysql install supports it.

Member

PVince81 commented Feb 3, 2017

@DeepDiver1975 ok, then let's go back to the config.php switch then ? That switch could be set by default on install if the mysql install supports it.

@DeepDiver1975

This comment has been minimized.

Show comment
Hide comment
@DeepDiver1975

DeepDiver1975 Feb 3, 2017

Member

That switch could be set by default on install if the mysql install supports it.

agreed

Member

DeepDiver1975 commented Feb 3, 2017

That switch could be set by default on install if the mysql install supports it.

agreed

Show outdated Hide outdated lib/private/Setup/MySQL.php
*/
private function supports4ByteCharset(Connection $connection) {
foreach (['innodb_file_format' => 'Barracuda', 'innodb_large_prefix' => 'ON', 'innodb_file_per_table' => 'ON'] as $var => $val) {
$result = $connection->executeQuery("SHOW VARIABLES LIKE '$var'");

This comment has been minimized.

@PVince81

PVince81 Feb 3, 2017

Member

use ILIKE because on my default setup it was barracuda and innodb_large_prefix was set to true instead of ON... crazy world

@PVince81

PVince81 Feb 3, 2017

Member

use ILIKE because on my default setup it was barracuda and innodb_large_prefix was set to true instead of ON... crazy world

This comment has been minimized.

@PVince81

PVince81 Feb 3, 2017

Member

Never mind. My local setup has the correct values as expected.

But this piece of doc https://github.com/owncloud/core/pull/17978/files#diff-2f769bc01ffefdf14eb04404218e7582R1083 needs to be adjusted accordingly then. That's where I saw the discrepancies.

@PVince81

PVince81 Feb 3, 2017

Member

Never mind. My local setup has the correct values as expected.

But this piece of doc https://github.com/owncloud/core/pull/17978/files#diff-2f769bc01ffefdf14eb04404218e7582R1083 needs to be adjusted accordingly then. That's where I saw the discrepancies.

This comment has been minimized.

@DeepDiver1975

DeepDiver1975 Feb 3, 2017

Member

FYI: ILIKE is not allowed with SHOW VARIABLES

@DeepDiver1975

DeepDiver1975 Feb 3, 2017

Member

FYI: ILIKE is not allowed with SHOW VARIABLES

@PVince81

This comment has been minimized.

Show comment
Hide comment
@PVince81

PVince81 Feb 3, 2017

Member

Something must be missing:

  1. Setup Mysql with the following:
#innodb_large_prefix=ON #yes, commented out
innodb_file_format=Barracuda
innodb_file_per_table=ON
  1. Setup OC with occ maintenance:install
  2. Change Mysql config and uncomment the large_prefix row
  3. Restart Mysql service.
  4. Run occ command:
± % occ db:convert-mysql-charset
Change collation for oc_addressbookchanges ...
Change collation for oc_addressbooks ...

                                                                                                                                
  [Doctrine\DBAL\Exception\DriverException]                                                                                     
  An exception occurred while executing 'ALTER TABLE `oc_addressbooks` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;':  
  SQLSTATE[HY000]: General error: 1709 Index column size too large. The maximum column size is 767 bytes.                       
                                                                                                                                

                                                                                                           
  [Doctrine\DBAL\Driver\PDOException]                                                                      
  SQLSTATE[HY000]: General error: 1709 Index column size too large. The maximum column size is 767 bytes.  
                                                                                                           

                                                                                                           
  [PDOException]                                                                                           
  SQLSTATE[HY000]: General error: 1709 Index column size too large. The maximum column size is 767 bytes.  

Maybe more work needed on some tables ?

Note that if I run "occ maintenance:install" directly with the large prefix enabled, the setup works fine.

Member

PVince81 commented Feb 3, 2017

Something must be missing:

  1. Setup Mysql with the following:
#innodb_large_prefix=ON #yes, commented out
innodb_file_format=Barracuda
innodb_file_per_table=ON
  1. Setup OC with occ maintenance:install
  2. Change Mysql config and uncomment the large_prefix row
  3. Restart Mysql service.
  4. Run occ command:
± % occ db:convert-mysql-charset
Change collation for oc_addressbookchanges ...
Change collation for oc_addressbooks ...

                                                                                                                                
  [Doctrine\DBAL\Exception\DriverException]                                                                                     
  An exception occurred while executing 'ALTER TABLE `oc_addressbooks` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;':  
  SQLSTATE[HY000]: General error: 1709 Index column size too large. The maximum column size is 767 bytes.                       
                                                                                                                                

                                                                                                           
  [Doctrine\DBAL\Driver\PDOException]                                                                      
  SQLSTATE[HY000]: General error: 1709 Index column size too large. The maximum column size is 767 bytes.  
                                                                                                           

                                                                                                           
  [PDOException]                                                                                           
  SQLSTATE[HY000]: General error: 1709 Index column size too large. The maximum column size is 767 bytes.  

Maybe more work needed on some tables ?

Note that if I run "occ maintenance:install" directly with the large prefix enabled, the setup works fine.

@PVince81

This comment has been minimized.

Show comment
Hide comment
@PVince81

PVince81 Feb 3, 2017

Member
  • TEST: once it works: setup mb4 DB from scratch, make a dump "mb4-direct.sqlp". Then test convert script, make a dump "mb4-converted". Then diff the dumps
Member

PVince81 commented Feb 3, 2017

  • TEST: once it works: setup mb4 DB from scratch, make a dump "mb4-direct.sqlp". Then test convert script, make a dump "mb4-converted". Then diff the dumps
@PVince81

This comment has been minimized.

Show comment
Hide comment
@PVince81

PVince81 Feb 3, 2017

Member

@DeepDiver1975 I diffed the schemas from "non-mb4" and "mb4 created at setup": there is an additional difference: ROW_FORMAT=compressed.

When trying the conversion manually with Mysql client, it works if I set the row format to compressed first:

MariaDB [owncloud]> ALTER TABLE `oc_addressbooks` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;
ERROR 1709 (HY000): Index column size too large. The maximum column size is 767 bytes.
MariaDB [owncloud]> ALTER TABLE `oc_addressbooks` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin ROW_FORMAT=compressed;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'ROW_FORMAT=compressed' at line 1
MariaDB [owncloud]> alter table `oc_addressbooks` ROW_FORMAT=COMPRESSED;
Query OK, 0 rows affected (0.06 sec)
Records: 0  Duplicates: 0  Warnings: 0

MariaDB [owncloud]> ALTER TABLE `oc_addressbooks` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;
Query OK, 1 row affected (0.08 sec)                
Records: 1  Duplicates: 0  Warnings: 0

  • TODO: add "ROW_FORMAT".
Member

PVince81 commented Feb 3, 2017

@DeepDiver1975 I diffed the schemas from "non-mb4" and "mb4 created at setup": there is an additional difference: ROW_FORMAT=compressed.

When trying the conversion manually with Mysql client, it works if I set the row format to compressed first:

MariaDB [owncloud]> ALTER TABLE `oc_addressbooks` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;
ERROR 1709 (HY000): Index column size too large. The maximum column size is 767 bytes.
MariaDB [owncloud]> ALTER TABLE `oc_addressbooks` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin ROW_FORMAT=compressed;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'ROW_FORMAT=compressed' at line 1
MariaDB [owncloud]> alter table `oc_addressbooks` ROW_FORMAT=COMPRESSED;
Query OK, 0 rows affected (0.06 sec)
Records: 0  Duplicates: 0  Warnings: 0

MariaDB [owncloud]> ALTER TABLE `oc_addressbooks` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;
Query OK, 1 row affected (0.08 sec)                
Records: 1  Duplicates: 0  Warnings: 0

  • TODO: add "ROW_FORMAT".
@PVince81

This comment has been minimized.

Show comment
Hide comment
@PVince81

PVince81 Feb 3, 2017

Member

I have added ROW_FORMAT and the command runs now:

Change collation for oc_addressbookchanges ...
Change collation for oc_addressbooks ...
Change collation for oc_appconfig ...
Change collation for oc_authtoken ...
Change collation for oc_calendarchanges ...
Change collation for oc_calendarobjects ...
Change collation for oc_calendars ...
Change collation for oc_calendarsubscriptions ...
Change collation for oc_cards ...
Change collation for oc_cards_properties ...
Change collation for oc_comments ...
Change collation for oc_comments_read_markers ...
Change collation for oc_credentials ...
Change collation for oc_dav_shares ...
Change collation for oc_external_applicable ...
Change collation for oc_external_config ...
Change collation for oc_external_mounts ...
Change collation for oc_external_options ...
Change collation for oc_file_locks ...
Change collation for oc_filecache ...
Change collation for oc_files_trash ...
Change collation for oc_group_admin ...
Change collation for oc_group_user ...
Change collation for oc_groups ...
Change collation for oc_jobs ...
Change collation for oc_migrations ...
Change collation for oc_mimetypes ...
Change collation for oc_mounts ...
Change collation for oc_preferences ...
Change collation for oc_privatedata ...
Change collation for oc_properties ...
Change collation for oc_schedulingobjects ...
Change collation for oc_share ...
Change collation for oc_share_external ...
Change collation for oc_storages ...
Change collation for oc_systemtag ...
Change collation for oc_systemtag_group ...
Change collation for oc_systemtag_object_mapping ...
Change collation for oc_trusted_servers ...
Change collation for oc_users ...
Change collation for oc_vcategory ...
Change collation for oc_vcategory_to_object ...

The result is that the schema is not 100% the same.

  • oc_federated_reshares still has ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;, seems the table was omitted from the list
  • oc_migrations still has collation ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

All other tables look fine

I suspect that the select * from information_schema is somehow omitting oc_federated_shares because that table has no string columns, only int. But for the sake of DB schema consistency, we should also convert that one.

Member

PVince81 commented Feb 3, 2017

I have added ROW_FORMAT and the command runs now:

Change collation for oc_addressbookchanges ...
Change collation for oc_addressbooks ...
Change collation for oc_appconfig ...
Change collation for oc_authtoken ...
Change collation for oc_calendarchanges ...
Change collation for oc_calendarobjects ...
Change collation for oc_calendars ...
Change collation for oc_calendarsubscriptions ...
Change collation for oc_cards ...
Change collation for oc_cards_properties ...
Change collation for oc_comments ...
Change collation for oc_comments_read_markers ...
Change collation for oc_credentials ...
Change collation for oc_dav_shares ...
Change collation for oc_external_applicable ...
Change collation for oc_external_config ...
Change collation for oc_external_mounts ...
Change collation for oc_external_options ...
Change collation for oc_file_locks ...
Change collation for oc_filecache ...
Change collation for oc_files_trash ...
Change collation for oc_group_admin ...
Change collation for oc_group_user ...
Change collation for oc_groups ...
Change collation for oc_jobs ...
Change collation for oc_migrations ...
Change collation for oc_mimetypes ...
Change collation for oc_mounts ...
Change collation for oc_preferences ...
Change collation for oc_privatedata ...
Change collation for oc_properties ...
Change collation for oc_schedulingobjects ...
Change collation for oc_share ...
Change collation for oc_share_external ...
Change collation for oc_storages ...
Change collation for oc_systemtag ...
Change collation for oc_systemtag_group ...
Change collation for oc_systemtag_object_mapping ...
Change collation for oc_trusted_servers ...
Change collation for oc_users ...
Change collation for oc_vcategory ...
Change collation for oc_vcategory_to_object ...

The result is that the schema is not 100% the same.

  • oc_federated_reshares still has ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;, seems the table was omitted from the list
  • oc_migrations still has collation ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

All other tables look fine

I suspect that the select * from information_schema is somehow omitting oc_federated_shares because that table has no string columns, only int. But for the sake of DB schema consistency, we should also convert that one.

@PVince81

This comment has been minimized.

Show comment
Hide comment
@PVince81

PVince81 Feb 3, 2017

Member

So the oc_federated_reshares column have no collation info since they are no strings:

MariaDB [owncloud]> select * from information_schema.columns where table_name='oc_federated_reshares' and table_schema='owncloud';
+---------------+--------------+-----------------------+-------------+------------------+----------------+-------------+-----------+--------------------------+------------------------+-------------------+---------------+--------------------+--------------------+----------------+-------------+------------+-------+---------------------------------+-------------------------------+
| TABLE_CATALOG | TABLE_SCHEMA | TABLE_NAME            | COLUMN_NAME | ORDINAL_POSITION | COLUMN_DEFAULT | IS_NULLABLE | DATA_TYPE | CHARACTER_MAXIMUM_LENGTH | CHARACTER_OCTET_LENGTH | NUMERIC_PRECISION | NUMERIC_SCALE | DATETIME_PRECISION | CHARACTER_SET_NAME | COLLATION_NAME | COLUMN_TYPE | COLUMN_KEY | EXTRA | PRIVILEGES                      | COLUMN_COMMENT                |
+---------------+--------------+-----------------------+-------------+------------------+----------------+-------------+-----------+--------------------------+------------------------+-------------------+---------------+--------------------+--------------------+----------------+-------------+------------+-------+---------------------------------+-------------------------------+
| def           | owncloud     | oc_federated_reshares | share_id    |                1 | NULL           | NO          | int       |                     NULL |                   NULL |                10 |             0 |               NULL | NULL               | NULL           | int(11)     | PRI        |       | select,insert,update,references |                               |
| def           | owncloud     | oc_federated_reshares | remote_id   |                2 | NULL           | NO          | int       |                     NULL |                   NULL |                10 |             0 |               NULL | NULL               | NULL           | int(11)     |            |       | select,insert,update,references | share ID at the remote server |
+---------------+--------------+-----------------------+-------------+------------------+----------------+-------------+-----------+--------------------------+------------------------+-------------------+---------------+--------------------+--------------------+----------------+-------------+------------+-------+---------------------------------+-------------------------------+

We should improve the query to also select from the information_schema.tables:

MariaDB [owncloud]> select * from information_schema.tables where table_name='oc_federated_reshares' and table_schema='owncloud';
+---------------+--------------+-----------------------+------------+--------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------------+
| TABLE_CATALOG | TABLE_SCHEMA | TABLE_NAME            | TABLE_TYPE | ENGINE | VERSION | ROW_FORMAT | TABLE_ROWS | AVG_ROW_LENGTH | DATA_LENGTH | MAX_DATA_LENGTH | INDEX_LENGTH | DATA_FREE | AUTO_INCREMENT | CREATE_TIME         | UPDATE_TIME | CHECK_TIME | TABLE_COLLATION | CHECKSUM | CREATE_OPTIONS | TABLE_COMMENT |
+---------------+--------------+-----------------------+------------+--------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------------+
| def           | owncloud     | oc_federated_reshares | BASE TABLE | InnoDB |      10 | Compact    |          0 |              0 |       16384 |               0 |            0 |         0 |           NULL | 2017-02-03 18:44:34 | NULL        | NULL       | utf8_bin        |     NULL |                |               |
+---------------+--------------+-----------------------+------------+--------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------------+

Member

PVince81 commented Feb 3, 2017

So the oc_federated_reshares column have no collation info since they are no strings:

MariaDB [owncloud]> select * from information_schema.columns where table_name='oc_federated_reshares' and table_schema='owncloud';
+---------------+--------------+-----------------------+-------------+------------------+----------------+-------------+-----------+--------------------------+------------------------+-------------------+---------------+--------------------+--------------------+----------------+-------------+------------+-------+---------------------------------+-------------------------------+
| TABLE_CATALOG | TABLE_SCHEMA | TABLE_NAME            | COLUMN_NAME | ORDINAL_POSITION | COLUMN_DEFAULT | IS_NULLABLE | DATA_TYPE | CHARACTER_MAXIMUM_LENGTH | CHARACTER_OCTET_LENGTH | NUMERIC_PRECISION | NUMERIC_SCALE | DATETIME_PRECISION | CHARACTER_SET_NAME | COLLATION_NAME | COLUMN_TYPE | COLUMN_KEY | EXTRA | PRIVILEGES                      | COLUMN_COMMENT                |
+---------------+--------------+-----------------------+-------------+------------------+----------------+-------------+-----------+--------------------------+------------------------+-------------------+---------------+--------------------+--------------------+----------------+-------------+------------+-------+---------------------------------+-------------------------------+
| def           | owncloud     | oc_federated_reshares | share_id    |                1 | NULL           | NO          | int       |                     NULL |                   NULL |                10 |             0 |               NULL | NULL               | NULL           | int(11)     | PRI        |       | select,insert,update,references |                               |
| def           | owncloud     | oc_federated_reshares | remote_id   |                2 | NULL           | NO          | int       |                     NULL |                   NULL |                10 |             0 |               NULL | NULL               | NULL           | int(11)     |            |       | select,insert,update,references | share ID at the remote server |
+---------------+--------------+-----------------------+-------------+------------------+----------------+-------------+-----------+--------------------------+------------------------+-------------------+---------------+--------------------+--------------------+----------------+-------------+------------+-------+---------------------------------+-------------------------------+

We should improve the query to also select from the information_schema.tables:

MariaDB [owncloud]> select * from information_schema.tables where table_name='oc_federated_reshares' and table_schema='owncloud';
+---------------+--------------+-----------------------+------------+--------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------------+
| TABLE_CATALOG | TABLE_SCHEMA | TABLE_NAME            | TABLE_TYPE | ENGINE | VERSION | ROW_FORMAT | TABLE_ROWS | AVG_ROW_LENGTH | DATA_LENGTH | MAX_DATA_LENGTH | INDEX_LENGTH | DATA_FREE | AUTO_INCREMENT | CREATE_TIME         | UPDATE_TIME | CHECK_TIME | TABLE_COLLATION | CHECKSUM | CREATE_OPTIONS | TABLE_COMMENT |
+---------------+--------------+-----------------------+------------+--------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------------+
| def           | owncloud     | oc_federated_reshares | BASE TABLE | InnoDB |      10 | Compact    |          0 |              0 |       16384 |               0 |            0 |         0 |           NULL | 2017-02-03 18:44:34 | NULL        | NULL       | utf8_bin        |     NULL |                |               |
+---------------+--------------+-----------------------+------------+--------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------------+

@PVince81

This comment has been minimized.

Show comment
Hide comment
@PVince81

PVince81 Feb 3, 2017

Member

Hmm, this doesn't do anything:

alter table oc_federated_reshares convert to character set utf8mb4 collate utf8mb4_bin;

The table keeps the wrong collation.
While it might be fine short term, if someone tries to add new string columns to it there is a slight risk that they will have the wrong collation.

Member

PVince81 commented Feb 3, 2017

Hmm, this doesn't do anything:

alter table oc_federated_reshares convert to character set utf8mb4 collate utf8mb4_bin;

The table keeps the wrong collation.
While it might be fine short term, if someone tries to add new string columns to it there is a slight risk that they will have the wrong collation.

@PVince81

This comment has been minimized.

Show comment
Hide comment
@PVince81

PVince81 Feb 3, 2017

Member

In general I'm fine merging this first, but we need to address the conversion issue directly afterwards.

Member

PVince81 commented Feb 3, 2017

In general I'm fine merging this first, but we need to address the conversion issue directly afterwards.

@DeepDiver1975

This comment has been minimized.

Show comment
Hide comment
@DeepDiver1975

DeepDiver1975 Feb 7, 2017

Member

The table keeps the wrong collation.
While it might be fine short term, if someone tries to add new string columns to it there is a slight risk that they will have the wrong collation.

my assumption is that as soon as a new column is added the collation will be adopted - but we better test this

Member

DeepDiver1975 commented Feb 7, 2017

The table keeps the wrong collation.
While it might be fine short term, if someone tries to add new string columns to it there is a slight risk that they will have the wrong collation.

my assumption is that as soon as a new column is added the collation will be adopted - but we better test this

MorrisJobke and others added some commits Jul 30, 2015

Adding tests for 4 byte unicode characters
* success on SQLite and Postgres
* failure on MySQL due to the limited charset that only supports up to 3 bytes
Add config option to update charset of mysql to utf8mb4
* fully optional
* requires additional options set in the database

@DeepDiver1975 DeepDiver1975 merged commit a7d1121 into master Feb 7, 2017

3 of 4 checks passed

continuous-integration/jenkins/pr-head This commit is being built
Details
Scrutinizer 21 updated code elements
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
licence/cla Contributor License Agreement is signed.
Details

@DeepDiver1975 DeepDiver1975 deleted the testing-characters-in-db branch Feb 7, 2017

@PVince81

This comment has been minimized.

Show comment
Hide comment
@PVince81

PVince81 Feb 7, 2017

Member

my assumption is that as soon as a new column is added the collation will be adopted - but we better test this

@DeepDiver1975 did you test it and confirmed that it works like this ?

Member

PVince81 commented Feb 7, 2017

my assumption is that as soon as a new column is added the collation will be adopted - but we better test this

@DeepDiver1975 did you test it and confirmed that it works like this ?

@DeepDiver1975

This comment has been minimized.

Show comment
Hide comment
@DeepDiver1975

DeepDiver1975 Feb 7, 2017

Member

did you test it and confirmed that it works like this ?

not yet

Member

DeepDiver1975 commented Feb 7, 2017

did you test it and confirmed that it works like this ?

not yet

@davitol

This comment has been minimized.

Show comment
Hide comment
@davitol

davitol Mar 7, 2017

Contributor

1. Set a emoji as comment
2. Refresh the screen and look up the comment

Actual Behaviour:
The comment is not shown

oC Version: 10.0 Master Branch
BBDD: mariadb

Note: I knew later that it was needed to add some parameters to my.cnf file in order to let emojis

Contributor

davitol commented Mar 7, 2017

1. Set a emoji as comment
2. Refresh the screen and look up the comment

Actual Behaviour:
The comment is not shown

oC Version: 10.0 Master Branch
BBDD: mariadb

Note: I knew later that it was needed to add some parameters to my.cnf file in order to let emojis

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment