Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[4.0] Change collations of com_finder tables from general to unicode on MySQL databases. #28425

Merged

Conversation

richard67
Copy link
Member

@richard67 richard67 commented Mar 21, 2020

Pull Request for Issue #9361 .

Redo of Pull Request (PR) #16617 but for 4.0 and in a different way.

Summary of Changes

This Pull Request (PR) changes collations of database tables for com_finder from general to unicode collations for MySQL databases.

We had left them with general collations when we did the utf8mb4 conversion because otherwise things broke when tables from 3rd party extensions where joined to com_finder tables.

Now with J4 com_finder database tables have been restructured anyway, so most of the tables are either dropped and created again in the update sql script changed with this PR, or they are at least truncated, so it is a good chance to change the table collations now without having to do what was before tried with PR #16617 for that purpose, which was a kind of 2nd stage of the utf8mb4 conversion procedure. Especially the #__finder_terms table required a special conversion in PR #16617 because of possible duplicate records after conversion, but now in J4 this is not a problem anymore, because the #__finder_terms table is truncated anyway in the update sql, also before this PR here. That's why it now can be done just with the update sql script.

For PostgreSQL databases, it adds truncation of tables #__finder_taxonomy_map, #__finder_tokens and #__finder_tokens_aggregate in the same way as it adds it for MySQL databases.

Testing Instructions

All tests related to collations have to be done using a MySQL (or MariaDB) database.

For PostgreSQL nothing has to be done for new installation test. For the update test, just check as described if everything works as well as before.

There are 2 tests to be done:

  1. New installation
  2. Update

Instructions for new installation test

  1. On a current 4.0dev branch, apply the patch of this PR and make a new installation. If you are not woorking on a git clone you can use following installation package, which is equal to last nightly build plus the patch of this PR applied: https://test5.richard-fath.de/Joomla_4.0.0-beta1-dev-Development-Full_Package_2020-03-23_pr-28425.zip.
  2. When the update has finished, log in to backend and check that everything works.
  3. Check in your database using e.g. PhpMyAdmin that all database tables have collation utf8mb4_unicode_ci except of table #__finder_terms_common, which has collation utf8mb4_bin.

Instructions for update test

  1. Install a fresh copy of 3.10 nightly or a git clone of the current 3.10-dev branch and install some sample data, or have a 3.9 installation with much content and update that one to 3.10.
  2. Enable the Smart Search plugin, if not enabled yet, enable statistics in Smart Search options and use smart search in frontend. Make sure you have indexed content and statistics.
  3. Go to the Joomla Update Component component options, set the update channel to "Custom ULR", set the URL to https://test5.richard-fath.de/next_major_list_pr-28425.xml and the status to "Development".
  4. Update to Joomla 4.0.
  5. When the update has finished, log in to backend and check that everything works. There is an error message shown about not existing template style, this is known and happens also without this PR. Beside this there should not be any error, and no errors or warnings or notices should be shown in PHP error log.
  6. Check in your database using e.g. PhpMyAdmin that all database tables have collation utf8mb4_unicode_ci except of table #__finder_terms_common, which has collation utf8mb4_bin.

Expected result

All database tables of the Joomla CMS core have collation utf8mb4_unicode_ci except of table #__finder_terms_common, which has collation utf8mb4_bin.

Actual result

All database tables of the Joomla CMS core have collation utf8mb4_unicode_ci except of tables with names starting with #__finder. Those have either collation utf8mb4_general_ci or utf8mb4_bin.

Documentation Changes Required

None.

@wilsonge
Copy link
Contributor

Adding beta blocker to this. It's actually pretty important and this needs to be in before beta if it's going.

@richard67
Copy link
Member Author

richard67 commented Mar 21, 2020

@wilsonge Question is if it will work like it is now, or if we will run in some timout on large amounts of data when converting character set of some tables when updating. But most of the tables are deleted and recreated anyway with the update sql script.

@Hackwar What do you think? Could you have a look on the diff in changed files here and report back if you see any problems?

So or so it should be tested with an update package which contains the changes from this PR on a J 3.10 with some data for smart search, indexed stuff, search terms and so on. I'll provide such an update package as soon as I am ready for test here and will undraft. Update package done, PR undraftet.

@richard67 richard67 changed the title [4.0] [WiP] Change collations of com_finder tables from general to unicode on MySQL databases. [4.0] Change collations of com_finder tables from general to unicode on MySQL databases. Mar 21, 2020
@richard67 richard67 marked this pull request as ready for review March 21, 2020 20:06
@richard67
Copy link
Member Author

@wilsonge @Hackwar Regarding the duplicate records after conversion, which were a problem in the old PR #16617 and issue #9361 , I think we are safe here. Those were a problem in the finder terms table, which is cleared anyway in the update sql script.

The only question which remains is if the conversion of the remaining tables which are not recreated or cleared or new may lead to timeout errors. These tables are #__finder_taxonomy_map, #__finder_tokens, #__finder_tokens_aggregate and #__finder_types.

@wilsonge
Copy link
Contributor

#__finder_types should be safe. the rest potentially are problematic as they are finder data. @Hackwar or @chrisdavenport would know more though

@alikon
Copy link
Contributor

alikon commented Mar 23, 2020

maybe we can circumvent the problem truncating the data tables and with a postinstall message inform to reindex

@wilsonge
Copy link
Contributor

maybe we can circumvent the problem truncating the data tables and with a postinstall message inform to reindex

That was originally the idea anyhow. But I'm not sure how the changes made by @Hackwar in finder more generally affect that

@richard67
Copy link
Member Author

@alikon @wilsonge Since the changes by @Hackwar most of the tables are already either truncated or dropped and created again or they are new, so for those no problem. Only those I've listed above remain to be checked (minus the #__finder_types which seems to be no problem): #__finder_taxonomy_map, #__finder_tokens, #__finder_tokens_aggregate. @Hackwar @chrisdavenport Couldn't we just truncate those, too, if all the rest is already truncated?

@Hackwar
Copy link
Member

Hackwar commented Mar 23, 2020

For me this is fine. Those tables are all truncated as well or are being truncated in another place, so don't worry about those.

@richard67
Copy link
Member Author

@Hackwar Thanks for feedback. Am happy to read that.

@richard67
Copy link
Member Author

I have updated the patched full install package for the installation test and the update package behind the custom update URL for the update test, so people can test now without having to care for the other, meanwhile solved issues with updating.

@wilsonge
Copy link
Contributor

@richard67 so truncate these 3 please then this is ready for testing #__finder_taxonomy_map, #__finder_tokens, #__finder_tokens_aggregate

@richard67
Copy link
Member Author

@wilsonge As far as I understood Hannes they are truncated at other places, but to be on the safer side I'll add that to the PR here. Stay tuned.

@Hackwar
Copy link
Member

Hackwar commented Mar 23, 2020

the tokens and tokens_aggregate tables are memory tables anyway. Those are just helpers, which are cleared upon every indexing run. No need to truncate those.

@wilsonge
Copy link
Contributor

If there's data in them potentially they can cause SQL errors on changing collation?

@richard67
Copy link
Member Author

@Hackwar But it would do no harm if we truncated them in the update sql before converting character set? Just to be on the safer side.

@Hackwar
Copy link
Member

Hackwar commented Mar 23, 2020

sure, truncate them if you want.

Truncate tables `#__finder_taxonomy_map`, `#__finder_tokens` and `#__finder_tokens_aggregate` like other tables, too, to avoid SQL errors or timeouts when converting character set on MySQL or doing other schema changes on PostgreSQL.
@richard67
Copy link
Member Author

Test packages updated.

@richard67
Copy link
Member Author

Testing instructions updated to reflect the additional table truncation for PostgreSQL.

Ready for test now.

@richard67
Copy link
Member Author

Can it be that smart search is broken on 3.10-dev with PostgreSQL (PDO)?

@alikon Can you confirm that?

@alikon
Copy link
Contributor

alikon commented Mar 23, 2020

added to my to-do list

@wilsonge
Copy link
Contributor

I have tested this item ✅ successfully on 746c75b

Works for me. Data all seems to be truncated and a reindex works successfully


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/28425.

@wilsonge wilsonge merged commit 00aad39 into joomla:4.0-dev Mar 23, 2020
@wilsonge
Copy link
Contributor

Thanks!

@richard67 richard67 deleted the 4.0-dev-com-finder-unicode-collations branch March 23, 2020 21:58
@richard67
Copy link
Member Author

Thanks.

@zero-24 zero-24 added this to the Joomla 4.0 milestone Mar 23, 2020
@richard67
Copy link
Member Author

richard67 commented Mar 25, 2020

@wilsonge While checking another PR, I noticed that in this PR here I have forgotten file administrator/components/com_finder/sql/install.mysql.sql. Is that file really still needed? If so, I have to make a PR for that with he changes from here.

@wilsonge
Copy link
Contributor

It’s not really used but I think let’s keep it updated for now

@richard67
Copy link
Member Author

Will make PR tonight then

@richard67
Copy link
Member Author

Ah, and I meant the mysql file of course, not the postgresql ;-)

@richard67
Copy link
Member Author

@wilsonge PR for the install.XXX.sql files is #28455 . Can be tested or even merged by review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants