-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[4.0] Change collations of com_finder tables from general to unicode on MySQL databases. #28425
[4.0] Change collations of com_finder tables from general to unicode on MySQL databases. #28425
Conversation
…er-unicode-collations
…er-unicode-collations
Adding beta blocker to this. It's actually pretty important and this needs to be in before beta if it's going. |
@wilsonge Question is if it will work like it is now, or if we will run in some timout on large amounts of data when converting character set of some tables when updating. But most of the tables are deleted and recreated anyway with the update sql script. @Hackwar What do you think? Could you have a look on the diff in changed files here and report back if you see any problems? So or so it should be tested with an update package which contains the changes from this PR on a J 3.10 with some data for smart search, indexed stuff, search terms and so on. |
@wilsonge @Hackwar Regarding the duplicate records after conversion, which were a problem in the old PR #16617 and issue #9361 , I think we are safe here. Those were a problem in the finder terms table, which is cleared anyway in the update sql script. The only question which remains is if the conversion of the remaining tables which are not recreated or cleared or new may lead to timeout errors. These tables are |
|
…er-unicode-collations
…er-unicode-collations
maybe we can circumvent the problem truncating the data tables and with a postinstall message inform to reindex |
That was originally the idea anyhow. But I'm not sure how the changes made by @Hackwar in finder more generally affect that |
@alikon @wilsonge Since the changes by @Hackwar most of the tables are already either truncated or dropped and created again or they are new, so for those no problem. Only those I've listed above remain to be checked (minus the |
For me this is fine. Those tables are all truncated as well or are being truncated in another place, so don't worry about those. |
@Hackwar Thanks for feedback. Am happy to read that. |
I have updated the patched full install package for the installation test and the update package behind the custom update URL for the update test, so people can test now without having to care for the other, meanwhile solved issues with updating. |
@richard67 so truncate these 3 please then this is ready for testing #__finder_taxonomy_map, #__finder_tokens, #__finder_tokens_aggregate |
@wilsonge As far as I understood Hannes they are truncated at other places, but to be on the safer side I'll add that to the PR here. Stay tuned. |
the tokens and tokens_aggregate tables are memory tables anyway. Those are just helpers, which are cleared upon every indexing run. No need to truncate those. |
If there's data in them potentially they can cause SQL errors on changing collation? |
@Hackwar But it would do no harm if we truncated them in the update sql before converting character set? Just to be on the safer side. |
sure, truncate them if you want. |
Truncate tables `#__finder_taxonomy_map`, `#__finder_tokens` and `#__finder_tokens_aggregate` like other tables, too, to avoid SQL errors or timeouts when converting character set on MySQL or doing other schema changes on PostgreSQL.
…er-unicode-collations
Test packages updated. |
Testing instructions updated to reflect the additional table truncation for PostgreSQL. Ready for test now. |
Can it be that smart search is broken on 3.10-dev with PostgreSQL (PDO)? @alikon Can you confirm that? |
added to my to-do list |
I have tested this item ✅ successfully on 746c75b This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/28425. |
Thanks! |
Thanks. |
@wilsonge While checking another PR, I noticed that in this PR here I have forgotten file |
It’s not really used but I think let’s keep it updated for now |
Will make PR tonight then |
Ah, and I meant the mysql file of course, not the postgresql ;-) |
Pull Request for Issue #9361 .
Redo of Pull Request (PR) #16617 but for 4.0 and in a different way.
Summary of Changes
This Pull Request (PR) changes collations of database tables for com_finder from general to unicode collations for MySQL databases.
We had left them with general collations when we did the utf8mb4 conversion because otherwise things broke when tables from 3rd party extensions where joined to com_finder tables.
Now with J4 com_finder database tables have been restructured anyway, so most of the tables are either dropped and created again in the update sql script changed with this PR, or they are at least truncated, so it is a good chance to change the table collations now without having to do what was before tried with PR #16617 for that purpose, which was a kind of 2nd stage of the utf8mb4 conversion procedure. Especially the
#__finder_terms
table required a special conversion in PR #16617 because of possible duplicate records after conversion, but now in J4 this is not a problem anymore, because the#__finder_terms
table is truncated anyway in the update sql, also before this PR here. That's why it now can be done just with the update sql script.For PostgreSQL databases, it adds truncation of tables
#__finder_taxonomy_map
,#__finder_tokens
and#__finder_tokens_aggregate
in the same way as it adds it for MySQL databases.Testing Instructions
All tests related to collations have to be done using a MySQL (or MariaDB) database.
For PostgreSQL nothing has to be done for new installation test. For the update test, just check as described if everything works as well as before.
There are 2 tests to be done:
Instructions for new installation test
utf8mb4_unicode_ci
except of table#__finder_terms_common
, which has collationutf8mb4_bin
.Instructions for update test
utf8mb4_unicode_ci
except of table#__finder_terms_common
, which has collationutf8mb4_bin
.Expected result
All database tables of the Joomla CMS core have collation
utf8mb4_unicode_ci
except of table#__finder_terms_common
, which has collationutf8mb4_bin
.Actual result
All database tables of the Joomla CMS core have collation
utf8mb4_unicode_ci
except of tables with names starting with#__finder
. Those have either collationutf8mb4_general_ci
orutf8mb4_bin
.Documentation Changes Required
None.