Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add utf8mb4 unicode conversion for com_finder #16617

Closed
wants to merge 4 commits into from

Conversation

csthomas
Copy link
Contributor

@csthomas csthomas commented Jun 10, 2017

Pull Request for Issue #9361 .

Summary of Changes

Convert #__finder_xxx tables to utf8mb4_unicode_ci collation

Testing Instructions

See #9361

Upgrade joomla from version 3.4.8, 3.5.x or 3.7.2 still works.
Joomla should contain contents with words like:

  • ausschließlich
  • ausschliesslich

Details:

  1. Install joomla 3.7.2 or staging
  2. Enable plg_content_finder
  3. Create article with problematic words mentioned above.
  4. Install patch, for example by com_patchtester
  5. Go to Extensions->Manage->Database
    You should see message "The Joomla! Smart Search database tables (in #__finder...) have not been converted yet to UTF-8 Multibyte (utf8mb4)."
  6. Click on fix button.
  7. Database is OK.

Expected result

#__finder_xxx tables has unicode collation.
Upgrade OK

Actual result

Missing unicode collation in #__finder_xxx tables.

Documentation Changes Required

I do not know.

@ghost
Copy link

ghost commented Jun 11, 2017

Update from 3.5.1 > 3.7.2, no Database-Fix necessary, "ausschließlich", "ausschliesslich" and "hätte" is found as before Update.

But in phpMyAdmin Tabel looks same as before Update:
bildschirmfoto 2017-06-11 um 12 21 09

@richard67
Copy link
Member

We had excluded the change from general to unicode collation for the com_finder tables because in some languages it has caused "duplicate index" problem after the conversion. To fix that it would have required to force a complete rebuild of the index. We have had discussions with the com_finder guru, @chrisdavenport , and we concluded to leave com_finder tables with general collations in order not to provoke problems in certain languages (I think it was Greece) where things have been OK before.


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/16617.

@wilsonge
Copy link
Contributor

I know - the example with

ausschließlich
ausschliesslich

was the example we struggled with. The code in script.php here is designed to fix that problem we encountered.

@ghost
Copy link

ghost commented Jun 11, 2017

should i mark the Test successfully?

@richard67
Copy link
Member

To be honest I am not sure yet if that will work, and I am not sure yet if I will find to review those many changes.

@richard67
Copy link
Member

Well a first look on the code looks OK to me. But I have not checked yet how it will be shown in the "Extensions -> Manage -> Database" view. It could be confusing if people are told they have not been converted to utf8mb4 yet if they were, so maybe there should be a separate check with a separate message text telling that it is conversion to unicode collation of the com_finder tables which has to be done.


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/16617.

@richard67
Copy link
Member

And I still would like to have @chrisdavenport 's opinion on if that could cause problems e.g. for Greece.


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/16617.

@richard67
Copy link
Member

@franz-wohlkoenig From my point of view your test is unsuccessful because it seems that @csthomas 's scripts did not run.

@ghost
Copy link

ghost commented Jun 11, 2017

I have tested this item 🔴 unsuccessfully on a3feec1


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/16617.

@chrisdavenport
Copy link
Contributor

Looks like it should work in theory (haven't tested it). My only concern would be on sites with substantial numbers of entries in the terms table. Copying all that data from one table to another and back again and then building a new index could take a lot of time. What are the consequences if the request times out?

@joomla-cms-bot joomla-cms-bot added the Language Change This is for Translators label Jun 19, 2017
@csthomas
Copy link
Contributor Author

Based on this https://stackoverflow.com/questions/4685173/delete-all-duplicate-rows-except-for-one-in-mysql I think this is not a bad solution.

Maybe creating a copy of structure and then copy data only once would be better. I will check it later.

@csthomas
Copy link
Contributor Author

At above commits first I want to use mainly UPDATE and DELETE queries but this queries does not use indexed column term and require some joins which can slow down all conversion.

Then I back to simpler INSERT INTO ... SELECT ... ON DUPLICATE KEY UPDATE.
It takes more memory (another table for a moment) but should be faster.

Disadvantage:
This PR does not update #__finder_links_terms[0-9a-f] tables.

@csthomas
Copy link
Contributor Author

csthomas commented Nov 4, 2017

As nobody need it I close it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Language Change This is for Translators
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants