Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unreliable search results in in 7.12 with basic and lucene search #9399

Open
SupersonicWaffle opened this issue Dec 21, 2021 · 13 comments
Open
Labels
Area: Module Issues & PRs related to modules that do not have specific label Type: Bug Bugs within the core SuiteCRM codebase

Comments

@SupersonicWaffle
Copy link

Search results in 7.12 seem to be unreliable but I can't quite make out a pattern.
I was advised by @Mac-Rae to create an issue over at the community forums as he can confirm problems with search seem to exist in 7.12

Issue

Some examples of unreliable results include

  • Searching for part of a name will not match the contact record
  • Using an umlaut (German letter ä, ö or ü) will not return any results with basic search
  • Using an umlaut with lucene will return all sorts of records but not actual matches. Searching for Müller will return all sorts of Records that contain Müller but not contacts with last Name Müller.

Expected Behavior

Search results prior to 7.12 have been more reliable

Actual Behavior

Search results don't include actual matches

Steps to Reproduce

Use basic or advanced search

Context

Users are reporting that having an unreliable search is making work very difficult

Your Environment

  • SuiteCRM Version used: 7.12.1
  • Browser name and version (e.g. Chrome Version 51.0.2704.63 (64-bit)): Microsoft Edge Version 96.0.1054.53
  • Environment name and version: MySQL, PHP 7.4.3
  • Operating System and version (e.g Ubuntu 16.04): Ubuntu 20.04
@Mac-Rae Mac-Rae added Area: Module Issues & PRs related to modules that do not have specific label Type: Bug Bugs within the core SuiteCRM codebase labels Dec 21, 2021
@JanSiero
Copy link
Contributor

Seems duplicate with
#9191

@Mac-Rae
Copy link
Contributor

Mac-Rae commented Dec 22, 2021

Looks to be, will leave this open for now to ensure specific cases are tested.

@SuiteBot
Copy link

SuiteBot commented Jan 4, 2022

@johnM2401
Copy link
Contributor

Hey @SupersonicWaffle

I'm having trouble with replicating some of your issues.

I have been able to partly replicate the Search Issue:
"Using an umlaut (German letter ä, ö or ü) will not return any results with basic search"
With the special character ž
(but oddly enough, not ü)

I was only able to do this on data that existed Pre-upgrade, while using PHP 7.4.3


So, I have some questions around this, that might help clarify some steps to replicate this issue easier:

  • Do you have these issues searching on data that existed Pre-Upgrade?

  • Do you have these issues searching on data that existed Post-Upgrade?
    (ie, if you create a Record now that contains "Müller", do you see it in search results as expected?)

    Or, does this issue appear on data from both pre and post 7.11.18->7.12.x upgrade?


  • Do you know what DB Collation format you are using?
    (ie, the default of "utf8_general_ci" or something more specific to your region?)

  • Do you know if this issue appears when done through another PHP Version?

  • Do you see any search results if you search using Non-UTF8 versions of the characters?
    (ie, searching "Muller" rather than "Müller")

  • If you backspace on a UTF8 character, does it transform to the non-UTF8 version of the character?
    (ie, backspacking 'ü' will transform it into 'u')


Any of the above questions that you can answer would be very greatly appreciated, and should help us get closer to narrowing this down

(Also, any further pertinent information such as Screenshots/Steps to replicate would also help greatly)

Thanks!
John

@SupersonicWaffle
Copy link
Author

SupersonicWaffle commented Jan 11, 2022

Hi @johnM2401

thanks for your response and sorry for the late reply.

Do you have these issues searching on data that existed Pre-Upgrade?

I just checked and basic search returns no results at all while Lucene seems to return only post-upgrade data

Do you know what DB Collation format you are using?

DEFAULT_COLLATION_NAME shows as utf8_general_ci

Do you know if this issue appears when done through another PHP Version?

I don't know, the server is currently running php 7.4.3 on Ubuntu. If necessary, I can clone the VM and upgrade.

Do you see any search results if you search using Non-UTF8 versions of the characters?

Interestingly, I do with basic search. Not with Lucene.

If you backspace on a UTF8 character, does it transform to the non-UTF8 version of the character?

I'm not sure I understand your question.
Do you mean, If I type 'Mü' into the search bar and hit backspace, does it transform into 'Mu'?
If that's what you mean, then no it doesn't do that.

If you have any further questions, I'm glad to help.
I apologize in advance for possibly being unresponsive as I have a baby on the way that is due soon.

Best Regards

@pgorod
Copy link
Contributor

pgorod commented Jan 11, 2022

The first thing I would check, for this Issue, is if the search string is reaching the Lucene search code unchanged by our over-zealous string clean-ups.

@attrib
Copy link

attrib commented Jan 27, 2022

For the basic search I think I can give some pointers what happens, after couple of hours of debugging.

With 7.11.19 the AntiXSS library was added.

This does internally a UTF8 Decomposition (https://en.wikipedia.org/wiki/Unicode_equivalence).
So the search query is "transformed" from to Mu" (u" just to show the decompositioned formed of ü in this issue)

Now when doing a normal sql query with lastname like "%Mü%" and lastname = "%Mu"%" it gives me back different results!
Thats why the search behaves unreliable.

In our case we converted all our tables to utf8mb4 a while back. In utf8mb4 Mu" can be saved into the DB, while I believe (untested!) with utf8 its written as in the database. (can sombeody test this?)

When doing the query lastname like "%Mü%" and lastname = "%Mu"%" I get back different results, depending if the bean was saved before or after our utf8mb4 conversion and upgrade to 7.11.19. Because when saving a bean with >= 7.11.19 is transforms all input data with AntiXSS resulting in decompositioned utf8 and utf8mb4 can saves this in the DB.

My quickfix at the moment was editing the SearchForm::generateSearchWhere (SearchForm2.php around 1353)

//field is not last name or this is not from global unified search, so do normal where clause
$where .= $db_field . " like " . $this->seed->db->quoted(sql_like_string($field_value, $like_char));

$normalized_value = Normalizer::normalize($field_value, Normalizer::NFC);
if ($normalized_value !== $field_value) {
  $where .= ' OR ' . $db_field . " like " . $this->seed->db->quoted(sql_like_string($normalized_value, $like_char));
}

I'm aware that this only fixes searches where the like operator is used and not for other cases which are handled in generateSearchWhere, but at least it works for now

@pgorod
Copy link
Contributor

pgorod commented Jan 28, 2022

Thanks for this. Where exactly is the AntiXSS library intervening in this request?

Why are we doing an XSS clean-up of strings that are meant for the database, not for HTML output?

@attrib
Copy link

attrib commented Jan 28, 2022

For me the issue is fixed, with 7.12.3 as #9191 landed yesterday. So @SupersonicWaffle should test it as well, to verify he had the same issue as me.

@pgorod It happens while "autoload", see #9191 (comment)

@SupersonicWaffle
Copy link
Author

That's great news.
I will test this and get back to you

@mattlorimer
Copy link
Member

@SupersonicWaffle you should note there is known critical issue with the utf8 fix script, #9482 the we are currently addressing

@schewiola
Copy link

@SupersonicWaffle you should note there is known critical issue with the utf8 fix script, #9482 the we are currently addressing

Hey @mattlorimer It seems that issue is resolved? Can you confirm it should be safe to update now?

@mattlorimer
Copy link
Member

yes it should be safe to upgrade and use the utf8 fix now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: Module Issues & PRs related to modules that do not have specific label Type: Bug Bugs within the core SuiteCRM codebase
Projects
None yet
Development

No branches or pull requests

9 participants