Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

User directory search fails for certain letters #2931

Closed
cuibonobo opened this issue Mar 1, 2018 · 6 comments
Closed

User directory search fails for certain letters #2931

cuibonobo opened this issue Mar 1, 2018 · 6 comments
Labels
z-p2 (Deprecated Label)

Comments

@cuibonobo
Copy link

Description

I have enabled the search_all_users option for the user directory search (and applied the #2831 patch to fix the internal server error). I've found that the user directory search works very well except when searching for the single letters a, s, and t. Searching for other single letters of the alphabet produces the correct results.

Steps to reproduce

POST to https://matrix.floydcounty.tv/_matrix/client/r0/user_directory/search with the proper authentication and the following JSON body:

{
	"search_term": "t"
}

I expect to receive a list of users whose display name or username starts with the letter t (there would be at least 10 results on my server). Instead, I get an empty result:

{
    "limited": false,
    "results": []
}

POSTing with some other letter, for example j, will return a list of users as expected.

The homeserver log doesn't show anything unusual for this particular request.

Version information

  • Version: Synapse/0.26.0
  • Install method: pip
  • Platform: Ubuntu 16.04.3 LTS running on a Xen VM
@turt2live
Copy link
Member

This sounds eerily similar to element-hq/element-web#4950

@neilisfragile neilisfragile added the z-p2 (Deprecated Label) label Mar 2, 2018
@cuibonobo
Copy link
Author

cuibonobo commented Mar 9, 2018

@turt2live: I think you're onto something!

Here are my results when I search for wil:

screenshot 2018-03-09 15 57 46

...and here's what happens when I search for will:

screenshot 2018-03-09 15 58 05

Seems like the database is filtering search terms for 'common' words.

EDIT: I should also mention that I'm using PostgreSQL as my DB. My guess is that Postgres's full-text search feature is being 'helpful' and removing common words from the search. Is there a way to still take advantage of the weighted results of the full-text search without common words?

@cuibonobo
Copy link
Author

cuibonobo commented Mar 9, 2018

The relevant portion seems to be here:

sql = """
SELECT d.user_id, display_name, avatar_url
FROM user_directory_search
INNER JOIN user_directory AS d USING (user_id)
%s
WHERE
%s
AND vector @@ to_tsquery('english', ?)
ORDER BY
(CASE WHEN s.user_id IS NOT NULL THEN 4.0 ELSE 1.0 END)
* (CASE WHEN display_name IS NOT NULL THEN 1.2 ELSE 1.0 END)
* (CASE WHEN avatar_url IS NOT NULL THEN 1.2 ELSE 1.0 END)
* (
3 * ts_rank_cd(
'{0.1, 0.1, 0.9, 1.0}',
vector,
to_tsquery('english', ?),
8
)
+ ts_rank_cd(
'{0.1, 0.1, 0.9, 1.0}',
vector,
to_tsquery('english', ?),
8
)
)
DESC,
display_name IS NULL,
avatar_url IS NULL
LIMIT ?
""" % (join_clause, where_clause)
args = (user_id, full_query, exact_query, prefix_query, limit + 1,)

Indeed, if I check the English stopwords list (which lives at /usr/share/postgresql/9.5/tsearch_data/english.stop on Ubuntu 16.04.3 LTS), the letters a, s, and t are listed, as well as the word will.

I did a bit of research and it's possible to do full-text search queries without using stopwords, but it involves creating a new dictionary, creating a configuration that uses that dictionary, and possibly creating an index for the new configuration. Once all of that is done, the first parameter of to_tsquery in the above-referenced lines would change from the default english configuration to the name of the configuration with the stopwords removed.

Given all the work that fixing this would involve, I wonder if full-text search is the right solution for user directory searches.

@awesome-manuel
Copy link
Contributor

I agree, the full-text search features, like derived words etc. are not really usefull when we search for user names. Should be changed to simple pattern matching in the relevant columns.
@ara4n

@chagai95
Copy link
Contributor

This is pretty interesting, is this going to be fixed sometime soon? Is it a lot of work? Seems like a different method should just be used instead of to_tsquery but I'm not sure that's an easy task...

@babolivier
Copy link
Contributor

babolivier commented Dec 17, 2020

This bug has been fixed in #8959

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
z-p2 (Deprecated Label)
Projects
None yet
Development

No branches or pull requests

6 participants