Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fulltextsearch Fails with Usernames and Group Names Containing Special Characters #768

Open
wickeddoc87 opened this issue Aug 28, 2023 · 2 comments

Comments

@wickeddoc87
Copy link

wickeddoc87 commented Aug 28, 2023

Issue Title: Fulltextsearch Fails with Usernames and Group Names Containing Special Characters

Description:

When using the Fulltextsearch feature in Nextcloud, I've consistently observed that the search doesn't operate correctly for usernames and group names containing special characters, spaces, or specific combinations of letters and numbers. Specifically:

Usernames and groups like "test", "test3", "testgroup" and , "test3group" operate as expected.
Usernames and group names such as "test 3", "Test_3", "Test@3", "test group" do not function properly.

This issue has been consistently reproducible over the past 2 weeks.

Steps to Reproduce:

Set up a new Nextcloud instance.
Install and configure the Fulltextsearch plugin.
Create users and groups with varying name patterns:
    test, test3, testgroup, test3group: functions as expected
    test 3,  Test_3, Test@3, "test group": search doesn't work

Execute a full-text search for each user and group.

Expected Result:

The Fulltextsearch should function consistently, irrespective of the username or group name's structure or content.

Actual Result:

Fulltextsearch operates only for usernames and group names that are simple strings made up of letters and numbers. It fails when the name includes special characters or spaces etc.

Environment:

Nextcloud version: 27.0.2.1
elasticsearch version: 8.8.1
Full text search version 27.0.1
Full text search - Elasticsearch Platform version: 27.0.2
Full text search - Files version: 27.0.1
Full text search - Files - Tesseract OCR version: 27.0.0
Database: Postgres 15

Additional Notes:

The user experience of Fulltextsearch could see significant improvement if this issue is addressed. Many users and groups tend to have names with spaces and special characters. As a result, there's a prevalent misconception that Fulltextsearch operates exclusively for admin users or admin groups, which is not the case.

@wickeddoc87 wickeddoc87 changed the title Fulltextsearch Fails with Usernames Containing Special Characters Fulltextsearch Fails with Usernames and Group Names Containing Special Characters Aug 28, 2023
@vbier
Copy link

vbier commented Sep 12, 2023

I have tried to understand the source code, and as far as I could, the function generateSearchQueryAccess in https://github.com/nextcloud/fulltextsearch_elasticsearch/blob/master/lib/Service/SearchMappingService.php#L307 restricts the found documents by permissions. It either has to be owned by the current user, be shared with him, be public or shared with a group or circle he is member of.

When I look at my indexed document, I can see that the owner matches the current user id, but the document is not found regardless. If I change the query to be a match query instead of a term, all documents seem to be found as expected. Reading up on ElasticSearch documenation, I can not see why this should be the case for a keyword field. So this does not seem to be the proper fix.

But I do not have the knowledge to investigate this further. Maybe somebody else can pick up here. The problem IMHO does not exist in the fulltextsearch code, but rather in the fulltextsearch_elasticsearch code.

@vbier
Copy link

vbier commented Sep 13, 2023

After checking the index mapping I realized that the users and owner fields are of type text, which completely explains why a term query can not find the userids with blanks are special characters:

        "users" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
        "owner" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }

The fields either need to be of type keyword, or the code in SearchManager.php needs to be changed to work on the subfields owner.keyword and users.keyword. Then the term query works as expected.

I have written an issue in the correct github project: nextcloud/fulltextsearch_elasticsearch#300

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants