Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search: Confusing behaviors #6366

Closed
hurradieweltgehtunter opened this issue May 23, 2023 · 6 comments
Closed

Search: Confusing behaviors #6366

hurradieweltgehtunter opened this issue May 23, 2023 · 6 comments
Assignees
Labels

Comments

@hurradieweltgehtunter
Copy link

While playing around with the search (based on #6343), we came across some difficulties. Maybe it's not all bugs, but it seems confusing at some points.

Difference between test and Name:test

As far as I understood, no prefix will search for file and folder names.
Prefixing Name: will also do a search for file and folder names.
Where is the difference for the user?

Compare searching for test and Name:test

test finds “test.txt”, “test123.pdf and folder “test”
Name:test finds only folder “test”

While this is surely technically correct, it is rather confusing, why one appends an asterisk per default and the other one does an exact match.
I was expecting Name:test would show also “test.txt”, “test123.pdf” and folder “test”

Placeholders

Name:Deutsch finds folder “Deutsch”
grafik

Name:Deutsch* finds nothing
grafik

Might correspond to the upper/lowercase scenario below. Changing the search term to lowercase Name:deutsch shows correct results. Still confused about two different outcomes.

Upper-/Lowercase

Lowercase shows case-insensitive results. Works as expected:
grafik

VS. Uppercase - shows nothing?!
grafik

Highlighting of results

Search results are not always highlighted.
Highlighted:
grafik

not highlighted:
grafik

grafik grafik

Inconsistent usage of upper/lowercase in combination with placeholder

Imagine an existing tag “intern”, at least on file, is marked with that tag.

Tags:intern works
Tags:inter* works -> cool, I can use placeholders within tags
Tags:InTeRn works --> cool, search is case-insensitive
Tags:InTe* does not work - ?!
grafik

! simple mode search works as expected and does not make a difference between test* and TEst*

Unknown files appear

Note: Logged in as Alice

#6343 tells me:

Complex search: 'Tags:bar file' (all tagged bar OR that match 'file')
Result bar.txt

Result:
grafik

  • I have no json files and no files called somewhat export in my personal space.
  • Clicking on the search results filename downloads the file (path: ./export.json) Clicking on the folder name (Personal) below leads me to a broken "Shared with me" page:
grafik
  • They are surely not taged with tag "internal", compare searching for this tag:
grafik
  • Searching for deutsch also does not show them:
grafik
  • Changing the search term to capitalized Tags:intern Deutsch removes the json files but also loses correct folders in a space (Folder "Deutsch" in Space "Sommerfest 2024" is missing):
grafik
  • Logging in with another user (Bob) makes the confusion perfect. Doing the same search results in same results:
grafik

EXCEPT the json files are not located in personal anymore, but in a space called Sommerfest 2024. And there they really are:
grafik

Logging in back with Alice, the space does not appear in her space overview. Therefore she shouldn't have access to those files.
Logging back in with Bob. Go to the space and check members. I can see that Alice is member of this space via a group.

Summary:

  • Alice is member of group "Staff"
  • Group "Staff" is member of space "Sommerfest 2024", permission editor
  • ❗ Alice does not see space "Sommerfest 2024"
  • Space "Sommerfest 2024" contains some json files called "export", "realm-export", etc.
  • Files are not tagged with any tag
  • ❗ Alice can see files of this space when typing "Tags:internal deutsch" in the search bar
@fschade
Copy link
Contributor

fschade commented May 23, 2023

Difference between test and Name:test

As far as I understood, no prefix will search for file and folder names. Prefixing Name: will also do a search for file and folder names. Where is the difference for the user?

searching for foo and Name:foo is similar but not the same, let me try to explain the difference:

  • Search for foo gets translated into Name:*foo* in the backend and could resolve into: afoo,foob,afoob,foo
  • Search for Name:foo gets translated into Name:foo in the backend and could resolve into: afoo~,foob,afoob,foo

the explicit field search exists because we need to have different field searches, for example: Tags:foo

@fschade
Copy link
Contributor

fschade commented May 23, 2023

Compare searching for test and Name:test

test finds “test.txt”, “test123.pdf and folder “test” Name:test finds only folder “test”

While this is surely technically correct, it is rather confusing, why one appends an asterisk per default and the other one does an exact match. I was expecting Name:test would show also “test.txt”, “test123.pdf” and folder “test”

i agree, it is (or could be) confusing, but as explained above, 99% of all users will only use the search as they do in google, type and expect the result.

same for the google engine (even if it is slightly more powerful ;) ), they have the default search but you're able to search for dedicated fields (advanced search): related:XY, inurl:XY, ...

this is why we've decided to use a wildcard (prefix, suffix) search here. The downside is that those wildcard searches cost more time then a explicit search. long term i hope we adopt the ios query syntax: owncloud/ios-app#933

aduffeck added a commit to aduffeck/ocis that referenced this issue May 23, 2023
@aduffeck
Copy link
Contributor

aduffeck commented May 24, 2023

With #6371 we are changing the field queries in complex searches to also be case insensitive like simple searches, fixing much of the confusing behavior you saw:

Placeholders

Name:Deutsch and Name:Deutsch* both find the folder.

Upper-/Lowercase

Both Name:*deutsch* and Name:*Deutsch* show the results.

Inconsistent usage of upper/lowercase in combination with placeholder

Tags:InTe* works as well.


The issues that are still left are

Highlighting of results

I think we're currently not using the search engine's highlight capabilities but web does the highlighting. So this would either need to be fixed by web or we'd have to switch over to rely on the search engine to mark the highlighted parts in the returned results.

Unknown files appear

That one's not quite clear yet. I think it might actually be two problems:

The reason that Alice gets those json files (which are GDPR reports) as results for the Tags:intern deutsch query likely is that the full-text index was also added to the composite field which is being used for simple terms in complex queries (i.e. deutsch in this case). This has been changed with #6356 so those files shouldn't show up with new indexes anymore.

That also explains why you don't see those export files with neither the Tags:intern nor the Deutsch query.

The export files Bob sees are likely (hopefully!) different files. Maybe you could verify that by downloading and comparing the content of the files or by comparing their URLs.

What's not clear to me is why Alice's export results take her to the Shared with me page. That doesn't really make sense to, could you maybe paste the result of the according REPORT call? It might also be a little bug in web.

aduffeck added a commit to aduffeck/ocis that referenced this issue May 24, 2023
@hurradieweltgehtunter
Copy link
Author

Topic "Unkown files appear" seems to be partly resolved. The space, which did not appear, was required to renew the shares because of a reverse index problem on rc2/3.
What can't be explained by this is, why the files appeared as search results at all, since the file names nor tags did not correspond to the search term.

@aduffeck
Copy link
Contributor

I think that one makes sense to me. Your search term Tags:intern deutsch forced the search into complex mode and as a result the deutsch term was applied to the composite field in the index. That field also contained the full-text index until #6356, so - assuming the export files contain the word deutsch (or a variant the stemmer reduces to deutsch) - it makes sense that they showed up in the result list.

@micbar
Copy link
Contributor

micbar commented May 25, 2023

Ok. We consider it as fixed.

@micbar micbar closed this as completed May 25, 2023
fschade pushed a commit that referenced this issue Jul 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Done
Development

No branches or pull requests

4 participants