Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

& symbol in search box still not redirecting to article #763

Closed
kelson42 opened this issue Feb 23, 2023 · 13 comments · Fixed by #765
Closed

& symbol in search box still not redirecting to article #763

kelson42 opened this issue Feb 23, 2023 · 13 comments · Fixed by #765

Comments

@kelson42
Copy link
Contributor

From kiwix-tools created by nijazm: kiwix/kiwix-tools#588

Okay, looks like you were fixing something but unsuccessfully. I just tested yesterday's nightly version of kiwix desktop and kiwix tools on Windows 11. Now just shows fulltext search autocomplete result for & symbol and when I click on it, it says No results were found for "&". In search box it shows containing '&'. The same happens in kiwix serve (web browsers) and kiwix desktop app. Tested with english wikipedia 2021-12. The only difference is that now titles containing & redirect properly (previously they did not), e.g. Me, Myself & Irene

@kelson42
Copy link
Contributor Author

Do you know exactly which part of the code removes this?

@kelson42 No, I don't.

@kelson42
Copy link
Contributor Author

As hypothesized in kiwix/kiwix-tools#587 (comment), the problem is that the ampersand symbol is treated as punctuation and is simply discarded during the creation of the title index as well as when running suggestion search on it.

Ideally, while building the title index we should handle article names consisting of a single symbol or word in a special way, letting those terms go into the title index as is despite any rules that drop punctuation and stopwords. Also we will have to enhance the suggestion search so that it accounts for such an addition to the title index.

@kelson42
Copy link
Contributor Author

I would say that we try to clean the query (or the title to index). And if the clean query(/title) is empty then we use the original string instead of the cleaned one.
We don't care about what the original string is composed of.

@kelson42
Copy link
Contributor Author

This ticket is a follow-up of #587 after one bug was fixed by kiwix/libkiwix#859 exposing another unrelated problem.

The essence of the problem is as follows.

English wikipedia contains an article with title & (that redirects to Ampersand).

A user exploring the wikipedia_en_all ZIM file via kiwix-serve expects that entering the & symbol in the ZIM viewer searchbox will suggest them a link leading to that article. Instead they are presented only with a suggestion to perform a full-text search for the text &, which still doesn't produce any results.

@kelson42
Copy link
Contributor Author

@mgautierfr Should we move this ticket to openzim/libzim?

@kelson42
Copy link
Contributor Author

@veloman-yunkan Thank you for the explanation and analysis. Do you know exactly which part of the code removes this? Is that related the stop words? Your proposal seems worth to be considered IMO. I believe this special handling here might be pretty independant of any special character but impacting any really short titles.

@kelson42
Copy link
Contributor Author

I don't understand this bug report. Can someone rephrase it please, https://github.com/kiwix/overview/blob/master/REPORT_BUG.md

@kelson42
Copy link
Contributor Author

yes

@kelson42
Copy link
Contributor Author

@mgautierfr If there is only stop word(s) OR punctions in a title we should keep them IMO. Does that make sense?

@kelson42 kelson42 added this to the 8.2.0 milestone Feb 23, 2023
@kelson42
Copy link
Contributor Author

kelson42 commented Mar 8, 2023

@veloman-yunkan Considering we have currently a bit divergent priorities, would that be something you could fix?

@mgautierfr
Copy link
Collaborator

Here a comment in kiwix/kiwix-tools#588 (comment) lost during transfer of the issue in libzim repository :

I would say that we try to clean the query (or the title to index). And if the clean query(/title) is empty then we use the original string instead of the cleaned one.
We don't care about what the original string is composed of.

@veloman-yunkan
Copy link
Collaborator

@mgautierfr The comment was not lost - it's here. But the authorship of the idea was definitely impacted.

@veloman-yunkan
Copy link
Collaborator

veloman-yunkan commented Mar 9, 2023

It turns out that the current version of libzim doesn't use stopwords when building the title index, and doesn't use them in suggestion search. Thus the issue is restricted to titles consisting entirely of punctuation.

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants