Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long titles in suggestion dropdown choices are no longer readable (regression) #513

Open
holta opened this issue Dec 17, 2021 · 13 comments
Open

Comments

@holta
Copy link

holta commented Dec 17, 2021

  1. Try a Title Search using kiwix-tools_linux-x86_64-2021-12-17.tar.gz on http://iiab.me/kiwix/wikipedia_en_all_maxi_2021-03/ using the word "apple" — and then click on the topmost of the 10 choices in the search dropdown.

    image

    It will sends all browsers to this 403 Forbidden page:

    http://iiab.me/kiwix/wikipedia_en_all_maxi_2021-03/A/.apple

    Does anybody know why a dot (period) gets added to the left of this word (apple), within the result URL above?

    Does anybody know why this affects the word "apple" in particular, but does not affect many other Title Searches? How common is this problem among other Title Searches?

  2. Is there any way to improve the ability to choose among the 10 dropdown choices in the screenshot above?

    When the same single English word is shown in all 10 choices above (with almost no context except for ellipsis etc!) it's suddenly now a lot harder for users to make an intelligent choice.

    Whereas in the past, the exact same Title Search (on the word "apple", when using kiwix-serve 3.1.2-5 from 2021-06-09 / 2021-06-10) offered 10 much more readable options — as seen in the search dropdown below:

    image

@holta holta changed the title (1) kiwix-serve Title Search on the word "apple" leads to 403 Forbidden (2) dropdown choices regression (long titles are no longer readable)) (1) kiwix-serve Title Search on the word "apple" leads to 403 Forbidden (2) dropdown choices regression (long titles are no longer readable) Dec 17, 2021
@kelson42 kelson42 self-assigned this Dec 17, 2021
@kelson42 kelson42 added this to the 3.2.0 milestone Dec 17, 2021
@holta
Copy link
Author

holta commented Dec 17, 2021

I've clarified the explanation above — thanks to anybody who might be able to explain / understand what's happening!

@maneeshpm
Copy link
Collaborator

Testing with latest git masters of libzim/libkiwix/kiwix-serve on wikipedia_en_all_mini_2021-01, similar search sends to the correct page http://localhost:8080/wikipedia_en_all_mini_2021-01/A/.apple without any error.
@kelson42 Are you able to recreate the issue with the above mentioned version?

@holta We try to follow a suggestion system that is very close to the actual Wikipedia search. If you search Apple on Wikipedia, you will find results with maximum 1 or 2 words in the top 10 suggestions, first word being apple. If a user is searching for apple and there exists a closest result apple in our index, that should be the best result rather than Apples to Apples like in the previous versions. I agree that our system is not "intelligent" because it does not take into account factors like page visits or popularity index which is done in more sophisticated search engines, but we intend to give the user sensible matches to what they actually search for.

@holta
Copy link
Author

holta commented Dec 18, 2021

@maneeshpm do you know why the search dropdown repeatedly shows "Apple..." leaving 80% of the horizontal real estate completely unused?

(No matter what 10 suggestions are offered — there really ought to be a way to visually distinguish between the offered choices — before clicking on any one of them!)

@maneeshpm
Copy link
Collaborator

That's a valid concern, for some reason the entire name is not being shown. I'll dig into the issue.

@kelson42
Copy link
Contributor

kelson42 commented Dec 18, 2021

  1. Try a Title Search using kiwix-tools_linux-x86_64-2021-12-17.tar.gz on http://iiab.me/kiwix/wikipedia_en_all_maxi_2021-03/ using the word "apple" — and then click on the topmost of the 10 choices in the search dropdown.
    It will sends all browsers to this 403 Forbidden page:
    http://iiab.me/kiwix/wikipedia_en_all_maxi_2021-03/A/.apple

I can not confirm this behaviour. Clicking on any of the suggestions leads to the right article. In the future please test a single kiwix-serve which is not in a special environnement or behind a reverse proxy.

Does anybody know why a dot (period) gets added to the left of this word (apple), within the result URL above?

@maneeshpm I confirm this behaviour and it does not seem normal to me.

Here is the json:

  {
    "value" : "Apple //",
    "label" : "<b>Apple</b>...",
    "kind" : "path"
      , "path" : "A/Apple_//"
  },
  {
    "value" : "Apple ///",
    "label" : "<b>Apple</b>...",
    "kind" : "path"
      , "path" : "A/Apple_///"
  },
  {
    "value" : "Apple®",
    "label" : "<b>Apple</b>...",
    "kind" : "path"
      , "path" : "A/Apple®"
  },

value json property seems correct, so I wonder why the label json property has points/ellipsis in place of characters \ or ®. Do you know more?

2. Is there any way to improve the ability to choose among the 10 dropdown choices in the screenshot above?
   When the same single English word is shown in all 10 choices above (with almost no context except for ellipsis etc!) it's suddenly now a lot harder for users to make an intelligent choice.

The suggestions are not pointing to the same article, you have this feeling just because the label with the ellipsis is the same and this HTTP error 403 in IIAB. AFAIK, beside this strange ellipsis behaviour, everything works fine.

   Whereas in the past, the exact same Title Search (on the word "apple", when using kiwix-serve 3.1.2-5 from 2021-06-09 / 2021-06-10) offered 10 much more readable options — as seen in the search dropdown below:

Like underlined by Maneesh, the current results are more pertinent than before. To me, we just need to clarify why we have ellipsis in place of the "real title".

@holta
Copy link
Author

holta commented Dec 18, 2021

The suggestions are not pointing to the same article, you have this feeling

No I do not have this feeling.

I'm not sure why people are claiming this (incorrectly).

@kelson42
Copy link
Contributor

kelson42 commented Dec 18, 2021

@maneeshpm After researching a bit, it seems:

  • If there is a bug, this is in libzim
  • It looks like an upstream Xapian bug in mp_mset->snippet(). I guess a bad treatmen of special characters... or maybe this is on purpose!

But I seem not ticket of that kind open upstream https://trac.xapian.org/search?q=snippet&noquickjump=1&ticket=on

Or maybe we should just wait to see if things are still wrong once we generate Wikipedia ZIM files with libzim7, considering that you have massivelly improved the ZIM creator?

@maneeshpm
Copy link
Collaborator

Or maybe we should just wait to see if things are still wrong once we generate Wikipedia ZIM files with libzim7, considering that you have massivelly improved the ZIM creator?

@kelson42 our snippets are completely generated using Xapian::MSet::snippet() with very little control for ourselves. I guess waiting and checking if this issue persists even with new zim files is the way forward.

PS. I would like to mention that in my limited testing with wikipedia_en_all_mini_2021-01, I was not able to find any case where useful info was omitted(replaced with ...). Only trailing parenthesis, or non word characters were being omitted. Yet to confirm this.

@holta
Copy link
Author

holta commented Dec 18, 2021

That's a valid concern, for some reason the entire name is not being shown. I'll dig into the issue.

Thanks @maneeshpm.

Another valid concern is why the most insignificant article (even Wikipedia has since removed the article on Apple Inc's .apple vanity domain name: https://en.wikipedia.org/w/index.php?title=.apple&redirect=no) is placed at the top of the search dropdown list.

When a child wanting to learn about real world apples...should probably be able to do that...without too many clicks (-:

(Of course the dropdown not showing the dot on the left-side of .apple further confuses this difficult user experience.)

In Any Case: while it's likely not possible to fix this in 2021 (e.g. kiwix-tools 3.2.0 is needed by many schools in coming weeks if possible!) this extremely odd ordering[*] has room for improvement in future years ;)

[*] Presumably it's alphabetically ordered among a long list, at the moment?

@kelson42
Copy link
Contributor

kelson42 commented Dec 19, 2021

@holta The content index does not really have a way to know what article is important or not. It can only see if there is a word fit beetween the article and the search pattern. For the moment we can not expect it to know that. But we have project to improve that, see for example openzim/libzim#653

@holta
Copy link
Author

holta commented Dec 19, 2021

@kelson42 thanks for explaining & thanks for opening openzim/libzim#653

The content index does not really have a way to know what article is important or not.

A Short-Term Suggestion for "2022" :

If the child searches for "apple", how about showing them the article they actually searched for?

https://en.wikipedia.org/wiki/apple

Or...the (identical after redirect) article:

https://en.wikipedia.org/wiki/Apple

Instead of accidentally/prominently advertising ~10 different Apple(TM) products to the young child!

RECAP: Consider using the search string itself — to help populate the search dropdown — when an article exists with that very same title?

@kelson42
Copy link
Contributor

@holta We are drifting from original bug report. I would wait newest WPEN zim files with libzim7 made and see then how things behave. If then there is still a problem the please open a new ticket.

@maneeshpm maneeshpm changed the title (1) kiwix-serve Title Search on the word "apple" leads to 403 Forbidden (2) dropdown choices regression (long titles are no longer readable) Long titles in suggestion dropdown choices are no longer readable(regression) Dec 19, 2021
@kelson42
Copy link
Contributor

Depends on openzim/mwoffliner#1606

@kelson42 kelson42 added this to the 3.4.0 milestone Jun 2, 2022
@kelson42 kelson42 modified the milestones: 3.3.1, 3.4.0 Sep 24, 2022
@kelson42 kelson42 modified the milestones: 3.5.0, 3.6.0 Feb 6, 2023
@kelson42 kelson42 modified the milestones: 3.6.0, 3.7.0 Sep 30, 2023
@kelson42 kelson42 changed the title Long titles in suggestion dropdown choices are no longer readable(regression) Long titles in suggestion dropdown choices are no longer readable (regression) Oct 8, 2023
@kelson42 kelson42 modified the milestones: 3.7.0, 3.8.0 Feb 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants