Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix] searxng_extra/update/update_engine_descriptions.py #2316

Merged
merged 4 commits into from
Apr 15, 2023

Conversation

return42
Copy link
Member

@return42 return42 commented Apr 5, 2023

Follow up of #2269

The script to update the descriptions of the engines does no longer work since #2269 has been merged.

Related: #2314 (comment)


searx/engines/wikipedia.py

The update_engine_descriptions.py needs a list of all wikipedias. The
implementation from #2269 had only a reduced list:

searxng_extra/update/update_engine_descriptions.py

Before PR #2269 there was a match_language() function that did an
approximation using various methods. With PR #2269 there are only the types
in the data model of the languages, which can be recognized by babel. The
approximation methods, which are needed (only here) in the determination of
the descriptions, must be replaced by other methods.


DRAFT: the modified script in this PR does it work but the yield of descriptions is still significantly lower than it used to be at the level of commit 64fea2f (before #2269 was merged). It still needs some rework

@dalf
Copy link
Member

dalf commented Apr 8, 2023

The wikipedia engine seems to have different issues.

I've pushed a commit where update_engine_descriptions.py doesn't use the wikipedia engine. In a way, that makes sense since the Wikipedia article names are returned by a SPARQL query on wikidata. With a minor update the query can return the Wikipedia article URL. From that, the wikipedia engine can be bypass.

With this change, searx/data/engine_descriptions.json is updated without issues except the language zh_Hant. See #2330

@return42 return42 force-pushed the fix-2314-upd-desc branch 3 times, most recently from 45ae08b to 70d488d Compare April 12, 2023 17:49
@return42
Copy link
Member Author

FYI: my work is not yet finished .. I will come back to this PR the next days ...

return42 and others added 4 commits April 15, 2023 16:03
Follow up of searxng#2269

The script to update the descriptions of the engines does no longer work since
PR searxng#2269 has been merged.

searx/engines/wikipedia.py
==========================

1. There was a misusage of zh-classical.wikipedia.org:

   - `zh-classical` is dedicate to classical Chinese [1] which is not
     traditional Chinese [2].

   - zh.wikipedia.org has LanguageConverter enabled [3] and is going to
     dynamically show simplified or traditional Chinese according to the
     HTTP Accept-Language header.

2. The update_engine_descriptions.py needs a list of all wikipedias.  The
   implementation from searxng#2269 included only a reduced list:

   - https://meta.wikimedia.org/wiki/Wikipedia_article_depth
   - https://meta.wikimedia.org/wiki/List_of_Wikipedias

searxng_extra/update/update_engine_descriptions.py
==================================================

Before PR searxng#2269 there was a match_language() function that did an approximation
using various methods.  With PR searxng#2269 there are only the types in the data model
of the languages, which can be recognized by babel.  The approximation methods,
which are needed (only here) in the determination of the descriptions, must be
replaced by other methods.

[1] https://en.wikipedia.org/wiki/Classical_Chinese
[2] https://en.wikipedia.org/wiki/Traditional_Chinese_characters
[3] https://www.mediawiki.org/wiki/Writing_systems#LanguageConverter

Closes: searxng#2330
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Wikipedia description are fetched without the help the wikipedia engine:

* the SQPARL query return the wikipedia URL of the article
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
@return42 return42 marked this pull request as ready for review April 15, 2023 14:10
@return42 return42 merged commit 5c8d56e into searxng:master Apr 15, 2023
9 checks passed
@return42 return42 deleted the fix-2314-upd-desc branch April 15, 2023 14:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants