-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix] searxng_extra/update/update_engine_descriptions.py #2316
Conversation
9d20396
to
8b46a51
Compare
The wikipedia engine seems to have different issues. I've pushed a commit where update_engine_descriptions.py doesn't use the wikipedia engine. In a way, that makes sense since the Wikipedia article names are returned by a SPARQL query on wikidata. With a minor update the query can return the Wikipedia article URL. From that, the wikipedia engine can be bypass. With this change, |
45ae08b
to
70d488d
Compare
FYI: my work is not yet finished .. I will come back to this PR the next days ... |
Follow up of searxng#2269 The script to update the descriptions of the engines does no longer work since PR searxng#2269 has been merged. searx/engines/wikipedia.py ========================== 1. There was a misusage of zh-classical.wikipedia.org: - `zh-classical` is dedicate to classical Chinese [1] which is not traditional Chinese [2]. - zh.wikipedia.org has LanguageConverter enabled [3] and is going to dynamically show simplified or traditional Chinese according to the HTTP Accept-Language header. 2. The update_engine_descriptions.py needs a list of all wikipedias. The implementation from searxng#2269 included only a reduced list: - https://meta.wikimedia.org/wiki/Wikipedia_article_depth - https://meta.wikimedia.org/wiki/List_of_Wikipedias searxng_extra/update/update_engine_descriptions.py ================================================== Before PR searxng#2269 there was a match_language() function that did an approximation using various methods. With PR searxng#2269 there are only the types in the data model of the languages, which can be recognized by babel. The approximation methods, which are needed (only here) in the determination of the descriptions, must be replaced by other methods. [1] https://en.wikipedia.org/wiki/Classical_Chinese [2] https://en.wikipedia.org/wiki/Traditional_Chinese_characters [3] https://www.mediawiki.org/wiki/Writing_systems#LanguageConverter Closes: searxng#2330 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Wikipedia description are fetched without the help the wikipedia engine: * the SQPARL query return the wikipedia URL of the article
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
70d488d
to
09295a3
Compare
Follow up of #2269
The script to update the descriptions of the engines does no longer work since #2269 has been merged.
Related: #2314 (comment)
searx/engines/wikipedia.py
The update_engine_descriptions.py needs a list of all wikipedias. The
implementation from #2269 had only a reduced list:
searxng_extra/update/update_engine_descriptions.py
Before PR #2269 there was a match_language() function that did an
approximation using various methods. With PR #2269 there are only the types
in the data model of the languages, which can be recognized by babel. The
approximation methods, which are needed (only here) in the determination of
the descriptions, must be replaced by other methods.
DRAFT: the modified script in this PR does it work but the yield of descriptions is still significantly lower than it used to be at the level of commit 64fea2f (before #2269 was merged). It still needs some rework