-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add searx.locale.get_engine_locale in addition to searx.utils.match_language; improve qwant engine #1652
Merged
Commits on Aug 14, 2022
-
[mod] decouple qwant's categories from SearXNG's categories
By using new property `qwant_categ:` the category of qwant is no longer bound to the category of SearXNG. Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Configuration menu - View commit details
-
Copy full SHA for 75bb8c4 - Browse repository at this point
Copy the full SHA 75bb8c4View commit details -
[mod] add locale.get_engine_locale to get predictable results
The match_language function sometimes returns incorrect results which is why a new function get_engine_locale is required. A bugfix of the match_language is not easily possible, because there is almost no documentation for it and already the call parameters are undefined. E.g. the function processes values like the ones from yahoo:: "yahoo": [ "ar", ... "zh_chs", "zh_cht" ] The get_engine_locale has been documented in detail, there is a clear description of the assumptions as well as the requirements and approximation rules (read doc-string for more details):: Argument ``engine_locales`` is a python dict that maps *SearXNG locales* to corresponding *engine locales*: <engine>: { # SearXNG string : engine-string 'ca-ES' : 'ca_ES', 'fr-BE' : 'fr_BE', 'fr-CA' : 'fr_CA', 'fr-CH' : 'fr_CH', 'fr' : 'fr_FR', ... 'pl-PL' : 'pl_PL', 'pt-PT' : 'pt_PT' } .. hint:: The *SearXNG locale* string has to be known by babel! In the following you will find a comparison: >>> import babel.languages >>> from searx.utils import match_language >>> from searx.locales import get_engine_locale Assume we have an engine that supports the follwoing locales: >>> lang_list = { ... "zh-CN": "zh_CN", ... "zh-HK": "zh_HK", ... "nl-BE": "nl_BE", ... "fr-CA": "fr_CA", ... } Assumption: A. When a user selects a language the results should be optimized according to the selected language. B. When user selects a language and a territory the results should be optimized with first priority on territory and second on language. ---- Example: (Assumption A.) A user selects region 'zh-TW' which should end in zh_HK hint: CN is 'Hans' and HK ('Hant') fits better to TW ('Hant') >>> get_engine_locale('zh-TW', lang_list) 'zh_HK' >>> lang_list[match_language('zh-TW', lang_list)] 'zh_CN' ---- Example: (Assumption A.) A user selects only the language 'zh' which should end in CN >>> get_engine_locale('zh', lang_list) 'zh_CN' >>> lang_list[match_language('zh', lang_list)] 'zh_CN' ---- Example: (Assumption B.) A user selects region 'fr-BE' which should end in nl-BE hint: priority should be on the territory the user selected. If the user prefers 'fr' he will select 'fr' without a region tag. >>> get_engine_locale('fr-BE', lang_list, default='unknown') 'nl_BE' >>> match_language('fr-BE', lang_list, fallback='unknown') 'fr-CA' ---- Example: (Assumption A.) A user selects only the language 'fr' which should end in fr_CA >>> get_engine_locale('fr', lang_list) 'fr_CA' >>> lang_list[match_language('fr', lang_list)] 'fr_CA' ---- The difference in priority on the territory is best shown with a engine that supports the following locales: >>> lang_list = { ... "fr-FR": "fr_FR", ... "fr-CA": "fr_CA", ... "en-GB": "en_GB", ... "nl-BE": "nl_BE", ... } ---- Example: (Assumption A.) A user selects only a language >>> get_engine_locale('en', lang_list) 'en_GB' >>> match_language('en', lang_list) 'en-GB' hint: the engine supports fr_FR and fr_CA since no territory is given, fr_FR takes priority .. >>> get_engine_locale('fr', lang_list) 'fr_FR' >>> lang_list[match_language('fr', lang_list)] 'fr_FR' ---- Example: (Assumption B.) A user selects region 'fr-BE' which should end in nl-BE >>> get_engine_locale('fr-BE', lang_list) 'nl_BE' >>> lang_list[match_language('fr-BE', lang_list)] 'fr_FR' ---- If the user selects a language and there are two locales like the following: >>> lang_list = { ... "fr-BE": "fr_BE", ... "fr-CH": "fr_CH", ... } >>> >>> get_engine_locale('fr', lang_list) 'fr_BE' >>> lang_list[match_language('fr', lang_list)] 'fr_BE' Looks like both functions return the same value, but match_language depends on the order of the dictionary (which is not predictable): >>> lang_list = { ... "fr-CH": "fr_CH", ... "fr-BE": "fr_BE", ... } >>> get_engine_locale('fr', lang_list) 'fr_BE' >>> lang_list[match_language('fr', lang_list)] 'fr_CH' >>> The get_engine_locale selects the locale by looking at the "population percent" and this percentage has an higher amount in BE (68.%) compared to CH (21%) Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Configuration menu - View commit details
-
Copy full SHA for 9ae409a - Browse repository at this point
Copy the full SHA 9ae409aView commit details -
[fix] qwant - API error::locale must be one ..
The request function should not request a language (aka locale) that is not supported by qwant. Select a locale like zh-TW ends in qwant's API error: ERROR searx.engines.qwant news: exception : \ API error::locale must be one of the following values: \ en_gb, en_ie, en_us, en_ca, en_my, en_au, en_nz, de_de, de_ch, de_at, fr_fr, \ fr_be, fr_ch, fr_ca, fr_ad, fc_ca, co_fr, es_es, es_ar, es_cl, es_co, es_mx, \ es_pe, es_ad, ca_es, ca_ad, ca_fr, eu_es, eu_fr, it_it, it_ch, pt_pt, pt_ad, \ nl_be, nl_nl The existing searx.utils.match_language function is unsuitable for this purpose, it is replaced by function searx.locales.get_engine_locale that is based on the methods from the babel package. The quant's _fetch_supported_languages function has been revised to filter out languages 8aka locales) not supported by qwant. Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Configuration menu - View commit details
-
Copy full SHA for 6579d6d - Browse repository at this point
Copy the full SHA 6579d6dView commit details -
[mod] qwant - add safesearch option
Closes: searxng#1640 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Configuration menu - View commit details
-
Copy full SHA for 27385e7 - Browse repository at this point
Copy the full SHA 27385e7View commit details
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.