Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add searx.locale.get_engine_locale in addition to searx.utils.match_language; improve qwant engine #1652

Merged
merged 4 commits into from
Aug 14, 2022

Commits on Aug 14, 2022

  1. [mod] decouple qwant's categories from SearXNG's categories

    By using new property `qwant_categ:` the category of qwant is no longer bound to
    the category of SearXNG.
    
    Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
    return42 committed Aug 14, 2022
    Configuration menu
    Copy the full SHA
    75bb8c4 View commit details
    Browse the repository at this point in the history
  2. [mod] add locale.get_engine_locale to get predictable results

    The match_language function sometimes returns incorrect results which is why a
    new function get_engine_locale is required.
    
    A bugfix of the match_language is not easily possible, because there is almost
    no documentation for it and already the call parameters are undefined.  E.g. the
    function processes values like the ones from yahoo::
    
        "yahoo": [
            "ar",
            ...
            "zh_chs",
            "zh_cht"
         ]
    
    The get_engine_locale has been documented in detail, there is a clear
    description of the assumptions as well as the requirements and approximation
    rules (read doc-string for more details)::
    
        Argument ``engine_locales`` is a python dict that maps *SearXNG locales* to
        corresponding *engine locales*:
    
          <engine>: {
              # SearXNG string : engine-string
              'ca-ES'          : 'ca_ES',
              'fr-BE'          : 'fr_BE',
              'fr-CA'          : 'fr_CA',
              'fr-CH'          : 'fr_CH',
              'fr'             : 'fr_FR',
              ...
              'pl-PL'          : 'pl_PL',
              'pt-PT'          : 'pt_PT'
          }
    
        .. hint::
    
           The *SearXNG locale* string has to be known by babel!
    
    In the following you will find a comparison:
    
    >>> import babel.languages
    >>> from searx.utils import match_language
    >>> from searx.locales import get_engine_locale
    
    Assume we have an engine that supports the follwoing locales:
    
    >>> lang_list = {
    ...     "zh-CN": "zh_CN",
    ...     "zh-HK": "zh_HK",
    ...     "nl-BE": "nl_BE",
    ...     "fr-CA": "fr_CA",
    ... }
    
    Assumption:
    
      A. When a user selects a language the results should be optimized according to
         the selected language.
    
      B. When user selects a language and a territory the results should be
         optimized with first priority on territory and second on language.
    
    ----
    
    Example: (Assumption A.)
    
      A user selects region 'zh-TW' which should end in zh_HK
    
    hint:
      CN is 'Hans' and HK ('Hant') fits better to TW ('Hant')
    
    >>> get_engine_locale('zh-TW', lang_list)
    'zh_HK'
    >>> lang_list[match_language('zh-TW', lang_list)]
    'zh_CN'
    
    ----
    
    Example: (Assumption A.)
    
      A user selects only the language 'zh' which should end in CN
    
    >>> get_engine_locale('zh', lang_list)
    'zh_CN'
    >>> lang_list[match_language('zh', lang_list)]
    'zh_CN'
    
    ----
    
    Example: (Assumption B.)
    
      A user selects region 'fr-BE' which should end in nl-BE
    
    hint:
      priority should be on the territory the user selected.  If the user
      prefers 'fr' he will select 'fr' without a region tag.
    
    >>> get_engine_locale('fr-BE', lang_list, default='unknown')
    'nl_BE'
    >>> match_language('fr-BE', lang_list, fallback='unknown')
    'fr-CA'
    
    ----
    
    Example: (Assumption A.)
    
      A user selects only the language 'fr' which should end in fr_CA
    
    >>> get_engine_locale('fr', lang_list)
    'fr_CA'
    >>> lang_list[match_language('fr', lang_list)]
    'fr_CA'
    
    ----
    
    The difference in priority on the territory is best shown with a engine that
    supports the following locales:
    
    >>> lang_list = {
    ...     "fr-FR": "fr_FR",
    ...     "fr-CA": "fr_CA",
    ...     "en-GB": "en_GB",
    ...     "nl-BE": "nl_BE",
    ... }
    
    ----
    
    Example: (Assumption A.)
    
       A user selects only a language
    
    >>> get_engine_locale('en', lang_list)
    'en_GB'
    >>> match_language('en', lang_list)
    'en-GB'
    
    hint: the engine supports fr_FR and fr_CA since no territory is given, fr_FR
    takes priority ..
    
    >>> get_engine_locale('fr', lang_list)
    'fr_FR'
    >>> lang_list[match_language('fr', lang_list)]
    'fr_FR'
    
    ----
    
    Example: (Assumption B.)
    
      A user selects region 'fr-BE' which should end in nl-BE
    
    >>> get_engine_locale('fr-BE', lang_list)
    'nl_BE'
    >>> lang_list[match_language('fr-BE', lang_list)]
    'fr_FR'
    
    ----
    
    If the user selects a language and there are two locales like the following:
    
    >>> lang_list = {
    ...      "fr-BE": "fr_BE",
    ...      "fr-CH": "fr_CH",
    ...  }
    >>>
    
    >>> get_engine_locale('fr', lang_list)
    'fr_BE'
    >>> lang_list[match_language('fr', lang_list)]
    'fr_BE'
    
    Looks like both functions return the same value, but match_language depends on the
    order of the dictionary (which is not predictable):
    
    >>> lang_list = {
    ...      "fr-CH": "fr_CH",
    ...      "fr-BE": "fr_BE",
    ...  }
    >>> get_engine_locale('fr', lang_list)
    'fr_BE'
    >>> lang_list[match_language('fr', lang_list)]
    'fr_CH'
    >>>
    
    The get_engine_locale selects the locale by looking at the "population percent"
    and this percentage has an higher amount in BE (68.%) compared to CH (21%)
    
    Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
    return42 committed Aug 14, 2022
    Configuration menu
    Copy the full SHA
    9ae409a View commit details
    Browse the repository at this point in the history
  3. [fix] qwant - API error::locale must be one ..

    The request function should not request a language (aka locale) that is not
    supported by qwant. Select a locale like zh-TW ends in qwant's API error:
    
      ERROR searx.engines.qwant news: exception : \
      API error::locale must be one of the following values: \
        en_gb, en_ie, en_us, en_ca, en_my, en_au, en_nz, de_de, de_ch, de_at, fr_fr, \
        fr_be, fr_ch, fr_ca, fr_ad, fc_ca, co_fr, es_es, es_ar, es_cl, es_co, es_mx, \
        es_pe, es_ad, ca_es, ca_ad, ca_fr, eu_es, eu_fr, it_it, it_ch, pt_pt, pt_ad, \
        nl_be, nl_nl
    
    The existing searx.utils.match_language function is unsuitable for this purpose,
    it is replaced by function searx.locales.get_engine_locale that is based on the
    methods from the babel package.
    
    The quant's _fetch_supported_languages function has been revised to filter out
    languages 8aka locales) not supported by qwant.
    
    Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
    return42 committed Aug 14, 2022
    Configuration menu
    Copy the full SHA
    6579d6d View commit details
    Browse the repository at this point in the history
  4. [mod] qwant - add safesearch option

    Closes: searxng#1640
    Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
    return42 committed Aug 14, 2022
    Configuration menu
    Copy the full SHA
    27385e7 View commit details
    Browse the repository at this point in the history