Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: słownik języka polskiego engine #1544

Closed
return42 opened this issue Jul 24, 2022 · 0 comments · Fixed by #1549
Closed

Bug: słownik języka polskiego engine #1544

return42 opened this issue Jul 24, 2022 · 0 comments · Fixed by #1549
Labels
bug Something isn't working

Comments

@return42
Copy link
Member

return42 commented Jul 24, 2022

Version of SearXNG, commit number if you are using on master branch and stipulate if you forked SearXNG
Repository: https://github.com/searxng/searxng
Branch: master
Version: current master

How did you install SearXNG?

make run

What happened?

ERROR   werkzeug                      : Error on request:
Traceback (most recent call last):
  File "local/py3/lib/python3.8/site-packages/werkzeug/serving.py", line 335, in run_wsgi
    execute(self.server.app)
  File "local/py3/lib/python3.8/site-packages/werkzeug/serving.py", line 325, in execute
    write(data)
  File "local/py3/lib/python3.8/site-packages/werkzeug/serving.py", line 265, in write
    self.send_header(key, value)
  File "/usr/lib/python3.8/http/server.py", line 517, in send_header
    ("%s: %s\r\n" % (keyword, value)).encode('latin-1', 'strict'))
UnicodeEncodeError: 'latin-1' codec can't encode character '\u0142' in position 61: ordinal not in range(256)

How To Reproduce

query !słownik_języka_polskiego foo

Screenshots & Logs

I debugged by adding a try/except and pdb ...

(Pdb) l
259  	                except ValueError:
260  	                    code_str, msg = status_sent, ""
261  	                code = int(code_str)
262  	                self.send_response(code, msg)
263  	                header_keys = set()
264  ->	                for key, value in headers_sent:
265  	                    try:
266  	                        self.send_header(key, value)
267  	                        header_keys.add(key.lower())
268  	                    except Exception as exc:
269  	                        import pdb
(Pdb) key
'Server-Timing'
(Pdb) value
'total;dur=289.618, render;dur=105.459, total_0_słownik języka polskiego;dur=174.954, load_0_słownik języka polskiego;dur=165.832'

... the name słownik języka polskiego can't be encoded in latin-1.

(Pdb) where
  /usr/lib/python3.8/threading.py(890)_bootstrap()
-> self._bootstrap_inner()
  /usr/lib/python3.8/threading.py(932)_bootstrap_inner()
-> self.run()
  /usr/lib/python3.8/threading.py(870)run()
-> self._target(*self._args, **self._kwargs)
  /usr/lib/python3.8/socketserver.py(683)process_request_thread()
-> self.finish_request(request, client_address)
  /usr/lib/python3.8/socketserver.py(360)finish_request()
-> self.RequestHandlerClass(request, client_address, self)
  /usr/lib/python3.8/socketserver.py(747)__init__()
-> self.handle()
  local/py3/lib/python3.8/site-packages/werkzeug/serving.py(367)handle()
-> super().handle()
  /usr/lib/python3.8/http/server.py(427)handle()
-> self.handle_one_request()
  /usr/lib/python3.8/http/server.py(415)handle_one_request()
-> method()
  local/py3/lib/python3.8/site-packages/werkzeug/serving.py(339)run_wsgi()
-> execute(self.server.app)
  local/py3/lib/python3.8/site-packages/werkzeug/serving.py(329)execute()
-> write(data)
> local/py3/lib/python3.8/site-packages/werkzeug/serving.py(264)write()
-> for key, value in headers_sent:

Additional context

Unicode characters (not ASCII) in the name of an engine always cause problems, e.g. in logging, another example is #166

Technical report

Its a severe issue .. no technical report is generated, the entire process fails.

@return42 return42 added the bug Something isn't working label Jul 24, 2022
return42 added a commit to return42/searxng that referenced this issue Jul 24, 2022
The engine name is not only a *name* its also a identifieer that is used in
logs, HTTP headers and more.  Unicode characters in the name of an engine could
cause variuos issues.

Closes: searxng#1544
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
return42 added a commit to return42/searxng that referenced this issue Jul 24, 2022
The engine name is not only a *name* its also a identifier that is used in
logs, HTTP headers and more.  Unicode characters in the name of an engine could
cause various issues.

Closes: searxng#1544
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
kvch pushed a commit to kvch/searx that referenced this issue Jul 30, 2022
The engine name is not only a *name* its also a identifier that is used in
logs, HTTP headers and more.  Unicode characters in the name of an engine could
cause various issues.

Closes: searxng/searxng#1544
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
kvch added a commit to searx/searx that referenced this issue Jul 30, 2022
* [fix] google engine: results XPath

* [fix] google & youtube - set EU consent cookie

This change the previous bypass method for Google consent using
``ucbcb=1`` (6face21) to accept the consent using ``CONSENT=YES+``.

The youtube_noapi and google have a similar API, at least for the consent[1].

Get CONSENT cookie from google reguest::

    curl -i "https://www.google.com/search?q=time&tbm=isch" \
         -A "Mozilla/5.0 (X11; Linux i686; rv:102.0) Gecko/20100101 Firefox/102.0" \
         | grep -i consent
    ...
    location: https://consent.google.com/m?continue=https://www.google.com/search?q%3Dtime%26tbm%3Disch&gl=DE&m=0&pc=irp&uxe=eomtm&hl=en-US&src=1
    set-cookie: CONSENT=PENDING+936; expires=Wed, 24-Jul-2024 11:26:20 GMT; path=/; domain=.google.com; Secure
    ...

PENDING & YES [2]:

  Google change the way for consent about YouTube cookies agreement in EU
  countries. Instead of showing a popup in the website, YouTube redirects the
  user to a new webpage at consent.youtube.com domain ...  Fix for this is to
  put a cookie CONSENT with YES+ value for every YouTube request

[1] iv-org/invidious#2207
[2] TeamNewPipe/NewPipeExtractor#592

Closes: searxng/searxng#1432

* [fix] sjp engine - convert enginename to a latin1 compliance name

The engine name is not only a *name* its also a identifier that is used in
logs, HTTP headers and more.  Unicode characters in the name of an engine could
cause various issues.

Closes: searxng/searxng#1544
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>

* [fix] engine tineye: handle 422 response of not supported img format

Closes: searxng/searxng#1449
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>

* bypass google consent with ucbcb=1

* [mod] Adds Lingva translate engine

Add the lingva engine (which grabs data from google translate).  Results from
Lingva are added to the infobox results.

* openstreetmap engine: return the localized named.

For example: display "Tokyo" instead of "東京都" when the language is English.

* [fix] engines/openstreetmap.py typo: user_langage --> user_language

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>

* Wikidata engine: ignore dummy entities

* Wikidata engine: minor change of the SPARQL request

The engine can be slow especially when the query won't return any answer.
See https://www.mediawiki.org/wiki/Wikidata_Query_Service/User_Manual/MWAPI#Find_articles_in_Wikipedia_speaking_about_cheese_and_see_which_Wikibase_items_they_correspond_to

Co-authored-by: Léon Tiekötter <leon@tiekoetter.com>
Co-authored-by: Emilien Devos <contact@emiliendevos.be>
Co-authored-by: Markus Heiser <markus.heiser@darmarit.de>
Co-authored-by: Emilien Devos <github@emiliendevos.be>
Co-authored-by: ta <alt3753.7@gmail.com>
Co-authored-by: Alexandre Flament <alex@al-f.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant