Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix] engine woxikon.de - don't raise exception on empty result list #1681

Merged
merged 2 commits into from
Sep 4, 2022

Conversation

return42
Copy link
Member

What does this PR do?

[fix] engine woxikon.de - don't raise exception on empty result list

Suggested-by: @allendema [1]

[1] #1543 (comment)

Why is this change important?

Woxikon expects a word in German, so with query "foo" the site finds nothing and respons a 404:

httpx.HTTPStatusError: Client error '404 Not Found' for url 'https://synonyme.woxikon.de/synonyme/foo.php'

How to test this PR locally?

 !woxikon.de_synonyme foo

There should not be an 404 error any longer.

Related issues

Closes: #1543

@unixfox
Copy link
Member

unixfox commented Aug 19, 2022

Will it raise an exception if the status code is higher than 500 though? Because that's useful for knowing if the server is down for example.

Same goes for 429 if the server rate limit SearXNG.

@return42
Copy link
Member Author

Will it raise an exception if the status code is higher than 500 though?

No ..

def patch_response(response, do_raise_for_httperror):
if isinstance(response, httpx.Response):
# requests compatibility (response is not streamed)
# see also https://www.python-httpx.org/compatibility/#checking-for-4xx5xx-responses
response.ok = not response.is_error
# raise an exception
if do_raise_for_httperror:
raise_for_httperror(response)

I think the intention of raise_for_httperror was to implement a handle in the response function / by example we have one in the wikipedia.py:

if resp.status_code == 404:
return []
if resp.status_code == 400:
try:
api_result = loads(resp.text)
except:
pass
else:
if (
api_result['type'] == 'https://mediawiki.org/wiki/HyperSwitch/errors/bad_request'
and api_result['detail'] == 'title-invalid-characters'
):
return []

The drawback of a XPath engine (woxikon) is, that we can't implement a response handle.

@unixfox
Copy link
Member

unixfox commented Aug 20, 2022

@return42
Copy link
Member Author

Not sure I understand your question right ..

XPath engines do not have a dedicated python module (to implement a engine specific HTTP error handle) and the raise_for_httperror: false is a bolean (in the settings.yml).

Do you suggest a different implementation for XPath engines, to fetch only configured HTTP status code .. e.g.:

  - name: woxikon.de synonyme
    engine: xpath
    ignore_http_status: 
      - 404
      - xxx

@dalf what do you think about a ignore list for XPath engines?

@unixfox
Copy link
Member

unixfox commented Aug 20, 2022

Oops I didn't see that it wasn't a dedicated python file for the engine.

Your solution with ignore_http_status seems a good IMO.

@dalf
Copy link
Member

dalf commented Aug 20, 2022

I agree it should be implemented in xpath.py.

By ignore_http_status you mean the error is ignored, and the result is parsed?
Yes, go.

Otherwise if xpath.py returns an empty result for these HTTP status code, some alternative names : http_status_as_no_result, no_result_for_http_status without strong conviction.

@return42
Copy link
Member Author

and the result is parsed?

not parsing; the first intention was not to raise an exception on empty result lists. Some engines (like woxikon here in this PR) return a "404 Not Found" when they do not have a match for the search term. I think it is good to have an option in the configured xpath engine to fetch some HTTP status codes and return an empty result list.

http_status_as_no_result, no_result_for_http_status without strong conviction.

True, the name could be improved, the ones you suggested I still do not like so much, but my suggestions are also not better: no_match_for_http status

@dalf
Copy link
Member

dalf commented Sep 2, 2022

I've added a commit so unexpected HTTP statuses are not hidden.

return42 and others added 2 commits September 4, 2022 09:07
Woxikon expects a word in German, so with query "foo" the site finds nothing and
respons a 404:

    httpx.HTTPStatusError: Client error '404 Not Found' \
      for url 'https://synonyme.woxikon.de/synonyme/foo.php'

[1] searxng#1543 (comment)

Closes: searxng#1543
Suggested-by: @allendema [1]
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
no_result_for_http_status contains a list of HTTP status.
These HTTP status are seen an empty result list.
In other cases an exception is thrown as usual.

Previously raise_for_httperror were ignoring all HTTP error,
which make defective engines invisible in the stats.
@return42
Copy link
Member Author

return42 commented Sep 4, 2022

I've added a commit so unexpected HTTP statuses are not hidden.

Thanks a lot .. I rebased and going to merge after checks from CI have been passed.

@return42 return42 merged commit 9ab8438 into searxng:master Sep 4, 2022
@return42 return42 deleted the fix-woxikon branch September 4, 2022 07:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug: woxikon.de synonyme engine
3 participants