Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bing.py: remove redirection links #1219

Merged
merged 1 commit into from
Jul 10, 2022
Merged

Conversation

dalf
Copy link
Member

@dalf dalf commented May 21, 2022

What does this PR do?

The purpose of this PR is to validate a possible solution to get the real URL from bing.

This PR make a quick & dirty change in searx.network to allow to send multiple URL in parallel (also not all exception are catched). So all redirect URL from bing are resolved at the same time. The total response time of the bing engine is similar to the google engine with this change on my laptop.

[EDIT] based on #1220 , the engine uses the <cite> tag when possible for short URL.

If it works on a public instance, an update of this PR should provide a more clean change of searx.network.

Why is this change important?

Fix the bing engine

How to test this PR locally?

can @return42 @mrpaulblack @tiekoetter @unixfox or anyone try this PR ?

Author's checklist

Most probably baidu can be integrated in the same way.

Related issues

See #1209

@Linerly
Copy link
Contributor

Linerly commented May 22, 2022

Tried patching this PR to my instance, it works
Screenshot
...another screenshot with another result
Another screenshot

Comment on lines 120 to 122
# follow HTTP Bing URL and update the result URLs
for i, redirect_response in enumerate(http_get(url_to_resolve, allow_redirects=False)):
if not isinstance(redirect_response, Exception):
results[url_to_resolve_index[i]]['url'] = redirect_response.headers['location']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can @return42 @mrpaulblack @tiekoetter @unixfox or anyone try this PR ?

The patch itself works but I expect a lot of IO by this loop: when I query about a resource with long URLs e.g. !bing heise.de bug :de I see a lot of HTTP requests fired to bing.com .. IDK an other solution but I think fire one HTTP request for one link is to much .. IDK how long it takes bing blocks such requests and bans the server IP (CAPTCHA).

Since I see also other issues [1], personally I tend to drop the bing-web engine.

[1] On my server I need to repeat each query twice to get results, otherwise I get bad results similar to #941

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I see also other issues [1], personally I tend to drop the bing-web engine.

May be the best solution would be disable bing by default and enable duckduckgo back, because, AFAIK, DDG uses bing?

dalf added a commit to dalf/searxng that referenced this pull request May 24, 2022
dalf added a commit to dalf/searxng that referenced this pull request May 24, 2022
dalf added a commit to dalf/searxng that referenced this pull request May 24, 2022
dalf added a commit to dalf/searxng that referenced this pull request May 24, 2022
allendema pushed a commit to allendema/searxng that referenced this pull request May 26, 2022
allendema pushed a commit to allendema/searxng that referenced this pull request May 26, 2022
@dalf
Copy link
Member Author

dalf commented May 29, 2022

For what it's worth, I've deployed this PR on my instance, and the bing engine works fine since then (it is enabled in all my browsers).

Comment on lines 93 to 151
elif isinstance(url, list):
future_list = [asyncio.run_coroutine_threadsafe(network.request(method, u, **kwargs), get_loop()) for u in url]
responses = []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what it's worth, I've deployed this PR on my instance, and the bing engine works fine since then (it is enabled in all my browsers).

OK, but I think this part of the network.request where the same **kwargs are send to a list of URLs is not a normal use case .. as you said: "quick & dirty" :) .. I think we should not merge this patch of network.request.

@dalf
Copy link
Member Author

dalf commented Jun 6, 2022

For ref: searx/searx@a3ad9f9

@unixfox
Copy link
Member

unixfox commented Jun 6, 2022

For ref: searx/searx@a3ad9f9

I bet it's just a temporary fix and this will break in a few months. Just like when for months we chased for the user agent that would restore the functionality of Google when it broke.
See https://github.com/searx/searx/pulls?q=is%3Apr+sort%3Aupdated-desc+google+old+ui+is%3Aclosed+author%3Aunixfox

@dalf dalf changed the title bing.py: quick & dirty way to get the real URL bing.py: remove redirection links Jul 6, 2022
mrpaulblack added a commit to paulgoio/searxng that referenced this pull request Jul 8, 2022
* test if bing redirect works on a public instance
* use fork from dalf as base
* update upstream to 21307a2c50058af24c409bcd10e76c6fb359f80b
add a new function searx.network.multi_requests to send multiple HTTP requests at once
Copy link
Member

@mrpaulblack mrpaulblack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

I tested this for 2 days on my public instance. I am not blocked by bing, URL still resolve to the correct ones and CPU and RAM usage also did not increase 👍

@dalf dalf marked this pull request as ready for review July 10, 2022 16:06
@dalf dalf merged commit 44f2eb5 into searxng:master Jul 10, 2022
@dalf dalf deleted the follow_bing_redirect branch July 10, 2022 16:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants