-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bing.py: remove redirection links #1219
Conversation
searx/engines/bing.py
Outdated
# follow HTTP Bing URL and update the result URLs | ||
for i, redirect_response in enumerate(http_get(url_to_resolve, allow_redirects=False)): | ||
if not isinstance(redirect_response, Exception): | ||
results[url_to_resolve_index[i]]['url'] = redirect_response.headers['location'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can @return42 @mrpaulblack @tiekoetter @unixfox or anyone try this PR ?
The patch itself works but I expect a lot of IO by this loop: when I query about a resource with long URLs e.g. !bing heise.de bug :de
I see a lot of HTTP requests fired to bing.com .. IDK an other solution but I think fire one HTTP request for one link is to much .. IDK how long it takes bing blocks such requests and bans the server IP (CAPTCHA).
Since I see also other issues [1], personally I tend to drop the bing-web engine.
[1] On my server I need to repeat each query twice to get results, otherwise I get bad results similar to #941
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since I see also other issues [1], personally I tend to drop the bing-web engine.
May be the best solution would be disable bing by default and enable duckduckgo back, because, AFAIK, DDG uses bing?
For what it's worth, I've deployed this PR on my instance, and the bing engine works fine since then (it is enabled in all my browsers). |
searx/network/__init__.py
Outdated
elif isinstance(url, list): | ||
future_list = [asyncio.run_coroutine_threadsafe(network.request(method, u, **kwargs), get_loop()) for u in url] | ||
responses = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For what it's worth, I've deployed this PR on my instance, and the bing engine works fine since then (it is enabled in all my browsers).
OK, but I think this part of the network.request
where the same **kwargs
are send to a list of URLs is not a normal use case .. as you said: "quick & dirty" :) .. I think we should not merge this patch of network.request
.
For ref: searx/searx@a3ad9f9 |
I bet it's just a temporary fix and this will break in a few months. Just like when for months we chased for the user agent that would restore the functionality of Google when it broke. |
cb54c4c
to
5973b21
Compare
5973b21
to
21307a2
Compare
* test if bing redirect works on a public instance * use fork from dalf as base * update upstream to 21307a2c50058af24c409bcd10e76c6fb359f80b
add a new function searx.network.multi_requests to send multiple HTTP requests at once
21307a2
to
a1e8af0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
I tested this for 2 days on my public instance. I am not blocked by bing, URL still resolve to the correct ones and CPU and RAM usage also did not increase 👍
What does this PR do?
The purpose of this PR is to validate a possible solution to get the real URL from bing.
This PR make a quick & dirty change in
searx.network
to allow to send multiple URL in parallel (also not all exception are catched). So all redirect URL from bing are resolved at the same time. The total response time of the bing engine is similar to the google engine with this change on my laptop.[EDIT] based on #1220 , the engine uses the
<cite>
tag when possible for short URL.If it works on a public instance, an update of this PR should provide a more clean change of
searx.network
.Why is this change important?
Fix the bing engine
How to test this PR
locally?can @return42 @mrpaulblack @tiekoetter @unixfox or anyone try this PR ?
Author's checklist
Most probably baidu can be integrated in the same way.
Related issues
See #1209