-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix] google images engine: Fix 'scrap_img_by_id' function #910
Conversation
While I am still in review with this PR I leave some comments ...
This implementation is from mine :-) ... ugly hack parsing JS code by regexp to pick out thumbs .. as far I remember, my intention was to get thumbnails instead of full size images to save band with.
In #878 we found a solution to avoid redirects by calculating the direct URL .. if we here in google can't avoid redirects, we can allow redirects in the image_proxy implementation. Here is a possible solution @dalf posted on Matrix a few days ago .. diff --git a/searx/webapp.py b/searx/webapp.py
index 5e05f978..eb08d63d 100755
--- a/searx/webapp.py
+++ b/searx/webapp.py
@@ -1132,7 +1132,7 @@ def image_proxy():
'DNT': '1',
}
set_context_network_name('image_proxy')
- resp, stream = http_stream(method='GET', url=url, headers=request_headers)
+ resp, stream = http_stream(method='GET', url=url, headers=request_headers, allow_redirects=True)
content_length = resp.headers.get('Content-Length')
if content_length and content_length.isdigit() and int(content_length) > maximum_size:
return 'Max size', 400 |
I don't think we can avoid this here because not every image comes from one source. I don't think google caches the full image of every source. |
While trying to understand
|
Without redirects the load of various images will fail when image_proxy is enabled [1]. [1] searxng#910 (comment) Suggested-by: @dalf [1] Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
The 'scrap_img_by_id' function didn't return any longer anything useful. This fix allows the google images engine to present the full source image instead of only the thumbnail. The function scrap_img_by_id() is rpelaced by a fully rewrite to parse image URLs by a regular expression. The new function parse_urls_img_from_js(dom) returns a mapping of data-id to image URL. Closes: searxng#909 Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Without redirects the load of various images will fail when image_proxy is enabled [1]. [1] searxng#910 (comment) Suggested-by: @dalf [1] Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tiekoetter, @dalf: the response from google has changed, see my comment above.
I implemented an alternative solution in my branch fix-909-mhe .. would you please have a look / see commit
[fix] image_proxy: allow HTTP redirects
In this fix-909-mhe branch I also implemented the redirect when image_proxy is enabled, see commit:
[fix] image_proxy: allow HTTP redirects
I also had been in this rabbit hole when I implemented it the first time :-) .. see my comment: return42@4a28b59#diff-7f888885ad7eb66947d95abd3ffa0dabe8a1faa6b1a4b5d4ea0b40908bf0afa6R100-R105 |
Awesome thanks for the PR, this annoyed me to have small images too. |
c3d28e5
to
c2d9c93
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
What does this PR do?
The 'scrap_img_by_id' function didn't return anything useful. This fix allows the google images engine to present the full source image instead of only the thumbnail.
How to test this PR locally?
make run
!goi land
Author's checklist
Note: If an image is redirected image proxy fails.
Related issues
Closes #909