Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix wikidata info box images #878

Merged
merged 5 commits into from
Feb 7, 2022
Merged

Conversation

tiekoetter
Copy link
Member

What does this PR do?

This PR fixes the wikidata info box images when the image proxy is enabled.

Why is this change important?

Without the fix info box images from wikidata are not loaded correctly.

How to test this PR locally?

  • make run
  • Search !wd test

Related issues

Closes #875

@return42
Copy link
Member

return42 commented Feb 6, 2022

If it is OK for you I will put some improvements (commit) on top of your branch.

@tiekoetter
Copy link
Member Author

Ok but let me rebase my commit. If I don't do this my commit is not signed and it will show an "Unverified" sign on GitHub.

@return42
Copy link
Member

return42 commented Feb 6, 2022

Ok but let me rebase my commit. If I don't do this my commit is not signed and it will show an "Unverified" sign on GitHub.

True / inform me when you are ready ..

@tiekoetter
Copy link
Member Author

No I mean if you want to squash everything in one commit let me do it, so that the commit is signed.

@return42
Copy link
Member

return42 commented Feb 6, 2022

ah, sorry .. I will push WIP commit on top you can squash or even drop ..

Wikidata info box images are now loaded from uploads.wikimedia.org instead of commons.wikimedia.org to prevent redirects

Co-authored-by: Markus Heiser <markus.heiser@darmarit.de>
@return42
Copy link
Member

return42 commented Feb 6, 2022

@tiekoetter I added a WIP commit on top which polish the implementation a little bit and adds a some debug messages.

DEBUG   searx.engines.wikidata        : get_thumbnail(): https://commons.wikimedia.org/wiki/Special:FilePath/Peterborough%20Cathedral%20March%202010.jpg?width=500&height=400
DEBUG   searx.engines.wikidata        : get_thumbnail() redirected: https://upload.wikimedia.org/wikipedia/commons/0/0d/Peterborough_Cathedral_March_2010.jpg

With this debug messages I noticed, that the calculated redirect URL is not a thumb, it is the full image. In the example above I searched for !wd peterborough.

The origin URL from wikidata (which is redirected) is the thumb (59kB):

https://commons.wikimedia.org/wiki/Special:FilePath/Peterborough%20Cathedral%20March%202010.jpg?width=500&height=400

and the redirected URL we calculate is the full size image (10,3MB):

https://upload.wikimedia.org/wikipedia/commons/0/0d/Peterborough_Cathedral_March_2010.jpg

Now I have a doubt that caluclate the redirect URL is a the right solution. May be it is better when the image_proxy follows the redirects.

@tiekoetter
Copy link
Member Author

@return42
Copy link
Member

return42 commented Feb 6, 2022

We can specify the max size of an image.

You surprise me a second time, your investigative skill is really good :-)

Do you have time to implement? .. Otherwise tomorrow I can give it a try.

@tiekoetter
Copy link
Member Author

@return42 I am almost done. Just waiting for make test to finish.

@return42
Copy link
Member

return42 commented Feb 6, 2022

@tiekoetter your solutions works perfect except for such cases where are more than one image, since the images with lower prio are selected. I tried !wd trump where are two images ...

DEBUG   searx.engines.wikidata        : get_thumbnail(): https://commons.wikimedia.org/wiki/Special:FilePath/Trump%20Text%20Logo.svg?width=500&height=400
DEBUG   searx.engines.wikidata        : get_thumbnail() redirected: https://upload.wikimedia.org/wikipedia/commons/thumb/0/07/Trump_Text_Logo.svg/500px-Trump_Text_Logo.svg
DEBUG   searx.engines.wikidata        : get_thumbnail(): https://commons.wikimedia.org/wiki/Special:FilePath/Donald%20Trump%20official%20portrait.jpg?width=500&height=400
DEBUG   searx.engines.wikidata        : get_thumbnail() redirected: https://upload.wikimedia.org/wikipedia/commons/thumb/5/56/Donald_Trump_official_portrait.jpg/500px-Donald_Trump_official_portrait.jpg

https://upload.wikimedia.org/wikipedia/commons/thumb/0/07/Trump_Text_Logo.svg/500px-Trump_Text_Logo.svg

does not exists, but the second image with the higher prio, the portrait exists:

https://upload.wikimedia.org/wikipedia/commons/thumb/5/56/Donald_Trump_official_portrait.jpg/500px-Donald_Trump_official_portrait.jpg

I pushed a commit on top that selects the image with higher prio .. what do you think about?

@return42 return42 marked this pull request as ready for review February 6, 2022 22:43
@tiekoetter
Copy link
Member Author

tiekoetter commented Feb 6, 2022

@return42 I read on this problem on stackoverflow. They said that if the image is an .svg you need to add .png to get the correct md5 hash url.

I will add the fix when the test is done.

@tiekoetter
Copy link
Member Author

@return42 Search !wd ttest; the md5 hash is broken.

Add '.png' to the second img_src_name if it has the extension '.svg'.
Use urllib.parse.unquote for URL decoding.
@tiekoetter tiekoetter changed the title [WIP] Fix wikidata info box images Fix wikidata info box images Feb 6, 2022
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
@return42
Copy link
Member

return42 commented Feb 7, 2022

Search !wd ttest; the md5 hash is broken.

@tiekoetter: seems it is fixed by your last patch .. do you see any issues more / I made some tests and for me it works now without any issue.

I added a commit on top with some pylint hints, no functional change.

If you do not see any issues more, I would like to merge this PR.

@tiekoetter
Copy link
Member Author

Which commits should be squashed and what should be the commit message(s)?

@return42
Copy link
Member

return42 commented Feb 7, 2022

Which commits should be squashed and what should be the commit message(s)?

From my side there is no more need to squash, as far I can see, each commit has a commit message that fits to the patch .. but it is up to you, if you want to squash feel free to do so.

@return42 return42 self-requested a review February 7, 2022 09:31
@tiekoetter
Copy link
Member Author

No if you think it is ok than there is indeed no need. I also think this PR is ready.

Copy link
Member

@return42 return42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works like a charm :-)

@tiekoetter: I really enjoyed in the collaboration we had, I am very much looking forward to your next contribution :-) .. thanks!

@return42 return42 merged commit ae8e3f3 into searxng:master Feb 7, 2022
return42 added a commit to return42/searxng that referenced this pull request Feb 7, 2022
Openstreatmap images are now loaded from uploads.wikimedia.org instead of
commons.wikimedia.org to prevent redirects.

With `image_proxy` enabled images from commons.wikimedia.org cant be loaded
since they are redirected.  We already discussed this issue [875] and
@tiekoetter fixed this issue in PR [878].

Related-to:
- [875] searxng#875
- [878] searxng#878
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Images in instant answer are not loaded if "image proxy" is enables
2 participants