Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Google News engine gives bad URLs #1959

Closed
AlyoshaVasilieva opened this issue Nov 16, 2022 · 5 comments · Fixed by #2306
Closed

Google News engine gives bad URLs #1959

AlyoshaVasilieva opened this issue Nov 16, 2022 · 5 comments · Fixed by #2306
Assignees
Labels
bug Something isn't working

Comments

@AlyoshaVasilieva
Copy link

Version of SearXNG, commit number if you are using on master branch and stipulate if you forked SearXNG
2022.11.11-3a765113
How did you install SearXNG?
Script, also occurs in public instances
What happened?
Google News URLs encodes URLs, SearXNG does not decode them

How To Reproduce

  1. Search news "Here's how ranked choice voting will decide Alaska's Senate race"
  2. Look at result URLs for the ABC News article

Bing, Qwant give URL:
https://abcnews.go.com/Politics/ranked-choice-voting-decide-alaskas-senate-race/story?id=93063277
Google News gives URL:
https://abcnews.go.com/Politics/ranked-choice-voting-decide-alaskas-senate-race/story?id\\u003d93063277

The Google URL does not work.

Repro 2:
Search "McConnell dismisses Scott's GOP leadership challenge"

Qwant gives URL:
https://abcnews.go.com/Politics/mcconnell-dismisses-scotts-gop-leadership-challenge-votes/story?id=93354279
Google gives URL:
https://abcnews.go.com/Politics/mcconnell-dismisses-scotts-gop-leadership-challenge-votes/story/x1aY/x17/x1dL/x0c/x0c/xd9/x0eL/xcc/xcdM/x0c/x8d/xceH/x97'

Expected behavior
URL is decoded properly

@AlyoshaVasilieva AlyoshaVasilieva added the bug Something isn't working label Nov 16, 2022
@return42
Copy link
Member

return42 commented Jan 8, 2023

For reference / I addressed this issue in my rework of locales & languages: [mod] Google: reversed engineered & upgrade to data_type: traits_v1

@return42 return42 self-assigned this Jan 8, 2023
@return42
Copy link
Member

return42 commented Mar 29, 2023

has been merged, but this problem still needs to be investigated / issue still exists --> :en !gon abcnews

return42 added a commit to return42/searxng that referenced this issue Apr 1, 2023
Google-News returns internal links where the origin URL is encoded in a
base64 (RFC 2045 aka URL-safe) string.

Closes: searxng#1959
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
@return42
Copy link
Member

return42 commented Apr 1, 2023

@AlyoshaVasilieva sorry for the late response on your issue report .. now I have implemented #2306 that fixes the issue you reported / would you like to test #2306 / thanks 👍

@AlyoshaVasilieva
Copy link
Author

The PR fixed every news source I can find (thanks!), except one: Des Moines Register.

  1. Search Des Moines Register in news section
  2. href URLs look like: http:///�https://www.desmoinesregister.com/story/weather/2023/04/01/tornado-sightings-damage-reports-from-severe-storms-in-iowa-friday/70071149007/

The URL after � (U+0001?) seems valid.

return42 added a commit to return42/searxng that referenced this issue Apr 2, 2023
Follow up of 8de8070 to fix the issue reported by AlyoshaVasilieva [1].

[1] searxng#1959 (comment)

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
return42 added a commit to return42/searxng that referenced this issue Apr 2, 2023
Follow up of 8de8070 to fix the issue reported by AlyoshaVasilieva [1].

[1] searxng#1959 (comment)

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
@return42
Copy link
Member

return42 commented Apr 2, 2023

@AlyoshaVasilieva thanks for your elaborate tests 👍 ... helps me to make the URL decoding more robust .. I fixed the issue:

If you see other issues, please let me know / thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants