Skip to content
This repository has been archived by the owner on Sep 7, 2023. It is now read-only.

Google Captcha #729

Open
sut12 opened this issue Oct 12, 2016 · 138 comments
Open

Google Captcha #729

sut12 opened this issue Oct 12, 2016 · 138 comments

Comments

@sut12
Copy link

sut12 commented Oct 12, 2016

Hey,

i have the problem that google gives me captchas. Therefor i cant see google results. My Server is not blacklistet anywhere. So im wondering what i can do to get rid of the captchas. My instance is not hosted under https. Is that maybe a reason for the google captchas?

Thanks!

@dalf
Copy link
Contributor

dalf commented Oct 12, 2016

If you can, use a browser using the same IP than your server : google will throw some captchas at first, and will usually stop after few requests.
One way to do it : install Firefox on the server, then use ssh with X forwarding or VNC.

Not sure, if you need to check if your IP is blacklisted ( how to check : http://www.dnsbl.info/dnsbl-database-check.php not sure either that this is right service to use)

@sut12
Copy link
Author

sut12 commented Oct 13, 2016

Hello,
i used elinks. It is possible to save images. So i saved the captcha and downloaded it afterwards. For now it is working but i had to enter the captcha already more than once.

I am not listed on any blacklist except on spamcannibal:
generic/anonymous/un-named IP

I need to check how i can get rid of this message. But im not sure if it has something to do with the google problem.

@sut12
Copy link
Author

sut12 commented Oct 13, 2016

It is so wired. Yesterday evening it worked after i enterd the captchas. This morning it didnt work.

Now it works but i didnt enter any captchas since yesterday...

@cy8aer
Copy link
Contributor

cy8aer commented Oct 24, 2016

Google is dead for me (as startpage).

@sut12
Copy link
Author

sut12 commented Oct 24, 2016

I need google. But its working sometimes and sometimes not. I even tried to analyze with google webmaster tools...

Right now its working - 2morrow maybe not.

@cy8aer
Copy link
Contributor

cy8aer commented Oct 24, 2016

Seems that google sets GOOGLE_ABUSE_EXEMPTION cookie after captcha. And this cookie has a timeout date (sometimes). If there would be some proxying mechanism for the captcha for re-setting the cookie would be nice

@cy8aer
Copy link
Contributor

cy8aer commented Nov 2, 2016

what about forwarding the captcha to the frontend and save the cookie in the backend then?

@sut12
Copy link
Author

sut12 commented Nov 3, 2016

I think that would be the best solution.

@ghost
Copy link

ghost commented Nov 3, 2016

I wonder how searx.me(and other searx admin) handle google captcha.
Sure they process 1K+/day request to google without getting "scroogled".

@gszathmari
Copy link

I am having the same issue. It doesn't help if I solve the CAPTCHAs from a browser as the GOOGLE_ABUSE_EXEMPTION should be sent with my searx queries.

@kvch
Copy link
Member

kvch commented Mar 26, 2017

@gszathmari What do you mean? What is GOOGLE_ABUSE_EXEMPTION?

@gszathmari
Copy link

gszathmari commented Mar 26, 2017 via email

@sut12
Copy link
Author

sut12 commented Mar 26, 2017

Thats exactly the point. cy8aer also pointed in the right direction:

what about forwarding the captcha to the frontend and save the cookie in the backend then?

I also tried to admin my domain with google webmaster tools. So that goole doenst ask for captcha anymore. It didnt help.

@cy8aer
Copy link
Contributor

cy8aer commented Mar 26, 2017

Yeah but I did not know the name of the cookie 'GOOGLE_ABUSE_EXEMPTION'. Thanks to gszathmari

@prolibre
Copy link

prolibre commented Dec 8, 2017

Engines cannot retrieve results:
google (unexpected crash: CAPTCHA required)

every day or almost google blocks me... it's getting unmanageable for me.

Anthony

@zwnk
Copy link

zwnk commented Jan 2, 2018

i recently updated to 0.13.1 and get the same error. went back to 0.12 and get google results.

@sut12
Copy link
Author

sut12 commented Jan 2, 2018

hmmm. I am on 0.12.0 and dont get google results. Maybe i should try to update xD.
Oh wait - rightn ow im getting results. But i am sure tomorrow it wont work again.

@kvch
Copy link
Member

kvch commented Jan 2, 2018

@zwnk have you tried the latest master? there were a few changes since the last release which fixed known google issues.

@zwnk
Copy link

zwnk commented Jan 2, 2018

i pulled today and had the google captcha error.

@Pofilo
Copy link
Collaborator

Pofilo commented Jan 3, 2018

This is not the first person asking for the problem with google.
As it is solved with a commit since the last release, maybe we can do a new release ?

@asciimoo
Copy link
Member

asciimoo commented Jan 3, 2018

@Pofilo we can release a new version (0.13.2), but as I see these fixes don't solve the problem permanently for everybody.

@sut12
Copy link
Author

sut12 commented Jan 5, 2018

I was on 12.0 and it just worked for 3 days now. Today in the morning it didnt work again. No results from googel. So i did an update to 13.1. Now it works again. Changed nothing else. Lets see how long it works.

@Dominion0815
Copy link

same problem here with 0.13.1:

ERROR:searx.search:engine google : exception : CAPTCHA required
Traceback (most recent call last):
  File "/usr/local/searx/searx/search.py", line 104, in search_one_request_safe
    search_results = search_one_request(engine, query, request_params)
  File "/usr/local/searx/searx/search.py", line 87, in search_one_request
    return engine.response(response)
  File "/usr/local/searx/searx/engines/google.py", line 217, in response
    raise RuntimeWarning(gettext('CAPTCHA required'))
RuntimeWarning: CAPTCHA required

@sachaz
Copy link

sachaz commented Feb 13, 2018

Hi,

I just updated https://searx.aquilenet.fr to 13.1 and I still got the same issue as Dominion0815

@sachaz
Copy link

sachaz commented Feb 18, 2018

issue corrected, my instance was bugged by bots. Corrected with filtron:
https://asciimoo.github.io/searx/admin/filtron.html
https://github.com/asciimoo/filtron

@Pofilo
Copy link
Collaborator

Pofilo commented Feb 18, 2018

Interesting, I will install filtron and give a feedback !

@steckerhalter
Copy link

I'm using filtron but still get the captcha. Any tips maybe on configuring filtron or something?

@return42
Copy link
Contributor

return42 commented Jan 7, 2021

@unixfox: great explained! I think your response is the ultimate answer of this issue / can we now close this issue with it?

@unixfox
Copy link
Member

unixfox commented Jan 7, 2021

@unixfox: great explained! I think your response is the ultimate answer of this issue / can we now close this issue with it?

No we won't close this issue because I have a feeling that closing it will create duplicate of it because people won't find this issue and thus think of creating a new one.

Technically we still have some work on our side for combatting this issue, an example is #2439

@searx searx deleted a comment from ltguillaume Jan 7, 2021
@searx searx deleted a comment from ltguillaume Jan 7, 2021
@mj162
Copy link

mj162 commented Jan 11, 2021

Agree with the logic in keeping this ticket open. It's how I found it before otherwise creating a new ticket.

I have switched to Unixfox's instance as first preference (though oddly, language has to be set to English UK rather than US for search results to find anything -- seen that on other instances too).

@deepend-tildeclub
Copy link

Not speaking from experience because I have none. But is there a way to proxy the Google captcha and show it to the client to respond. Or is that impossible?

Don't think making Searx look more like a real browser would solve anything because it's the amount of requests from one ip that causes Google to ask for captcha. But maybe having more ips and have searx spread the requests over multiple ips as well as blocking bot requests could lessen the issue a bit.

@unixfox
Copy link
Member

unixfox commented Jan 11, 2021

Not speaking from experience because I have none. But is there a way to proxy the Google captcha and show it to the client to respond. Or is that impossible?

This has been brought up many times and no it's impossible because a recaptcha can only be loaded from the same domain, in this case www.google.com

I have switched to Unixfox's instance as first preference (though oddly, language has to be set to English UK rather than US for search results to find anything -- seen that on other instances too).

Please open an issue if you can replicate the bug on other instances and even in a local Searx.

@unixfox unixfox removed this from Existing engines to fix in Engines Jan 11, 2021
@mj162
Copy link

mj162 commented Jan 11, 2021

I've used this: text proxy for years. The gentleman moved to a different hosting provider and since then Google's baulked with its graphical Captcha request... Agreed, too many requests from one IP is a reasonable assumption but how many is too many and how far back in time are they going; the last 5 minutes or the last 5 years?

[will do]

@paradoxsupreme
Copy link

paradoxsupreme commented Feb 16, 2021

I am trying to self host a searx instance on a VPS with nginx. I have used these filtron rules.json. I have also set up morty. The instance is working correctly but when I try to use the google search engine it gives me google (too many requests) on the first request and then it gives Sorry! we didn't find any results.. Is there anything I can do to fix the google search egine?

@unixfox
Copy link
Member

unixfox commented Feb 16, 2021

I am trying to self host a searx instance on a VPS with nginx. I have used these filtron rules.json. I have also set up morty. The instance is working correctly but when I try to use the google search engine it gives me google (too many requests) on the first request and then it gives Sorry! we didn't find any results.. Is there anything I can do to fix the google search egine?

You could use some proxies but unfortunately apart from that Searx can't do much about this issue.

@paradoxsupreme
Copy link

I am trying to self host a searx instance on a VPS with nginx. I have used these filtron rules.json. I have also set up morty. The instance is working correctly but when I try to use the google search engine it gives me google (too many requests) on the first request and then it gives Sorry! we didn't find any results.. Is there anything I can do to fix the google search egine?

You could use some proxies but unfortunately apart from that Searx can't do much about this issue.

Could you give me some resources about how I can configure these proxies?

@unixfox
Copy link
Member

unixfox commented Feb 16, 2021

I am trying to self host a searx instance on a VPS with nginx. I have used these filtron rules.json. I have also set up morty. The instance is working correctly but when I try to use the google search engine it gives me google (too many requests) on the first request and then it gives Sorry! we didn't find any results.. Is there anything I can do to fix the google search egine?

You could use some proxies but unfortunately apart from that Searx can't do much about this issue.

Could you give me some resources about how I can configure these proxies?

Add them into the settings.yaml here: https://github.com/searx/searx/blob/master/searx/settings.yml#L78

@paradoxsupreme

This comment has been minimized.

@paradoxsupreme

This comment has been minimized.

@unixfox

This comment has been minimized.

@BBaoVanC

This comment has been minimized.

@unixfox

This comment has been minimized.

@rob4226
Copy link
Contributor

rob4226 commented Apr 23, 2021

Hi, I've been using searx for a long time and love it, I just updated to v1.0. I don't think this is unique to v1.0 but I keep getting this error for google that many others get:

Engines cannot retrieve results: google (too many requests)

I understand that means too many requests from one IP but what I don't understand is that both my self-hosted searx server and my computer are on the same network in my house and hence use the same external IP address...but when I go to google in my browser it searches fine, no CAPTCHA, or too many requests error.

So why would I be getting a too many requests error from google when using searx, but not my browser, when coming from the same exact IP? Just doesn't make sense to me.

Thank you for everyone's work on searx!

@unixfox
Copy link
Member

unixfox commented Apr 23, 2021

Hi, I've been using searx for a long time and love it, I just updated to v1.0. I don't think this is unique to v1.0 but I keep getting this error for google that many others get:

Engines cannot retrieve results: google (too many requests)

I understand that means too many requests from one IP but what I don't understand is that both my self-hosted searx server and my computer are on the same network in my house and hence use the same external IP address...but when I go to google in my browser it searches fine, no CAPTCHA, or too many requests error.

So why would I be getting a too many requests error from google when using searx, but not my browser, when coming from the same exact IP? Just doesn't make sense to me.

Thank you for everyone's work on searx!

Please see #729 (comment) and #729 (comment).

@RennisDitchie
Copy link

@unixfox
Hey there, thanks for the post above which explains that the overarching issue isn't anything to deal with Searx directly, but in that Google treats Searx as a bot (go figure, botnet search engine doesn't like other bots...)

In this case, how would you go about DISABLING Google from a Searx instance, and just use DuckDuckGo instead?

I ask because I am running into the same exact "Engines cannot retrieve results: google (too many requests)" issue as everyone else since I host it on a Debian 10 VPS droplet on Digital Ocean.

I was willing to bet that maybe I did too many search requests, but it'll often give this error on literally the first search attempt if I include too many words in the query itself.

Anyway, would appreciate if you could link me to a specific section of the guide on how to do this, or another issue as I just want to nuke Google from my Searx config to just get rid of the issue entirely as I can't even use the stupid search engine as a result of the error.

Thanks again.

@BBaoVanC
Copy link
Contributor

BBaoVanC commented Apr 24, 2021 via email

@Pofilo
Copy link
Collaborator

Pofilo commented Apr 24, 2021

@RennisDitchie

I can't even use the stupid search engine

Well, you could be more respectful from something you can use/modify freely right ?

However, this Google issue is something we can't really resolve.
Another metasearch engine named Whoogle is designed only for Google results and suffers from the same problems (I tried it last week).

@RennisDitchie
Copy link

@Pofilo

I am thankful, but when I first deployed Searx a year ago just months after Luke Smith's video on the same topic, it just didn't work period, plus the installation guide was a mess.

Yes, the deployment process has improved, and yes, I can solve my issues by "using the docker image".

Don't confuse my frustration for the fact that I think your guys' project is awesome.

I'm actually more mad at Google itself for blocking it in the first place. Bear with me on that one since I'm just debating how to fix the issue itself. I thought it was my newbness in terms of not following the instructions to a T, but am kind of glad its really just an issue that will most likely never be truly solved since Google will always try to circumvent users from not directly accessing their website.

Anyway, just wanted to say thanks on that regard.

The only other issue I'm running into is that on basic queries like "text outline css", I get this error as well:
Sorry! we didn't find any results. Please use another query or search in more categories.

I'm assuming I have to check the error log to see why this is occurring?

@RennisDitchie
Copy link

On 04/23/21 10:19PM, RennisDitchie wrote: @unixfox Hey there, thanks for the post above which explains that the overarching issue isn't anything to deal with Searx directly, but in that Google treats Searx as a bot (go figure, botnet search engine doesn't like other bots...) In this case, how would you go about DISABLING Google from a Searx instance, and just use DuckDuckGo instead? I ask because I am running into the same exact "Engines cannot retrieve results: google (too many requests)" issue as everyone else since I host it on a Debian 10 VPS droplet on Digital Ocean. I was willing to bet that maybe I did too many search requests, but it'll often give this error on literally the first search attempt if I include too many words in the query itself. Anyway, would appreciate if you could link me to a specific section of the guide on how to do this, or another issue as I just want to nuke Google from my Searx config to just get rid of the issue entirely as I can't even use the stupid search engine as a result of the error. Thanks again. -- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: #729 (comment)
Go to your settings.yml and set disabled to True under the google engine. Ex. yml - name : google engine : google shortcut : go disabled : True You can also completely delete it (so it doesn't show up at all) by just removing this code block from your settings.yml.

Thanks a ton for this!

Will write this down in my org doc notes, thanks!

@RennisDitchie
Copy link

RennisDitchie commented Apr 24, 2021

After disabling Google, I'm now running into the "Sorry! we didn't find any results. Please use another query or search in more categories." error more often.

However, I will note this within the relevant GitHub issue instead:
#2053

Thank you again to both @BBaoVanC @Pofilo and @unixfox for providing great insight on this issue within this GitHub issue.

I know I might have seemed a bit harsh in my responses before, and trust me, I'm just not as big brained to create an entire Python based project as Searx on my own, so I just wanted to say thanks for the help on this one.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests