-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Google Captcha #729
Comments
If you can, use a browser using the same IP than your server : google will throw some captchas at first, and will usually stop after few requests. Not sure, if you need to check if your IP is blacklisted ( how to check : http://www.dnsbl.info/dnsbl-database-check.php not sure either that this is right service to use) |
Hello, I am not listed on any blacklist except on spamcannibal: I need to check how i can get rid of this message. But im not sure if it has something to do with the google problem. |
It is so wired. Yesterday evening it worked after i enterd the captchas. This morning it didnt work. Now it works but i didnt enter any captchas since yesterday... |
Google is dead for me (as startpage). |
I need google. But its working sometimes and sometimes not. I even tried to analyze with google webmaster tools... Right now its working - 2morrow maybe not. |
Seems that google sets GOOGLE_ABUSE_EXEMPTION cookie after captcha. And this cookie has a timeout date (sometimes). If there would be some proxying mechanism for the captcha for re-setting the cookie would be nice |
what about forwarding the captcha to the frontend and save the cookie in the backend then? |
I think that would be the best solution. |
I wonder how searx.me(and other searx admin) handle google captcha. |
I am having the same issue. It doesn't help if I solve the CAPTCHAs from a browser as the |
@gszathmari What do you mean? What is |
This is the cookie what Google sets if you solve the CAPTCHA.
So even though I solve the CAPTCHA from the same IP address as my searx instance, searx still fails to retrieve the results from Google. This is because searx is not sending in the `GOOGLE_ABUSE_EXEMPTION` cookie with the queries, so it is still presented by the CATPCHA. However, my browser is not asked to solve the CAPTCHA because it is sending in the `GOOGLE_ABUSE_EXEMPTION` cookies with the subsequent queries.
Hope this clarified up a bit
|
Thats exactly the point. cy8aer also pointed in the right direction:
I also tried to admin my domain with google webmaster tools. So that goole doenst ask for captcha anymore. It didnt help. |
Yeah but I did not know the name of the cookie 'GOOGLE_ABUSE_EXEMPTION'. Thanks to gszathmari |
Engines cannot retrieve results: every day or almost google blocks me... it's getting unmanageable for me. Anthony |
i recently updated to 0.13.1 and get the same error. went back to 0.12 and get google results. |
hmmm. I am on 0.12.0 and dont get google results. Maybe i should try to update xD. |
@zwnk have you tried the latest master? there were a few changes since the last release which fixed known google issues. |
i pulled today and had the google captcha error. |
This is not the first person asking for the problem with google. |
@Pofilo we can release a new version (0.13.2), but as I see these fixes don't solve the problem permanently for everybody. |
I was on 12.0 and it just worked for 3 days now. Today in the morning it didnt work again. No results from googel. So i did an update to 13.1. Now it works again. Changed nothing else. Lets see how long it works. |
same problem here with 0.13.1:
|
Hi, I just updated https://searx.aquilenet.fr to 13.1 and I still got the same issue as Dominion0815 |
issue corrected, my instance was bugged by bots. Corrected with filtron: |
Interesting, I will install filtron and give a feedback ! |
I'm using filtron but still get the captcha. Any tips maybe on configuring filtron or something? |
@unixfox: great explained! I think your response is the ultimate answer of this issue / can we now close this issue with it? |
No we won't close this issue because I have a feeling that closing it will create duplicate of it because people won't find this issue and thus think of creating a new one. Technically we still have some work on our side for combatting this issue, an example is #2439 |
Agree with the logic in keeping this ticket open. It's how I found it before otherwise creating a new ticket. I have switched to Unixfox's instance as first preference (though oddly, language has to be set to English UK rather than US for search results to find anything -- seen that on other instances too). |
Not speaking from experience because I have none. But is there a way to proxy the Google captcha and show it to the client to respond. Or is that impossible? Don't think making Searx look more like a real browser would solve anything because it's the amount of requests from one ip that causes Google to ask for captcha. But maybe having more ips and have searx spread the requests over multiple ips as well as blocking bot requests could lessen the issue a bit. |
This has been brought up many times and no it's impossible because a recaptcha can only be loaded from the same domain, in this case www.google.com
Please open an issue if you can replicate the bug on other instances and even in a local Searx. |
I've used this: text proxy for years. The gentleman moved to a different hosting provider and since then Google's baulked with its graphical Captcha request... Agreed, too many requests from one IP is a reasonable assumption but how many is too many and how far back in time are they going; the last 5 minutes or the last 5 years? [will do] |
I am trying to self host a searx instance on a VPS with nginx. I have used these filtron rules.json. I have also set up morty. The instance is working correctly but when I try to use the google search engine it gives me |
You could use some proxies but unfortunately apart from that Searx can't do much about this issue. |
Could you give me some resources about how I can configure these proxies? |
Add them into the settings.yaml here: https://github.com/searx/searx/blob/master/searx/settings.yml#L78 |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Hi, I've been using searx for a long time and love it, I just updated to v1.0. I don't think this is unique to v1.0 but I keep getting this error for google that many others get: Engines cannot retrieve results: google (too many requests) I understand that means too many requests from one IP but what I don't understand is that both my self-hosted searx server and my computer are on the same network in my house and hence use the same external IP address...but when I go to google in my browser it searches fine, no CAPTCHA, or too many requests error. So why would I be getting a too many requests error from google when using searx, but not my browser, when coming from the same exact IP? Just doesn't make sense to me. Thank you for everyone's work on searx! |
Please see #729 (comment) and #729 (comment). |
@unixfox In this case, how would you go about DISABLING Google from a Searx instance, and just use DuckDuckGo instead? I ask because I am running into the same exact "Engines cannot retrieve results: google (too many requests)" issue as everyone else since I host it on a Debian 10 VPS droplet on Digital Ocean. I was willing to bet that maybe I did too many search requests, but it'll often give this error on literally the first search attempt if I include too many words in the query itself. Anyway, would appreciate if you could link me to a specific section of the guide on how to do this, or another issue as I just want to nuke Google from my Searx config to just get rid of the issue entirely as I can't even use the stupid search engine as a result of the error. Thanks again. |
On 04/23/21 10:19PM, RennisDitchie wrote:
@unixfox
Hey there, thanks for the post above which explains that the overarching issue isn't anything to deal with Searx directly, but in that Google treats Searx as a bot (go figure, botnet search engine doesn't like other bots...)
In this case, how would you go about DISABLING Google from a Searx instance, and just use DuckDuckGo instead?
I ask because I am running into the same exact "Engines cannot retrieve results: google (too many requests)" issue as everyone else since I host it on a Debian 10 VPS droplet on Digital Ocean.
I was willing to bet that maybe I did too many search requests, but it'll often give this error on literally the first search attempt if I include too many words in the query itself.
Anyway, would appreciate if you could link me to a specific section of the guide on how to do this, or another issue as I just want to nuke Google from my Searx config to just get rid of the issue entirely as I can't even use the stupid search engine as a result of the error.
Thanks again.
--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#729 (comment)
Go to your settings.yml and set `disabled` to `True` under the `google`
engine.
Ex.
```yml
- name : google
engine : google
shortcut : go
disabled : True
```
You can also completely delete it (so it doesn't show up at all) by just removing this code block from your settings.yml.
|
Well, you could be more respectful from something you can use/modify freely right ? However, this Google issue is something we can't really resolve. |
I am thankful, but when I first deployed Searx a year ago just months after Luke Smith's video on the same topic, it just didn't work period, plus the installation guide was a mess. Yes, the deployment process has improved, and yes, I can solve my issues by "using the docker image". Don't confuse my frustration for the fact that I think your guys' project is awesome. I'm actually more mad at Google itself for blocking it in the first place. Bear with me on that one since I'm just debating how to fix the issue itself. I thought it was my newbness in terms of not following the instructions to a T, but am kind of glad its really just an issue that will most likely never be truly solved since Google will always try to circumvent users from not directly accessing their website. Anyway, just wanted to say thanks on that regard. The only other issue I'm running into is that on basic queries like "text outline css", I get this error as well: I'm assuming I have to check the error log to see why this is occurring? |
Thanks a ton for this! Will write this down in my org doc notes, thanks! |
After disabling Google, I'm now running into the "Sorry! we didn't find any results. Please use another query or search in more categories." error more often. However, I will note this within the relevant GitHub issue instead: Thank you again to both @BBaoVanC @Pofilo and @unixfox for providing great insight on this issue within this GitHub issue. I know I might have seemed a bit harsh in my responses before, and trust me, I'm just not as big brained to create an entire Python based project as Searx on my own, so I just wanted to say thanks for the help on this one. |
Hey,
i have the problem that google gives me captchas. Therefor i cant see google results. My Server is not blacklistet anywhere. So im wondering what i can do to get rid of the captchas. My instance is not hosted under https. Is that maybe a reason for the google captchas?
Thanks!
The text was updated successfully, but these errors were encountered: