Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

It's possible to get Google results even when getting blocked by Google Recaptcha. #159

Closed
unixfox opened this issue Jun 20, 2021 · 5 comments

Comments

@unixfox
Copy link
Member

unixfox commented Jun 20, 2021

On the mobile UI of Google Search, the button More results is not affected by Google rate limiting and I can still do requests while I'm actively blocked by the original Google search. You can even fetch the results from the first page by modifying the request.

The results are given in strange raw data, but you can extract the needed HTML code, which seems to be the same code as the one from Google search if you fake the user agent to a desktop browser.

RAW response data: https://gist.github.com/unixfox/0cb7eebd3add42bfcbf42ea29a063b89#file-raw-txt

HTML manually parsed from the RAW response data: https://gist.github.com/unixfox/0cb7eebd3add42bfcbf42ea29a063b89#file-beautifier-html

Here is how to do requests:

  • URL: https://www.google.com/search?vet=12ahUKEwjE4O6xoajxAhWL_KQKHVCLBKoQxK8CegQIAhAG..i&ved=2ahUKEwjE4O6xoajxAhWL_KQKHVCLBKoQqq4CegQIAhAI&yv=3&q=test&prmd=vmin&ei=c0fQYITbBIv5kwXQlpLQCg&start=0&sa=N&asearch=arc&async=arc_id:srp_510,ffilt:all,ve_name:MoreResultsContainer,next_id:srp_5,use_ac:true,_id:arc-srp_510,_pms:qs,_fmt:pc
  • Headers:
    • user-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0
    • accept: */*
    • sec-fetch-site: same-origin
    • sec-fetch-mode: cors
    • sec-fetch-dest: empty
    • referer: https://www.google.com/
    • accept-encoding: gzip, deflate
    • accept-language: en-US,en;q=0.9

The query parameters are very similar to the ones from the original Google search. Not all query parameters are required, and some can be omitted. That's also the case of the headers.

@return42 Any idea how we could implement that in Searx?

@unixfox
Copy link
Member Author

unixfox commented Jun 21, 2021

@return42 I just updated my comment with a simpler way of extracting the data. Please have a look at it again.

@return42
Copy link
Member

Thanks!! .. I will have a look, but give me some time :-)

@unixfox
Copy link
Member Author

unixfox commented Jun 21, 2021

Well I tried myself and this just need a couple of modified lines, and it works! Here is the patch:

diff --git a/searx/engines/google.py b/searx/engines/google.py
index 841212e0..ae8e6ab5 100644
--- a/searx/engines/google.py
+++ b/searx/engines/google.py
@@ -273,6 +273,8 @@ def request(query, params):
         'ie': "utf8",
         'oe': "utf8",
         'start': offset,
+        'asearch': "arc",
+        'async': "arc_id:srp_510,ffilt:all,ve_name:MoreResultsContainer,next_id:srp_5,use_ac:true,_id:arc-srp_510,_pms:qs,_fmt:pc"
     })
 
     if params['time_range'] in time_range_dict:
@@ -282,9 +284,7 @@ def request(query, params):
     params['url'] = query_url
 
     params['headers'].update(lang_info['headers'])
-    params['headers']['Accept'] = (
-        'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8'
-    )
+    params['headers']['Accept'] = ('*/*')
 
     return params

The issue is that Searx can't find the number of results. This thing: image

dalf added a commit that referenced this issue Jun 21, 2021
dalf added a commit that referenced this issue Jun 21, 2021
disable by default, it has to be enabled in settings.yml

related to  #159
@unixfox unixfox closed this as completed Jun 21, 2021
dalf referenced this issue in dalf/searxng Jun 22, 2021
disable by default, it has to be enabled in settings.yml

related to  #159
MarcAbonce pushed a commit to MarcAbonce/searxng that referenced this issue Sep 23, 2021
disable by default, it has to be enabled in settings.yml

related to  searxng#159
@SecureCPU
Copy link

How can I implement this in whoogle? I just got rate limited.

@unixfox
Copy link
Member Author

unixfox commented Jun 29, 2022

How can I implement this in whoogle? I just got rate limited.

You are on the searxng repository not whoogle, please open an issue on the correct project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants