Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Google Trends API] Data for searches from 2022 is missing #561

Closed
marm123 opened this issue Jan 10, 2023 · 20 comments
Closed

[Google Trends API] Data for searches from 2022 is missing #561

marm123 opened this issue Jan 10, 2023 · 20 comments
Labels
status: queued Ready to work on

Comments

@marm123
Copy link

marm123 commented Jan 10, 2023

It appears that some searches are missing the whole 2022 data. I noticed that Google indicates, on their Trends search, that they change the data collection. I'm not sure if this is relevant, as some searches do return data from 2022, while others do not.

Search with missing 2022 data
image

Information from Google Trends search
image

Playground | Inspect

@marm123 marm123 added the status: queued Ready to work on label Jan 10, 2023
@marm123
Copy link
Author

marm123 commented Jan 10, 2023

Might be related to #300

@marm123
Copy link
Author

marm123 commented Jan 11, 2023

It appears that Google changed requests for its Trends page, making some Python libraries, like pytrends, unreliable. I'm unsure if this is relevant or related to this issue, but it's worth mentioning. Link to the Intercom thread and reported issue on pytrends GitHub repository are below:

Intercom
pytrends GitHub issue

@aliayar
Copy link

aliayar commented Jan 11, 2023

This post from pytrends issue suggests that we may have been affected as well:

I opened an issue already for this a few weeks ago. After doing some digging, it seems Google has changed their API and is now creating "holes" in the data for scraped info.

It is also happening on large keyword tools such as Keywords Everywhere

There is now a new user in the headers, one called 'USER_TYPE_LEGIT_USER' and the other 'USER_TYPE_SCRAPER'
The scraper user has the "holes" while the legit user doesn't.

@nicktba
Copy link

nicktba commented Jan 11, 2023

Im the poster in the PyTrends issue ^

Please let me know if you're able to find a resolution to this.

@aciddjus
Copy link

aciddjus commented Jan 11, 2023

We are marked with USER_TYPE_SCRAPER:

image

@nicktba
Copy link

nicktba commented Jan 11, 2023

HTTP Error 401 Unauthorized indicates that the request lacks valid authentication credentials for the target resource.

You have to get the 'USER_TYPE_LEGIT_USER' token. Its not just replacing the userConfig

Im not sure how to do that without borrowing it from the browser

@aliayar
Copy link

aliayar commented Jan 12, 2023

I am curious if Google employs this technique with their other services.

@ilyazub
Copy link

ilyazub commented Jan 17, 2023

Related question on StackOverflow: https://stackoverflow.com/q/73988220/1291371

Related issues in g-trends repository: x-fran/g-trends#54

@ilyazub
Copy link

ilyazub commented Jan 17, 2023

Google Trends in the browser submits a request with reCAPTCHA token. This can be a reason we get USER_TYPE_SCRAPER instead of USER_TYPE_LEGIT_USER.

image

curl 'https://trends.google.com/trends/api/explore?hl=en-US&tz=-120&req=%7B%22comparisonItem%22:%5B%7B%22keyword%22:%22filter%22,%22geo%22:%22%22,%22time%22:%22now+4-H%22%7D%5D,%22category%22:0,%22property%22:%22%22%7D&tz=-120' \
  -H 'authority: trends.google.com' \
  -H 'accept: application/json, text/plain, */*' \
  -H 'accept-language: en-US,en;q=0.6' \
  -H 'content-type: application/json;charset=UTF-8' \
  -H 'cookie: NID=511=XFlLBiX63uT_Z3MtZZcDi_qaxDIpYgCnUfralfFn4HMFnavFOeuwOejdfB_eQb8-awg_jwYANnr7dGFtK4830aAEsP6Z-cl0YxRY6L1-_Z6V2nw90m4i1VN-FpCQEWtusomgE0WfPOilk7k95hSxJbwnsdMCxVqguJLBIiIMuok' \
  -H 'origin: https://trends.google.com' \
  -H 'referer: https://trends.google.com/trends/explore?date=now%204-H&q=filter' \
  -H 'sec-fetch-dest: empty' \
  -H 'sec-fetch-mode: cors' \
  -H 'sec-fetch-site: same-origin' \
  -H 'sec-gpc: 1' \
  -H 'user-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36' \
  --data-raw '' \
  --compressed

What if we just hardcode cookie for the USER_TYPE_LEGIT_USER?

@aciddjus
Copy link

Google Trends in the browser submits a request with reCAPTCHA token. This can be a reason we get USER_TYPE_SCRAPER instead of USER_TYPE_LEGIT_USER.

image

curl 'https://trends.google.com/trends/api/explore?hl=en-US&tz=-120&req=%7B%22comparisonItem%22:%5B%7B%22keyword%22:%22filter%22,%22geo%22:%22%22,%22time%22:%22now+4-H%22%7D%5D,%22category%22:0,%22property%22:%22%22%7D&tz=-120' \
  -H 'authority: trends.google.com' \
  -H 'accept: application/json, text/plain, */*' \
  -H 'accept-language: en-US,en;q=0.6' \
  -H 'content-type: application/json;charset=UTF-8' \
  -H 'cookie: NID=511=XFlLBiX63uT_Z3MtZZcDi_qaxDIpYgCnUfralfFn4HMFnavFOeuwOejdfB_eQb8-awg_jwYANnr7dGFtK4830aAEsP6Z-cl0YxRY6L1-_Z6V2nw90m4i1VN-FpCQEWtusomgE0WfPOilk7k95hSxJbwnsdMCxVqguJLBIiIMuok' \
  -H 'origin: https://trends.google.com' \
  -H 'referer: https://trends.google.com/trends/explore?date=now%204-H&q=filter' \
  -H 'sec-fetch-dest: empty' \
  -H 'sec-fetch-mode: cors' \
  -H 'sec-fetch-site: same-origin' \
  -H 'sec-gpc: 1' \
  -H 'user-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36' \
  --data-raw '' \
  --compressed

What if we just hardcode cookie for the USER_TYPE_LEGIT_USER?

I tried doing this. But it will still return USER_TYPE_SCRAPER after a few requests.

image

@ilyazub
Copy link

ilyazub commented Jan 17, 2023

The request for cookie is expected to be a POST request now.

image

With the regular cURL, it's still returns USER_TYPE_SCRAPER. But with curl-impersonate, Google Trends responds with USER_TYPE_LEGIT_USER.

image

Command:

curl_ff98 'https://trends.google.com/trends/api/explore?hl=en-US&tz=-120&req=%7B%22comparisonItem%22:%5B%7B%22keyword%22:%22snowboard%22,%22geo%22:%22%22,%22time%22:%22today+12-m%22%7D%5D,%22category%22:0,%22property%22:%22%22%7D&tz=-120' \
  -H 'authority: trends.google.com' \
  -H 'accept: application/json, text/plain, */*' \
  -H 'content-type: application/json;charset=UTF-8' \
  -H 'cookie: NID=511=FJ4YkcBxxIqov2FykB9Bk59PRArkpNvtsUNt9YnMMQMjZ8_IVOILVqRP0CTaQbHav5UZ0XTeCbDpK8PA9niYtdiPlP8eNcB5pej0fp9gJq99jfFvzlB_dV74utZN-V2X_riUioiBhfwPdz16HbtA2Soxiu10lHPGdNlE__BYgoI' \
  -H 'origin: https://trends.google.com' \
  -H 'referer: https://trends.google.com/trends/explore?q=snowboard' \
  --data-raw '' \
  --compressed

@jbnitorum
Copy link

@ilyazub is curl-impersonate going to be integrated into the product to make this work? I just tried in the API playground and I see the same issue is still persisting.

image

@ritu1337
Copy link

ritu1337 commented Jan 31, 2023

With the regular cURL, it's still returns USER_TYPE_SCRAPER. But with curl-impersonate, Google Trends responds with USER_TYPE_LEGIT_USER.

@ilyazub I get "userType":"USER_TYPE_LEGIT_USER" with regular curl

curl --version
curl 7.84.0 (x86_64-apple-darwin22.0) libcurl/7.84.0 (SecureTransport) LibreSSL/3.3.6 zlib/1.2.11 nghttp2/1.47.0
Release-Date: 2022-06-27

I think we can reuse data in POST request to get USER_TYPE_LEGIT_USER, but on subsequent request with the same data you get USER_TYPE_SCRAPER. But if you wait a bit between requests with the same data then you get USER_TYPE_LEGIT_USER.

data FEWyJzZ... looks like a Recaptcha token.

@ilyazub
Copy link

ilyazub commented Feb 21, 2023

data FEWyJzZ... looks like a Recaptcha token.

Yes, it's the Invisible reCAPTCHA token.

@ilyazub is curl-impersonate going to be integrated into the product to make this work? I just tried in the API playground and I see the same issue is still persisting.

@jbnitorum We will fix it but don't have a timeline for the fix. Thank you for your patience and understanding.

@emptymalei
Copy link

emptymalei commented Aug 2, 2023

Hi @ilyazub , thanks a lot for looking into this. I am wondering if there is any update on this topic. Thanks.


Update:
Please ignore. I realized that there is a more recent update here
#887

@ilyazub
Copy link

ilyazub commented Sep 13, 2023

The data from Google Trends and SerpApi seem to match.

Google Trends SerpApi
image image
https://trends.google.com/trends/explore?date=2017-01-01%202023-09-13&geo=DK&q=vink%C3%B8leskab&hl=en&tz=420 https://serpapi.com/playground?engine=google_trends&q=vink%C3%B8leskab&geo=DK&tz=420&date=2017-01-01+2023-09-13

@Helldez
Copy link

Helldez commented Nov 28, 2023

Any suggestions on how to fix the problem on the code?

@ilyazub
Copy link

ilyazub commented Nov 29, 2023

@Helldez I'm sorry to hear that you faced an issue with our Google Trends API. May you please share a search ID or the parameters you used that caused the problem?

@Helldez
Copy link

Helldez commented Dec 3, 2023

I'm not a serpapi user I'm helping rewrite pytrends. So I wanted to ask you how you solved the recaptcha problem if possible otherwise thanks anyway

@tanys123
Copy link

tanys123 commented Feb 28, 2024

This should be fixed after #1143 as we are getting USER_TYPE_LEGIT_USER for our requests.

Screenshot 2024-02-28 at 3 44 25 PM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: queued Ready to work on
Projects
None yet
Development

No branches or pull requests

10 participants