# Google Trends API Data
## - Introduction:

#### Primary Objective:
The main goal of collecting Google Trends data for Saudi Arabia is to analyze the interest in social media and mental health across the country over specific time periods between 2020 and 2024. This analysis will help to understand how interest in social media evolves and its potential impact on mental health in Saudi Arabia over time.

#### Secondary Objectives:

##### - Trend Analysis Over Time:
We aim to determine whether there is an increase or decrease in interest in social media and mental health over the years in Saudi Arabia. This will help identify any emerging patterns or changes in public concern related to these topics.

##### - Drawing Conclusions About the Potential Impact of Social Media:
Based on the data, we will be able to form preliminary conclusions about the relationship between increased social media usage and the level of interest in mental health in the country. This may offer insights into how digital engagement correlates with mental health awareness or concern within the Saudi cultural context.

## - Source of Dataset:
The dataset was sourced from Google Trends, accessible via the following link: [Google Trends.](https://trends.google.com/trends/)

The data was retrieved programmatically using the 'pytrends' library, which provides a Python interface to interact with the Google Trends API. 

## - Attributes’ description table:

| Column Name                      | Description                                                                 | Data Type   | Possible Values                         |
|:----------------------------------|:--------------------------------------------------------------------------|:-----------:|:----------------------------------------:|
| `date`                            | Represents the year associated with the data entry.                        | Object     | Continuous dates (e.g., '2020-01-01')      |
| `Year`                            | Indicates the year of the data entry.                                      | Numeric     | Integer values (e.g., 2020, 2021)       |
| `Country`                         | Represents the country for which the data is collected.                    | Categorical | Country codes (e.g., 'SA')              |
| `mental health`                   | Represents the overall interest in mental health for the year and country. | Numeric     | Continuous numeric values (e.g., 2014)  |
| `social media`                    | Represents the overall interest in social media for the year and country.  | Numeric     | Continuous numeric values (e.g., 4123)  |
| `Facebook`                        | Represents the interest in Facebook for the year and country.              | Numeric     | Continuous numeric values (e.g., 3947)  |
| `Instagram`                       | Represents the interest in Instagram for the year and country.             | Numeric     | Continuous numeric values (e.g., 4307)  |
| `LinkedIn`                        | Represents the interest in LinkedIn for the year and country.              | Numeric     | Continuous numeric values (e.g., 4010)  |
| `Snapchat`                        | Represents the interest in Snapchat for the year and country.              | Numeric     | Continuous numeric values (e.g., 4104)  |
| `TikTok`                          | Represents the interest in TikTok for the year and country.                | Numeric     | Continuous numeric values (e.g., 3096)  |
| `WhatsApp`                        | Represents the interest in WhatsApp for the year and country.              | Numeric     | Continuous numeric values (e.g., 4299)  |
| `YouTube`                         | Represents the interest in YouTube for the year and country.               | Numeric     | Continuous numeric values (e.g., 4312)  |
| `Twitter(X)`                      | Represents the interest in the rebranded Twitter platform "X."            | Numeric     | Continuous numeric values (e.g., 3927)  |
| `addiction and social media`      | Represents the interest in addiction-related queries for social media.     | Numeric     | Continuous numeric values     |
| `depression and social media`     | Represents the interest in depression-related queries for social media.    | Numeric     | Continuous numeric values      |
| `insomnia and social media`       | Represents the interest in insomnia-related queries for social media.      | Numeric     | Continuous numeric values     |
| `isPartial`                       | Indicates whether the data for a particular year is complete or partial.   | Numeric     | 0 = complete, non-zero = partial (e.g., 11) |


## - Steps:

#### 1. Installation
Before running the script to collect data from Google Trends, the necessary Python library, pytrends, had to be installed. This was done using the command-line interface (CMD) on the computer. To install the pytrends library, the following command was executed:

In [None]:
pip install pytrends

This command utilizes Python’s package manager, pip, to download and install the pytrends library. The CMD (Command Prompt) was used to ensure that the library was available in the environment, allowing the script to make requests to Google Trends and retrieve the required data.

Once the installation was successfully completed, the script was ready to run without any issues related to missing dependencies.

#### 2. Importing Necessary Libraries

In [22]:
from pytrends.request import TrendReq
import pandas as pd
import time
import random
from pytrends.exceptions import ResponseError

In this code, we imported the necessary libraries and tools for working with Google Trends and data. Here's a breakdown of what we did:

1. from pytrends.request import TrendReq:
We imported TrendReq from pytrends, allowing us to access Google Trends data. This is the main object used to send search queries to Google Trends.
2. import pandas as pd:
We imported Pandas, a popular data analysis library, to organize and manipulate the data using DataFrames for easy analysis.
3. import time:
We imported time to manage time-based functions like delaying execution between requests to avoid overwhelming Google with too many queries at once.
4. import random:
We imported random to introduce random delays between requests, ensuring that our requests appear natural and avoid triggering any rate limits.
5. from pytrends.exceptions import ResponseError:
We imported ResponseError to handle potential errors from Google Trends, allowing us to retry requests or manage failures gracefully.

#### 3. Setting up Google Trends Request

In [7]:
pytrends = TrendReq(hl='en-US', tz=360, timeout=(10, 25))

we are initializing a TrendReq object from the pytrends library with some specific parameters:

- hl='en-US':

This sets the language for the Google Trends results to English (United States). The hl parameter stands for "host language."
- tz=360:

This sets the time zone to UTC+6 hours (360 minutes). Time zones are specified in minutes from UTC. For example, 0 is UTC, and 360 corresponds to UTC+6.
- timeout=(10, 25):

This sets a timeout for the connection. It consists of two values:
10 seconds for the connection to be established.
25 seconds for reading the data once connected.

These timeouts help ensure that the code doesn't hang indefinitely if there's a slow response from Google Trends.


By creating this pytrends object, we are now ready to send search queries to Google Trends with the specified language, time zone, and connection settings.

#### 4. Defining Keyword Groups

In [29]:
keywords_group1 = [
    "social media",
    "mental health",
    "depression and social media",
    "anxiety and social media"
]

keywords_group2 = [
    "insomnia and social media",
    "stress and social media",
    "addiction and social media",
]

keywords_group3 = [
    "Instagram",
    "Twitter",
    "Platform X",
    "Facebook",
]

keywords_group4 = [
    "Snapchat",
    "TikTok",
    "LinkedIn",
    "YouTube",
    "WhatsApp"
]


This section of the code organizes the keywords into four distinct groups. These keywords will be used for Google Trends data analysis to track public interest in social media platforms and their potential impact on mental health. 
<br>
The keywords were divided into two groups for several practical reasons:
<br>
- **Reduce API request load**: 
By splitting the keywords into smaller groups, we can perform multiple API calls without overloading the system, ensuring smooth data retrieval.

- **Data Accuracy**: 
By splitting the terms, we can ensure more accurate and focused data collection. Querying too many terms at once can dilute the relevance of the data, especially when comparing distinct platforms or topics.

This separation makes it easier to conduct a detailed analysis of trends for both social media platforms and mental health topics without overwhelming the system.







#### 5. Specifying Time Ranges for Each Year

In [24]:
years = {
    '2020': '2020-01-01 2020-12-31',
    '2021': '2021-01-01 2021-12-31',
    '2022': '2022-01-01 2022-12-31',
    '2023': '2023-01-01 2023-12-31',
    '2024': '2024-01-01 2024-12-31'
}

The dictionary years specifies the time range for each year from 2020 to 2024 because the Google Trends API does not allow for direct yearly data retrieval. Instead, the API requires precise date ranges that include specific months and days. By defining each year with exact start and end dates (from January 1st to December 31st), we can simulate yearly data collection.

This step was necessary because the API only accepts data requests with monthly and daily granularity, not by year. Therefore, creating these specific date ranges ensures that we can retrieve data for an entire year without gaps.

#### 7. Function to Fetch Google Trends Data 

In [30]:
countries = ['SA']
def fetch_data_with_retry(country, keywords, year, timeframe, retries=3):
    for attempt in range(retries):
        try:
            pytrends.build_payload(keywords, cat=0, timeframe=timeframe, geo=country, gprop='')
            data = pytrends.interest_over_time()
            return data
        except ResponseError as e:
            if "429" in str(e):
                print(f"Rate limit exceeded for {country}, retrying... ({attempt + 1}/{retries})")
                time.sleep(60 * (attempt + 1))  
            else:
                print(f"Other error occurred: {e}")
                break
        except Exception as e:
            print(f"Error: {e}")
            break
    return pd.DataFrame() 

In this function, `fetch_data_with_retry`, we're implementing a method to retrieve Google Trends data with a built-in retry mechanism in case errors occur, particularly when the rate limit is exceeded. Here's a summary of the function:

- **Parameters**:
  - `country`: The country code for which we want to retrieve data.
  - `keywords`: A list of keywords to search for (e.g., ['social media', 'mental health']).
  - `year`: The year of interest (used to define the timeframe).
  - `timeframe`: The period for the query (e.g., '2020-01-01 2021-01-01').
  - `retries`: The number of attempts to retry fetching the data if the request fails (default is 3 retries).

- **Function Workflow**:
  - **for attempt in range(retries)**:
    This loop allows the function to try retrieving the data up to the specified number of retries if an error occurs.
  - **pytrends.build_payload**:
    This is where the search query is sent to Google Trends, using the given keywords, country, and timeframe.
  - **pytrends.interest_over_time()**:
    Once the payload is built, this function retrieves the "interest over time" data from Google Trends. If successful, the data is returned.
  - **except ResponseError**:
    This block specifically checks if the error is related to exceeding the rate limit (error code 429). If such an error occurs, a message is printed, and the function waits before retrying. The delay increases with each retry attempt.
  - **except Exception**:
    Any other errors (not related to rate limits) are caught here, printed, and the process stops.
  - **return pd.DataFrame()**:
    If all retry attempts fail, or another error occurs, the function returns an empty DataFrame.

- **Purpose**:
  The main goal of this function is to ensure that data retrieval continues even when Google Trends imposes rate limits (error 429), by retrying the request with increasing wait times. It also handles other errors gracefully, making the overall process of fetching data more reliable, especially when working with frequent or large queries.

#### 8. Function to fetch data for a country

In [31]:

def fetch_data_for_country(country, keywords_group):
    country_data = pd.DataFrame()
    for year, timeframe in years.items():
        for keywords in keywords_group:
            print(f"Fetching data for {year} with keywords '{keywords}' in {country}...")
            data = fetch_data_with_retry(country, [keywords], year, timeframe)
            if not data.empty:
                data['Year'] = year
                data['Country'] = country
                country_data = pd.concat([country_data, data], axis=0)
            else:
                print(f"No data available for {year} with keywords '{keywords}' in {country}")
            time.sleep(random.uniform(5, 15))  
    return country_data

The `fetch_data_for_country` function is designed to retrieve Google Trends data for a specified country and a group of keywords over a series of years. It combines the results into a single DataFrame for easy analysis.

### Parameters
- **country**: A string representing the country code (e.g., 'SA' for Saudi Arabia) from which to fetch data.
- **keywords_group**: A list of groups of keywords (e.g., social media and mental health terms) to search for in Google Trends.

### Function Workflow
1. **Initialization**:
   - `country_data`: An empty DataFrame to hold the collected data for the specified country.

2. **Iterating Over Years**:
   - The function iterates through the `years` dictionary.
   - For each year and its corresponding timeframe, the function processes each group of keywords.

3. **Data Fetching**:
   - Inside the nested loop, the function prints a message indicating the year, keywords, and country for which data is being fetched.
   - It calls the `fetch_data_with_retry` function to retrieve the Google Trends data. If the data is successfully fetched and not empty:
     - It adds the current year and country as new columns to the data.
     - The new data is concatenated to the `country_data` DataFrame.
   - If no data is available for the specified year and keywords, it prints a message to inform the user.

4. **Random Sleep**:
   - After each data retrieval attempt, the function waits for a random amount of time (between 5 to 15 seconds) to avoid hitting the Google Trends rate limit.

5. **Return Value**:
   - Finally, the function returns the `country_data` DataFrame containing all the fetched data for the specified country.

#### 9. Fetch data from all keyword groups

In [32]:

all_data = pd.DataFrame()
all_data = pd.concat([
    fetch_data_for_country(countries[0], keywords_group1),
    fetch_data_for_country(countries[0], keywords_group2),
    fetch_data_for_country(countries[0], keywords_group3),
    fetch_data_for_country(countries[0], keywords_group4)
], ignore_index=True)


Fetching data for 2020 with keywords 'social media' in SA...
Fetching data for 2020 with keywords 'mental health' in SA...
Fetching data for 2020 with keywords 'depression and social media' in SA...
No data available for 2020 with keywords 'depression and social media' in SA
Fetching data for 2020 with keywords 'anxiety and social media' in SA...
No data available for 2020 with keywords 'anxiety and social media' in SA
Fetching data for 2021 with keywords 'social media' in SA...
Fetching data for 2021 with keywords 'mental health' in SA...
Fetching data for 2021 with keywords 'depression and social media' in SA...
Fetching data for 2021 with keywords 'anxiety and social media' in SA...
No data available for 2021 with keywords 'anxiety and social media' in SA
Fetching data for 2022 with keywords 'social media' in SA...
Fetching data for 2022 with keywords 'mental health' in SA...
Fetching data for 2022 with keywords 'depression and social media' in SA...
Fetching data for 2022 with keyw

  df = df.fillna(False)


Fetching data for 2024 with keywords 'mental health' in SA...


  df = df.fillna(False)


Fetching data for 2024 with keywords 'depression and social media' in SA...
No data available for 2024 with keywords 'depression and social media' in SA
Fetching data for 2024 with keywords 'anxiety and social media' in SA...
No data available for 2024 with keywords 'anxiety and social media' in SA
Fetching data for 2020 with keywords 'insomnia and social media' in SA...
No data available for 2020 with keywords 'insomnia and social media' in SA
Fetching data for 2020 with keywords 'stress and social media' in SA...
No data available for 2020 with keywords 'stress and social media' in SA
Fetching data for 2020 with keywords 'addiction and social media' in SA...
No data available for 2020 with keywords 'addiction and social media' in SA
Fetching data for 2021 with keywords 'insomnia and social media' in SA...
Fetching data for 2021 with keywords 'stress and social media' in SA...
No data available for 2021 with keywords 'stress and social media' in SA
Fetching data for 2021 with keywords

  df = df.fillna(False)


Fetching data for 2024 with keywords 'Twitter' in SA...


  df = df.fillna(False)


Fetching data for 2024 with keywords 'Platform X' in SA...


  df = df.fillna(False)


Fetching data for 2024 with keywords 'Facebook' in SA...


  df = df.fillna(False)


Fetching data for 2020 with keywords 'Snapchat' in SA...
Fetching data for 2020 with keywords 'TikTok' in SA...
Fetching data for 2020 with keywords 'LinkedIn' in SA...
Fetching data for 2020 with keywords 'YouTube' in SA...
Fetching data for 2020 with keywords 'WhatsApp' in SA...
Fetching data for 2021 with keywords 'Snapchat' in SA...
Fetching data for 2021 with keywords 'TikTok' in SA...
Fetching data for 2021 with keywords 'LinkedIn' in SA...
Fetching data for 2021 with keywords 'YouTube' in SA...
Fetching data for 2021 with keywords 'WhatsApp' in SA...
Fetching data for 2022 with keywords 'Snapchat' in SA...
Fetching data for 2022 with keywords 'TikTok' in SA...
Fetching data for 2022 with keywords 'LinkedIn' in SA...
Fetching data for 2022 with keywords 'YouTube' in SA...
Fetching data for 2022 with keywords 'WhatsApp' in SA...
Fetching data for 2023 with keywords 'Snapchat' in SA...
Fetching data for 2023 with keywords 'TikTok' in SA...
Fetching data for 2023 with keywords 'Link

  df = df.fillna(False)


Fetching data for 2024 with keywords 'TikTok' in SA...


  df = df.fillna(False)


Fetching data for 2024 with keywords 'LinkedIn' in SA...


  df = df.fillna(False)


Fetching data for 2024 with keywords 'YouTube' in SA...


  df = df.fillna(False)


Fetching data for 2024 with keywords 'WhatsApp' in SA...


  df = df.fillna(False)



   - An empty DataFrame named `all_data` is created to hold the combined data from all keyword groups.
   - The `pd.concat` function is used to combine the results from four separate calls to the `fetch_data_for_country` function. Each call fetches data for a specific group of keywords defined earlier.
   - The `countries[0]` expression accesses the first country in the `countries` list (which contains only 'SA' in this context).
   - The `ignore_index=True` parameter ensures that the index of the resulting DataFrame is reset, providing a continuous index across all combined data.

#### 10. Exporting to CSV"

In [33]:

if not all_data.empty:
    yearly_data = all_data.groupby(['Year', 'Country']).sum().reset_index()
    yearly_data['date'] = yearly_data['Year']
    selected_columns = ['date', 'Year', 'Country'] + all_data.columns.difference(['Year', 'Country']).tolist()
    yearly_data = yearly_data[selected_columns]
    print(yearly_data.head())

    yearly_data.to_csv("APIGoogleTrends_data_SA.csv", index=False)
else:
    print("No data was collected.")

   date  Year Country  Facebook  Instagram  LinkedIn  Platform X  Snapchat  \
0  2020  2020      SA    3947.0     4307.0    4010.0         0.0    4104.0   
1  2021  2021      SA    3976.0     4502.0    4026.0         0.0    2059.0   
2  2022  2022      SA    4629.0     4031.0    4022.0         0.0    3129.0   
3  2023  2023      SA    4574.0     4192.0    4189.0       187.0    3441.0   
4  2024  2024      SA    2692.0     3271.0    3233.0       143.0    3483.0   

   TikTok  Twitter  WhatsApp  YouTube  addiction and social media  \
0  3096.0   3927.0    4299.0   4312.0                         0.0   
1  3187.0   4016.0    3637.0   4616.0                         0.0   
2  3709.0   3996.0    4123.0   4571.0                         0.0   
3  4181.0   3912.0    4540.0   4634.0                       100.0   
4  3062.0   3253.0    3069.0   3729.0                         0.0   

   depression and social media  insomnia and social media  isPartial  \
0                          0.0              

## - Preprocessing:

#### Merge and reorder columns

In [43]:

df = pd.read_csv('APIGoogleTrends_data_SA.csv')

df['Twitter(X)'] = df['Twitter'].fillna(0) + df['Platform X'].fillna(0)

df.drop(columns=['Twitter', 'Platform X'], inplace=True)

new_order = [
    'date', 
    'Year', 
    'Country',
    'mental health', 
    'social media',  
    'Facebook', 
    'Instagram', 
    'LinkedIn', 
    'Snapchat', 
    'TikTok', 
    'WhatsApp', 
    'YouTube',
    'Twitter(X)',
    'addiction and social media', 
    'depression and social media', 
    'insomnia and social media', 
    'isPartial'
]

df = df[new_order]

df.to_csv('APIGoogleTrends_data_SA(1).csv', index=False)

print(df)


   date  Year Country  mental health  social media  Facebook  Instagram  \
0  2020  2020      SA         2014.0        4123.0    3947.0     4307.0   
1  2021  2021      SA         2337.0        3059.0    3976.0     4502.0   
2  2022  2022      SA         2075.0        4187.0    4629.0     4031.0   
3  2023  2023      SA         2126.0        4234.0    4574.0     4192.0   
4  2024  2024      SA         2506.0        3338.0    2692.0     3271.0   

   LinkedIn  Snapchat  TikTok  WhatsApp  YouTube  Twitter(X)  \
0    4010.0    4104.0  3096.0    4299.0   4312.0      3927.0   
1    4026.0    2059.0  3187.0    3637.0   4616.0      4016.0   
2    4022.0    3129.0  3709.0    4123.0   4571.0      3996.0   
3    4189.0    3441.0  4181.0    4540.0   4634.0      4099.0   
4    3233.0    3483.0  3062.0    3069.0   3729.0      3396.0   

   addiction and social media  depression and social media  \
0                         0.0                          0.0   
1                         0.0           

## - Data Cleaning:

#### Checking for Missing Values

In [42]:

file_path = "APIGoogleTrends_data_SA(1).csv"
data = pd.read_csv(file_path)

missing_values = data.isnull().sum()

print("Number of missing values ​​in each column:")
print(missing_values)



Number of missing values ​​in each column:
mental health                  0
social media                   0
date                           0
Year                           0
Country                        0
Facebook                       0
Instagram                      0
LinkedIn                       0
Snapchat                       0
TikTok                         0
WhatsApp                       0
YouTube                        0
Twitter(X)                     0
addiction and social media     0
depression and social media    0
insomnia and social media      0
isPartial                      0
dtype: int64


The results indicate that there are no missing values in any of the columns, including 'social media', 'mental health', and various social media platforms (Instagram, Twitter, Facebook, etc.). This is due to the earlier code implementation that ensures data integrity by managing retries and handling errors when fetching data. The absence of missing values confirms that the data collection process successfully gathered complete datasets without interruptions. The strategy of handling errors, including rate limits, helped avoid missing data, ensuring that each data entry is fully populated.

#### Checking for Outliers

In [39]:
outliers = {}

for column in data.select_dtypes(include=['float64', 'int64']).columns:
    Q1 = data[column].quantile(0.25)
    Q3 = data[column].quantile(0.75)
    IQR = Q3 - Q1
    lower_bound = Q1 - 1.5 * IQR
    upper_bound = Q3 + 1.5 * IQR

    outlier_values = data[(data[column] < lower_bound) | (data[column] > upper_bound)]
    outliers[column] = outlier_values

print("\nOutliers in each column:")
for column, outlier_values in outliers.items():
    outlier_count = outlier_values.shape[0]  
    print(f"Column '{column}' has {outlier_count} outlier(s).")

    if outlier_count > 0:
        print(f"Outlier values in column '{column}':")
        print(outlier_values[column].values)  




Outliers in each column:
Column 'date' has 0 outlier(s).
Column 'Year' has 0 outlier(s).
Column 'Facebook' has 1 outlier(s).
Outlier values in column 'Facebook':
[2692.]
Column 'Instagram' has 1 outlier(s).
Outlier values in column 'Instagram':
[3271.]
Column 'LinkedIn' has 2 outlier(s).
Outlier values in column 'LinkedIn':
[4189. 3233.]
Column 'Snapchat' has 2 outlier(s).
Outlier values in column 'Snapchat':
[4104. 2059.]
Column 'TikTok' has 0 outlier(s).
Column 'WhatsApp' has 0 outlier(s).
Column 'YouTube' has 1 outlier(s).
Outlier values in column 'YouTube':
[3729.]
Column 'addiction and social media' has 1 outlier(s).
Outlier values in column 'addiction and social media':
[100.]
Column 'depression and social media' has 0 outlier(s).
Column 'insomnia and social media' has 1 outlier(s).
Outlier values in column 'insomnia and social media':
[100.]
Column 'isPartial' has 1 outlier(s).
Outlier values in column 'isPartial':
[11]
Column 'mental health' has 0 outlier(s).
Column 'social me

- **Statement on Outlier Analysis and Recommendations**:

In the analysis of the columns related to mental health and social media, we observed that the long phrases such as "insomnia and social media," "stress and social media," "addiction and social media," "depression and social media," and "anxiety and social media" did not yield valid results during the data requests.
Google Trends did not recognize these keywords accurately, as reflected by the zero search interest across most years, which is an illogical outcome. Therefore, we recommend deleting these columns from our dataset.

On the other hand, the presence of outliers in the remaining columns is significant as it indicates variations in search interest, whether increasing or decreasing. We suggest retaining these columns as they provide valuable insights into the fluctuations of public concern regarding social media and mental health.



## - Operations and Decisions:

### Collection Methods:
We employed the **pytrends** library to systematically retrieve data from Google Trends. This method enabled us to capture public interest trends related to social media and mental health across Saudi Arabia. Specifically, we used keyword searches within predefined timeframes, allowing us to gather region-specific data effectively.

### Decisions Made:

- **Keyword Grouping:**  
  To optimize data retrieval, we divided the keywords into **four distinct groups**. This was necessary to ensure the API could handle the requests efficiently and avoid errors.

- **Yearly Timeframes:**  
  Since Google Trends only allows data retrieval based on specific date ranges (down to months and days), we manually defined the yearly ranges (e.g., '2020-01-01 to 2020-12-31') to capture trends over entire years. This allowed us to compare year-on-year trends for each keyword.

- **Data Merging and Cleanup:**
  - **Combining Columns:**  
    We merged the Twitter and Platform X columns into a single column named Twitter(X) since they represent the same platform. This step was crucial to avoid redundancy and streamline our analysis.

  - **Handling Outliers:**  
    During data cleaning, we identified outliers in several columns, which indicated significant deviations in search interest. We decided to retain these outliers in our dataset as they provided insights into unusual spikes or drops in interest related to social media and mental health. Furthermore, we decided to initially remove the keywords with long phrases. This decision was made because Google Trends struggled to process these keywords accurately, leading to zero search interest for most years. This action aimed to improve the overall quality and reliability of our data analysis.

  - **Language Restriction:**  
    Due to Google Trends' inability to effectively recognize Arabic keywords, we decided to limit our analysis to English keywords only. This decision ensured that we could gather more accurate and reliable data on public interest in social media and mental health topics.

- **Column Reordering:**  
  For better organization and clarity, we reordered the columns in our final dataset to prioritize key variables.

**Conclusion:**  
The methodology and decisions outlined here provided a structured approach to analyzing Google Trends data, ensuring both accuracy and relevance in capturing public interest in social media and mental health in the selected countries. This process allowed us to derive meaningful insights while navigating the limitations of the Google Trends API.

## - Challenges:
- **Accessing Google Responses**:<br>
A significant challenge we encountered was managing multiple requests to the Google Trends API without exceeding the rate limits. To avoid server overload and ensure consistent data retrieval, we implemented a time.sleep(random.uniform(5, 15)) command to introduce a delay between each request. This step was crucial for maintaining stable API communication and avoiding request failures due to rate limits.

- **Timeframe Specification**:<br>
The Google Trends API does not allow direct yearly data retrieval and instead only supports queries based on months and days. To overcome this, we manually defined exact date ranges for each year (e.g., January 1st to December 31st). This allowed us to gather data for entire years, ensuring that the data was structured according to our yearly analysis needs. The process required aggregating monthly data into yearly summaries to align with our research objectives.

- **Keyword Grouping Limitations**:<br>
The API enforces a limit on the number of keywords that can be queried at once. This limitation led us to divide the keywords into two distinct groups, with each group containing relevant social media and mental health terms. Although this increased the complexity of the data retrieval process, we combined the results during post-processing to ensure all keywords were included in the analysis. This required careful data management and merging.

- **Processing Long Phrase Keywords**:<br>
   **Search Difficulties**: There were keywords with long phrases that Google Trends could not handle accurately, as most years returned zero results. This issue led to an initial decision to remove those keywords from the analysis and focus instead on analyzing shorter, more relevant phrases.

- **Difficulty with Arabic Keywords**:<br>
**Language Recognition Issues**: Google Trends struggled to recognize and process Arabic keywords effectively. As a result, the analysis had to rely solely on English keywords.

**Conclusion:**<br>
In conclusion, our analysis of Google Trends data encountered several challenges, including managing API request limits, specifying timeframes, grouping keywords, and dealing with language recognition issues. By adjusting request intervals, defining precise date ranges, and focusing on English keywords, we maintained an effective data collection process. These challenges shaped our approach and enabled us to derive valuable insights into public interest in social media and mental health within the targeted regions.