## Lexy Feldmann<br>Using Weather API EDA

### Set up the API data using documentation from the Weather API website

In [1]:
# Import the necessary libraries (everything but pandas taken from the website)
import openmeteo_requests
import requests_cache
import pandas as pd
from retry_requests import retry

In [2]:
# Setup the Open-Meteo API client with cache and retry on error (all taken from the website)
cache_session = requests_cache.CachedSession('.cache', expire_after = -1)
retry_session = retry(cache_session, retries = 5, backoff_factor = 0.2)
openmeteo = openmeteo_requests.Client(session = retry_session)

# Store the API URL in a variable to use later
url = "https://archive-api.open-meteo.com/v1/archive"

### Grab dataframe from the last project milestone that was formed through HTML data

In [3]:
# Grab the countries_df dateframe from the Project - 3 Milestone file
%store -r countries_df

# Print the first ten rows of the dataframe to make sure it is there for use
countries_df.head(10)

Unnamed: 0,Index,Country,Latitude,Longitude
0,1,Andorra,42.546245,1.601554
1,2,United Arab Emirates,23.424076,53.847818
2,3,Afghanistan,33.93911,67.709953
3,4,Antigua and Barbuda,17.060816,-61.796428
4,5,Anguilla,18.220554,-63.068615
5,6,Albania,41.153332,20.168331
6,7,Armenia,40.069099,45.038189
7,8,Netherlands Antilles,12.226079,-69.060087
8,9,Angola,-11.202692,17.873887
9,10,Antarctica,-75.250973,-0.071389


### Function given to users from the weather API website. It shows how to handle daily data. I modified it so that I am only returning one row per country (get the average of all weather across 365 days) so that it is returned yearly for each country.

In [4]:
import pandas as pd

def process_daily(response, country, latitude, longitude):
    # Create a Daily object from the API response
    daily = response.Daily()

    # Grab the temperature, precipitation, and sunshine duration and store them in their respective variables
    daily_temperature_2m_mean = daily.Variables(0).ValuesAsNumpy()
    daily_precipitation_sum = daily.Variables(1).ValuesAsNumpy()
    daily_sunshine_duration = daily.Variables(2).ValuesAsNumpy()

    # Create a dictionary for the date and data to grab (taken from the API website)
    daily_data = {
        "date": pd.date_range(
            start=pd.to_datetime(daily.Time(), unit="s", utc=True),
            end=pd.to_datetime(daily.TimeEnd(), unit="s", utc=True),
            freq=pd.Timedelta(seconds=daily.Interval()),
            inclusive="left"
        ),
        "Average Temperature (°F)": daily_temperature_2m_mean,
        "Average Precipitation (in.)": daily_precipitation_sum,
        "Average Sunshine Duration (hrs)": daily_sunshine_duration / 3600  # Convert seconds to hours
    }

    # Create a dataframe using the daily_data dictionary
    daily_dataframe = pd.DataFrame(data=daily_data)

    # Calculate the average of all temperatures, precipitations, and sunshine durations
    average_temperature = daily_dataframe["Average Temperature (°F)"].mean()
    average_precipitation = daily_dataframe["Average Precipitation (in.)"].mean()
    average_sunshine = daily_dataframe["Average Sunshine Duration (hrs)"].mean()

    # Create a dictionary for the yearly average of temperature, precipitation, and sunshine duration for each country
    average_data = {
        "Country": [country],
        "Latitude": [latitude],
        "Longitude": [longitude],
        "Average Temperature (°F)": [average_temperature],
        "Average Precipitation (in.)": [average_precipitation],
        "Average Sunshine Duration (hrs)": [average_sunshine]
    }

    # Create a DataFrame with the yearly data
    average_dataframe = pd.DataFrame(data=average_data)

    # Return the yearly dataframe
    return average_dataframe

### 1. Create a dataframe, using a function, that utilizes inputted dictionary parameters and calls on the process_daily() function in order to properly grab 365 days worth of data

In [5]:
# Takes in a dictionary of parameters for the API to sift through, that then spits out the specified weather data
def make_df(input_params):
    
    # Creates an empty dataframe with the specified columns
    final_df = pd.DataFrame(columns=['Country', 'Latitude', 'Longitude', 'Average Temperature (°F)',
                                     'Average Precipitation (in.)', 'Average Sunshine Duration (hrs)'])
    
    # Initialize an empty list to store DataFrames for concatenation
    dfs_to_concat = []
    
    # Loop over countries_df to gather latitude and longitude for each location
    for index, row in countries_df.iterrows():
        country = row['Country']
        latitude = row['Latitude']
        longitude = row['Longitude']
    
        # Update input_params (to go into the API) with latitude and longitude from countries_df
        input_params['latitude'] = latitude
        input_params['longitude'] = longitude

        # Retrieve weather data from the Open-Meteo API for the current location
        responses = openmeteo.weather_api(url, params=input_params)
    
        # Process each response in the list
        for response in responses:
            
            # Process the response and get the DataFrame using the process_daily function (gives one row for each country)
            processed_df = process_daily(response, country, latitude, longitude)
            
            # Add country, latitude, and longitude to the processed DataFrame and set to the values from countries_df
            processed_df['Country'] = country
            processed_df['Latitude'] = latitude
            processed_df['Longitude'] = longitude
            
            # Append the processed DataFrame to the list
            dfs_to_concat.append(processed_df)
        
    # Concatenate all DataFrames in the list to one large dataframe for all countries (for that year)
    final_df = pd.concat(dfs_to_concat, ignore_index=True)
    
    # Return all countries for the current year
    return final_df.head(244)

### 2. Add a year column to each dataset to be able to distinguish yearly weather changes

In [6]:
# Specify the parameters and dates for the 2015 data in a dictionary format to pass into the make_df function
params_2015 = {
    "start_date": "2015-01-01",
    "end_date": "2015-12-31",
    "daily": "temperature_2m_mean,precipitation_sum,sunshine_duration",
    "temperature_unit": "fahrenheit",
    "precipitation_unit": "inch"
}

# Pass in the parameters into the make_df function and store it in a variable
weather_2015 = make_df(params_2015)

# Add a column for the year 2015
weather_2015['Year'] = 2015

# Print the first ten rows to check that it worked
weather_2015.head(10)

OpenMeteoRequestsError: {'error': True, 'reason': 'Minutely API request limit exceeded. Please try again in one minute.'}

In [None]:
# Specify the parameters and dates for the 2016 data in a dictionary format to pass into the make_df function
params_2016 = {
    "start_date": "2016-01-01",
    "end_date": "2016-12-31",
    "daily": "temperature_2m_mean,precipitation_sum,sunshine_duration",
    "temperature_unit": "fahrenheit",
    "precipitation_unit": "inch"
}

# Pass in the parameters into the make_df function and store it in a variable
weather_2016 = make_df(params_2016)

# Add a column for the year 2016
weather_2016['Year'] = 2016

# Print the first ten rows to check that it worked
weather_2016.head(10)

In [None]:
# Specify the parameters and dates for the 2017 data in a dictionary format to pass into the make_df function
params_2017 = {
    "start_date": "2017-01-01",
    "end_date": "2017-12-31",
    "daily": "temperature_2m_mean,precipitation_sum,sunshine_duration",
    "temperature_unit": "fahrenheit",
    "precipitation_unit": "inch"
}

# Pass in the parameters into the make_df function and store it in a variable
weather_2017 = make_df(params_2017)

# Add a column for the year 2017
weather_2017['Year'] = 2017

# Print the first ten rows to check that it worked
weather_2017.head(10)

In [None]:
# Specify the parameters and dates for the 2018 data in a dictionary format to pass into the make_df function
params_2018 = {
    "start_date": "2018-01-01",
    "end_date": "2018-12-31",
    "daily": "temperature_2m_mean,precipitation_sum,sunshine_duration",
    "temperature_unit": "fahrenheit",
    "precipitation_unit": "inch"
}

# Pass in the parameters into the make_df function and store it in a variable
weather_2018 = make_df(params_2018)

# Add a column for the year 2018
weather_2018['Year'] = 2018

# Print the first ten rows to check that it worked
weather_2018.head(10)

In [None]:
# Specify the parameters and dates for the 2019 data in a dictionary format to pass into the make_df function
params_2019 = {
    "start_date": "2019-01-01",
    "end_date": "2019-12-31",
    "daily": "temperature_2m_mean,precipitation_sum,sunshine_duration",
    "temperature_unit": "fahrenheit",
    "precipitation_unit": "inch"
}

# Pass in the parameters into the make_df function and store it in a variable
weather_2019 = make_df(params_2019)

# Add a column for the year 2019
weather_2019['Year'] = 2019

# Print the first ten rows to check that it worked
weather_2019.head(10)

### 3. Merge yearly dataframes into one large dataset (2015 - 2019)

In [None]:
# Concatenate the DataFrames (merge them all together)
all_weather = pd.concat([weather_2015, weather_2016, weather_2017, weather_2018, weather_2019], ignore_index=True)

# Verify the merge worked!
all_weather.head(500)

### 4. Remove unnecessary columns (no longer need latitude and longitude)

In [None]:
# Drop the latitude and longitude, they were only need to map on Country to the weather data
yearly_country_weather = all_weather.drop(columns=['Latitude', 'Longitude'])

# Print to verify that the columns were dropped
yearly_country_weather.head()

### 5. Sort the dataframe by country and year (ascending) to better see yearly weather changes

In [None]:
# Set display rows to show ALL rows when printed
pd.set_option('display.max_rows', None)

# Sort the weather data by the 'Country' and 'Year' columns in ascending order and print the dataframe
yearly_country_weather.sort_values(by=['Country', 'Year'], ascending=[True, True])

# Store the dataframe for later use
%store yearly_country_weather

### Ethical Implications of My Data Wrangling Steps

The changes that were made to the data included creating a dataframe by extracting daily data and turning it into yearly data, adding a Year column to each yearly dataset, merging all yearly datasets together (now have weather data for each country in countries_df from 2015-2019), removing unnecessary columns from the merged dataset, and reordering the dataset to be in a more readable format. There are no legal or regulatory guidelines for my data or project topic. All data is public and none of it is sensitive. With transforming API data, you run the risk of not correctly calling on each piece of data. I checked to make sure I had handled this properly by manually inputting some of the latitudes and longitudes in the website for the specified dates, to see if it returned the same daily data that I had gotten before grabbing the average of all days and turning it into yearly data. I made no assumptions when cleaning the data, the only thing that MAY be considered an assumption is picking which daily data I deemed can increase or reduce happiness from the Weather API (average temperature, precipitation sum, sunshine duration). My data source is found on https://open-meteo.com/en/docs/historical-weather-api#hourly=&daily=temperature_2m_mean&temperature_unit=fahrenheit&wind_speed_unit=mph&precipitation_unit=inch, which is an entirely trusted source. It was acquired in an ethical way, as the site allows for easy access to the API (gives you example code on how to do so). I would mitigate the ethical implications by making sure I transformed my data efficiently and effectively - making sure all steps are accounted for and no transformations were messed up by either doing them in the wrong order or incorrectly. 