# **NB01 - Data Collection**

**OBJECTIVE:**
Collect weather data from the OpenMeteo API and save it to a JSON file. The weather data will be extracted from ten different cities, including London, and will include the following:
- The total precipitation in mm in 2003 and 2023
- The number of days of rainfall in 2003 and 2023

**AUTHOR:** 
@nadiabegic on GitHub

**LAST EDITED:**
7-Nov-2024

-----------------------
**Imports**:

In [132]:
import requests
import os
import json
from datetime import datetime 
import pandas as pd

# 1. Preparation and defining repeat functions

1.1 Read the CSV file of world_cities to access the country codes and city names

In [133]:
world_cities = pd.read_csv('../data/world_cities.csv')

In [134]:
# View the dataframe

world_cities

Unnamed: 0,country,name,lat,lng
0,AD,El Tarter,42.57952,1.65362
1,AD,Sant Julià de Lòria,42.46372,1.49129
2,AD,Pas de la Casa,42.54277,1.73361
3,AD,Ordino,42.55623,1.53319
4,AD,les Escaldes,42.50729,1.53414
...,...,...,...,...
149832,ZW,Beitbridge,-22.21667,30.00000
149833,ZW,Beatrice,-18.25283,30.84730
149834,ZW,Banket,-17.38333,30.40000
149835,ZW,Epworth,-17.89000,31.14750


In [135]:
##valid_rows = (world_cities['country']=='US') & (world_cities['name']=='New York City')
#world_cities[valid_rows]

Unnamed: 0,country,name,lat,lng


1.2 Define the repeat function to obtain the latitude and longitude of a city

In [136]:
def get_lat_long(country_code, city_name, world_cities):
    """
    Retrieves the latitude and longitude of a given city in a specific country.

    Parameters:
        country_code (str): The country code of the city.
        city_name (str): The name of the city.
        world_cities (dict): A dictionary containing city data for different countries.

    Returns:
        Float: returns two floats representing latitude and longitude.
    """
    
    valid_rows = (world_cities['country']==country_code) & (world_cities['name']==city_name)
    city_data = world_cities[valid_rows]
    return city_data['lat'].iloc[0], city_data['lng'].iloc[0]

1.3 Define the repeat function to obtain the amount of rain in a given time period

In [151]:
def get_rain_sum(country_code, city_name, start_date, end_date, world_cities):
    """
    A function which retrieves the rain sum for a given country code and city name.

    Parameters:
        country_code (str): The country code of the location.
        city_name (str): The name of the city.
        start_date (str): The start date of the historical data in the format 'YYYY-MM-DD'.
        end_date (str): The end date of the historical data in the format 'YYYY-MM-DD'.
        world_cities (dict): A dictionary containing city data for different countries.

    Returns:
        list: a list of the daily rain_sum (in mm) for the given data range. 
    """

    latitude, longitude = get_lat_long(country_code, city_name, world_cities)

    base_historical_url = "https://archive-api.open-meteo.com/v1/archive?"
    params_lat_long     = "latitude=" + str(latitude) + "&longitude="  + str(longitude)
    params_date         = "&start_date=" + str(start_date) + "&end_date=" + str(end_date)
    params_others       = "&daily=rain_sum"

    final_url = base_historical_url + params_lat_long + params_date + params_others

    response = requests.get(final_url)
    rain_data = response.json()
    rain_sum = rain_data['daily']['rain_sum']
   
    return rain_sum

1.4 Define the function to obtain the number of days of rainfall

In [138]:
def num_days_rain(country_code, city_name, start_date, end_date, world_cities):
    """
    A function which retrieves the number of days it rained in a given country code and city name.
    
    Parameters:
        country_code (str): The country code of the location.
        city_name (str): The name of the city.
        start_date (str): The start date of the historical data in the format 'YYYY-MM-DD'.
        end_date (str): The end date of the historical data in the format 'YYYY-MM-DD'.
        world_cities (dict): A dictionary containing city data for different countries.

    Returns:
        int: the number of days it rained in the given city in a given time period.
    """
        
    days_of_rain = 0
    for rain_sum in get_rain_sum(country_code, city_name, start_date, end_date, world_cities):
        if rain_sum > 0:
            days_of_rain += 1

    return days_of_rain

1.5 Define the function to calculate the total precipitation for the year in mm

In [139]:
def total_precipitation(country_code, city_name, start_date, end_date, world_cities):
    """
    A function which retrieves the total precipitation in a given country code and city name.
    
    Parameters:
        country_code (str): The country code of the location.
        city_name (str): The name of the city.
        start_date (str): The start date of the historical data in the format 'YYYY-MM-DD'.
        end_date (str): The end date of the historical data in the format 'YYYY-MM-DD'.
        world_cities (dict): A dictionary containing city data for different countries.

    Returns:
        int: the total precipitation in mm in the given city in a given time period.
    """

    precipitation = 0
    for rain_sum in get_rain_sum(country_code, city_name, start_date, end_date, world_cities):
        precipitation += rain_sum

    return precipitation

# 2. Collect the number of days of rainfall in 2023

2.1 Obtain the number of days of rainfall in 2023 for the ten different cities and store the data into a dictionary

In [140]:
cities = [
    ("GB", "London"),
    ("GB", "Edinburgh"),
    ("BA", "Sarajevo"),
    ("NL", "Amsterdam"),
    ("FR", "Paris"),
    ("ES", "Madrid"),
    ("SY", "Damascus"),
    ("US", "New York City"),
    ("US", "Los Angeles"),
    ("AE", "Dubai")
]

In [141]:
days_rain_2023 = {}

start_date = "2023-01-01"
end_date = "2023-12-31"

for country_code, city_name in cities: 
    days_rain = num_days_rain(country_code, city_name, start_date, end_date, world_cities)
    days_rain_2023[city_name] = days_rain

2.2 Save the data to a JSON file

In [142]:
with open('../data/days_rain_2023.json', 'w') as file:
    json.dump(days_rain_2023, file)

# 3. Obtain the total precipitation in 2023

3.1 Obtain the total precipitation in 2023 for the ten different cities and store the data into a dictionary

In [143]:
total_precipitation_2023 = {}

start_date = "2023-01-01"
end_date = "2023-12-31"

for country_code, city_name in cities: 
    precipitation = total_precipitation(country_code, city_name, start_date, end_date, world_cities)
    total_precipitation_2023[city_name] = precipitation

3.2 Save the data to a JSON file

In [144]:
with open('../data/total_precipitation_2023.json', 'w') as file:
    json.dump(total_precipitation_2023, file)

# 4. Collect the number of days of rainfall in 2003

4.1 Obtain the number of days of rainfall in 2003 for the ten different cities and store the data into a dictionary

In [145]:
days_rain_2003 = {}

start_date = "2003-01-01"
end_date = "2003-12-31"

for country_code, city_name in cities: 
    days_rain = num_days_rain(country_code, city_name, start_date, end_date, world_cities)
    days_rain_2003[city_name] = days_rain

4.2 Save the data to a JSON file

In [146]:
with open('../data/days_rain_2003.json', 'w') as file:
    json.dump(days_rain_2003, file)

# 5. Obtain the total precipitation in 2003

5.1 Obtain the total precipitation in 2003 for the ten different cities and store the data into a dictionary

In [161]:
total_precipitation_2003 = {}

start_date = "2003-01-01"
end_date = "2003-12-31"

for country_code, city_name in cities: 
    precipitation = total_precipitation(country_code, city_name, start_date, end_date, world_cities)
    total_precipitation_2003[city_name] = precipitation

5.2 Save the data to a JSON file

In [162]:
with open('../data/total_precipitation_2003.json', 'w') as file:
    json.dump(total_precipitation_2003, file)