# **Notebook 01 | Rain Data Collection**

LSE DS105A – Data for Data Science (2024/25 Autumn Term)

**AUTHOR:** SABAA PASHA (Candidate Number: **41408**)

**OBJECTIVE**: Collect historical weather data using the [Open-Meteo API](https://open-meteo.com/en/docs) to answer the following question:

> *"Is London really as rainy as the movies make it out to be?"*

| Endpoint         | URL starts with                      |
|------------------|--------------------------------------|
| [Historical Weather Data](https://open-meteo.com/en/docs/historical-weather-api) | `https://archive-api.open-meteo.com/v1/archive`   |

## **Imports**

In [1]:
import os
import json

os.chdir("..")
from scripts.functions import get_lat_lon, get_rain_data

import requests

> ```python
> os.chdir("..") 
> ```
The command above moves the current working directory up one level to access the parent directory. This allows for the get_lat_lon and get_rain_data functions to be imported from the functions.py file located in the scripts folder.

## **Automated Data Collection**

**CHOSEN TIME PERIOD:** 01/01/2023 - 31/12/2023
> This is a very recent time period, meaning that the analysis will be relevant to current climate trends and conditions.
>
> An entire year allows for us to take into account seasonal variability and typical weather patterns.

Setting the start and end dates by allocating to variables:

In [2]:
start_date = "2023-01-01"
end_date = "2023-12-31"

**SELECTED CITIES:** Bangalore (India), Bogota (Colombia), Riyadh (Saudi Arabia), Amsterdam (Netherlands).
> Bangalore: Known to receive significant amounts of rainfall from June-September due to its monsoon season.
>
> Bogota: Proximity to the equator and high altitude means that the city eperiences constant rainfall.
>
> Riyadh: Hot desert climate with little rainfall, offers a contrast to the wetter cities.
>
> Amsterdam: Mild climate and rain patterns very similar to that of the United Kingdom.

Creating a list to store the names of the 5 cities and their respective countries:

In [3]:
cities = [
    ("London", "United Kingdom"),
    ("Bangalore", "India"),
    ("Bogota", "Colombia"),
    ("Riyadh", "Saudi Arabia"),
    ("Amsterdam", "Netherlands")
]

print(cities)

[('London', 'United Kingdom'), ('Bangalore', 'India'), ('Bogota', 'Colombia'), ('Riyadh', 'Saudi Arabia'), ('Amsterdam', 'Netherlands')]


**VARIABLES:** precipitation_sum, rain_sum, precipitation_hours
> precipitation_sum: Measures the total amount of ALL types of precipitation (rain, sleet, hail, snow) that can contribute to the perception of a "rainy" climate.
>
> rain_sum: Isolates and measures rainfall on its own.
>
> precipitation_hours: Measures the number of hours with any type of precipitation.
>
Creating a dictionary to store each country's data for the above variables:

In [4]:
all_rain_data = {}

for city, country in cities:
    print("=" * 10770)
    try:
        output = get_rain_data(city, country, start_date, end_date)
        all_rain_data[f"{city}"] = output
        print(f"Rain data for {city}, {country}:")
        print(output)
        print()
    except ValueError as e:
        print(f"Rain data for {city}, {country}:") 
        print(f"{e}")

Response Status Code: 200
Rain data for London, United Kingdom:
{'time': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05', '2023-01-06', '2023-01-07', '2023-01-08', '2023-01-09', '2023-01-10', '2023-01-11', '2023-01-12', '2023-01-13', '2023-01-14', '2023-01-15', '2023-01-16', '2023-01-17', '2023-01-18', '2023-01-19', '2023-01-20', '2023-01-21', '2023-01-22', '2023-01-23', '2023-01-24', '2023-01-25', '2023-01-26', '2023-01-27', '2023-01-28', '2023-01-29', '2023-01-30', '2023-01-31', '2023-02-01', '2023-02-02', '2023-02-03', '2023-02-04', '2023-02-05', '2023-02-06', '2023-02-07', '2023-02-08', '2023-02-09', '2023-02-10', '2023-02-11', '2023-02-12', '2023-02-13', '2023-02-14', '2023-02-15', '2023-02-16', '2023-02-17', '2023-02-18', '2023-02-19', '2023-02-20', '2023-02-21', '2023-02-22', '2023-02-23', '2023-02-24', '2023-02-25', '2023-02-26', '2023-02-27', '2023-02-28', '2023-03-01', '2023-03-02', '2023-03-03', '2023-03-04', '2023-03-05', '2023-03-06', '2023-03-07', '2

Confirming that the function successfully collected the data:

In [5]:
all_rain_data.keys()

dict_keys(['London', 'Bangalore', 'Bogota', 'Riyadh', 'Amsterdam'])

In [6]:
for city, data in all_rain_data.items():
    print("=" * 93)
    print(f"City: {city}")
    print("Keys in the data:", data.keys())
    print()

City: London
Keys in the data: dict_keys(['time', 'precipitation_sum', 'rain_sum', 'precipitation_hours'])

City: Bangalore
Keys in the data: dict_keys(['time', 'precipitation_sum', 'rain_sum', 'precipitation_hours'])

City: Bogota
Keys in the data: dict_keys(['time', 'precipitation_sum', 'rain_sum', 'precipitation_hours'])

City: Riyadh
Keys in the data: dict_keys(['time', 'precipitation_sum', 'rain_sum', 'precipitation_hours'])

City: Amsterdam
Keys in the data: dict_keys(['time', 'precipitation_sum', 'rain_sum', 'precipitation_hours'])



Renaming the keys for clarity and readability:

In [7]:
renamed_rain_data = {}

for city, data in all_rain_data.items():
    renamed_data = {
        "Date": data["time"],
        "Total Precipitation (mm)": data["precipitation_sum"],
        "Total Rain (mm)": data["rain_sum"],
        "Hours of Precipitation": data["precipitation_hours"]
    }
    renamed_rain_data[city] = renamed_data

Confirming that the keys were renamed correctly:

In [8]:
renamed_rain_data.keys()

dict_keys(['London', 'Bangalore', 'Bogota', 'Riyadh', 'Amsterdam'])

In [9]:
for city, data in renamed_rain_data.items():
    print("=" * 110)
    print(f"City: {city}")
    print("Keys in the data:", data.keys())
    print()

City: London
Keys in the data: dict_keys(['Date', 'Total Precipitation (mm)', 'Total Rain (mm)', 'Hours of Precipitation'])

City: Bangalore
Keys in the data: dict_keys(['Date', 'Total Precipitation (mm)', 'Total Rain (mm)', 'Hours of Precipitation'])

City: Bogota
Keys in the data: dict_keys(['Date', 'Total Precipitation (mm)', 'Total Rain (mm)', 'Hours of Precipitation'])

City: Riyadh
Keys in the data: dict_keys(['Date', 'Total Precipitation (mm)', 'Total Rain (mm)', 'Hours of Precipitation'])

City: Amsterdam
Keys in the data: dict_keys(['Date', 'Total Precipitation (mm)', 'Total Rain (mm)', 'Hours of Precipitation'])



## **Saving to a JSON File**

In [10]:
with open("./data/rain_data.json", "w") as file:
    json.dump(renamed_rain_data, file)