# 1. ⚙️ Setup

## 1.1 Data Collection Strategy

In this notebook, I will focus on the **Historical Weather Data** endpoint on the [Open-Meteo API](https://open-meteo.com/en/docs):

| Endpoint         | URL starts with                                      |
|------------------|------------------------------------------------------|
| [Historical Weather Data](https://open-meteo.com/en/docs/historical-weather-api) | `https://archive-api.open-meteo.com/v1/archive` |

The objective is to assess the raininess of **London, UK** relative to other cities with varied climates. I will gather historical precipitation data for the entire year of 2023, allowing for a detailed comparison of annual rain patterns.

### Cities Selected for Comparison:
1. **London, UK** – The main city of interest, often portrayed as rainy.
2. **Singapore** – Known for its tropical climate with high annual rainfall.
3. **Cairo, Egypt** – Represents a dry climate with very low annual rainfall.
4. **Buenos Aires, Argentina** – A moderate climate with regular seasonal rainfall.
5. **Mumbai, India** – Known for very high rainfall, particularly during the monsoon season.

### Data Collection Scope

For each city, I retrieved the following data over the period from **January 1, 2023, to December 31, 2023**:
- **Daily Precipitation** (`precipitation_sum`) to measure total rainfall.
- **Precipitation Hours** (`precipitation_hours`) to capture the duration of rainfall each day.

This dataset provides the basis for calculating metrics such as **Total Rainfall**, **Number of Rainy Days**, **Average Rain Intensity**, and **Average Rain Duration**. These metrics allow for a balanced and comprehensive comparison of raininess across regions, helping to evaluate if London’s reputation as a rainy city holds up against cities with distinct climates.


In [1]:
import os
import json

import requests

import pandas as pd

  from pandas.core import (


# 1.2 Helpful functions

**The code below reads the world cities data and return the latitude and longitude of a city. You can use it to get the coordinates of the city you want to analyse.**

In [2]:
def get_lat_lon(country_code, city_name):
    
    filepath = '../data/world_cities.csv'
    world_cities = pd.read_csv(filepath)

    # This is how we filter data in pandas
    city_data = world_cities[(world_cities['country'] == country_code) & 
                             (world_cities['name'] == city_name)]
    
    # Convert the data to a list of dictionaries
    city_data = city_data.to_dict('records')
    
    if len(city_data) == 0:
        raise ValueError(f"No records found for {city_name}, {country_code} in {filepath}")

    latitude = city_data[0]['lat']
    longitude = city_data[0]['lng']

    return latitude, longitude

**Let's test this function for Singapore, SG.**

In [3]:
get_lat_lon("SG", "Singapore")

(1.28967, 103.85007)

**I wrote a function to construct the URL for me. This way I can call it anytime inside or outside another function:**

In [4]:
def build_url(latitude: float, longitude: float, start_date:str , end_date: str):
    base_historical_url = "https://archive-api.open-meteo.com/v1/era5?"
    params_lat_long     = "latitude=" + str(latitude) + "&longitude="  + str(longitude)
    params_date         = "&start_date=" + start_date + "&end_date=" + end_date

    # I want the daily precipitation sum and and precipitation hours.
    # Setting the timezone to automatically adjust the time to the local time of the location
    params_others       = "&daily=precipitation_sum,precipitation_hours&timezone=auto"

    final_url = base_historical_url + params_lat_long + params_date + params_others

    return final_url

**Let's test the function for Singapore, SG.**

In [5]:
build_url(1.28967, 103.8501, "2023-01-01", "2023-01-02")

'https://archive-api.open-meteo.com/v1/era5?latitude=1.28967&longitude=103.8501&start_date=2023-01-01&end_date=2023-01-02&daily=precipitation_sum,precipitation_hours&timezone=auto'

In [6]:
# Compile a list of city data, including country code and city name
cities = [
    ("GB", "London"),       
    ("SG", "Singapore"),    
    ("EG", "Cairo"),        
    ("AR", "Buenos Aires"), 
    ("IN", "Mumbai")        
]


**Compile latitudes and longitudes for all cities**

In [7]:
geo_data = []

for country_code, city_name in cities:
    latitude, longitude = get_lat_lon(country_code, city_name)
    geo_data.append((country_code, city_name, latitude, longitude))

geo_data

[('GB', 'London', 51.50853, -0.12574),
 ('SG', 'Singapore', 1.28967, 103.85007),
 ('EG', 'Cairo', 30.06263, 31.24967),
 ('AR', 'Buenos Aires', -34.61315, -58.37723),
 ('IN', 'Mumbai', 19.07283, 72.88261)]

## 2. Historical Rainfall

In [8]:
from datetime import datetime

def get_historical_data(country_code, city_name, start_date=None, end_date=None):
    """
    Retrieves historical weather data for a specific city using default dates if none are provided.
    
    Parameters:
        country_code (str): The country code of the city.
        city_name (str): The name of the city.
        start_date (str): Optional; Start date in "YYYY-MM-DD" format. Defaults to Jan 1, 2023.
        end_date (str): Optional; End date in "YYYY-MM-DD" format. Defaults to Dec 31, 2023.
    
    Returns:
        dict: Dictionary of historical weather data.
    """
    # Set default dates to the full year 2023 if not provided
    if not start_date:
        start_date = "2023-01-01"
    if not end_date:
        end_date = "2023-12-31"
    
    # Retrieve latitude and longitude from the world_cities data
    latitude, longitude = get_lat_lon(country_code, city_name)
    
    # Use the build_url function to construct the final API URL
    url = build_url(latitude, longitude, start_date, end_date)
    
    # Make API request and return data
    response = requests.get(url)
    data = response.json()
    
    return data.get("daily", {})

# Main code for gathering historical rainfall data for each city
historical_rainfall = {}

for country_code, city_name, _, _ in geo_data:
    # Call get_historical_data without specifying dates
    rainfall = get_historical_data(country_code, city_name)
    historical_rainfall[city_name] = rainfall


A few checks to confirm it worked:

In [9]:
historical_rainfall.keys()

dict_keys(['London', 'Singapore', 'Cairo', 'Buenos Aires', 'Mumbai'])

In [10]:
for city, rainfall in historical_rainfall.items():
    print(f"The value for key {city:10s} is a list of {len(rainfall)} elements")

The value for key London     is a list of 3 elements
The value for key Singapore  is a list of 3 elements
The value for key Cairo      is a list of 3 elements
The value for key Buenos Aires is a list of 3 elements
The value for key Mumbai     is a list of 3 elements


**Save to file:**

In [11]:
with open('../data/multicity_historical.json', 'w') as file:
    json.dump(historical_rainfall, file)