# Introduction
This notebook is for collecting and processing rainfall data required for our data analysis notebook.

To keep the notebook focused on data analysis, key functions used for data processing have been moved to a separate Python file, `data_collection_utils.py`. 

By organising the functions in `data_collection_utils.py`, we maintain a clean and modular notebook. These functions are imported at the start of the notebook under section **1. Import Libraries** for easy use:

## 1. Import Libraries ⚙️


This code imports essential libraries required for data collection and analysis:

In [1]:
import os
import json

import requests

import pandas as pd

from datetime import datetime

import sys
sys.path.append('../scripts')  # Add the 'scripts' folder to the Python path

from data_collection_utils import get_lat_lon, build_url, get_historical_data



## 2. Function Imports and Testing

In this section, we import and test key functions from the `data_collection_utils.py` file to ensure they work correctly. These functions include:
- `get_lat_lon`: Retrieves latitude and longitude for a given city.
- `build_url`: Constructs the API URL with the necessary parameters.
- `get_historical_data`: Fetches historical weather data for the specified city and date range.

Below, we test `get_lat_lon` using a sample city to verify it’s working as expected.


In [2]:
get_lat_lon("SG", "Singapore")

(1.28967, 103.85007)

Similarly, we test build_url using a sample city to verify it's working as expected.

In [3]:
build_url(1.28967, 103.8501, "2023-01-01", "2023-01-02")

'https://archive-api.open-meteo.com/v1/era5?latitude=1.28967&longitude=103.8501&start_date=2023-01-01&end_date=2023-01-02&daily=precipitation_sum,precipitation_hours&timezone=auto'

**Compile a list of city data, including country code and city name**

In [4]:
cities = [
    ("GB", "London"),       
    ("SG", "Singapore"),    
    ("EG", "Cairo"),        
    ("AR", "Buenos Aires"), 
    ("IN", "Mumbai")        
]


**Compile latitudes and longitudes for all cities required for our analysis using a for loop function**

In [5]:
geo_data = []

# Loop through each city and country code in the cities list
for country_code, city_name in cities:
    # Get latitude and longitude for the specified city and country
    latitude, longitude = get_lat_lon(country_code, city_name)
    
    # Append the country code, city name, latitude, and longitude as a tuple to geo_data
    geo_data.append((country_code, city_name, latitude, longitude))

# Display the collected geographic data
geo_data


[('GB', 'London', 51.50853, -0.12574),
 ('SG', 'Singapore', 1.28967, 103.85007),
 ('EG', 'Cairo', 30.06263, 31.24967),
 ('AR', 'Buenos Aires', -34.61315, -58.37723),
 ('IN', 'Mumbai', 19.07283, 72.88261)]

## 3. Historical Rainfall 🌧️

I have already created a function stored in `data_collection_utils.py` so let's test it as well.

In [6]:
get_historical_data("SG", "Singapore", "2023-01-01", "2023-01-02")

{'time': ['2023-01-01', '2023-01-02'],
 'precipitation_sum': [0.0, 3.7],
 'precipitation_hours': [0.0, 5.0]}

I've created a dictionary to collect and store historical rainfall data for each city.

In [7]:
# Create an empty dictionary to store historical rainfall data for each city
historical_rainfall = {}

# Loop through each city and country code in geo_data
for country_code, city_name, _, _ in geo_data:
    # Retrieve historical rainfall data for the city (using default dates)
    rainfall = get_historical_data(country_code, city_name)
    
    # Store the rainfall data in a dictionary with the city name as the key
    historical_rainfall[city_name] = rainfall


In [8]:
print(geo_data)

[('GB', 'London', 51.50853, -0.12574), ('SG', 'Singapore', 1.28967, 103.85007), ('EG', 'Cairo', 30.06263, 31.24967), ('AR', 'Buenos Aires', -34.61315, -58.37723), ('IN', 'Mumbai', 19.07283, 72.88261)]


A few checks to confirm it worked:

In [9]:
historical_rainfall.keys() # Check for the keys in the historical_rainfall dictionary

dict_keys(['London', 'Singapore', 'Cairo', 'Buenos Aires', 'Mumbai'])

In [10]:
# Loop through each city and its rainfall data in the historical_rainfall dictionary
for city, rainfall in historical_rainfall.items():
    # Print the city name and the number of elements in the rainfall data list
    print(f"The value for key {city:10s} is a list of {len(rainfall)} elements")


The value for key London     is a list of 3 elements
The value for key Singapore  is a list of 3 elements
The value for key Cairo      is a list of 3 elements
The value for key Buenos Aires is a list of 3 elements
The value for key Mumbai     is a list of 3 elements


**This code saves the `historical_rainfall` data to a JSON file:**

In [11]:
with open('../data/multicity_historical.json', 'w') as file:
    json.dump(historical_rainfall, file)