<font style='font-size:1.5em'>**✔️ Week 03 Formative Exercise Solution** </font>

<font style='font-size:1.2em'>LSE DS105A – Data for Data Science (2024/25)</font>


<div style="color: #333333; background-color:rgba(93, 158, 188, 0.15); border-radius: 10px; box-shadow: 0 4px 8px rgba(0, 0, 0, 0.1); padding: 20px; margin: 10px; flex: 1 1 calc(45% - 20px);min-width: 250px;max-width: 350px;align-items:top;min-height: calc(45% - 20px); box-sizing: border-box;font-size:0.9em;">

This is a notebook of solutions to 📝 **W03 Formative Exercise** (due 16 Oct 2024) of the course **DS105A - Data for Data Science** at the [LSE Data Science Institute](https://lse.ac.uk/dsi).

</div>


**AUTHORS:**  Dr. [Jon Cardoso-Silva](https://jonjoncardoso.github.io)

**DEPARTMENT:** [LSE Data Science Institute](https://lse.ac.uk/dsi)

**OBJECTIVE**: Demonstrate how one would use the building blocks of the Python programming language to create a solution to the W03 Formative Exercise.

---

In [1]:
import os
import json

import requests

from datetime import datetime

# Create folder data if it doesn't exist
os.makedirs('../data', exist_ok=True)

<details style="width:70%;font-size:0.9em;border: 1px solid #aaa;border-radius: 4px;padding: .5em;margin-left:1.5em"><summary style="    font-weight: bold;margin: -.5em -.5em 0;padding: .5em;border-bottom: 1px solid #aaa;">🔵 Click here if you got an error with the cell above</summary>

If the cell above throws an error when you run it, it's because you need to install additional Python libraries.

In that case, go to the menu and click "Terminal" -> "New Terminal". Then, on the terminal run:

```bash
pip install requests numpy pandas lets-plot
```

OR

```bash
python -m pip install requests numpy pandas lets-plot
```

Wait for it to complete, then come back here (you can close the Terminal window), click "Restart" at the top of this notebook and try again.

⭐ Pro-Tip: Alternatively, you can run Terminal commands from here! Open a new Python cell below and add a `!` to your prompt, like this:

```bash
! pwd
```

</details>

# 📃 Data Collection Strategy

In this notebook, I will focus on two endpoints of interest on the [Open-Meteo API](https://open-meteo.com/en/docs):

| Endpoint         | URL starts with                      |
|------------------|--------------------------------------|
| [Weather Forecast](https://open-meteo.com/en/docs)                            | `https://api.open-meteo.com/v1/forecast`          |
| [Historical Weather Data](https://open-meteo.com/en/docs/historical-weather-api) | `https://archive-api.open-meteo.com/v1/archive`   |

I will compare the weather forecast in the period of a week (from now) with the historical data from the same period last year.

I will repeat the comparison for London, UK and Paris, FR. 

# 1. Helpful functions

<div style="width:50%;font-size:0.9em;background-color:#EED55544;padding:0.5rem;font-weight:350;">

⚠️ **WARNING**

What you see here is a version of the notebook that has gone through a lot of cleaning and refactoring before it reached its clean and tidy state.

No one would start knowing precisely how to write the functions below. We go through a process of exploration and trial and error until we achieve the desired code. Only then we wrap our code into loops and functions.

**Understanding the process is more important than the final solution!** 

For an idea of how one would go about writing a solution to this exercise, check out the recording of the 🧑‍🏫 **W03 Lecture**. For more details, check out the ✅ **W03 Solutions** page on Moodle/website.

</div>

## 1.1 Store latitude and longitude of the capitals

I decided to innovate and on top of what was asked in the instructions, I added a function to read the `world_cities.csv` and convert it to a dictionary. This way I don't need to manually search for the latitude and longitude of the cities. I can simply call this function _inside_ the other data collection functions.

The dictionary is structured as follows:

```python
{
    "UK": {"London": [51.5074, -0.1278],
           "Edinburgh": [55.9533, -3.1883],
           ...
    },
    "FR": {"Paris": [48.8566, 2.3522],
           "Marseille": [43.2965, 5.3698],
           ...
    },
    ...
}
```

That is, the dictionary is a nested dictionary with the country code as the first key, the city name as the second key, and the latitude and longitude as the values.


In [2]:
# Pro-tip: eventually, once we start using pandas, our life will be so much simpler.
#          All of the code below will be swapped by a simple pd.read_csv() line
#          (See 💻 W03 Lab's template notebook for an example)
#          But it's important to understand how things work under the hood!

def read_world_cities(filepath):
    """
    Reads a CSV file containing world cities data and returns a dictionary
    where the keys are country codes and the values are dictionaries of cities
    with their corresponding latitude and longitude.

    Parameters:
        filepath (str): The path to the CSV file.

    Returns:
        dict: A dictionary where the keys are country codes and the values are
              dictionaries of cities with their corresponding latitude and longitude.
    """

    # Reuse code from past weeks to read a CSV file
    with open(filepath, 'r', encoding='utf-8') as file:
        world_cities = file.read().split('\n')

    # Each line has four elements:
    #    - country code (2 letters)
    #    - city 
    #    - latitude 
    #    - longitude
    lines = [line.split(',') for line in world_cities[1:-1]]

    # Start with empty dictionary
    output = {}

    for line in lines:
        country_code = line[0]
        city = line[1]
        lat  = line[2]
        lon  = line[3]

        # Create a dictionary for this city
        line_dict = {city: [lat, lon]}

        if country_code in output:
            # If we've seen this country code before:
            output[country_code].update(line_dict)
        else:
            # Else, create the country code key
            output[country_code] = line_dict
            

    return output

In [3]:
world_cities = read_world_cities('../data/world_cities.csv')

## 1.2 Get latitude and longitude of any city

I added a function to navigate the dictionary for me so that I can get the latitude and longitude of any city I want.

In [4]:
def get_lat_long(country_code, city_name, world_cities):
    """
    Retrieves the latitude and longitude of a given city in a specific country.

    Parameters:
        country_code (str): The country code of the city.
        city_name (str): The name of the city.
        world_cities (dict): A dictionary containing city data for different countries.

    Returns:
        tuple: A tuple containing the latitude and longitude of the city.
    """

    city_data = world_cities[country_code][city_name]
    return city_data[0], city_data[1]

In [5]:
london_latitude, london_longitude = get_lat_long('GB', 'London', world_cities)

print(f"London's latitude and longitude are: {london_latitude}, {london_longitude}")

London's latitude and longitude are: 51.50853, -0.12574


# 2. Collect Temperature Forecast

At first my function had latitude, longitude as parameters. But now that I introduced the `get_lat_long()` function, I  can simply pass the city name and country code to the function `get_forecast_data()`:

In [6]:
def get_forecast_data(country_code, city_name, world_cities):
    """
    Retrieves the forecasted temperatures for a given country code and city name.

    Parameters:
        country_code (str): The country code of the location.
        city_name (str): The name of the city.
        world_cities (dict): A dictionary containing world cities data.

    Returns:
        list: A list of 168 hourly forecasted temperatures (in Celsius)
              starting from today's date at 00:00.
    """

    latitude, longitude = get_lat_long(country_code, city_name, world_cities)

    base_forecast_url = "https://api.open-meteo.com/v1/forecast?"
    params_lat_long = "latitude=" + str(latitude) + "&longitude="  + str(longitude)
    params_others = "&hourly=temperature_2m"

    final_url = base_forecast_url + params_lat_long + params_others

    response = requests.get(final_url)

    forecast_data = response.json()
    forecast_temperatures = forecast_data['hourly']['temperature_2m']
    return forecast_temperatures

# Demonstrate that the function works as intended
london_forecast = get_forecast_data('GB', 'London', world_cities)
print(f"The function returned a list of {len(london_forecast)} elements.")
print(f"Head of the list: {london_forecast[0:10]}")
print(f"Tail of the list: {london_forecast[-10:]}")

The function returned a list of 168 elements.
Head of the list: [8.3, 8.3, 8.4, 8.2, 8.1, 8.2, 8.1, 8.1, 8.2, 8.3]
Tail of the list: [3.9, 3.9, 3.4, 2.5, 1.7, 1.3, 1.0, 0.8, 0.4, 0.1]


In [7]:
temperatures = {
    'London': get_forecast_data('GB', 'London', world_cities),
    'Paris' : get_forecast_data('FR', 'Paris', world_cities)
}

with open("../data/open-meteo/forecasted_temperatures.json", 'w', encoding="UTF-8") as file:
    json.dump(temperatures, file)

# 3. Collect historical data

Similarly, I went beyond the requirements of the exercise and decided to replace the 'latitude' and 'longitude' parameters from the instructions with the more intuitive 'country_code', 'city_name' parameters. 

In [8]:
def get_historical_data(country_code, city_name, start_date, end_date, world_cities):
    """
    Retrieves the historical temperatures for a given country code and city name.

    Parameters:
        country_code (str): The country code of the location.
        city_name (str): The name of the city.
        start_date (str): The start date of the historical data in the format 'YYYY-MM-DD'.
        end_date (str): The end date of the historical data in the format 'YYYY-MM-DD'.
        world_cities (dict): A dictionary containing world cities data.

    Returns:
        list: A list of historical temperatures (in Celsius) for the given date range, in hourly intervals.
    """

    latitude, longitude = get_lat_long(country_code, city_name, world_cities)

    base_historical_url = "https://archive-api.open-meteo.com/v1/era5?"
    params_lat_long     = "latitude=" + str(latitude) + "&longitude="  + str(longitude)
    params_date         = "&start_date=" + str(start_date) + "&end_date=" + str(end_date)
    params_others       = "&hourly=temperature_2m"

    final_url = base_historical_url + params_lat_long + params_date + params_others

    response = requests.get(final_url)

    historical_data = response.json()
    historical_temperatures = historical_data['hourly']['temperature_2m']
    return historical_temperatures

# Demonstrate the this function works as intended
london_historical = get_historical_data('GB', 'London', '2023-10-13', '2023-10-19', world_cities)
print(f"The function returned a list of {len(london_historical)} elements.")
print(f"Head of the list: {london_historical[0:10]}")
print(f"Tail of the list: {london_historical[-10:]}")

The function returned a list of 168 elements.
Head of the list: [16.1, 16.6, 17.3, 17.0, 17.6, 17.8, 18.5, 19.0, 19.3, 18.3]
Tail of the list: [17.7, 17.3, 16.5, 15.2, 15.2, 14.9, 14.8, 14.4, 14.4, 14.3]


**Note to self:** Remember to edit the start_date and end_date to match the same dates of the forecast, but a year ago.

\#TODO: In the future, I want for the `get_historical_data()` to calculate the dates automatically, based on today's date.

In [10]:
temperatures = {
    'London': get_historical_data('GB', 'London', '2023-10-20', '2023-10-26', world_cities),
    'Paris' : get_historical_data('FR', 'Paris',  '2023-10-20', '2023-10-26', world_cities)
}

with open("../data/open-meteo/historical_temperatures.json", 'w', encoding="UTF-8") as file:
    json.dump(temperatures, file)

----

**🏆 Challenge**

I will collect the same type of data as above, but for the capitals of Western European countries (given the geographical proximity):

- London, UK
- Vienna, Austria
- Brussels, Belgium
- Paris, France
- Vaduz, Liechtenstein
- Luxembourg, Luxembourg
- Monaco, Monaco
- Amsterdam, Netherlands
- Bern, Switzerland

Normally, I'd overwrite the same.json files I had created but to make the distinction clearer, I'd call them: `multicity_forecast.json` and `multicity_historical.json` for the forecasted temperatures data and historical temperatures data, respectively.

---


# 4. Repeat for multiple cities

Get lat long for all selected cities.

In [11]:
# Compile lists of latitudes and longitudes for all selected cities
# I am using tuples to store the country code and city name
cities = [
    ("GB", "London"),
    ("AT", "Vienna"),
    ("BE", "Brussels"),
    ("FR", "Paris"),
    ("LI", "Vaduz"),
    ("LU", "Luxembourg"),
    ("MC", "Monaco"),
    ("NL", "Amsterdam"),
    ("CH", "Bern")
]

**Compile latitudes and longitudes for all cities**

In [12]:
geo_data = []

for country_code, city_name in cities:
    latitude, longitude = get_lat_long(country_code, city_name, world_cities)
    geo_data.append((country_code, city_name, latitude, longitude))

geo_data

[('GB', 'London', '51.50853', '-0.12574'),
 ('AT', 'Vienna', '48.20849', '16.37208'),
 ('BE', 'Brussels', '50.85045', '4.34878'),
 ('FR', 'Paris', '48.85341', '2.3488'),
 ('LI', 'Vaduz', '47.14151', '9.52154'),
 ('LU', 'Luxembourg', '49.61167', '6.13'),
 ('MC', 'Monaco', '43.73718', '7.42145'),
 ('NL', 'Amsterdam', '52.37403', '4.88969'),
 ('CH', 'Bern', '46.94809', '7.44744')]

## 4.1 Weather Forecast

Reference date: 20/10/2024

In [13]:
forecast_temperatures = {}

for country_code, city_name, _, _ in geo_data:
    temperatures = get_forecast_data(country_code, city_name, world_cities)
    forecast_temperatures[city_name] = temperatures

A few checks to confirm it worked:

In [14]:
forecast_temperatures.keys()

dict_keys(['London', 'Vienna', 'Brussels', 'Paris', 'Vaduz', 'Luxembourg', 'Monaco', 'Amsterdam', 'Bern'])

In [15]:
for city, temperatures in forecast_temperatures.items():
    print(f"The value for key {city:10s} is a list of {len(temperatures)} elements")

The value for key London     is a list of 168 elements
The value for key Vienna     is a list of 168 elements
The value for key Brussels   is a list of 168 elements
The value for key Paris      is a list of 168 elements
The value for key Vaduz      is a list of 168 elements
The value for key Luxembourg is a list of 168 elements
The value for key Monaco     is a list of 168 elements
The value for key Amsterdam  is a list of 168 elements
The value for key Bern       is a list of 168 elements


In [16]:
for city, temperatures in forecast_temperatures.items():
    print(f"The head of the {city:10s} list is {temperatures[:10]}")

print("-->  I can confirm that the values are different, as expected.")

The head of the London     list is [8.3, 8.3, 8.4, 8.2, 8.1, 8.2, 8.1, 8.1, 8.2, 8.3]
The head of the Vienna     list is [2.4, 2.4, 2.5, 2.2, 2.0, 2.0, 2.0, 1.9, 2.2, 2.4]
The head of the Brussels   list is [6.3, 7.1, 7.0, 5.6, 5.9, 5.7, 5.5, 5.9, 6.7, 7.4]
The head of the Paris      list is [9.1, 8.7, 8.1, 8.7, 8.5, 8.1, 8.3, 8.0, 7.9, 8.6]
The head of the Vaduz      list is [4.7, 4.4, 4.3, 4.5, 4.6, 4.4, 4.5, 4.4, 4.5, 4.9]
The head of the Luxembourg list is [4.2, 3.5, 3.9, 3.3, 3.3, 3.2, 3.7, 3.8, 4.3, 5.1]
The head of the Monaco     list is [13.3, 13.1, 13.3, 13.5, 13.4, 13.4, 13.4, 13.5, 14.7, 16.0]
The head of the Amsterdam  list is [8.3, 8.6, 8.6, 8.9, 8.4, 9.2, 10.3, 10.3, 10.4, 11.3]
The head of the Bern       list is [4.1, 3.5, 3.4, 3.6, 3.8, 3.5, 3.2, 3.2, 3.4, 3.6]
-->  I can confirm that the values are different, as expected.


**Dump the data to a JSON file**

In [17]:
# Save it to JSON file
with open('../data/open-meteo/multicity_forecast.json', 'w') as file:
    json.dump(forecast_temperatures, file)

## 4.2 Historical Temperatures

Reference date: 20/Oct/2023 - 26/Oct/2023

In [18]:
historical_temperatures = {}

# TODO: Find a way to let get_historical_data() calculate the dates for me
start_date = '2023-10-20'
end_date   = '2023-10-26'

for country_code, city_name, _, _ in geo_data:
    temperatures = get_historical_data(country_code, city_name, start_date, end_date, world_cities)
    historical_temperatures[city_name] = temperatures

A few checks to confirm it worked:

In [19]:
historical_temperatures.keys()

dict_keys(['London', 'Vienna', 'Brussels', 'Paris', 'Vaduz', 'Luxembourg', 'Monaco', 'Amsterdam', 'Bern'])

In [20]:
for city, temperatures in historical_temperatures.items():
    print(f"The value for key {city:10s} is a list of {len(temperatures)} elements")

The value for key London     is a list of 168 elements
The value for key Vienna     is a list of 168 elements
The value for key Brussels   is a list of 168 elements
The value for key Paris      is a list of 168 elements
The value for key Vaduz      is a list of 168 elements
The value for key Luxembourg is a list of 168 elements
The value for key Monaco     is a list of 168 elements
The value for key Amsterdam  is a list of 168 elements
The value for key Bern       is a list of 168 elements


In [21]:
for city, temperatures in historical_temperatures.items():
    print(f"The head of the {city:10s} list is {temperatures[:10]}")

print("-->  I can confirm that the values are different, as expected.")

The head of the London     list is [14.4, 14.3, 14.2, 14.0, 14.1, 14.0, 14.0, 13.9, 14.1, 13.9]
The head of the Vienna     list is [9.6, 9.6, 9.6, 9.8, 10.4, 10.6, 10.7, 11.2, 17.3, 21.0]
The head of the Brussels   list is [15.6, 15.3, 15.4, 15.2, 15.2, 14.9, 15.1, 15.1, 15.2, 15.1]
The head of the Paris      list is [15.6, 15.0, 14.8, 14.5, 14.2, 14.1, 14.0, 13.9, 14.5, 15.3]
The head of the Vaduz      list is [17.3, 15.0, 15.1, 14.5, 13.9, 15.5, 16.7, 14.9, 16.3, 17.3]
The head of the Luxembourg list is [13.8, 13.9, 14.0, 13.8, 13.7, 13.7, 13.7, 13.6, 13.9, 14.4]
The head of the Monaco     list is [18.8, 18.1, 18.4, 18.1, 19.4, 20.0, 19.4, 19.3, 19.4, 18.2]
The head of the Amsterdam  list is [13.0, 12.4, 12.0, 11.6, 11.0, 10.3, 9.9, 10.1, 10.3, 10.2]
The head of the Bern       list is [13.1, 13.0, 12.6, 12.4, 12.2, 12.7, 12.0, 12.6, 13.2, 14.1]
-->  I can confirm that the values are different, as expected.


**Save to file:**

In [22]:
with open('../data/open-meteo/multicity_historical.json', 'w') as file:
    json.dump(historical_temperatures, file)

---

💡 **Ideas for the future**:

- Have a single function `get_weather_data(country_code, city_name, world_cities, reference_date)` that returns a single dictionary with the weather forecast **and** historical data for a city.
- Maybe this function could even have a parameter `num_years` to get historical data for the last `num_years` years.
