<font style='font-size:1.5em'>**💻 Week 03 Lab (✅ Solutions)** </font>

<font style='font-size:1.2em'>LSE DS105A – Data for Data Science (2024/25)</font>

**DATE:** 21 October 2024

**AUTHORS:**  

- Jon

**OBJECTIVE**: Provide a possible solution to the Week 03 Lab using lots of Python functions.


<div style="background-color: #fff; margin-top:1em;border-radius: 10px; box-shadow: 0 4px 8px rgba(0, 0, 0, 0.1); padding: 0.75em; flex: 1 1 calc(45% - 20px);align-items:middle;box-sizing: border-box;font-size:0.9em;width:300px">

📝 **Note:** there won't be any solutions for the Bonus Task or Challenge Task. 

But we will be more than happy to assist you in solving those. Just post your questions to Slack or attend a support session.

</div>

---

In [1]:
import os
import json

import requests

import pandas as pd

<details style="width:70%;font-size:0.9em;border: 1px solid #aaa;border-radius: 4px;padding: .5em;margin-left:1.5em"><summary style="    font-weight: bold;margin: -.5em -.5em 0;padding: .5em;border-bottom: 1px solid #aaa;">🔵 Click here if you got an error with the cell above</summary>

If the cell above throws an error when you run it, it's because you need to install additional Python libraries.

In that case, go to the menu and click "Terminal" -> "New Terminal". Then, on the terminal run:

```bash
pip install requests numpy pandas lets-plot
```

OR

```bash
python -m pip install requests numpy pandas lets-plot
```

Wait for it to complete, then come back here (you can close the Terminal window), click "Restart" at the top of this notebook and try again.

⭐ Pro-Tip: Alternatively, you can run Terminal commands from here! Open a new Python cell below and add a `!` to your prompt, like this:

```bash
! pwd
```

</details>

# 1. ⚙️ Setup

## 1.1 Data Collection Strategy

My focus here is just to collect daily min/max temperature data for the city <CITY> over the year 2023. I will analyse the data later in NB02.

 (_I will not work with forecast data in this notebook._)


For this, I will use the [Open-Meteo API](https://open-meteo.com/en/docs):

| Endpoint         | URL starts with                      |
|------------------|--------------------------------------|
| [Historical Weather Data](https://open-meteo.com/en/docs/historical-weather-api) | `https://archive-api.open-meteo.com/v1/archive`   |


<div style="background-color:white;padding:0.5em;border-radius:0.5em;font-family: monospace;border: 1px solid #eda291;font-size:1.05em;width:600px;margin-top:20px" id="yui_3_17_2_1_1729500472157_68">
<p style="margin-top: 0;margin-bottom: 1rem;">💽 <strong>DATA SPECIFICATION CARD:</strong></p>
<ul style="margin-top: 0;margin-bottom: 1rem;padding-left: 2rem;">
<li><strong>City:</strong> A selected city from the <code style="background-color: #f6f6f6;padding: .2em;font-size: 0.875em;color: #9753b8;white-space: pre-wrap;border-radius: .25rem;word-wrap: break-word;font-family:'SFMono-Regular','Menlo';direction: ltr;unicode-bidi: bidi-override;">world_cities.csv</code> file.</li>
<li><strong>Date Period:</strong> Every single day of the year 2023.</li>
<li><strong>Variables:</strong> Daily minimum and maximum temperatures.</li>
</ul>
</div>


## 1.2 Definitions


In [2]:
selected_country = 'GB'
selected_city    = 'London'

## 1.3 Helpful functions

If I create functions, I will place them here at the beginning of the notebook, so that I can reuse them later.


<div style="font-size:0.75em;margin-left:0.5em;width:60%;border: 1px solid #ddd;padding: .5em">

Note from Jon: I already left a function here to read the world cities data and return the latitude and longitude of a city. You can use it to get the coordinates of the city you want to analyse.

</div>

In [3]:
def get_lat_lon(country_code, city):
    
    filepath = '../data/world_cities.csv'
    world_cities = pd.read_csv(filepath)

    # This is how we filter data in pandas
    city_data = world_cities[(world_cities['country'] == country_code) & 
                             (world_cities['name'] == city)]
    
    # Convert the data to a list of dictionaries
    city_data = city_data.to_dict('records')
    
    if len(city_data) == 0:
        raise ValueError(f"No records found for {city}, {country_code} in {filepath}")

    latitude = city_data[0]['lat']
    longitude = city_data[0]['lng']

    return latitude, longitude

I wrote a function to construct the URL for me. This way I can call it anytime inside or outside another function:

In [4]:
def build_url(latitude: float, longitude: float, start_date:str , end_date: str):
    base_historical_url = "https://archive-api.open-meteo.com/v1/era5?"
    params_lat_long     = "latitude=" + str(latitude) + "&longitude="  + str(longitude)
    params_date         = "&start_date=" + start_date + "&end_date=" + end_date

    # I want the daily min and max temperatures
    # Setting the timezone to Europe/London
    params_others       = "&daily=temperature_2m_max,temperature_2m_min&timezone=Europe%2FLondon"

    final_url = base_historical_url + params_lat_long + params_date + params_others

    return final_url

Below is an adapted version of the `get_historical_data()` function from the ✅ [W03 Formative Exercise solutions](https://moodle.lse.ac.uk/mod/page/view.php?id=1538617).

I tweaked it so I can reuse the useful functions above:

- `get_lat_lon()`

- `build_url()`

And, as I am not sure if the same dictionary keys would work, this version of the code returns the original dictionary representing the JSON data. 

I will still have to filter out to get the min max temperatures.

In [5]:
def get_historical_data(country_code, city_name, start_date, end_date):
    """
    Retrieves the JSON that comes out of the OpenMeteo Historical

    Parameters:
        country_code (str): The country code of the location.
        city_name (str): The name of the city.
        start_date (str): The start date of the historical data in the format 'YYYY-MM-DD'.
        end_date (str): The end date of the historical data in the format 'YYYY-MM-DD'.
        world_cities (dict): A dictionary containing world cities data.

    Returns:
        dict: A dictionary representing a JSON file
    """

    latitude, longitude = get_lat_lon(country_code, city_name)

    url = build_url(latitude, longitude, start_date, end_date)

    response = requests.get(url)

    historical_data = response.json()
    return historical_data


In [6]:
# Demonstrate the this function works as intended
sample_historical_data = get_historical_data('GB', 'London', '2023-10-13', '2023-10-19')
print(f"The function returned an object of type: {type(sample_historical_data)}")
print(f"This dictionary has the following keys: {sample_historical_data.keys()}")
print(f"The information I want is under the following keys:")
print(f"  sample_historical_data['daily']['time'] \t\t\t- Sample: {sample_historical_data['daily']['time'][0:3]}")
print(f"  sample_historical_data['daily']['temperature_2m_min'] \t- Sample: {sample_historical_data['daily']['temperature_2m_min'][0:3]}")
print(f"  sample_historical_data['daily']['temperature_2m_max'] \t- Sample: {sample_historical_data['daily']['temperature_2m_max'][0:3]}")

The function returned an object of type: <class 'dict'>
This dictionary has the following keys: dict_keys(['latitude', 'longitude', 'generationtime_ms', 'utc_offset_seconds', 'timezone', 'timezone_abbreviation', 'elevation', 'daily_units', 'daily'])
The information I want is under the following keys:
  sample_historical_data['daily']['time'] 			- Sample: ['2023-10-13', '2023-10-14', '2023-10-15']
  sample_historical_data['daily']['temperature_2m_min'] 	- Sample: [9.8, 6.7, 4.3]
  sample_historical_data['daily']['temperature_2m_max'] 	- Sample: [20.5, 13.3, 10.5]


## 1.4 Get latitude and longitude

In [7]:
latitude, longitude = get_lat_lon(selected_country, selected_city)

Checking that the function works:

In [8]:
print(f"The latitude & longitude of {selected_city} ({selected_country}) are: ({latitude}, {longitude})")

The latitude & longitude of London (GB) are: (51.50853, -0.12574)


# 2. Collecting Data

**Checking the URL**

I moved the code I originally wrote here to a function `build_url()` in Section 1.3 of this notebook.

This way, I can confirm that I get the URL by simply calling the function:

In [9]:
url = build_url(latitude, longitude, '2023-01-01', '2023-12-31')
url

'https://archive-api.open-meteo.com/v1/era5?latitude=51.50853&longitude=-0.12574&start_date=2023-01-01&end_date=2023-12-31&daily=temperature_2m_max,temperature_2m_min&timezone=Europe%2FLondon'

<div style="margin-left:2.5em">

☝️

When I click on it, I can see from my browser that I built the URL correctly. The data it returns is precisely what I need.

I could send a `requests.get(url)` to obtain a JSON response (in the form of UTF-8 string bytes).

</div>

**Checking the JSON response**

I added a function `get_historical_data()` to Section 1.3 to handle the request for me.

With this function, I don't need to provide the geographical data, I just pass the name of the city and country. 

Internally, the function calls the other helpful functions I wrote as needed.

In [10]:
json_data = get_historical_data(selected_country, selected_city, '2023-01-01', '2023-12-31') 

I've already checked that this function works (in section 1.3), so I'll just double check that the length of the lists match the period of dates I provided:

In [11]:
dates    = json_data['daily']['time']
min_temp = json_data['daily']['temperature_2m_min']
max_temp = json_data['daily']['temperature_2m_max']

# There should be 365 elements in all of these lists
len(dates) == len(min_temp) == len(max_temp) == 365

True

## Reshaping the data & saving to file

(this addresses step 6 of Part II of the lab)


In [12]:
json_data = get_historical_data(selected_country, selected_city, '2023-01-01', '2023-12-31') 

final_data = {
    "country"  : selected_country,
    "city"     : selected_city,
    "date"     : json_data['daily']['time'],
    "min_temp" : json_data['daily']['temperature_2m_min'],
    "max_temp" : json_data['daily']['temperature_2m_max']
}

with open('../data/open-meteo/daily_temp.json', 'w') as file:
    json.dump(final_data, file)

--- 

All done.

Now I don't need to download the data every time I need to run analysis. I can just simply read the resulting JSON file from another notebook.

**\#TODO** | Things I would do if I had more time here:

- Further edit the `get_historical_data` to return data in the `final_data` format