<font style='font-size:1.5em'>**✔️ Week 03 Formative Exercise Solution** </font>

<font style='font-size:1.2em'>LSE DS105A – Data for Data Science (2024/25)</font>


<div style="color: #333333; background-color:rgba(93, 158, 188, 0.15); border-radius: 10px; box-shadow: 0 4px 8px rgba(0, 0, 0, 0.1); padding: 20px; margin: 10px; flex: 1 1 calc(45% - 20px);min-width: 250px;max-width: 350px;align-items:top;min-height: calc(45% - 20px); box-sizing: border-box;font-size:0.9em;">

This is a notebook of solutions to 📝 **W03 Formative Exercise** (due 16 Oct 2024) of the course **DS105A - Data for Data Science** at the [LSE Data Science Institute](https://lse.ac.uk/dsi).

</div>


**AUTHORS:**  Dr. [Jon Cardoso-Silva](https://jonjoncardoso.github.io)

**DEPARTMENT:** [LSE Data Science Institute](https://lse.ac.uk/dsi)

**OBJECTIVE**: Use the exercises to show how to use loops and functions in Python.

---

<details style="width:70%;font-size:0.9em;border: 1px solid #aaa;border-radius: 4px;padding: .5em;margin-left:1.5em"><summary style="    font-weight: bold;margin: -.5em -.5em 0;padding: .5em;border-bottom: 1px solid #aaa;">🔵 Click here if you got an error with the cell above</summary>

If the cell above throws an error when you run it, it's because you need to install additional Python libraries.

In that case, go to the menu and click "Terminal" -> "New Terminal". Then, on the terminal run:

```bash
pip install requests numpy pandas lets-plot
```

OR

```bash
python -m pip install requests numpy pandas lets-plot
```

Wait for it to complete, then come back here (you can close the Terminal window), click "Restart" at the top of this notebook and try again.

⭐ Pro-Tip: Alternatively, you can run Terminal commands from here! Open a new Python cell below and add a `!` to your prompt, like this:

```bash
! pwd
```

</details>

In [2]:
import json

import numpy as np
import pandas as pd

from lets_plot import *
LetsPlot.setup_html()

from IPython.display import Image

# 1. Reading Data

I want to read data from the two JSON files created in NB01 and 'merge' them into a single dictionary in the following format:

<div style="font-size:0.8em;width:40%">

```python
{
    "forecast": {
        "London": [list of temperatures],
        "Paris" : [list of temperatures]
    },
    "historical": {
        "London": [list of temperatures],
        "Paris" : [list of temperatures]
    }
}
```

</div>

In [3]:
with open('../data/open-meteo/multicity_forecast.json', 'r', encoding='utf-8') as file:
    forecast = json.load(file)

with open('../data/open-meteo/multicity_historical.json', 'r', encoding='utf-8') as file:
    historical = json.load(file)

We put both dictionaries in a single dictionary with two keys: "forecast" and "historical":

In [4]:
temp_comparison = {
    'forecast': forecast,
    'historical': historical
}

Check that it looks as expected:

In [5]:
temp_comparison.keys()

dict_keys(['forecast', 'historical'])

Check that the cities match in both inner dictionaries:

In [6]:
temp_comparison['forecast'].keys() == temp_comparison['historical'].keys()

True

Check that the number of elements in the lists match:

In [7]:
len(temp_comparison['forecast']['London']) == len(temp_comparison['historical']['London'])

True

# 2. Compare Temperatures

**Q:** Are the current forecast temperatures higher than the ones for the same period last year?

In [8]:
df_comparison = (
    pd.DataFrame(temp_comparison)
    .explode('forecast')
    .explode('historical')
    .reset_index(names='city')
    .groupby(['city'])
    .apply(lambda x: pd.Series({'avg_temp_diff': np.mean(x['forecast'] - x['historical'])}), include_groups=False)
)

print("Average difference in temperature from this year's forecast and last year's records:")
print(" (per location)")
df_comparison.style.background_gradient(subset='avg_temp_diff', cmap='cividis')

Average difference in temperature from this year's forecast and last year's records:
 (per location)


Unnamed: 0_level_0,avg_temp_diff
city,Unnamed: 1_level_1
Amsterdam,-2.40119
Bern,-5.747024
Brussels,-4.677976
London,-4.729762
Luxembourg,-5.560714
Monaco,-3.314881
Paris,-5.201786
Vaduz,-9.288095
Vienna,-9.916667


---

I like to be extra, so I have a question:

**Q:** At a more granular level, instead of averaging, how do the temperatures compare hour by hour?

In [9]:
# First create a DataFrame with the forecast and historical data for each city

num_city = len(temp_comparison['forecast'])
num_days = len(temp_comparison['forecast']['London'])

plot_df = (
    pd.DataFrame(temp_comparison)
        .apply(lambda x: pd.Series({'forecast'  : x['forecast'],
                                    'historical': x['historical'],
                                    'index'     : [i % num_days  for i in range(len(x['forecast']))], 
                                    'day'       : [int((i % num_days) / 24) + 1 for i in range(len(x['forecast']))],
                                    'hour'      : [i % 24 for i in range(len(x['forecast']))]}),
                                    axis=1)
        .explode(['forecast', 'historical', 'index', 'day', 'hour'])
        .reset_index(names='city')
        .assign(direction=lambda x: np.where(x['forecast'] > x['historical'], 'up', 'down'))
        # .melt(id_vars=['city', 'index', 'day', 'hour'], value_vars=['forecast', 'historical'], var_name='type', value_name='temperature')
)

plot_df

Unnamed: 0,city,forecast,historical,index,day,hour,direction
0,London,8.3,14.4,0,1,0,down
1,London,8.3,14.3,1,1,1,down
2,London,8.4,14.2,2,1,2,down
3,London,8.2,14.0,3,1,3,down
4,London,8.1,14.1,4,1,4,down
...,...,...,...,...,...,...,...
1507,Bern,0.1,11.4,163,7,19,down
1508,Bern,0.3,11.2,164,7,20,down
1509,Bern,0.6,10.9,165,7,21,down
1510,Bern,0.7,10.7,166,7,22,down


Here is the real work 🏗️:

In [10]:
valid_rows = ((plot_df['city'] == 'Paris') | (plot_df['city'] == 'Brussels')) & (plot_df['day'] == 1)  

#'day == 1 & (city == "Paris" | city == "Brussels")'

plot_df = plot_df[valid_rows].copy()

In [11]:
(
    ggplot(data=plot_df, mapping=aes(x='hour')) +
    geom_point(aes(y='forecast'), color='red') +
    geom_point(aes(y='historical'), color='blue') +
    facet_grid(y='city')
)

In [13]:
forecast_tooltips = (
    layer_tooltips()
    .format('@hour', '{.0f}:00')
    .line("City | @city")
    .line("Day | @day")
    .line("Hour | @hour")
    .line("Forecast (*) | @forecast")
    .line("Historical |@historical")
)

historical_tooltips = (
    layer_tooltips()
    .format('@hour', '{.0f}:00')
    .line("City | @city")
    .line("Day | @day")
    .line("Hour | @hour")
    .line("Forecast | @forecast")
    .line("Historical (*) |@historical")
)

(
    ggplot(plot_df.query('day == 1 & (city == "Paris" | city == "Brussels")'), aes(x='hour')) +
    geom_point(aes(y='forecast'), size=2, tooltips=forecast_tooltips, shape=21, fill='#111111') +
    geom_point(aes(y='historical'), size=2.3, tooltips=historical_tooltips, shape=21, fill='white', stroke=0.7, alpha=0.5) +
    geom_segment(aes(xend='hour', y='forecast', yend='historical', color='direction'), 
                 arrow=arrow(ends="first", type = "closed", length=6)) +
    facet_grid(y='city') +
    scale_color_manual(name='Temperature change direction', guide='none',
                       values={'up': '#e26a4f', 'down': '#5d9ebc'}) +
    theme_bw() +
    scale_y_continuous(name="Temperature (°C)", 
                       breaks=list(range(0, 40, 10))) +
    labs(title="🥶 Temperatures were lower today compared to same period last year",
         subtitle=("Black dots are this year's forecast temperatures, white dots are last year's records.\n"
                   "⬇ Downward arrows indicate that the forecast is lower than the historical records."),
         x="Hour of the day") +
    theme(axis_text_x=element_text(size=16),
          axis_text_y=element_text(size=14),
          axis_title=element_text(size=18, face='bold'),
          plot_title=element_text(size=24, face='bold'),
          plot_subtitle=element_text(margin=[5, 10]),
          strip_background=element_rect(fill='#324655'),
          strip_text=element_text(size=15, face='bold', color='white')) +
    ggsize(1000, 400)
)