# **W06 Summative**
## "Is London really as rainy as the movies make it out to be?"

--------------------------------------------------------------------------------------------

**By**: So Hoi Ling (Vienna)
<br><br>

**Project Description**: This project conducted data analysis on the amount of rain in London, and as compared to other cities.

**API**: OpenMeteo API [OpenMeteo] (https://open-meteo.com/)
<br><br>

**Measure of Raininess**: Amount of rain (mm)

**Time Period**: 2019-2023

**Cities**: The list of cities can be found in `../data/cities_config.json`

**Variables**: rain_sum



--------------------------------------------------------------------------------------------

In [1]:
# Standard library imports
import csv

# Third-party imports
import pandas as pd
from IPython.display import display
import ipywidgets as widgets
from lets_plot import *

# Setup for LetsPlot
LetsPlot.setup_html()

In [20]:
# Run this cell when new functions have been added to my_functions.py
import importlib
import my_functions

importlib.reload(my_functions)

<module 'my_functions' from '/Users/shl/Documents/lse/ds105/ds105a-2024-w06-summative-so-hl/code/my_functions.py'>

--------------------------------------------------------------------------------------------

## **1. Data Collection and Preparation**

The data collection was conducted in collect_data.py, which can be run from the terminal under this command:

```bash 
python collect_data.py ../data/world_cities.csv --london_daily_output ../data/london_daily_rain.csv --london_hourly_output ../data/london_hourly_rain.csv --all_daily_output ../data/all_daily_rain.csv --all_hourly_output ../data/all_hourly_rain.csv
```

**EXPLANATION**

The key rationales for this approach are:
1. **Modularity** - to maintain and update the data collection logic independently from the analysis notebook  

2. **Performance** - faster terminal execution compared to running in a Jupyter notebook, especially considering the large dataset. 

*Note*: we will only be using the daily rain data for this notebook.

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

## **2. Load Data**

In this section, we will load the daily rainfall data for London and other cities from the preprocessed CSV files. This data will be used for our analysis to compare London's rainfall with that of other cities.

In [3]:
#  Load all data from CSV files
london = pd.read_csv("../data/london_daily_rain.csv")
cities = pd.read_csv("../data/all_daily_rain.csv")

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

## **3. Data Preparation and Visualisation**

In this section, we will prepare the datasets by renaming columns for consistency and converting date columns into datetime formats.

In [4]:
# Aesthetics
for df in [london, cities]:
    my_functions.rename_columns(df)  # Rename columns for consistency
    df["date"] = pd.to_datetime(df["date"])
if df["date"].isnull().any():  # Check for conversion errors
    print(
        f"Warning: Some date entries could not be converted in the {df['city'][0]} DataFrame."
    )

# Add a city label to the London DataFrame
london["city"] = "London"

Before conducting any analysis, let's visualise and get a sensing of the data.

In [5]:
# Preview London data
london.head()

Unnamed: 0,date,rain,city
0,2019-01-01,0.2,London
1,2019-01-02,0.0,London
2,2019-01-03,0.0,London
3,2019-01-04,0.0,London
4,2019-01-05,0.0,London


In [6]:
# Preview all cities data
cities.head()

Unnamed: 0,date,rain,city
0,2019-01-01,0.0,"NG,Abuja"
1,2019-01-02,0.0,"NG,Abuja"
2,2019-01-03,0.0,"NG,Abuja"
3,2019-01-04,0.0,"NG,Abuja"
4,2019-01-05,0.0,"NG,Abuja"


**What is the mean, median, maximum and minimum amount of rainfall in London?**

In [7]:
# Mean rainfall
mean_rain = london["rain"].mean()
print(f"The mean amount of rainfall per day is {round(mean_rain, 2)}mm.")

The mean amount of rainfall per day is 2.07mm.


In [8]:
# Median rainfall
median_rain = london["rain"].median()
print(f"The median amount of rainfall per day is {round(median_rain, 2)}mm.")

The median amount of rainfall per day is 0.3mm.


In [9]:
# Max rainfall
max_rain = london["rain"].max()
print(f"The maximum amount of rainfall per day in London is {max_rain}mm.")

The maximum amount of rainfall per day in London is 35.2mm.


In [10]:
# Min rainfall
min_rain = london["rain"].min()
print(f"The minimum amount of rainfall per day in London is {min_rain}mm.")

The minimum amount of rainfall per day in London is 0.0mm.


--------------------------------------------------------------------------------------------

## **4. Data Analysis**

### Measure of rainfall: Amount of Rain

In this section, we will compare the amount of rain (measured in millimeters) in London against various cities globally. This analysis aims to determine if London is indeed as rainy as it is often portrayed.

To facilitate a thorough comparison, the *daily rain data* has been grouped by *days, months, and seasons*. This allows us to calculate the average amount of rain for each time period in both London and other cities. By analyzing these averages, we can gain insights into the typical rainfall patterns in London compared to other regions.

The following analyses will include:
- Daily rainfall comparisons
- Monthly rainfall averages
- Seasonal rainfall trends

These analyses will help answer the question: *Is London really as rainy as the movies make it out to be?*

--------------------------------------------------------------------------------------------

### **Average Daily Rainfall (mm)**  
In this section, we will compare the average daily rainfall (measured in millimeters) in London over a 5-year period (2019-2023) with the average daily rainfall from 40 other selected cities around the world. 

We will:
1. Use the `calculate_average_rainfall` function from `my_functions` to compute the average daily rainfall for the dataset containing all cities.
2. Plot the results using the `plot_rainfall` function (using `ggplot`) to visually compare the average daily rainfall in London against the average from the other cities.


This analysis will help us visualize the daily rainfall trends and assess whether London experiences significantly more or less rainfall compared to other urban areas over the given time period.

In [11]:
# Average rainfall per day
average_day = my_functions.calculate_average_rainfall(cities)

# Plotting line graphs of daily rainfall
my_functions.plot_rainfall(
    london, average_day, "London does not deviate much from average city's rainfall"
)

**ANALYSIS**:  
As shown through the plot, this comparison does not yield much valuable insights into the comparison of rainfall between London and the average city. This is especially due to the difference in variance of data - while average city takes the average rainfall in 40 cities, the "London" plot only tracks London's rainfall, thus is prone to significantly more outliers. 

**FURTHER ANALYSIS**:  
Let's conduct further analysis by comparing the median daily amount of rainfall in London compared to the average city.

In [12]:
# London's median daily rainfall
london_median_daily = london["rain"].median()

# Average city's median daily rainfall
city_median_daily = average_day["rain"].median()

# Print the results
if london_median_daily > city_median_daily:
    print(
        f"London's median daily rainfall ({london_median_daily}mm) is higher than that of the average city ({city_median_daily}mm)."
    )
elif london_median_daily < city_median_daily:
    print(
        f"London's median daily rainfall ({london_median_daily}mm) is lower than that of the average city ({city_median_daily}mm)."
    )
else:
    print(
        f"London's median daily rainfall is the same as that of the average city at ({london_median_daily}mm)."
    )

London's median daily rainfall (0.3mm) is lower than that of the average city (2.38mm).


From this analysis, we conclude that London is actually significantly less rainy on a daily basis compared to the average city. 

--------------------------------------------------------------------------------------------

### **Average Daily Rainfall (mm) by Region**  
In this section, we will compare the average daily rainfall (measured in millimeters) in London over a 5-year period (2019-2023) with the average daily rainfall by region from 40 other selected cities around the world. The cities are grouped by region in `map_region` based on continents, which are specifed in `cities_config.json`. A large and approximately equal number of cities was used for each region to ensure accuracy and prevent bias.

We will:
1. Use the `calculate_region_average_rainfall` function from `my_functions` to compute the average daily rainfall by region for the dataset containing all cities.
2. Plot the results using the `plot_region_rainfall` function (using `ggplot`) to visually compare the average daily rainfall in London against the average from the other cities by region. For regional analyses, I have created a special widget so that the user can select which plots to view at one time. The instructions for usage are elaborated on below.


This analysis will help us visualise the daily rainfall trends and assess whether London experiences significantly more or less rainfall compared to the average city in other regions over the given time period.

**EXPLANATION**:  

For regional analyses, you will be able to select which plots you wish to view. Here is a rundown of how that was achieved: 

1. **`unique_regions`** creates a dropdown using `widgets`, allowing for multiple regions to be selected at one time. 

2. **`update_plot`**  then adapts to plot the selected regions in `unique_regions`.

These 2 functions were linked, and London is displayed by default for aesthetic purposes.
<br></br>

**INSTRUCTIONS**: 
1. **Select regions**  
* Use the dropdown to select regions you want to visualise alongside London.  
* You can select multiple regions by holding down `Ctrl` on Windows or `Cmd` on macOS while clicking on the region names.  
* For adjacent regions on the list, simply click on one and drag to the adjacent regions.


2. **View plot**  
Once you have made the selections, the rainfall plot will automatically update to reflect your choices.


3. **Reset selections**   
To start over, deselect all options in the dropdown. The default plot (London) will remain.

In [13]:
# Average rainfall per day by region
average_day_region = my_functions.calculate_region_average_rainfall(cities)

#  Create a dropdown for region selection
unique_regions = average_day_region["region"].unique().tolist()
region_dropdown = widgets.SelectMultiple(
    options=unique_regions, description="Select:", disabled=False
)

# Create an output widget to display the plot
output = widgets.Output()


def update_plot(change):
    """Function to plot line graphs of daily rainfall by region based on dropdown"""
    output.clear_output(wait=True)

    with output:
        selected_regions = list(change["new"])

        # If no regions are selected, default to the first unique region
        if not selected_regions:
            selected_regions = [unique_regions[0]]

        # Plot rainfall data for selected regions
        plot = my_functions.plot_regional_rainfall(
            london, average_day_region, selected_regions
        )
        display(plot)


# Link dropdown to update function
region_dropdown.observe(update_plot, names="value")

# Display dropdown and output
display(region_dropdown, output)

# Display London by default
update_plot({"new": [unique_regions[0]]})

SelectMultiple(description='Select:', options=('Africa', 'Asia', 'Europe', 'North America', 'Oceania', 'South …

Output()

**ANALYSIS**:   
As expected, the variance of the plot of average city in a specific region increases as fewer cities are averaged, thus the amount of variance of London and the average city in a region is reduced. While there are marginal differences in daily rainfall for the average city in Africa, Europe, North America, Oceania and South America and London, the average city in Asia appears to have a significantly higher median rainfall.

**FURTHER ANALYSIS**:  
Let's conduct further analysis by comparing the median daily amount of rainfall in London compared to the average city in a particular region.

In [15]:
# Initialise a dictionary to store the results for each region
region_median_daily = {}

# Loop through each region in average_day_region
for region in average_day_region["region"].unique():
    # Get the median rain for current region
    region_median_daily[region] = average_day_region[
        average_day_region["region"] == region
    ]["rain"].median()

# Print the results
for region, median in region_median_daily.items():
    if london_median_daily > median:
        print(
            f"London's median daily rainfall ({london_median_daily:.2f}mm) is higher than that of the average city ({median:.2f}mm) in {region}."
        )
    elif london_median_daily < median:
        print(
            f"London's median daily rainfall ({london_median_daily:.2f}mm) is lower than that of the average city ({median:.2f}mm) in {region}."
        )
    else:
        print(
            f"London's median daily rainfall is the same as that of the average city in {region} at ({median:.2f}mm)."
        )

London's median daily rainfall (0.30mm) is lower than that of the average city (1.28mm) in Africa.
London's median daily rainfall (0.30mm) is lower than that of the average city (3.23mm) in Asia.
London's median daily rainfall (0.30mm) is lower than that of the average city (1.44mm) in Europe.
London's median daily rainfall (0.30mm) is lower than that of the average city (1.30mm) in North America.
London's median daily rainfall (0.30mm) is lower than that of the average city (1.40mm) in Oceania.
London's median daily rainfall (0.30mm) is lower than that of the average city (0.78mm) in South America.


In [21]:
# Create a raininess ranking
my_functions.raininess_rank(london_median_daily, region_median_daily)

Rainfall Ranking by Median Rainfall (in descending order):
1. Average city in Asia: 3.23mm
2. Average city in Europe: 1.44mm
3. Average city in Oceania: 1.40mm
4. Average city in North America: 1.30mm
5. Average city in Africa: 1.28mm
6. Average city in South America: 0.78mm
7. London: 0.30mm


Analysing the above, London is the least rainy by median rainfall, while Asia has the highest median rainfall (far higher than the average city in any other region). 

--------------------------------------------------------------------------------------------

### **Average Monthly Rainfall (mm)**  
In this section, we will compare the average monthly rainfall (measured in millimeters) in London over a 5-year period (2019-2023) with the average monthly rainfall from 40 other selected cities around the world. 

We will:
1. Use the `calculate_average_rainfall` function from `my_functions` to compute the average monthly rainfall for the dataset containing all cities. `group_month` is used in the function to group the rainfall data monthly. 
2. Plot the results using the `plot_rainfall` function (using `ggplot`) to visually compare the average monthly rainfall in London against the average from the other cities.


This analysis will help us visualize the monthly rainfall trends and assess whether London experiences significantly more or less rainfall compared to other urban areas over the given time period.

In [22]:
# Average rainfall per month
average_month = my_functions.calculate_average_rainfall(cities, frequency="M")
london_month = my_functions.group_month(london)
london_month["city"] = "London"
london_month["rain"] = round(london_month["rain"], 2)

# Plotting line graphs of monthly rainfall
my_functions.plot_rainfall(
    london_month,
    average_month,
    "London and average city's rainfall are more closely aligned towards the end of the time period",
)

**ANALYSIS**:  
The overall trend suggests that London's rainfall levels are generally in line with or slightly below the average city's levels, except for some extreme fluctuations.  
Interestingly, towards the end of the period (2023-2024), both London and the average city have more closely aligned rainfall levels, with fewer extreme differences.

**FURTHER ANALYSIS**:  
Let's conduct further analysis by comparing the median monthly amount of rainfall in London compared to the average city.

In [24]:
# London's median monthly rainfall
london_median_monthly = london_month["rain"].median()

# Average city's median monthly rainfall
city_median_monthly = average_month["rain"].median()

# Print the results
if london_median_monthly > city_median_monthly:
    print(
        f"London's median monthly rainfall ({london_median_monthly:.2f}mm) is higher than that of the average city ({city_median_monthly:.2f}mm)."
    )
elif london_median_daily < city_median_daily:
    print(
        f"London's median monthly rainfall ({london_median_monthly:.2f}mm) is lower than that of the average city ({city_median_monthly:.2f}mm)."
    )
else:
    print(
        f"London's median monthly rainfall is the same as that of the average city at ({london_median_monthly:.2f}mm)."
    )

London's median monthly rainfall (57.45mm) is lower than that of the average city (77.88mm).


From this analysis, we conclude that London is less rainy on a monthly basis compared to the average city.

--------------------------------------------------------------------------------------------

### **Average Monthly Rainfall (mm) by Region**  
In this section, we will compare the average monthly rainfall (measured in millimeters) in London over a 5-year period (2019-2023) with the average monthly rainfall by region from 40 other selected cities around the world. The cities are grouped by region in `map_region` based on continents, which are specifed in `cities_config.json`. A large and approximately equal number of cities was used for each region to ensure accuracy and prevent bias.

We will:
1. Use the `calculate_region_average_rainfall` function from `my_functions` to compute the average monthly rainfall by region for the dataset containing all cities. `group_month` is used in the function to group the rainfall data monthly. 
2. Plot the results using the `plot_region_rainfall` function (using `ggplot`) to visually compare the average monthly rainfall in London against the average from the other cities by region. For regional analyses, I have created a special widget so that the user can select which plots to view at one time. The instructions for usage are elaborated on below.


This analysis will help us visualise the monthly rainfall trends and assess whether London experiences significantly more or less rainfall compared to the average city in other regions over the given time period.

For **EXPLANATION** and **INSTRUCTIONS**, refer to [Average Daily Rainfall (mm) by Region](#average-daily-rainfall-mm-by-region)

In [25]:
# Average rainfall per month by region
average_month_region = my_functions.calculate_region_average_rainfall(
    cities, frequency="M"
)
london_month_region = london_month.copy()
london_month_region["region"] = "London"

#  Create a dropdown for region selection
unique_regions_month = average_month_region["region"].unique().tolist()

region_dropdown_month = widgets.SelectMultiple(
    options=unique_regions_month,
    description="Select:",
    disabled=False,
    value=(unique_regions_month[0],),  # Default selection
)

# Create an output widget to display the plot
output_month = widgets.Output()


def update_plot_month(change):
    """Function to plot line graphs of daily rainfall by region based on dropdown"""
    with output_month:
        output_month.clear_output(wait=True)
        selected_regions_month = list(change["new"])

        # Always include London in the plot
        if "London" not in selected_regions_month:
            selected_regions_month.append("London")

        # Call plotting function
        plot = my_functions.plot_regional_rainfall(
            london_month_region,
            average_month_region,
            selected_regions_month,
            frequency="M",
        )
        # Display the plot
        display(plot)


# Link dropdown to update function
region_dropdown_month.observe(update_plot_month, names="value")
display(region_dropdown_month, output_month)

# Display London by default
update_plot({"new": region_dropdown_month.value})

SelectMultiple(description='Select:', index=(0,), options=('Africa', 'Asia', 'Europe', 'North America', 'Ocean…

Output()

**ANALYSIS**:  
London's monthly rainfall trend matches the general trend of monthly rainfall of an average city in Europe very closely, which is expected given that London is a city in Europe. 

The average city in Asia has a significantly higher amount of rainfall than London.  

The average city in the other regions have comparable trends to London's monthly rainfall. A few interesting trends are noted below:
* The monthly rainfall in London and an average city in North America matched very closely from Oct 2021  to Dec 2021.
* The trend for monthly rainfall in London and an average city in South America were in opposite directions from Apr 2021 to Apr 2022, and the monthly rainfall trends in an average city in South America trailed that of London from Oct 2022 - Dec 2023.
* The monthly rainfall in London was significantly higher than that of an average city in Oceania from Nov 2021 to Jun 2022. 

**FURTHER ANALYSIS**:  
Let's conduct further analysis by comparing the median monthly amount of rainfall in London compared to the average city in a particular region.

In [26]:
# Initialise a dictionary to store the results for each region
region_median_monthly = {}

# Loop through each region in average_month_region
for region in average_month_region["region"].unique():
    # Get the median rain for current region
    region_median_monthly[region] = average_month_region[
        average_month_region["region"] == region
    ]["rain"].median()

# Print the results
for region, median in region_median_monthly.items():
    if london_median_monthly > median:
        print(
            f"London's median monthly rainfall ({london_median_monthly:.2f}mm) is higher than that of the average city ({median:.2f}mm) in {region}."
        )
    elif london_median_monthly < median:
        print(
            f"London's median monthly rainfall ({london_median_monthly:.2f}mm) is lower than that of the average city ({median:.2f}mm) in {region}."
        )
    else:
        print(
            f"London's median monthly rainfall is the same as that of the average city in {region} at ({median:.2f}mm)."
        )

London's median monthly rainfall (57.45mm) is higher than that of the average city (52.69mm) in Africa.
London's median monthly rainfall (57.45mm) is lower than that of the average city (113.60mm) in Asia.
London's median monthly rainfall (57.45mm) is higher than that of the average city (52.99mm) in Europe.
London's median monthly rainfall (57.45mm) is lower than that of the average city (62.81mm) in North America.
London's median monthly rainfall (57.45mm) is lower than that of the average city (81.06mm) in Oceania.
London's median monthly rainfall (57.45mm) is higher than that of the average city (49.64mm) in South America.


In [27]:
# Create a raininess ranking
my_functions.raininess_rank(london_median_monthly, region_median_monthly)

Rainfall Ranking by Median Rainfall (in descending order):
1. Average city in Asia: 113.60mm
2. Average city in Oceania: 81.06mm
3. Average city in North America: 62.81mm
4. London: 57.45mm
5. Average city in Europe: 52.99mm
6. Average city in Africa: 52.69mm
7. Average city in South America: 49.64mm


Analysing the above, London is in the middle of the rankings, indicating an average amount of raininess by median rainfall on a monthly basis. The average city in Asia is the rainiest by median monthly rainfall, while the average city in South America is the least rainy by median monthly rainfall.

--------------------------------------------------------------------------------------------

### **Average Seasonal Rainfall (mm)**  
In this section, we will compare the average seasonal rainfall (measured in millimeters) in London over a 5-year period (2019-2023) with the average seasonal rainfall from 40 other selected cities around the world. 

We will:
1. Use the `group_season` function from `my_functions` to group and compute the average seasonal rainfall for London and other cities.
2. Plot the results using `ggplot` to visually compare the average seasonal rainfall in London against the average from the other cities.


This analysis will help us visualize the seasonal rainfall trends and assess whether London experiences significantly more or less rainfall compared to other urban areas over the given time period.

Bar graphs were used, as they are more ideal than line graphs for representing discrete categories (i.e. seasons).

In [28]:
# Average rainfall per season
london_season = my_functions.group_season(london_month)
average_season = my_functions.group_season(average_month)

# Combine dataframes
df_season = pd.concat([london_season, average_season])

# Plotting bar graphs of seasonal rainfall
(
    ggplot(df_season, aes(x="season_year", y="rain", fill="city"))
    + geom_bar(stat="identity", position="dodge", size=1.2)
    + labs(title="Average Seasonal Rainfall", x="Season and Year", y="Rainfall (mm)")
    + ggsize(3000, 800)
    + theme(
        plot_title=element_text(size=45, face="bold", hjust=0.5),
        axis_text_x=element_text(angle=45, hjust=1),
    )
)

**ANALYSIS**:  
Overall, London and the average city have similar average seasonal rainfall. However, the average city's seasonal rainfall is often marginally higher than London's, and is significantly higher in Winter 2020 and Autumn 2022, which is significant since Winter is perceived as one of London's rainiest seasons. 

**FURTHER ANALYSIS**:  
Let's conduct further analysis by comparing the median seasonal amount of rainfall in London compared to the average city. 

In [30]:
# London's median seasonal rainfall
london_median_seasonally = london_season["rain"].median()

# Average city's median seasonal rainfall
city_median_seasonally = average_season["rain"].median()

# Print the results
if london_median_seasonally > city_median_seasonally:
    print(
        f"London's median seasonal rainfall ({london_median_seasonally:.2f}mm) is higher than that of the average city ({city_median_seasonally:.2f}mm)."
    )
elif london_median_seasonally < city_median_seasonally:
    print(
        f"London's median seasonal rainfall ({london_median_seasonally:.2f}mm) is lower than that of the average city ({city_median_seasonally:.2f}mm)."
    )
else:
    print(
        f"London's median seasonal rainfall is the same as that of the average city at ({london_median_seasonally:.2f}mm)."
    )

London's median seasonal rainfall (191.35mm) is lower than that of the average city (226.31mm).


From this analysis, we conclude that London is less rainy on a seasonal basis compared to the average city.

--------------------------------------------------------------------------------------------

### **Average Seasonal Rainfall (mm) by Region**  
In this section, we will compare the average seasonal rainfall (measured in millimeters) in London over a 5-year period (2019-2023) with the average seasonal rainfall by region from 40 other selected cities around the world. The cities are grouped by region in `map_region` based on continents, which are specifed in `cities_config.json`. A large and approximately equal number of cities was used for each region to ensure accuracy and prevent bias.

We will:
1. Use the `group_season` function from `my_functions` to group and compute the average seasonal rainfall for London and other cities (by region). Note that `group_season` does not need to group by region again, since the parameter used is the monthly rainfall dataframe by region for London and other cities.
2. Plot the results using `ggplot` to visually compare the average seasonal rainfall in London against the average from the other cities by region. For regional analyses, I have created a special widget so that the user can select which plots to view at one time. The instructions for usage are elaborated on below.


This analysis will help us visualise the seasonal rainfall trends and assess whether London experiences significantly more or less rainfall compared to the average city in other regions over the given time period.


Bar graphs were used, as they are more ideal than line graphs for representing discrete categories (i.e. seasons).

In [31]:
# Average rainfall per season by region
london_season_region = my_functions.group_season(london_month_region, False)
london_season_region["city"] = "London"
average_season_region = my_functions.group_season(average_month_region, False)

# Clarify regions as "Average city in {region}"
average_season_region = average_season_region.assign(
    city=average_season_region["region"].apply(lambda r: f"Average city in {r}")
)

# Combine dataframes
df_season_region = pd.concat([london_season_region, average_season_region])

#  Create a dropdown for region selection
unique_regions_season = average_season_region["region"].unique().tolist()
region_dropdown_season = widgets.SelectMultiple(
    options=unique_regions_season,
    description="Select:",
    disabled=False,
    layout=widgets.Layout(width="1000px"),
)

# Create an output widget to display the plot
output_season = widgets.Output()


# Function to plot bar graphs of seasonal rainfall by region
def create_plot(selected_regions):
    selected_regions = list(set(selected_regions) | {"London"})
    filtered_df = df_season_region[df_season_region["region"].isin(selected_regions)]

    seasonal_region_plot = (
        ggplot(filtered_df, aes(x="season_year", y="rain", fill="city"))
        + geom_bar(stat="identity", position="dodge", size=1.2)
        + labs(
            title="Average Seasonal Rainfall",
            subtitle="By Region",
            x="Season and Year",
            y="Rainfall (mm)",
        )
        + ggsize(3000, 800)
        + theme(
            plot_title=element_text(size=45, face="bold", hjust=0.5),
            plot_subtitle=element_text(hjust=0.5),
            axis_text_x=element_text(angle=45, hjust=1),
        )
    )
    return seasonal_region_plot


# Function to plot line graphs of daily rainfall by region based on dropdown
def update_plot_season(change):
    with output_season:
        output_season.clear_output(wait=True)
        selected_regions_season = list(change["new"])

        # Call the create_plot function with the selected regions
        plot = create_plot(selected_regions_season)
        display(plot)


# Link dropdown to update function
region_dropdown_season.observe(update_plot_season, names="value")
display(region_dropdown_season, output_season)

# Display London by default
update_plot_season({"new": []})

SelectMultiple(description='Select:', layout=Layout(width='1000px'), options=('Africa', 'Asia', 'Europe', 'Nor…

Output()

**ANALYSIS**:  
Overall, London's rainfall is the most closely correlated with the rainfall in an average city in Europe, although London's rainfall was significantly higher in multiple seasons (Summer 2019, Winter 2021, Summer 2021, Autumn 2022).  

Rainfall in London and the average city in Africa is quite correlated, although London often has higher rainfall than Africa, especially in Autumn 2022.  

The average city in Asia always has higher rainfall than London in every season.  

Rainfall in London and the average city in North America is also quite correlated, although North America had significantly higher rainfall than London in Summer 2021, Autumn 2021 and Summer 2023.   

The average city in Oceania had significantly higher rainfall than London from Autumn 2021 to Winter 2023. 

Rainfall in London and the average city in South America is not very correlated, and the rainfall in average city in South America was more than 2 times of London's in Winter 2022.

**FURTHER ANALYSIS**:  
Let's conduct further analysis by comparing the median seasonal amount of rainfall in London compared to the average city by region.

In [33]:
# Initialise a dictionary to store the results for each region
region_median_seasonally = {}

# Loop through each region in average_season_region
for region in average_season_region["region"].unique():
    # Get the median rain for current region
    region_median_seasonally[region] = average_season_region[
        average_season_region["region"] == region
    ]["rain"].median()

# Print the results
for region, median in region_median_seasonally.items():
    if london_median_seasonally > median:
        print(
            f"London's median seasonal rainfall ({london_median_seasonally:.2f}mm) is higher than that of the average city ({median:.2f}mm) in {region}."
        )
    elif london_median_seasonally < median:
        print(
            f"London's median seasonal rainfall ({london_median_seasonally:.2f}mm) is lower than that of the average city ({median:.2f}mm) in {region}."
        )
    else:
        print(
            f"London's median seasonal rainfall is the same as that of the average city in {region} at ({median:.2f}mm)."
        )

London's median seasonal rainfall (191.35mm) is higher than that of the average city (159.49mm) in Africa.
London's median seasonal rainfall (191.35mm) is lower than that of the average city (354.89mm) in Asia.
London's median seasonal rainfall (191.35mm) is higher than that of the average city (160.34mm) in Europe.
London's median seasonal rainfall (191.35mm) is lower than that of the average city (207.69mm) in North America.
London's median seasonal rainfall (191.35mm) is lower than that of the average city (254.51mm) in Oceania.
London's median seasonal rainfall (191.35mm) is higher than that of the average city (147.39mm) in South America.


In [34]:
# Create a raininess ranking
my_functions.raininess_rank(london_median_seasonally, region_median_seasonally)

Rainfall Ranking by Median Rainfall (in descending order):
1. Average city in Asia: 354.89mm
2. Average city in Oceania: 254.51mm
3. Average city in North America: 207.69mm
4. London: 191.35mm
5. Average city in Europe: 160.34mm
6. Average city in Africa: 159.49mm
7. Average city in South America: 147.39mm


The rankings are consistent with the average monthly rainfall by region. This further validates our rainfall ranking system. 

## **5. Conclusion**

**London is not as rainy as we think!**

 London is the median in our rainfall ranking by median monthly and seasonal rainfall by region. A caveat is that this analysis defined "raininess" as the amount of rain -- perhaps when people talk about London as a rainy city, they are referring to the number of hours of rain or the gloomy weather. Further analysis could be conducted on the number of rainfall hours, especially the number of rainfall hours during the day (because that probably shapes the perception of raininess the most).