# W06 Summative
## "Is London really as rainy as the movies make it out to be?"

--------------------------------------------------------------------------------------------

**By**: So Hoi Ling (Vienna)
<br><br>

**Project description**: Data analysis on amount and frequency of rain in London, and as compared to other cities

**API**: OpenMeteo [https://open-meteo.com/]
<br><br>

**Measure of raininess**:   
(1) Amount of rain  
(2) Number of rainy days per month and per season  
(3) Number of hours of rain per month and per season  
(4) Number of hours of rain per month and per season (during the day)

**Time period**: 2019-2023

**Cities**: Washington, Mexico City, Brasília, Abuja, Cairo, Berlin, New Delhi, Tokyo, Singapore, Canberra, Almaty, Dubai, Helsinki, Istanbul, Auckland, Athens, Oslo, Seoul, Bangkok, Rome

**Variables**: rain_sum, precipitation_sum, precipitation_hours, sunrise and sunset


--------------------------------------------------------------------------------------------

support session
- maybe diff notebooks (based on diff scenarios)
- list comprehension something like one line for loops 
- have another document for functions and then import
- pandas 

In [19]:
#  Imports
import json
import csv
import statistics
from lets_plot import *
from datetime import datetime
import pandas as pd
import calendar
LetsPlot.setup_html()

In [23]:
# Run this cell when new functions have been added to my_functions.py
import importlib
import my_functions
importlib.reload(my_functions)

<module 'my_functions' from '/Users/shl/Documents/lse/ds105/ds105a-2024-w06-summative-so-hl/my_functions.py'>

In [21]:
# Constants
NUMBER_YEARS = 5
NUMBER_OTHER_CITIES = 40

--------------------------------------------------------------------------------------------

## 1. Data Collection

In [4]:
# Lists of cities grouped regionally
cities_all = {
    "Africa": ["NG,Abuja", "EG,Cairo", "ZA,Johannesburg", "KE,Nairobi", "MA,Casablanca", "GH,Accra"],
    "Asia": ["KZ,Almaty", "TH,Bangkok", "AE,Dubai", "TR,Istanbul", "ID,Jakarta", "MY,Kuala Lumpur", "IN,New Delhi", "KR,Seoul", "SG,Singapore", "JP,Tokyo"],
    "Europe": ["GR,Athens", "DE,Berlin", "FI,Helsinki", "RU,Moscow", "NO,Oslo", "FR,Paris", "IT,Rome", "ES,Madrid", "AT,Vienna"],
    "North America": ["MX,Mexico City", "US,New York City", "CA,Toronto", "US,Washington", "US,Chicago", "US,Los Angeles"],
    "Oceania": ["NZ,Auckland", "AU,Canberra", "AU,Sydney", "AU,Melbourne"],
    "South America": ["BR,Brasília", "PE,Lima", "BR,Rio de Janeiro", "AR,Bueno Aires", "CL,Santiago"]
}

# Initialise empty dictionary to store coordinates
coord_all = {}

# Define coordinates for London
coord_london = [51.50853, -0.12574]

coord_all = my_functions.extract_coord(
    "./data/world_cities.csv", cities_all, coord_all)

In [5]:
# Store results
results_all = {}

# Store results for London
rain_data_london_daily, rain_data_london_hourly = my_functions.get_rain_data(
    coord_london[0], coord_london[1])
results_all["GB,London"] = {
    "daily": rain_data_london_daily["daily"],
    "hourly": rain_data_london_hourly["hourly"]
}

# Store results for other cities
for regions, cities in coord_all.items():
    for city, coord in cities.items():
        lat = float(coord[0])
        long = float(coord[1])
        rain_data_daily, rain_data_hourly = my_functions.get_rain_data(
            lat, long)
        results_all[city] = {
            "daily": rain_data_daily["daily"],
            "hourly": rain_data_hourly["hourly"]
        }

In [6]:
# To preview all data
with open("data/all_data.json", "w") as file:
    json.dump(results_all, file)

# To preview London's data
london = results_all["GB,London"]
with open("data/london_rain.json", "w") as file:
    json.dump(london, file)

--------------------------------------------------------------------------------------------

## 2. Data Analysis

### A. Preliminary analysis of London's rainfall data

**What is the average amount of rainfall in London?**

In [7]:
# Average rainfall
mean_rain = statistics.mean(london["daily"]["rain_sum"])
print(f"The average amount of rainfall per day is {round(mean_rain, 2)}mm.")

The average amount of rainfall per day is 2.07mm.


**What is the average number of hours of rain in London?**

In [8]:
# Average number of rainy days
total_rain_hours = sum(
    1 for hourly_rain in london["hourly"]["rain"] if hourly_rain > 0)
total_days = len(london["hourly"]["rain"])/24
mean_rain_hours = total_rain_hours / total_days
print(f"The average number of hours of rain per day is {
      round(mean_rain_hours, 2)}h.")

The average number of hours of rain per day is 4.17h.


--------------------------------------------------------------------------------------------

### B. Comparative analysis of London's rainfall data against other cities

### Measure: Amount of Rain

Let's compare the amount of rain (mm) in London vs other cities globally. 

The *daily rain data* was grouped by *days, months and seasons*, to find the average amount of rain for a day, month and season in London and other cities respectively. 

--------------------------------------------------------------------------------------------

**Average daily rainfall (mm)**  
This section compares the average amount of rainfall per day over 5 years of London versus *the average of 20 other cities*.

In [9]:
# Initialise empty dictionaries
london_daily_rain_amt = {}
other_cities_daily_rain_amt = {}

# Get results for each city
for city, data in results_all.items():
    time = data["daily"]["time"]
    rain_sum = data["daily"]["rain_sum"]

    # Calculate daily rainfall
    daily_rainfall = my_functions.periodic_rain_amt(time, rain_sum, False)

    if city == "GB,London":
        london_daily_rain_amt = daily_rainfall
    else:
        other_cities_daily_rain_amt[city] = daily_rainfall


# Calculate the average of data for other cities
sum_other_cities_daily_rain_amt = my_functions.add_cities(
    other_cities_daily_rain_amt)
ave_other_cities_daily_rain_amt = my_functions.ave_rain(
    sum_other_cities_daily_rain_amt, NUMBER_OTHER_CITIES)

In [10]:
# Plotting line graphs of the average rainfall per day
df_daily_rain_amount_London = pd.DataFrame({
    "Date": list(london_daily_rain_amt.keys()),
    "Rainfall": list(london_daily_rain_amt.values()),
    "Source": "London"
})

df_daily_rain_amount_others = pd.DataFrame({
    "Date": list(ave_other_cities_daily_rain_amt.keys()),
    "Rainfall": list(ave_other_cities_daily_rain_amt.values()),
    "Source": "Average city"
})

df_daily_combined = pd.concat(
    [df_daily_rain_amount_London, df_daily_rain_amount_others])
df_daily_combined["Date"] = pd.to_datetime(df_daily_combined["Date"])

(
    ggplot(df_daily_combined, aes(x="Date", y="Rainfall", color="Source", group="Source")) +
    geom_line(size=1.2) +
    labs(title="Average Daily Rainfall", x="Date", y="Rainfall (mm)") +
    ggsize(800, 600) +
    theme(plot_title=element_text(size=45, face="bold", hjust=0.5),
          axis_text_x=element_text(angle=45, hjust=1))
)

Due to the significant variation in data and too many data points, the graph does not yield much valuable insights into the overall trend of rainfall, although it does show that London has moderate rain on average compared to other cities. Thus, we can look at average monthly and seasonal rainfall instead.

--------------------------------------------------------------------------------------------

This section compares the average amount of rainfall per day over 5 years of London, compared to *different regions*.

In [11]:
# Initialise empty dictionary
daily_rain_amt = {region: {} for region in cities_all}

# Sort data regionally
for city, data in other_cities_daily_rain_amt.items():
    region = my_functions.match_city_to_region(city, cities_all)
    if region in daily_rain_amt:
        daily_rain_amt[region][city] = data

# Calculate the average of data for other cities regionally
ave_daily_rain_amt = my_functions.ave_regional_rain(daily_rain_amt)

In [24]:
# Plotting line graphs of the average rainfall per day
region_dfs_day = pd.concat(
    my_functions.create_region_dfs(ave_daily_rain_amt, "Date").values(),
    ignore_index=True,
)

df_daily_combined_regional = pd.concat(
    [df_daily_rain_amount_London, region_dfs_day])
df_daily_combined_regional["Date"] = pd.to_datetime(
    df_daily_combined_regional["Date"])

(
    ggplot(
        df_daily_combined_regional,
        aes(x="Date", y="Rainfall", color="Source", group="Source"),
    )
    + geom_line(size=1.2)
    + labs(title="Average Daily Rainfall", x="Date", y="Rainfall (mm)")
    + ggsize(800, 600)
    + theme(
        plot_title=element_text(size=45, face="bold", hjust=0.5),
        axis_text_x=element_text(angle=45, hjust=1),
    )
)

As shown, there is too much variability in data and too much data to observe any trends. Thus, we proceed to average monthly and seasonal rainfall.

--------------------------------------------------------------------------------------------

**Average monthly rainfall (mm)**  
This section compares the average amount of rainfall per month over 5 years of London versus *the average of 20 other cities*. 

In [13]:
# Initialise empty dictionaries
london_monthly_rain_amt = {}
other_cities_monthly_rain_amt = {}

# Get results for each city
for city, data in results_all.items():
    time = data["daily"]["time"]
    rain_sum = data["daily"]["rain_sum"]

    # Calculate monthly rainfall
    monthly_rainfall = my_functions.periodic_rain_amt(time, rain_sum, True)

    if city == "GB,London":
        london_monthly_rain_amt = monthly_rainfall
    else:
        other_cities_monthly_rain_amt[city] = monthly_rainfall

#  Calculate the average of data for other cities
sum_other_cities_monthly_rain_amt = my_functions.add_cities(
    other_cities_monthly_rain_amt)
ave_other_cities_monthly_rain_amt = my_functions.ave_rain(
    sum_other_cities_monthly_rain_amt, NUMBER_OTHER_CITIES)

In [14]:
# Plotting line graphs of the average rainfall per month
df_monthly_rain_amount_London = pd.DataFrame({
    "Month": list(london_monthly_rain_amt.keys()),
    "Rainfall": list(london_monthly_rain_amt.values()),
    "Source": "London"
})

df_monthly_rain_amount_others = pd.DataFrame({
    "Month": list(ave_other_cities_monthly_rain_amt.keys()),
    "Rainfall": list(ave_other_cities_monthly_rain_amt.values()),
    "Source": "Average city"
})

df_monthly_combined = pd.concat(
    [df_monthly_rain_amount_London, df_monthly_rain_amount_others])

(
    ggplot(df_monthly_combined, aes(x="Month", y="Rainfall", color="Source", group="Source")) +
    geom_line(size=1.2) +
    labs(title="Average Monthly Rainfall", x="Month", y="Rainfall (mm)") +
    ggsize(800, 600) +
    theme(plot_title=element_text(size=45, face="bold", hjust=0.5),
          axis_text_x=element_text(angle=45, hjust=1))
)

--------------------------------------------------------------------------------------------

This section compares the average amount of rainfall per month over 5 years of London, compared to *different regions*.

In [15]:
# Initialise empty dictionary
monthly_rain_amt = {region: {} for region in cities_all}

# Sort data regionally
for city, data in other_cities_monthly_rain_amt.items():
    region = my_functions.match_city_to_region(city, cities_all)
    if region in monthly_rain_amt:
        monthly_rain_amt[region][city] = data

# Calculate the average of data for other cities regionally
ave_monthly_rain_amt = my_functions.ave_regional_rain(monthly_rain_amt)

In [25]:
# Plotting line graphs of the average rainfall per month
region_dfs_month = pd.concat(
    my_functions.create_region_dfs(ave_monthly_rain_amt, "Month").values(),
    ignore_index=True,
)

df_monthly_combined_regional = pd.concat(
    [df_monthly_rain_amount_London, region_dfs_month]
)

(
    ggplot(
        df_monthly_combined_regional,
        aes(x="Month", y="Rainfall", color="Source", group="Source"),
    )
    + geom_line(size=1.2)
    + labs(title="Average Monthly Rainfall", x="Month", y="Rainfall (mm)")
    + ggsize(800, 600)
    + theme(
        plot_title=element_text(size=45, face="bold", hjust=0.5),
        axis_text_x=element_text(angle=45, hjust=1),
    )
)

--------------------------------------------------------------------------------------------

**Average seasonal rainfall (mm)**  
This section compares the average amount of rainfall per season over 5 years of London versus the *average of 20 other cities*.   
We can build on the monthly rain data, which has previously been stored as a dataframe.

In [17]:
# Plotting line graphs of the average rainfall per season
df_season_combined = my_functions.calculate_seasonal_rain(df_monthly_combined)

(
    ggplot(df_season_combined, aes(x="Season", y="Rainfall", color="Source", group="Source")) +
    geom_line(size=1.2) +
    labs(title="Average Seasonal Rainfall", x="Season", y="Rainfall (mm)") +
    ggsize(800, 600) +
    theme(plot_title=element_text(size=45, face="bold", hjust=0.5),
          axis_text_x=element_text(angle=45, hjust=1))
)

--------------------------------------------------------------------------------------------

This section compares the average amount of rainfall per season over 5 years of London, compared to *different regions*.

In [27]:
# Plotting line graphs of the average rainfall per season
df_season_combined_regional = my_functions.calculate_seasonal_rain(
    df_monthly_combined_regional
)

(
    ggplot(
        df_season_combined_regional,
        aes(x="Season", y="Rainfall", color="Source", group="Source"),
    )
    + geom_line(size=1.2)
    + labs(title="Average Seasonal Rainfall", x="Season", y="Rainfall (mm)")
    + ggsize(800, 600)
    + theme(
        plot_title=element_text(size=45, face="bold", hjust=0.5),
        axis_text_x=element_text(angle=45, hjust=1),
    )
)