# Data Collection

In order to best answer the question **“Is London really as rainy as the movies make it out to be?”** several variables need to be considered: the time period, other cities to compare against London, how to measure and define raininess and as such which Open Meteo variables will be used. 

### The time period
To explore this question in depth I am going to focus on a very historical time period (1st January 1950 - 31st December 1970) and a more recent time period (1st January 2000 - 31st December 2020)
### Other cities to compare against London
To assess whether London is as rainy as movies suggest, a broad sample of global cities should be used in the comparison. This will enhance the reliability of findings, ensuring different climates are accounted for. As such the cities I will investigate are:
 - Manchester, UK: offers a regional counterpoint to London, indicating whether the amount of rain in London is abnormal for England
 - Edinburgh, Scotland: provides another UK perspective, allowing the assessment of London's weather in comparison to the broader UK climate
 - Paris, France: located relatively close to London and at a similar latitude, which may offer a similar climate and allowing the comaprison between major European cities
 - Rome, Italy: offers a warmer Mediterranean climate to contrast London's apparent 'rainier' climate
 - Kyiv, Ukraine: a more eastern European city with a similar latitude to London
 - Seattle, Washington, USA: another city that possesses 'a rainy reputation', making it a potentially good benchmark for comparison
 - Toronto, Canada: a major city on the east coast of North America, adding to the geographic diversity and possessing a climate influenced by the Great Lakes
 - Phoenix, Arizona, USA: a drier North American desert climate for contrast, enabling the evaluation of London's weather on a global scale
 - Bogotá, Columbia: a tropical climate in South America, characterised by high humidity and significant rainfall providing a valuable perspective on London's rainfall
 - Cairo, Egypt: a desert city with minimal rain, highlighting extremes in rainfall levels
 - Cape Town, South Africa: another city in the Southern Hemisphere with distinct wet and dry seasons
 - Omsk, Russia: located in Siberia to offer another contrasting point with London
 - Mumbai, India: a tropical monsoon climate, with heavy rains to offer comparison with London
 - Tokyo, Japan: an eastern Asian city to provide another global counterpoint
 - Auckland, New Zealand: very southern city to provide another global counterpoint

In [3]:
from lets_plot import *
import pandas as pd

LetsPlot.setup_html()

def plot_city_map(city_data, point_size=1, point_colour='blue', title='City Locations', title_size=30):
    
    # Parameters:
    #- city_data: List of tuples containing (city_name, latitude, longitude).
    #- point_size: Size of the points on the map (default is 5).
    #- point_colour: Color of the points (default is 'blue').
    #- title: Title of the plot (default is 'City Locations').
    
    
    # Create a DataFrame from the provided list of tuples
    city_df = pd.DataFrame(city_data, columns=['city', 'latitude', 'longitude'])
    
    # Plotting the points on a map
    plot = (
        ggplot() +
        geom_livemap() +
        geom_point(aes(x='longitude', y='latitude'), size=point_size, colour=point_colour, show_legend=False, data=city_df) +
        ggtitle(title) +
        theme_minimal() + 
        theme(plot_title=element_text(size=title_size, hjust=0.5))
    )
    
    return plot

# Sample data as a list of tuples (city, latitude, longitude)
city_coords = [
    ("London", 51.5072, -0.1276),  
    ("Manchester", 53.4808, -2.2426),  
    ("Edinburgh", 55.9533, -3.1883),  
    ("Paris", 48.8575, 2.3514),
    ("Rome", 41.8967, 12.4822),
    ("Kyiv", 50.4502, 30.5245),
    ("Seattle", 47.6061, -122.3328),  
    ("Toronto", 43.6532, -79.3832),  
    ("Phoenix", 33.4484, -112.0740),  
    ("Bogota", 4.7110, -74.0721),  
    ("Cairo", 30.0444, 31.2357),
    ("Cape Town", -33.9221, 18.4231),
    ("Omsk", 54.9914, 73.3645),
    ("Mumbai", 19.0760, 72.8777),
    ("Tokyo", 35.6764, 139.6500),
    ("Auckland", -36.8509, 174.7645)
]

# Call the function to plot the city map
city_map_plot = plot_city_map(city_coords)
city_map_plot


As the map demonstrates, these locations will provide a wide range of counterpoints to compare London against.

### Measuring and defining raininess
Raininess is based upon the frequency and duration of the rain and the amount of precipiation. As such the variables in Open-Meteo to be used are:
- Precipitation Sum 
- Rain Sum 
- Precipitation Hours

## Data collection

In [8]:
import requests

def get_historical_data(city_coords, start_date, end_date):
    base_historical_url = "https://archive-api.open-meteo.com/v1/archive"
    precipitation = {}
    
    for city, latitude, longitude in city_coords:
        params_lat_long = f"latitude={latitude}&longitude={longitude}"
        params_others = "&daily=precipitation_sum,rain_sum,precipitation_hours"  # Changed to daily parameters
        params_dates = f"&start_date={start_date}&end_date={end_date}"
        
        end_url = base_historical_url + '?' + params_lat_long + params_others + params_dates
        
        historical_response = requests.get(end_url)
        
        if historical_response.status_code == 200:
            historical_data = historical_response.json()
            # Extract the relevant precipitation data
            historical_precipitation = {
                "precipitation_sum": historical_data['daily']['precipitation_sum'],
                "rain_sum": historical_data['daily']['rain_sum'],
                "precipitation_hours": historical_data['daily']['precipitation_hours'],
            }
            # Store the historical data for each city
            precipitation[city] = historical_precipitation
        else:
            print(f"Error for {city}: {historical_response.status_code}")
            precipitation[city] = None  # Store None if there's an error for that city
    
    return precipitation  # Return the temperatures dictionary after the loop

# City coordinates
city_coords = [
    ("London", 51.5072, -0.1276),  
    ("Manchester", 53.4808, -2.2426),  
    ("Edinburgh", 55.9533, -3.1883),  
    ("Paris", 48.8575, 2.3514),
    ("Rome", 41.8967, 12.4822),
    ("Kyiv", 50.4502, 30.5245),
    ("Seattle", 47.6061, -122.3328),  
    ("Toronto", 43.6532, -79.3832),  
    ("Phoenix", 33.4484, -112.0740),  
    ("Bogota", 4.7110, -74.0721),  
    ("Cairo", 30.0444, 31.2357),
    ("Cape Town", -33.9221, 18.4231),
    ("Omsk", 54.9914, 73.3645),
    ("Mumbai", 19.0760, 72.8777),
    ("Tokyo", 35.6764, 139.6500),
    ("Auckland", -36.8509, 174.7645)
]

# Call the function to get historical data for the cities
precipitation_hist = get_historical_data(city_coords, start_date="1970-12-01", end_date="1970-12-31")
print(precipitation_hist)


Error for London: 429
Error for Manchester: 429
Error for Edinburgh: 429
Error for Paris: 429
Error for Rome: 429
Error for Kyiv: 429
Error for Seattle: 429
Error for Toronto: 429
Error for Phoenix: 429
Error for Bogota: 429
Error for Cairo: 429
Error for Cape Town: 429
Error for Omsk: 429
Error for Mumbai: 429
Error for Tokyo: 429
Error for Auckland: 429
{'London': None, 'Manchester': None, 'Edinburgh': None, 'Paris': None, 'Rome': None, 'Kyiv': None, 'Seattle': None, 'Toronto': None, 'Phoenix': None, 'Bogota': None, 'Cairo': None, 'Cape Town': None, 'Omsk': None, 'Mumbai': None, 'Tokyo': None, 'Auckland': None}


In [9]:
def get_historical_data(city_coords, start_date, end_date):
    base_historical_url = "https://archive-api.open-meteo.com/v1/archive"
    precipitation = {}
    
    for city, latitude, longitude in city_coords:
        params_lat_long = f"latitude={latitude}&longitude={longitude}"
        params_others = "&daily=precipitation_sum,rain_sum,precipitation_hours"  # Changed to daily parameters
        params_dates = f"&start_date={start_date}&end_date={end_date}"
        
        end_url = base_historical_url + '?' + params_lat_long + params_others + params_dates
        
        historical_response = requests.get(end_url)
        
        if historical_response.status_code == 200:
            historical_data = historical_response.json()
            # Extract the relevant precipitation data
            historical_precipitation = {
                "precipitation_sum": historical_data['daily']['precipitation_sum'],
                "rain_sum": historical_data['daily']['rain_sum'],
                "precipitation_hours": historical_data['daily']['precipitation_hours'],
            }
            # Store the historical data for each city
            precipitation[city] = historical_precipitation
        else:
            print(f"Error for {city}: {historical_response.status_code}")
            precipitation[city] = None  # Store None if there's an error for that city
    
    return precipitation  # Return the temperatures dictionary after the loop

# City coordinates
city_coords = [
    ("London", 51.5072, -0.1276),  
]

# Call the function to get historical data for the cities
precipitation_hist = get_historical_data(city_coords, start_date="1970-12-01", end_date="1970-12-31")
print(precipitation_hist)


Error for London: 429
{'London': None}


In [5]:
import requests
import time
import json 

def get_historical_data(city_coords, start_date, end_date):
    base_historical_url = "https://archive-api.open-meteo.com/v1/archive"
    precipitation = {}
    
    for city, latitude, longitude in city_coords:
        params_lat_long = f"latitude={latitude}&longitude={longitude}"
        params_others = "&daily=precipitation_sum,rain_sum,precipitation_hours"  
        params_dates = f"&start_date={start_date}&end_date={end_date}"
        
        end_url = base_historical_url + '?' + params_lat_long + params_others + params_dates
        
        # Initialize attempt counter
        attempts = 0
        max_attempts = 5  # Set a maximum number of attempts

        while attempts < max_attempts:
            historical_response = requests.get(end_url)
            
            if historical_response.status_code == 200:
                historical_data = historical_response.json()
                # Extract the relevant precipitation data
                historical_precipitation = {
                    "precipitation_sum": historical_data['daily']['precipitation_sum'],
                    "rain_sum": historical_data['daily']['rain_sum'],
                    "precipitation_hours": historical_data['daily']['precipitation_hours'],
                }
                # Store the historical data for each city
                precipitation[city] = historical_precipitation
                break  # Exit the loop on successful response
            elif historical_response.status_code == 429:
                print(f"Rate limit exceeded for {city}. Retrying...")
                attempts += 1
                time.sleep(2 ** attempts)  # Exponential backoff
            else:
                print(f"Error for {city}: {historical_response.status_code}")
                precipitation[city] = None  # Store None if there's an error for that city
                break  # Exit the loop on other errors
    
    return precipitation  # Return the precipitation dictionary after the loop

# City coordinates
city_coords = [
    ("London", 51.5072, -0.1276),  
    ("Manchester", 53.4808, -2.2426),  
    ("Edinburgh", 55.9533, -3.1883),  
    ("Paris", 48.8575, 2.3514),
    ("Rome", 41.8967, 12.4822),
    ("Kyiv", 50.4502, 30.5245),
    ("Seattle", 47.6061, -122.3328),  
    ("Toronto", 43.6532, -79.3832),  
    ("Phoenix", 33.4484, -112.0740),  
    ("Bogota", 4.7110, -74.0721),  
    ("Cairo", 30.0444, 31.2357),
    ("Cape Town", -33.9221, 18.4231),
    ("Omsk", 54.9914, 73.3645),
    ("Mumbai", 19.0760, 72.8777),
    ("Tokyo", 35.6764, 139.6500),
    ("Auckland", -36.8509, 174.7645)
]

# Call the function to get historical data for the cities
precipitation_hist = get_historical_data(city_coords, start_date="1950-01-01", end_date="1970-12-31")

with open("precipitation_data.json", "w") as file:
    json.dump(precipitation_hist, file)

print("Precipitation data saved to 'precipitation_data.json'")

Rate limit exceeded for London. Retrying...
Rate limit exceeded for London. Retrying...
Rate limit exceeded for London. Retrying...
Rate limit exceeded for London. Retrying...
Rate limit exceeded for London. Retrying...
Rate limit exceeded for Manchester. Retrying...
Rate limit exceeded for Manchester. Retrying...
Rate limit exceeded for Manchester. Retrying...
Rate limit exceeded for Manchester. Retrying...
Rate limit exceeded for Manchester. Retrying...
Rate limit exceeded for Edinburgh. Retrying...
Rate limit exceeded for Edinburgh. Retrying...
Rate limit exceeded for Edinburgh. Retrying...
Rate limit exceeded for Edinburgh. Retrying...


KeyboardInterrupt: 

## Collecting historical data

In [7]:
import requests
import json

def get_historical_data(city_coords, start_date, end_date):
    base_historical_url = "https://archive-api.open-meteo.com/v1/archive"
    precipitation = {}
    
    for city, latitude, longitude in city_coords:
        params_lat_long = f"latitude={latitude}&longitude={longitude}"
        params_others = "&daily=precipitation_sum,rain_sum,precipitation_hours"  # Changed to daily parameters
        params_dates = f"&start_date={start_date}&end_date={end_date}"
        
        end_url = base_historical_url + '?' + params_lat_long + params_others + params_dates
        
        historical_response = requests.get(end_url)
        
        if historical_response.status_code == 200:
            historical_data = historical_response.json()
            # Extract the relevant precipitation data
            historical_precipitation = {
                "precipitation_sum": historical_data['daily']['precipitation_sum'],
                "rain_sum": historical_data['daily']['rain_sum'],
                "precipitation_hours": historical_data['daily']['precipitation_hours'],
            }
            # Store the historical data for each city
            precipitation[city] = historical_precipitation
        else:
            print(f"Error for {city}: {historical_response.status_code}")
            precipitation[city] = None  # Store None if there's an error for that city

    with open("precipitation_data_hist.json", "w") as file:
        json.dump(precipitation, file, indent=4) # saving the data to a json file
    
    return precipitation  # Return the temperatures dictionary after the loop

 

# City coordinates
city_coords = [
    ("London", 51.5072, -0.1276),  
    ("Manchester", 53.4808, -2.2426),  
    ("Edinburgh", 55.9533, -3.1883),  
    ("Paris", 48.8575, 2.3514),
    ("Rome", 41.8967, 12.4822),
    ("Seattle", 47.6061, -122.3328),    
    ("Bogota", 4.7110, -74.0721),  
    ("Cairo", 30.0444, 31.2357),
    ("Cape Town", -33.9221, 18.4231),
    ("Mumbai", 19.0760, 72.8777)
]

# Call the function to get historical data for the cities
precipitation_hist = get_historical_data(city_coords, start_date="1970-01-01", end_date="1980-12-31")
print(precipitation_hist)


Error for Paris: 429
Error for Seattle: 429
Error for Bogota: 429
Error for Cairo: 429
Error for Cape Town: 429
Error for Mumbai: 429
{'London': {'precipitation_sum': [0.0, 0.3, 0.3, 0.0, 0.0, 1.4, 0.0, 4.5, 5.6, 1.2, 4.7, 0.0, 2.1, 2.8, 1.5, 1.2, 0.0, 2.8, 0.0, 1.5, 1.5, 2.1, 10.9, 4.5, 1.9, 1.0, 0.0, 0.0, 5.5, 5.4, 0.0, 3.5, 8.0, 2.0, 2.4, 0.2, 0.0, 4.6, 2.1, 0.7, 0.2, 0.0, 12.8, 0.0, 0.2, 0.0, 0.3, 5.9, 2.9, 3.5, 0.7, 8.9, 8.8, 0.0, 0.2, 0.7, 2.7, 1.2, 0.0, 0.6, 1.6, 0.0, 11.8, 0.3, 0.6, 1.9, 0.0, 0.0, 1.2, 6.4, 1.9, 0.6, 0.3, 0.0, 0.0, 1.8, 2.5, 0.0, 0.6, 1.0, 0.0, 8.2, 2.6, 0.0, 0.6, 0.1, 2.3, 0.2, 1.7, 2.5, 2.2, 0.3, 1.5, 2.5, 1.9, 8.6, 1.8, 1.9, 2.0, 5.6, 0.6, 8.8, 0.4, 0.2, 2.4, 0.9, 0.3, 0.3, 1.0, 0.6, 5.3, 0.2, 1.9, 3.2, 4.6, 4.1, 1.3, 1.2, 2.2, 0.3, 1.6, 0.0, 0.0, 0.0, 0.0, 2.1, 10.2, 2.6, 0.4, 6.6, 7.0, 0.7, 0.1, 2.1, 0.0, 0.1, 0.8, 0.0, 0.0, 0.0, 0.5, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.7, 1.7, 0.2, 0.0, 0.1, 0.0, 0.0, 0.0, 0.0, 0.0, 0.5, 0.5, 1.1, 0.4, 0.6, 1.1, 0.5, 0.0, 0.0

### Collecting data for the more recent time period
Done seperately to ensure that there were not too many API requests in one day

In [6]:

# Get the historical data for the desired time period
precipitation_recent = get_historical_data(city_coords, start_date="2010-01-01", end_date="2020-12-31")

# Save the retrieved data to a JSON file
with open("precipitation_data_recent.json", "w") as file:
    json.dump(precipitation_recent, file, indent=4)  # saving the data to a json file

# Print the retrieved data
print(precipitation_recent)



{'London': {'precipitation_sum': [0.0, 0.0, 0.0, 0.0, 3.2, 9.4, 0.0, 0.0, 0.0, 3.1, 1.1, 0.0, 3.6, 2.8, 0.1, 5.6, 1.5, 0.0, 0.0, 4.9, 0.0, 2.7, 0.6, 0.5, 1.6, 0.0, 0.0, 0.7, 4.5, 0.0, 0.0, 0.0, 2.8, 2.0, 0.8, 4.8, 0.0, 0.0, 0.6, 0.0, 0.4, 0.0, 0.2, 0.2, 0.6, 0.0, 11.3, 1.3, 4.7, 0.2, 0.1, 7.5, 10.6, 1.8, 3.0, 5.7, 1.0, 6.2, 11.4, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.5, 0.0, 0.0, 0.0, 0.0, 0.0, 0.5, 3.9, 2.5, 0.1, 0.2, 0.2, 3.9, 5.4, 0.5, 1.0, 3.5, 8.1, 5.8, 3.9, 0.8, 6.3, 4.8, 2.3, 0.3, 0.0, 0.3, 0.0, 0.0, 0.0, 0.0, 0.6, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 1.1, 0.0, 0.1, 2.0, 1.3, 4.5, 10.7, 0.3, 0.0, 0.0, 0.0, 0.3, 1.4, 0.2, 0.0, 0.3, 0.2, 0.4, 0.0, 0.0, 4.3, 0.3, 0.1, 0.0, 0.1, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.8, 0.0, 7.0, 0.0, 0.5, 6.0, 0.0, 0.0, 0.0, 0.9, 1.9, 2.6, 11.4, 2.8, 0.0, 3.1, 0.0, 1.2, 1.8, 0.0, 0.0, 0.0, 1.5, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.3, 0.0, 0.2, 3.0, 0.0, 0.0, 1.5, 0.0, 0.0, 0.0, 0.0, 2.1, 0.2, 0.0, 0.0, 0.0, 0

In [None]:
import requests
import json

def get_historical_data(city_coords, start_date, end_date):
    base_historical_url = "https://archive-api.open-meteo.com/v1/archive"
    precipitation = {}
    
    for city, latitude, longitude in city_coords:
        params_lat_long = f"latitude={latitude}&longitude={longitude}"
        params_others = "&daily=precipitation_sum,rain_sum,precipitation_hours"  # Changed to daily parameters
        params_dates = f"&start_date={start_date}&end_date={end_date}"
        
        end_url = base_historical_url + '?' + params_lat_long + params_others + params_dates
        
        recent_response = requests.get(end_url)
        
        if recent_response.status_code == 200:
            recent_data = recent_response.json()
            # Extract the relevant precipitation data
            recent_precipitation = {
                "precipitation_sum": recent_data['daily']['precipitation_sum'],
                "rain_sum": recent_data['daily']['rain_sum'],
                "precipitation_hours": _data['daily']['precipitation_hours'],
            }
            # Store the historical data for each city
            precipitation[city] = historical_precipitation
        else:
            print(f"Error for {city}: {historical_response.status_code}")
            precipitation[city] = None  # Store None if there's an error for that city

    with open("precipitation_data_rec.json", "w") as file:
        json.dump(precipitation, file, indent=4) # saving the data to a json file
    
    return precipitation  # Return the temperatures dictionary after the loop

 

# City coordinates
city_coords = [
    ("London", 51.5072, -0.1276),  
    ("Manchester", 53.4808, -2.2426),  
    ("Edinburgh", 55.9533, -3.1883),  
    ("Paris", 48.8575, 2.3514),
    ("Rome", 41.8967, 12.4822),
    ("Seattle", 47.6061, -122.3328),    
    ("Bogota", 4.7110, -74.0721),  
    ("Cairo", 30.0444, 31.2357),
    ("Cape Town", -33.9221, 18.4231),
    ("Mumbai", 19.0760, 72.8777)
]

# Call the function to get historical data for the cities
precipitation_recent = get_historical_data(city_coords, start_date="2010-01-01", end_date="2020-12-31")
print(precipitation_recent)
