**OBJECTIVE:** Collect weather data from the OpenMeteo API and save to a JSON file.

**AUTHOR:** Joshua Xu

**LAST EDITED:** 2024-10-26

---


# Data Collection
In this notebook, I will focus on using [Open-Meteo API](https://open-meteo.com/en/docs) to extract weather from London and cities we are comparing it against.

URL of [Historical Weather Data](https://open-meteo.com/en/docs/historical-weather-api): `https://archive-api.open-meteo.com/v1/archive`

The world cities with their coordinates are found **[here](https://github.com/joelacus/world-cities/blob/main/world_cities.csv)**.

## 1. Retrieving Data from OpenMeteo

We first import the necessary modules:

In [1]:
import json 
import requests

We need to define a function which returns historical data:

In [2]:
# The function will take inputs of latitude, longitude to return weather data
def get_historical_data(latitude, longitude):

# Defining parameters 
    base_historic_url =  "https://archive-api.open-meteo.com/v1/era5?"
    params_lat_long = "latitude=" + str(latitude) + "&longitude="  + str(longitude)
    params_date = "&start_date=2023-01-01&end_date=2024-01-01"

# Defining parameters measuring 'raininess'
    params_others = "&hourly=&daily=weather_code,precipitation_sum,rain_sum,showers_sum,precipitation_hours"

# Define variable combining parameters 
    final_url = base_historic_url + params_lat_long + params_date + params_others

# Define variable for retrieved data from constructed url
    response = requests.get(final_url)

# Convert data to json file
    historical_data = response.json()

# Filter data
    historical_raininess = historical_data['daily']

# Return data
    return historical_raininess

To obtain London's historical data:

In [3]:
london_raininess = get_historical_data('51.5085', '-0.1257')

However, it is tedious to search for the coordinates of cities to input. Let's try to automate this.

## 2. Automate Coordinate Extraction

We can automate the process with the following code to obtain the coordinates after inputting cities and country codes.

In [4]:
# Accessing text file of world cities and coordinates
def get_coord(country_code, city_name):
    with open('../../world_cities.csv') as file:

# Splitting data via line breaks
            lines = file.read().split('\n')
        
# Splitting data into elements to be sorted into a list of a dictionary  
    dict_list = []

    for line in lines:
        line_elements = line.split(',')
        if len(line_elements) != 4: 
             pass # adjust for errors arising from empty line
        else:
            current_dictionary = {
            'cities_name': line_elements[1],
            'country_code': line_elements[0],
            'latitude': line_elements[2],
            'longitude': line_elements[3]
            }
            dict_list.append(current_dictionary)

# Filtering dictionaries to look for coordinates of inputted latitude and longitude
    for dict in dict_list:
        if dict['cities_name'] == str(city_name) and dict['country_code'] == str(country_code):
            return dict['latitude'], dict['longitude']


Combine the two functions:

In [5]:
def get_data(city_name, country_code):

    with open('../Data/world_cities.csv') as file:

            lines = file.read().split('\n')
        
    dict_list = []

    for line in lines:
        line_elements = line.split(',')
        if len(line_elements) != 4: 
             pass 
        else:
            current_dictionary = {
            'cities_name': line_elements[1],
            'country_code': line_elements[0],
            'latitude': line_elements[2],
            'longitude': line_elements[3]
            }
            dict_list.append(current_dictionary)
    for dict in dict_list:
        if dict['cities_name'] == str(city_name) and dict['country_code'] == str(country_code):
            latitude = dict['latitude']
            longitude = dict['longitude'] # define valuables used for codes to follow

    base_historic_url =  "https://archive-api.open-meteo.com/v1/era5?"
    params_lat_long = "latitude=" + str(latitude) + "&longitude="  + str(longitude)
    params_date = "&start_date=2023-01-01&end_date=2024-01-01"

    params_others = "&hourly=&daily=weather_code,precipitation_sum,rain_sum,showers_sum,precipitation_hours"

    final_url = base_historic_url + params_lat_long + params_date + params_others

    response = requests.get(final_url)

    data = response.json()

    raininess = {
         'Date': data['daily']['time'],
         'Weather code': data['daily']['weather_code'],
         'Precipitation sum': data['daily']['precipitation_sum'],
         'Rain sum': data['daily']['rain_sum'],
         'Shower sum': data['daily']['showers_sum'],
         'Precipitation hours': data['daily']['precipitation_hours']
    }
    return raininess

As such, we can collect data from London by simply inputting 'London' and 'GB' to the function and save it to a variable:

In [6]:
london_data = get_data('London', 'GB')

Using our [randomiser](../Sampling/Randomiser.ipynb), we determined that the cities we will be comparing London with are:
- Porto-Novo, CV
- Kigali, RW
- Apia, WS
- Tiraspol, MD
- Dublin, IE
- Ljubljana, SI
- Kuwait City, KW
- Bangkok, TH
- Santo Domingo, CO
- Brasiléia, BR 

In [7]:
porto_novo_data = get_data('Porto Novo', 'CV')
kigali_data = get_data('Kigali', 'RW')
apia_data = get_data('Apia', 'WS')
tiraspol_data = get_data('Tiraspol', 'MD')
dublin_data = get_data('Dublin', 'IE')
ljubljana_data = get_data('Ljubljana', 'SI')
kuwait_city_data = get_data('Kuwait City', 'KW')
bangkok_data = get_data('Bangkok', 'TH')
santo_domingo_data = get_data('Santo Domingo', 'CO')
brasileia_data = get_data('Brasiléia', 'BR')

Merging them into a *MEGA* dictionary:

In [8]:
mega_dict = {
    'London': london_data,
    'Porto Novo': porto_novo_data,
    'Kigali': kigali_data,
    'Apia': apia_data,
    'Tiraspol': tiraspol_data,
    'Dublin': dublin_data,
    'Ljubljana': ljubljana_data,
    'Kuwait City': kuwait_city_data,
    'Bangkok': bangkok_data,
    'Santo Domingo': santo_domingo_data,
    'Brasiléia': brasileia_data
}

## 3. Saving Collected Data

The data can then be saved as a JSON file to be further analysed:

In [9]:
with open("../Data/london_data.json", "w") as file:
    json.dump(london_data, file)

Saving all the data:

In [10]:
with open("../Data/all_data.json", "w") as file:
    json.dump(mega_dict, file)

## Bonus
As shown previously, the above code can be run immediately in this notebook as such. 

In [11]:
%run get_data.py