**Topic: Weather**

In the Notebook we work with extraction via https://www.meteomatics.com/, basic, non-commercial subscription.

We are using end point with the following structure: api.meteomatics.com/validdatetime/parameters/locations/format?optionals in combination with password and login. 

There are not that much records retrieved, it is almost nothing for a real data research, how ever I am showcasing the process itself.


In [15]:
# let us install a couple of libraries
!pip install requests  
!pip install python-dotenv 



In [16]:
#import of libraries we need to proceed
import requests
import pandas as pd
from dotenv import load_dotenv
import os
import io
import numpy as np


Here we create .env file manually and add the following parameters: METEOMATICS_USERNAME, METEOMATICS_PASSWORD, API_URL in there.
As API_URL we use "https://api.meteomatics.com/2024-08-30T00:00:00Z--2024-09-03T00:00:00Z:PT1H/t_2m:C,precip_1h:mm,wind_speed_10m:ms/52.520551,13.461804/csv

In the above example we have:

* 2024-08-30T00:00:00Z--2024-09-03T00:00:00Z - dates period

* PT1H - time step, which is one hour in this case

* t_2m:C - temperature in Celsius, 2 meters above ground level

* precip_1h:mm - precipitation in mm

* wind_speed_10m:ms - wind speed in ms

* 52.520551,13.461804 - latitude, longitude, this is central part of Berlin, Germany

* csv - format for downloading


In case we need to change dates interval, we change it in .env and restart the kernel of jupyter notebook, then we do re-run of cells so the updated data retrieved.


In [17]:
# Load environment variables from .env file
load_dotenv() 

# Retrieve credentials from environment variables
username = os.getenv("METEOMATICS_USERNAME")
password = os.getenv("METEOMATICS_PASSWORD")
url = os.getenv("API_URL")

# we print the result to be sure parameters are fetched correctly
print(f"Loaded URL from .env: {url}")

Loaded URL from .env: https://api.meteomatics.com/2024-08-30T00:00:00Z--2024-09-03T00:00:00Z:PT1H/t_2m:C,precip_1h:mm,wind_speed_10m:ms/52.520551,13.461804/csv


In [18]:
# Here we execute parametrized get request to the End Point
response = requests.get(url, auth = (username, password)) 
# Checking just in case something is retrieved, it returns all records for the trial period
print(response.content) 

b'validdate;t_2m:C;precip_1h:mm;wind_speed_10m:ms\n2024-08-30T00:00:00Z;22.2;0.00;0.2\n2024-08-30T01:00:00Z;20.8;0.00;0.3\n2024-08-30T02:00:00Z;20.7;0.00;1.3\n2024-08-30T03:00:00Z;18.7;0.00;2.1\n2024-08-30T04:00:00Z;19.4;0.00;1.4\n2024-08-30T05:00:00Z;21.1;0.00;4.0\n2024-08-30T06:00:00Z;22.7;0.00;4.1\n2024-08-30T07:00:00Z;23.7;0.00;4.6\n2024-08-30T08:00:00Z;24.0;0.08;7.9\n2024-08-30T09:00:00Z;25.7;0.00;7.1\n2024-08-30T10:00:00Z;27.4;0.00;5.0\n2024-08-30T11:00:00Z;28.6;0.00;3.0\n2024-08-30T12:00:00Z;28.7;0.00;4.8\n2024-08-30T13:00:00Z;29.0;0.00;5.7\n2024-08-30T14:00:00Z;26.5;0.00;6.8\n2024-08-30T15:00:00Z;25.6;0.00;6.7\n2024-08-30T16:00:00Z;25.1;0.00;5.8\n2024-08-30T17:00:00Z;23.7;0.00;6.0\n2024-08-30T18:00:00Z;22.1;0.00;4.9\n2024-08-30T19:00:00Z;21.1;0.00;5.9\n2024-08-30T20:00:00Z;20.4;0.02;5.9\n2024-08-30T21:00:00Z;19.6;0.05;4.0\n2024-08-30T22:00:00Z;17.8;0.00;6.0\n2024-08-30T23:00:00Z;16.7;0.00;5.0\n2024-08-31T00:00:00Z;15.7;0.68;5.0\n2024-08-31T01:00:00Z;15.2;0.00;3.1\n2024-08-31T02

When we send a GET request to the server we have to know how it went, right? For the purpose server returns us an error code.

In our case we expect '200', this is our happy path. However something else could be returned, i.e. 403, 401, 500, 400

Therefore, we use try / except construction below.

In [19]:
try:
    if response.status_code == 200:
        # Convert the response content to a StringIO object and read it as a CSV, because read_csv expects file-like object
        csv_data = io.StringIO(response.text) 
        #convert to dataframe with delimeter ; and let us also rename columns 
        df = pd.read_csv(csv_data, sep=';', names=['Validdate','Temperature_Celsius', 'Precipitation_mm', 'Wind_Speed_ms'],  skiprows=1)
        # check how it looks like
        print(df.head(10)) 
except:
        #here we return just an error code, so our beloved developer can understand what happened
        print(f"Error: {response.status_code}")

              Validdate  Temperature_Celsius  Precipitation_mm  Wind_Speed_ms
0  2024-08-30T00:00:00Z                 22.2              0.00            0.2
1  2024-08-30T01:00:00Z                 20.8              0.00            0.3
2  2024-08-30T02:00:00Z                 20.7              0.00            1.3
3  2024-08-30T03:00:00Z                 18.7              0.00            2.1
4  2024-08-30T04:00:00Z                 19.4              0.00            1.4
5  2024-08-30T05:00:00Z                 21.1              0.00            4.0
6  2024-08-30T06:00:00Z                 22.7              0.00            4.1
7  2024-08-30T07:00:00Z                 23.7              0.00            4.6
8  2024-08-30T08:00:00Z                 24.0              0.08            7.9
9  2024-08-30T09:00:00Z                 25.7              0.00            7.1


As you can see above we got a dataframe. It consists of four columns.

Now let us work with columns: 
* we break down the column to new columns;
* add new columns - feature engineering

With each CSV record we do the following:
* we take a part of record before 'T' after 'T' and after 'Z' and put them into different columns
* as a result we remove this column 'validdate'


In [20]:
# Add Date column and extract a date in the given format
df['Date'] = df['Validdate'].str.split('T').str[0]
df['Date'] = pd.to_datetime(df['Date'], format='%Y-%m-%d')
# Remove the fragment before 'T' in the 'validdate;t_2m:C' column
df['Validdate'] = df['Validdate'].str.split('T').str[1]

# Check if it looks nice
df.head(10)

Unnamed: 0,Validdate,Temperature_Celsius,Precipitation_mm,Wind_Speed_ms,Date
0,00:00:00Z,22.2,0.0,0.2,2024-08-30
1,01:00:00Z,20.8,0.0,0.3,2024-08-30
2,02:00:00Z,20.7,0.0,1.3,2024-08-30
3,03:00:00Z,18.7,0.0,2.1,2024-08-30
4,04:00:00Z,19.4,0.0,1.4,2024-08-30
5,05:00:00Z,21.1,0.0,4.0,2024-08-30
6,06:00:00Z,22.7,0.0,4.1,2024-08-30
7,07:00:00Z,23.7,0.0,4.6,2024-08-30
8,08:00:00Z,24.0,0.08,7.9,2024-08-30
9,09:00:00Z,25.7,0.0,7.1,2024-08-30


In [21]:
# Add Time (Hours) we shall leave just hours, since we fetch one time per hour, it is more user friendly
# Here we extract the fragment after 'T' and before 'Z'
df['Time (Hours)'] = df['Validdate'].str.split('Z').str[0]

# We convert extracted fragment to time format
df['Time (Hours)'] = pd.to_datetime(df['Time (Hours)'], format='%H:%M:%S').dt.time

# We extract hours
df['Time (Hours)'] = pd.to_datetime(df['Time (Hours)'].astype(str), format='%H:%M:%S').dt.hour

# Let us check what we got
df.head(10)

Unnamed: 0,Validdate,Temperature_Celsius,Precipitation_mm,Wind_Speed_ms,Date,Time (Hours)
0,00:00:00Z,22.2,0.0,0.2,2024-08-30,0
1,01:00:00Z,20.8,0.0,0.3,2024-08-30,1
2,02:00:00Z,20.7,0.0,1.3,2024-08-30,2
3,03:00:00Z,18.7,0.0,2.1,2024-08-30,3
4,04:00:00Z,19.4,0.0,1.4,2024-08-30,4
5,05:00:00Z,21.1,0.0,4.0,2024-08-30,5
6,06:00:00Z,22.7,0.0,4.1,2024-08-30,6
7,07:00:00Z,23.7,0.0,4.6,2024-08-30,7
8,08:00:00Z,24.0,0.08,7.9,2024-08-30,8
9,09:00:00Z,25.7,0.0,7.1,2024-08-30,9


In [22]:
# Now let us get rid of the column, it is a technical one and almost empty
df.drop(columns=['Validdate'], inplace=True)

# let us check
df.head(10)

Unnamed: 0,Temperature_Celsius,Precipitation_mm,Wind_Speed_ms,Date,Time (Hours)
0,22.2,0.0,0.2,2024-08-30,0
1,20.8,0.0,0.3,2024-08-30,1
2,20.7,0.0,1.3,2024-08-30,2
3,18.7,0.0,2.1,2024-08-30,3
4,19.4,0.0,1.4,2024-08-30,4


Let us break down 'Date' into 'Month', 'Year', 'Day' - it is better for performance when we start to explore data, i.e. using visualization

Let us also add Name of the Day, this is more user friendly

Note: we are not removing the Date column

In [26]:
df['Month'] = df['Date'].dt.strftime('%B') # month in a from of a name
df['Year'] = df['Date'].dt.year # year
df['Day'] = df['Date'].dt.day # day
df['Day_Name'] = df['Date'].dt.day_name() # name of the day, i.e. Friday
df.head(10)

Unnamed: 0,Temperature_Celsius,Precipitation_mm,Wind_Speed_ms,Date,Time (Hours),Month,Year,Day,Day_Name
0,22.2,0.0,0.2,2024-08-30,0,August,2024,30,Friday
1,20.8,0.0,0.3,2024-08-30,1,August,2024,30,Friday
2,20.7,0.0,1.3,2024-08-30,2,August,2024,30,Friday
3,18.7,0.0,2.1,2024-08-30,3,August,2024,30,Friday
4,19.4,0.0,1.4,2024-08-30,4,August,2024,30,Friday
5,21.1,0.0,4.0,2024-08-30,5,August,2024,30,Friday
6,22.7,0.0,4.1,2024-08-30,6,August,2024,30,Friday
7,23.7,0.0,4.6,2024-08-30,7,August,2024,30,Friday
8,24.0,0.08,7.9,2024-08-30,8,August,2024,30,Friday
9,25.7,0.0,7.1,2024-08-30,9,August,2024,30,Friday
