# Electroprophet⚡️

In this notebook, I will do the first steps on our final project for Le Wagon's Data Science bootcamp. Here, I'll try to model the carbon emission in France using weather data, in order to provide recomendations for users to reduce their carbon footprint 👣.

## Getting the data

In [1]:
%load_ext autoreload
%autoreload 2

### 1. Wheater API

Here we'll be able to get weather data of a chosen city. 

As the energy data is given by region, later we'll have to check how many cities we have to take in consideration in order to get the weather in a given region.

In [110]:
import requests
import datetime
from dateutil.relativedelta import relativedelta
import pandas as pd
import os.path

# Get the absolute path of the directory where the script is located
script_dir = os.getcwd()

# Construct the path to the "my_directory" directory relative to the script directory
raw_data_path = os.path.join(script_dir, "raw_data")

In [120]:
def get_weather(city, years=10, overwrite=False):
    
    '''
    This function receives the name of a city and a number of years, and returns a dataframe 
    with weather data from this city during those past years
    '''
    
    path = raw_data_path + '/df_' + city.lower() + '_weather.csv'
    file_exists = os.path.isfile(path) 
    
    if file_exists and not overwrite:
        
        print('Found a file for this city. Importing...')
        
        weather_df = pd.read_csv(path, index_col=0)
        
    else:
        
        print('Creating a new .csv file for this city')
        
        # First we declare the weather parameters. Here we'll be taking all params supported by the API
        weather_params = ['temperature_2m','relativehumidity_2m','dewpoint_2m',
                      'apparent_temperature','pressure_msl','surface_pressure',
                      'precipitation','rain','snowfall','cloudcover',
                      'cloudcover_low','cloudcover_mid','cloudcover_high',
                      'shortwave_radiation','direct_radiation','direct_normal_irradiance',
                      'diffuse_radiation','windspeed_10m','windspeed_100m',
                      'winddirection_10m','winddirection_100m','windgusts_10m',
                      'et0_fao_evapotranspiration','weathercode','vapor_pressure_deficit',
                      'soil_temperature_0_to_7cm','soil_temperature_7_to_28cm',
                      'soil_temperature_28_to_100cm','soil_temperature_100_to_255cm',
                      'soil_moisture_0_to_7cm','soil_moisture_7_to_28cm',
                      'soil_moisture_28_to_100cm','soil_moisture_100_to_255cm']

        # This request is done in order to get the latitude and longitude of the desired city
        city_response = requests.get('https://geocoding-api.open-meteo.com/v1/search',
                           params = {'name': city}).json()

        lat = city_response['results'][0]['latitude']
        lon = city_response['results'][0]['longitude']

        # Then we compute the dates used to get the weather data
        ## The API only has data until 9 days ago
        end_date = (datetime.date.today() - relativedelta(days=8)).strftime('%Y-%m-%d') 
        #start_date = (datetime.date.today() - relativedelta(years=years)).strftime('%Y-%m-%d')
        start_date = '2013-01-01'

        # So we make the request to the weather API archive
        weather_response = requests.get('https://archive-api.open-meteo.com/v1/archive',
                           params = {'latitude': lat,
                                    'longitude': lon,
                                    'start_date': start_date,
                                    'end_date': end_date,
                                    'hourly': weather_params,
                                    'timezone': 'auto'}).json()
        
        weather_df = pd.DataFrame(weather_response['hourly'], columns = ['time'] + weather_params)
        weather_df['time'] = pd.to_datetime(weather_df['time'], format='%Y-%m-%d')
        weather_df = weather_df.set_index('time')
        
        weather_df.to_csv(path)
    
    print('Done ✅')
    return weather_df

In [158]:
from prophecy import data

data.get_weather('Amiens')
#df = data.get_weather('Paris',overwrite=True)
#df = data.get_weather('Marseille',overwrite=True)

/home/caiodamasceno/code/Johnny4good/electroprophet/raw_data/df_amiens_weather.csv
Found a file for Amiens. Importing...
Done ✅


Unnamed: 0_level_0,temperature_2m,relativehumidity_2m,dewpoint_2m,apparent_temperature,pressure_msl,surface_pressure,precipitation,rain,snowfall,cloudcover,...,weathercode,vapor_pressure_deficit,soil_temperature_0_to_7cm,soil_temperature_7_to_28cm,soil_temperature_28_to_100cm,soil_temperature_100_to_255cm,soil_moisture_0_to_7cm,soil_moisture_7_to_28cm,soil_moisture_28_to_100cm,soil_moisture_100_to_255cm
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2013-01-01 00:00:00,8.5,87,6.4,3.5,1003.8,990.8,0.9,0.9,0.0,100,...,53,0.15,7.8,8.1,8.4,9.8,0.430,0.399,0.407,0.366
2013-01-01 01:00:00,8.3,88,6.4,3.8,1003.3,990.3,0.7,0.7,0.0,100,...,53,0.13,7.7,8.1,8.4,9.8,0.429,0.402,0.407,0.366
2013-01-01 02:00:00,8.3,89,6.7,4.5,1002.9,989.9,0.6,0.6,0.0,100,...,53,0.12,7.7,8.1,8.4,9.8,0.428,0.404,0.407,0.366
2013-01-01 03:00:00,8.3,92,7.0,5.1,1002.6,989.6,0.5,0.5,0.0,100,...,53,0.09,7.8,8.1,8.4,9.8,0.428,0.407,0.407,0.366
2013-01-01 04:00:00,8.3,92,7.2,5.3,1002.8,989.8,0.7,0.7,0.0,100,...,53,0.08,7.8,8.1,8.4,9.8,0.430,0.409,0.407,0.366
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2021-08-09 07:00:00,14.3,86,11.9,13.1,1014.5,1001.6,0.2,0.2,0.0,100,...,51,0.23,14.9,16.2,16.5,14.4,0.376,0.374,0.315,0.355
2021-08-09 08:00:00,15.1,92,13.8,13.8,1014.9,1002.0,0.8,0.8,0.0,100,...,53,0.13,15.4,16.2,16.5,14.4,0.386,0.374,0.315,0.355
2021-08-09 09:00:00,15.7,91,14.2,13.9,1015.0,1002.1,0.4,0.4,0.0,100,...,51,0.16,15.8,16.1,16.5,14.4,0.390,0.374,0.315,0.355
2021-08-09 10:00:00,16.5,90,14.9,15.5,1015.2,1002.4,0.1,0.1,0.0,98,...,51,0.18,16.2,16.2,16.5,14.4,0.390,0.374,0.315,0.355


In [43]:
import pandas as pd
pd.to_datetime(df.index, format='%Y-%m-%d')

DatetimeIndex(['2013-03-09 00:00:00', '2013-03-09 01:00:00',
               '2013-03-09 02:00:00', '2013-03-09 03:00:00',
               '2013-03-09 04:00:00', '2013-03-09 05:00:00',
               '2013-03-09 06:00:00', '2013-03-09 07:00:00',
               '2013-03-09 08:00:00', '2013-03-09 09:00:00',
               ...
               '2023-03-01 14:00:00', '2023-03-01 15:00:00',
               '2023-03-01 16:00:00', '2023-03-01 17:00:00',
               '2023-03-01 18:00:00', '2023-03-01 19:00:00',
               '2023-03-01 20:00:00', '2023-03-01 21:00:00',
               '2023-03-01 22:00:00', '2023-03-01 23:00:00'],
              dtype='datetime64[ns]', name='time', length=87480, freq=None)

In [54]:
import datetime

print(datetime.datetime.strptime('2013-01-01','%Y-%m-%d'))

2013-01-01 00:00:00


## Preprocessing the weather data

First let's explore and see if we find any problems

In [5]:
df.duplicated().sum() # Checking for duplicates

0

In [None]:
df.isnull().sum().sort_values(ascending=False)/len(df) # Checking for null values

In [7]:
weather_params = ['temperature_2m','relativehumidity_2m','dewpoint_2m',
                      'apparent_temperature','pressure_msl','surface_pressure',
                      'precipitation','rain','snowfall','cloudcover',
                      'cloudcover_low','cloudcover_mid','cloudcover_high',
                      'shortwave_radiation','direct_radiation','direct_normal_irradiance',
                      'diffuse_radiation','windspeed_10m','windspeed_100m',
                      'winddirection_10m','winddirection_100m','windgusts_10m',
                      'et0_fao_evapotranspiration','weathercode','vapor_pressure_deficit',
                      'soil_temperature_0_to_7cm','soil_temperature_7_to_28cm',
                      'soil_temperature_28_to_100cm','soil_temperature_100_to_255cm',
                      'soil_moisture_0_to_7cm','soil_moisture_7_to_28cm',
                      'soil_moisture_28_to_100cm','soil_moisture_100_to_255cm']

In [97]:
import matplotlib.pyplot as plt
import seaborn as sns

Then we can preprocess our data

### 1. MinMax Scaling

In [89]:
from sklearn.preprocessing import MinMaxScaler

In [90]:
scaler = MinMaxScaler()
df_preproc = df.copy()
for feature in weather_params:
    scaler.fit(df[[feature]])
    df_preproc[feature] = scaler.transform(df[[feature]])

In [91]:
df_preproc.head()

Unnamed: 0_level_0,temperature_2m,relativehumidity_2m,dewpoint_2m,apparent_temperature,pressure_msl,surface_pressure,precipitation,rain,snowfall,cloudcover,...,weathercode,vapor_pressure_deficit,soil_temperature_0_to_7cm,soil_temperature_7_to_28cm,soil_temperature_28_to_100cm,soil_temperature_100_to_255cm,soil_moisture_0_to_7cm,soil_moisture_7_to_28cm,soil_moisture_28_to_100cm,soil_moisture_100_to_255cm
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2013-03-09T00:00,0.409804,0.963855,0.59466,0.401724,0.38191,0.383929,0.0,0.0,0.0,1.0,...,0.04,0.005525,0.244737,0.260563,0.134021,0.046512,0.863946,0.768421,0.822785,0.917647
2013-03-09T01:00,0.409804,0.963855,0.59466,0.408621,0.386935,0.389031,0.0,0.0,0.0,1.0,...,0.04,0.005525,0.242105,0.257042,0.139175,0.046512,0.857143,0.77193,0.822785,0.917647
2013-03-09T02:00,0.409804,0.963855,0.59466,0.415517,0.393216,0.395408,0.0,0.0,0.0,1.0,...,0.04,0.005525,0.242105,0.257042,0.139175,0.046512,0.853741,0.77193,0.822785,0.917647
2013-03-09T03:00,0.409804,0.963855,0.592233,0.422414,0.395729,0.397959,0.0,0.0,0.0,1.0,...,0.04,0.007366,0.239474,0.257042,0.139175,0.046512,0.846939,0.775439,0.822785,0.917647
2013-03-09T04:00,0.407843,0.963855,0.589806,0.424138,0.394472,0.396684,0.0,0.0,0.0,1.0,...,0.04,0.007366,0.236842,0.253521,0.139175,0.046512,0.840136,0.775439,0.822785,0.917647


In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

index = 0

for index in enumerate(weather_params):
    index=index[0]
    fig, axes = plt.subplots(nrows=1, ncols=4, figsize=(12,3))

    sns.histplot(df[weather_params[index]], bins=200,kde = True, ax=axes[0]);
    sns.boxplot(data=df, x=weather_params[index], ax=axes[1]);
    sns.histplot(df_preproc[weather_params[index]], bins=200,kde = True, ax=axes[2], color='orange');
    sns.boxplot(data=df_preproc, x=weather_params[index], ax=axes[3], color='orange');

### 2. Robust Scaler

In [None]:
from sklearn.preprocessing import RobustScaler


rb_scaler = RobustScaler() 
df_preproc2 = df.copy()
for feature in weather_params:
    rb_scaler.fit(df[[feature]])
    df_preproc2[feature] = rb_scaler.transform(df[[feature]])

rb_scaler.fit(data[['GrLivArea']]) 

data['GrLivArea'] = rb_scaler.transform(data[['GrLivArea']]) 


In [None]:
df_weather = data.get_weather('Amiens')
df_weather

In [None]:
df_merge = data.merge_weather_energy_df('Amiens', 'Hauts-de-France', 'eolien')
df_merge

/home/caiodamasceno/code/Johnny4good/electroprophet/raw_data/df_amiens_weather.csv
Found a file for Amiens. Importing...
Done ✅
Found a file for Hauts-de-France. Importing...


In [187]:
prod_df = data.get_energy_production(-1, 0, 'Hauts-de-France')

Found a file for Hauts-de-France. Importing...
Done ✅


In [186]:
prod_df.columns

Index(['time', 'code_insee_region', 'libelle_region', 'nature', 'date',
       'heure', 'date_heure', 'consommation', 'thermique', 'nucleaire',
       'eolien', 'solaire', 'hydraulique', 'pompage', 'bioenergies',
       'ech_physiques', 'stockage_batterie', 'destockage_batterie',
       'eolien_terrestre', 'eolien_offshore', 'tco_thermique', 'tch_thermique',
       'tco_nucleaire', 'tch_nucleaire', 'tco_eolien', 'tch_eolien',
       'tco_solaire', 'tch_solaire', 'tco_hydraulique', 'tch_hydraulique',
       'tco_bioenergies', 'tch_bioenergies'],
      dtype='object')