# Electroprophet⚡️

In this notebook, I will do the first steps on our final project for Le Wagon's Data Science bootcamp. Here, I'll try to model the carbon emission in France using weather data, in order to provide recomendations for users to reduce their carbon footprint 👣.

## Getting the data

In [75]:
%load_ext autoreload
%autoreload 2

### 1. Wheater API

Here we'll be able to get weather data of a chosen city. 

As the energy data is given by region, later we'll have to check how many cities we have to take in consideration in order to get the weather in a given region.

In [158]:
def get_weather(city, years=10, overwrite=False):
    
    '''
    This function receives the name of a city and a number of years, and returns a dataframe 
    with weather data from this city during those past years
    '''
    
    import requests
    import datetime
    from dateutil.relativedelta import relativedelta
    import pandas as pd
    import os.path
    
    path = './raw_data/df_' + city.lower() + '_weather.csv'
    file_exists = os.path.isfile(path) 
    
    if file_exists and not overwrite:
        
        print('Found a file for this city. Importing...')
        
        weather_df = pd.read_csv(path, index_col=0)
        
    else:
        
        print('Creating a new .csv file for this city')
        
        # First we declare the weather parameters. Here we'll be taking all params supported by the API
        weather_params = ['temperature_2m','relativehumidity_2m','dewpoint_2m',
                      'apparent_temperature','pressure_msl','surface_pressure',
                      'precipitation','rain','snowfall','cloudcover',
                      'cloudcover_low','cloudcover_mid','cloudcover_high',
                      'shortwave_radiation','direct_radiation','direct_normal_irradiance',
                      'diffuse_radiation','windspeed_10m','windspeed_100m',
                      'winddirection_10m','winddirection_100m','windgusts_10m',
                      'et0_fao_evapotranspiration','weathercode','vapor_pressure_deficit',
                      'soil_temperature_0_to_7cm','soil_temperature_7_to_28cm',
                      'soil_temperature_28_to_100cm','soil_temperature_100_to_255cm',
                      'soil_moisture_0_to_7cm','soil_moisture_7_to_28cm',
                      'soil_moisture_28_to_100cm','soil_moisture_100_to_255cm']

        # This request is done in order to get the latitude and longitude of the desired city
        city_response = requests.get('https://geocoding-api.open-meteo.com/v1/search',
                           params = {'name': city}).json()

        lat = city_response['results'][0]['latitude']
        lon = city_response['results'][0]['longitude']

        # Then we compute the dates used to get the weather data
        ## The API only has data until 9 days ago
        end_date = (datetime.date.today() - relativedelta(days=8)).strftime('%Y-%m-%d') 
        start_date = (datetime.date.today() - relativedelta(years=years)).strftime('%Y-%m-%d')

        # So we make the request to the weather API archive
        weather_response = requests.get('https://archive-api.open-meteo.com/v1/archive',
                           params = {'latitude': lat,
                                    'longitude': lon,
                                    'start_date': start_date,
                                    'end_date': end_date,
                                    'hourly': weather_params,
                                    'timezone': 'auto'}).json()

        weather_df = pd.DataFrame(weather_response['hourly'], columns = ['time'] + weather_params)
        
        weather_df.to_csv(path, index=False)
    
    print('Done')
    return weather_df

In [160]:
from prophecy import data

df = data.get_weather('Amiens',overwrite=True)
df = data.get_weather('Paris',overwrite=True)
df = data.get_weather('Marseille',overwrite=True)

Creating a new .csv file for Amiens
Done ✅
Creating a new .csv file for Paris
Done ✅
Creating a new .csv file for Marseille
Done ✅


## Preprocessing the weather data

First let's explore and see if we find any problems

In [144]:
df.duplicated().sum() # Checking for duplicates

0

In [145]:
df.isnull().sum().sort_values(ascending=False)/len(df) # Checking for null values

time                             0.0
vapor_pressure_deficit           0.0
windspeed_100m                   0.0
winddirection_10m                0.0
winddirection_100m               0.0
windgusts_10m                    0.0
et0_fao_evapotranspiration       0.0
weathercode                      0.0
soil_temperature_0_to_7cm        0.0
temperature_2m                   0.0
soil_temperature_7_to_28cm       0.0
soil_temperature_28_to_100cm     0.0
soil_temperature_100_to_255cm    0.0
soil_moisture_0_to_7cm           0.0
soil_moisture_7_to_28cm          0.0
soil_moisture_28_to_100cm        0.0
windspeed_10m                    0.0
diffuse_radiation                0.0
direct_normal_irradiance         0.0
direct_radiation                 0.0
shortwave_radiation              0.0
cloudcover_high                  0.0
cloudcover_mid                   0.0
cloudcover_low                   0.0
cloudcover                       0.0
snowfall                         0.0
rain                             0.0
p

Then we can preprocess our data

In [146]:
from sklearn.preprocessing import MinMaxScaler

In [147]:
scaler = MinMaxScaler()
df_preproc = df.copy()
for feature in weather_params:
    scaler.fit(df[[feature]])
    df_preproc[feature] = scaler.transform(df[[feature]])

In [149]:
df_preproc.head()

Unnamed: 0,time,temperature_2m,relativehumidity_2m,dewpoint_2m,apparent_temperature,pressure_msl,surface_pressure,precipitation,rain,snowfall,...,weathercode,vapor_pressure_deficit,soil_temperature_0_to_7cm,soil_temperature_7_to_28cm,soil_temperature_28_to_100cm,soil_temperature_100_to_255cm,soil_moisture_0_to_7cm,soil_moisture_7_to_28cm,soil_moisture_28_to_100cm,soil_moisture_100_to_255cm
0,2013-03-09T00:00,0.334239,0.87013,0.547826,0.370722,0.423932,0.427574,0.0,0.0,0.0,...,0.013333,0.034759,0.24282,0.275362,0.082474,0.022388,0.870968,0.862543,0.69962,0.430556
1,2013-03-09T01:00,0.358696,0.805195,0.547826,0.346008,0.415385,0.418848,0.0,0.0,0.0,...,0.013333,0.053476,0.221932,0.271739,0.082474,0.022388,0.867742,0.862543,0.69962,0.430556
2,2013-03-09T02:00,0.36413,0.805195,0.553623,0.315589,0.415385,0.417103,0.0,0.0,0.0,...,0.026667,0.05615,0.206266,0.264493,0.082474,0.022388,0.864516,0.862543,0.69962,0.430556
3,2013-03-09T03:00,0.277174,0.922078,0.507246,0.307985,0.403419,0.404887,0.0,0.0,0.0,...,0.026667,0.018717,0.198433,0.257246,0.082474,0.022388,0.864516,0.862543,0.69962,0.430556
4,2013-03-09T04:00,0.277174,0.948052,0.515942,0.302281,0.388034,0.38918,0.008475,0.008475,0.0,...,0.68,0.010695,0.203655,0.25,0.082474,0.022388,0.867742,0.859107,0.69962,0.430556
