# Data exploration

In this notebook, you are to explore the data. You are to fulfill the following tasks:
- **Univariate analysis**
    - study the consommation_totale time series, in terms of its trend, cycle, seasonality, and stationnarity 
    - **Tips**: you can analyse ACF, PACF plots, perform Augmented Dickey-Fuller test
- **Multivariate analysis**
    - study other variables, retrieve insights
    - study the correlation between the Bitcoin market price and other blockchain variables 
    - **Tips**: Pearson correlation

# Import packages

In [None]:
import requests
import pandas as pd
import os
from google.colab import drive

In [None]:
drive.mount('/content/gdrive')
if os.getcwd() != "/content/gdrive/MyDrive/Colab Notebooks/EI_TS_CS":
  os.chdir("/content/gdrive/MyDrive/Colab Notebooks/EI_TS_CS")

In [511]:
%run ./utils.ipynb

# Import data

Request

In [513]:
URL = "https://data.enedis.fr/api/explore/v2.0/catalog/datasets/bilan-electrique-demi-heure/exports/json"

PARAMS = {
        "limit" : -1,
        "sort" : "horodate"
    }

FEATURE = "consommation_totale"

DATA_PATH = "data/bilan-electrique.csv"

In [514]:
if not(os.path.isfile(DATA_PATH)):
    req = requests.get(URL, PARAMS).json()
    df = pd.json_normalize(req)
    df.to_csv(DATA_PATH)
df_full = pd.read_csv(DATA_PATH)

In [515]:
df.head()

Unnamed: 0,horodate,consommation_totale,date,year
0,2018-05-12 22:00:00,33736900000.0,2018-05-12,2018
1,2018-05-12 22:30:00,31183630000.0,2018-05-12,2018
2,2018-05-12 23:00:00,30245520000.0,2018-05-12,2018
3,2018-05-12 23:30:00,29533030000.0,2018-05-12,2018
4,2018-05-13 00:00:00,28996320000.0,2018-05-13,2018


Preprocess data

In [516]:
columns = ["horodate", FEATURE]

In [517]:
df_full["horodate"] = pd.to_datetime(df_full["horodate"].apply(lambda x : x[:19]),
               format='%Y-%m-%dT%H:%M:%S')

df = df_full[columns]

# Data exploration

Plot time serie

In [518]:
fig = px.line(
    df,
    x="horodate",
    y=FEATURE,
    title='Evolution of Total Consumption',
    width=600
    )

fig.update_layout(
    yaxis_title="Total Consumption (W)", xaxis_title="Date"
)

fig.show()

Group data at day level

In [519]:
df["date"] = df["horodate"].dt.date
df_day = df.groupby("date")[FEATURE].mean().reset_index()



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



In [520]:
fig = px.line(
    df_day,
    x="date",
    y=FEATURE,
    title='Total consumption'
    )
fig.show()

Plot correlation diagrams

In [521]:
create_corr_plot(df_day[FEATURE])

In [522]:
create_corr_plot(df_day[FEATURE], plot_pacf=True)

# Add your own data exploration here