# Assignment - Time Series Analysis of London Energy Consumption

KATE expects your code to define variables with specific names that correspond to certain things we are interested in.

KATE will run your notebook from top to bottom and check the latest value of those variables, so make sure you don't overwrite them.

* Remember to uncomment the line assigning the variable to your answer and don't change the variable or function names.
* Use copies of the original or previous DataFrames to make sure you do not overwrite them by mistake.

You will find instructions below about how to define each variable.

Once you're happy with your code, upload your notebook to KATE to check your feedback.

A time series from smart-meters in London is provided from end of 2011 to 2014. The data contains the daily consumption (kWh) averaged over several households in London. 

The last timestamp for which consumpton is available is January 31st 2014. 

The aim of this assignment is to build a forecasting model that predicts the consumption for February 2014 (exluding 28th, so from the 1st to the 27th).

## Get the data
Load the dataset using `.read_csv()` and assign it to a DataFrame, `df`:

In [None]:
import pandas as pd

#df = pd.read_csv("data/london_smartmeter_basic.csv")
df = pd.read_csv("london_smartmeter_basic.csv")


df.head()

## Get Training and Evaluation Time Series

Separate the original dataframe into a training set `ts` and evaluation set `ts_eval`.

`ts_eval` is the evaluation time series, it contains the list of days you will need to predict consumption for so KATE can evaluate the performance of your model. 

In the dataframe provided, the column `evaluation_set` determines whether a row is for evaluation or not.

Get all the rows for the evaluation set using:

```
df.loc[df.evaluation_set]
```

Note, you also need to do the following:
* Set the index of the time series to be datetime values based on the column day. You can use `pd.DatetimeIndex()` for this.
* Rename the columns `day` to `ds` and `consumption` to  `y` in the dataframe.

In [None]:
def preprocess(df):
    """This function takes a dataframe and preprocesses it so it is
    ready for the training stage.

    The DataFrame contains the time axis and the target column.

    It also contains some rows for which the target column is unknown.
    Those are the observations you will need to predict for KATE
    to evaluate the performance of your model.

    Here you will need to return the training time series: ts together
    with the preprocessed evaluation time serie: ts_eval. You also need
    to set the index of the time series to be datetime values based
    on the column day. You can use pd.DatetimeIndex() for this.

    Make sure you return ts_eval separately! It needs to contain
    all the rows for evaluation -- they are marked with the column
    evaluation_set. You can easily select them with pandas:

         - df.loc[df.evaluation_set]


    :param df: the dataset
    :type df: pd.DataFrame
    :return: ts, ts_eval
    """
raise NotImplementedError

Once you have implemented the `preprocess` function, you can call it with `ts, ts_eval = preprocess(df)`:

In [None]:
# Your code here...
#    ts, ts_eval = preprocess(df)


## Train an AR model

Write a function to train an AR model using your train time series `ts`.

You can use the module Prophet; do not forget to set the growth and the seasonality parameters. You can also use the  Autoregressive AR(p) model `tsa.ar_model.AutoReg` from the `statsmodels` module to train an AR model.

*NOTE*: Since with this project your model will be trained directly on KATE, it is limited to models that can be trained under 1min. You will receive a `TimeoutError` if your model takes too long.

In [None]:
from prophet import Prophet

def train(ts):
    """Trains a new model on ts and returns it.

    :param ts: your processed training time serie
    :type ts: pd.DataFrame
    :return: a trained model
    """
raise NotImplementedError

Once you have implemented the `train` function, you can call with with `model = train(ts)`:

In [None]:
# Your code here...
#    model = train(ts)


## Evaluate Your Model

Write a function `predict` that takes the model you have trained as well as a test time series (on KATE this will be the `ts_eval` that you processed above, but you can test this function locally with your own time series). 

The function should return the predictions on the test set assigned to a variable called `y_pred`.


In [None]:
def predict(model, ts_test):
    """This functions takes your trained model as well
    as a processed test time series and returns predictions.

    This should return your predictions either as a pd.DataFrame with one column
    or a pd.Series

    :param model: your trained model
    :param ts_test: a processed test time serie (on KATE it will be ts_eval)
    :return: y_pred, your predictions
    """


Once implemented, you can check our your predictions by calling predict with `model` and `ts_eval`:

In [None]:
# Your code here...
#    y_pred = predict(model, ts_eval)
