[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ourownstory/neural_prophet/blob/master/tutorials/feature-use/collect_predictions.ipynb)

# Prediction Collection

## Collect Predictions
First, let's fit a vanilla model:

In [1]:
if 'google.colab' in str(get_ipython()):
    !pip install git+https://github.com/ourownstory/neural_prophet.git # may take a while
    #!pip install neuralprophet # much faster, but may not have the latest upgrades/bugfixes
    
import pandas as pd
from neuralprophet import NeuralProphet, set_log_level
set_log_level("ERROR")

In [2]:
data_location = "https://raw.githubusercontent.com/ourownstory/neuralprophet-data/main/datasets/"
df = pd.read_csv(data_location + "air_passengers.csv")
df.tail(3)

Unnamed: 0,ds,y
141,1960-10-01,461
142,1960-11-01,390
143,1960-12-01,432


In [3]:
m = NeuralProphet(n_lags=5, n_forecasts=3)
metrics_train = m.fit(df=df, freq="MS")

  0%|          | 0/109 [00:00<?, ?it/s]

  0%|          | 0/109 [00:00<?, ?it/s]

                                                                                                                                                                                                                                  

## Getting the latest forecast df
We may get the df of the latest forecast for data analysis.

In [4]:
forecast = m.predict(df)

In [5]:
df_fc = m.get_latest_forecast(forecast)
df_fc.head(3)

Unnamed: 0,ds,y,yhat1
0,1960-10-01,461.0,463.035004
1,1960-11-01,390.0,410.434906
2,1960-12-01,432.0,439.753998


Number of steps before latests forecast could be included. Here we include 5 steps before latest forecast.

In [6]:
df_fc = m.get_latest_forecast(forecast, include_previous_forecasts=5)
df_fc.head(3)

Unnamed: 0,ds,y,yhat6,yhat5,yhat4,yhat3,yhat2,yhat1
0,1960-05-01,472.0,476.862457,,,,,
1,1960-06-01,535.0,531.84314,527.112732,,,,
2,1960-07-01,622.0,579.601501,576.99292,590.679077,,,


Historical data could be included, however be aware that the df could be large.

In [7]:
df_fc = m.get_latest_forecast(forecast, include_history_data=True)
df_fc.head(3)

Unnamed: 0,ds,y,yhat1
0,1949-01-01,112.0,
1,1949-02-01,118.0,
2,1949-03-01,132.0,


## Collect in-sample predictions

## Predictions sorted based on forecast target
Calling `predict`, we get a `df_forecast` where each `'yhat<i>'` refers to the `<i>` -step-ahead prediction for **this row's datetime being the target**.
Here, `<i>`  refers to the age of the prediction.

e.g. `yhat3` is the prediction for this datetime, predicted 3 steps ago, it is "3 steps old".

Note that the last row `1961-3-01` only has a `yhat3`, which was forecasted at the last location with data `1960-12-01`.
Because we lack inputs after that location, we do not have more recent predictions `yhat1` from `1961-2-01` nor  `yhat2` from `1961-1-01`.

We also get the individual forecast components, which also refer to their respective contrigution to `yhat<i>`, forecasted `<i>` steps ago. 

Components without an added number are only time-dependent or based on future regressors, neither are lagged, and thus a single value.

In [8]:
df = pd.read_csv(data_location + "air_passengers.csv")
forecast = m.predict(df)
forecast.tail(3)

Unnamed: 0,ds,y,yhat1,residual1,yhat2,residual2,yhat3,residual3,ar1,ar2,ar3,trend,season_yearly
141,1960-10-01,461.0,463.035004,2.035004,466.74472,5.74472,473.37146,12.37146,-234.828171,-231.118423,-224.491684,717.820862,-19.95767
142,1960-11-01,390.0,411.421021,21.421021,410.434906,20.434906,419.941437,29.941437,-278.072388,-279.058502,-269.552002,724.978149,-35.484722
143,1960-12-01,432.0,422.152222,-9.847778,439.118073,7.118073,439.753998,7.753998,-322.755859,-305.790009,-305.154083,731.904602,13.003475


## Predictions based on forecast start
Calling `predict_raw`, we get a `df` where each `'step<i>'` refers to the `<i>`th step-ahead prediction **starting at this row's datetime**.
Here, `<i>`  refers to how many steps ahead the prediction is targeted at.

e.g. `step0` is the prediction for this datetime. `step1` is the prediction for the next datetime. 

All the predictions of a particular row were made at the same time: One step before the rows datestamp.

In [9]:
df = pd.read_csv(data_location + "air_passengers.csv")
forecast = m.predict(df, decompose=False, raw=True)
forecast.tail(3)

KeyError: 'ID'

Note that the last row contains the last possible forecast, forecasting `1961-1-01`, `1961-2-01` and `1961-3-01` with data available at `1960-12-01`.


Setting `decompose=True` will include the individual forecast components, which also refer to their respective contrigution to `step<i>` into the future. 

In [None]:
df = pd.read_csv(data_location + "air_passengers.csv")
forecast = m.predict(df, decompose=True, raw=True)
forecast.tail(3)

## Collect out-of-sample predictions
This is how you can extend predictions into the unknown future:

In [None]:
df = pd.read_csv(data_location + "air_passengers.csv")
future = m.make_future_dataframe(df, periods=3) # periods=m.n_forecasts, n_historic_predictions=False

Now, the forecast dataframe only contains predictions about the yet unobserved future.

In [None]:
future.tail()

## Predictions based on forecast target

In [None]:
forecast = m.predict(future)
forecast.tail(3)

## Predictions based on forecast start
We can also get the forecasts based on the forecast start. here, each `stepX` refers to X steps from datestamp `ds`

In [None]:
forecast = m.predict(future, raw=True, decompose=False)
forecast