[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ourownstory/neural_prophet/blob/master/tutorials/feature-use/collect_predictions.ipynb)

First, let's fit a vanilla model:

In [1]:
import pandas as pd
from neuralprophet import NeuralProphet, set_log_level
set_log_level("ERROR")

In [2]:
data_location = "https://raw.githubusercontent.com/ourownstory/neuralprophet-data/main/datasets/"
df = pd.read_csv(data_location + "air_passengers.csv")
df.tail(3)

Unnamed: 0,ds,y
141,1960-10-01,461
142,1960-11-01,390
143,1960-12-01,432


In [3]:
m = NeuralProphet(n_lags=5, n_forecasts=3)
metrics_train = m.fit(df=df, freq="MS")

  0%|          | 0/208 [00:00<?, ?it/s]

                                                                                                                                                  

# Collect in-sample predictions

## Predictions sorted based on forecast target
Calling `predict`, we get a `df_forecast` where each `'yhat<i>'` refers to the `<i>` -step-ahead prediction for **this row's datetime being the target**.
Here, `<i>`  refers to the age of the prediction.

e.g. `yhat3` is the prediction for this datetime, predicted 3 steps ago, it is "3 steps old".

Note that the last row `1961-3-01` only has a `yhat3`, which was forecasted at the last location with data `1960-12-01`.
Because we lack inputs after that location, we do not have more recent predictions `yhat1` from `1961-2-01` nor  `yhat2` from `1961-1-01`.

We also get the individual forecast components, which also refer to their respective contrigution to `yhat<i>`, forecasted `<i>` steps ago. 

Components without an added number are only time-dependent or based on future regressors, neither are lagged, and thus a single value.

In [4]:
df = pd.read_csv("../../tests/test-data/air_passengers.csv")
forecast = m.predict(df)
forecast.tail(3)

Unnamed: 0,ds,y,yhat1,residual1,yhat2,residual2,yhat3,residual3,ar1,ar2,ar3,trend,season_yearly
141,1960-10-01,461,464.971893,3.971893,469.978088,8.978088,477.406097,16.406097,-203.164963,-198.158768,-190.730743,687.89325,-19.756405
142,1960-11-01,390,411.290985,21.290985,411.018768,21.018768,422.92392,32.92392,-248.486053,-248.758255,-236.853104,694.680359,-34.903366
143,1960-12-01,432,422.637665,-9.362335,441.483795,9.483795,442.197876,10.197876,-289.633789,-270.787628,-270.073578,701.248657,11.022773


## Predictions based on forecast start
Calling `predict_raw`, we get a `df` where each `'step<i>'` refers to the `<i>`th step-ahead prediction **starting at this row's datetime**.
Here, `<i>`  refers to how many steps ahead the prediction is targeted at.

e.g. `step0` is the prediction for this datetime. `step1` is the prediction for the next datetime. 

All the predictions of a particular row were made at the same time: One step before the rows datestamp.

In [5]:
df = pd.read_csv("../../tests/test-data/air_passengers.csv")
forecast = m.predict(df, decompose=False, raw=True)
forecast.tail(3)

Unnamed: 0,ds,step0,step1,step2
136,1960-10-01,464.971893,411.018768,442.197876
137,1960-11-01,411.290985,441.483795,458.19043
138,1960-12-01,422.637665,443.060791,457.305481


Note that the last row contains the last possible forecast, forecasting `1961-1-01`, `1961-2-01` and `1961-3-01` with data available at `1960-12-01`.


Setting `decompose=True` will include the individual forecast components, which also refer to their respective contrigution to `step<i>` into the future. 

In [6]:
df = pd.read_csv("../../tests/test-data/air_passengers.csv")
forecast = m.predict(df, decompose=True, raw=True)
forecast.tail(3)

Unnamed: 0,ds,step0,step1,step2,trend0,trend1,trend2,season_yearly0,season_yearly1,season_yearly2,ar0,ar1,ar2
136,1960-10-01,464.971893,411.018768,442.197876,687.89325,694.680359,701.248657,-19.756405,-34.903366,11.022773,-203.164963,-248.758255,-270.073578
137,1960-11-01,411.290985,441.483795,458.19043,694.680359,701.248657,708.035828,-34.903366,11.022773,4.086137,-248.486053,-270.787628,-253.931519
138,1960-12-01,422.637665,443.060791,457.305481,701.248657,708.035828,714.823059,11.022773,4.086137,-25.781528,-289.633789,-269.061157,-231.736038


# Collect out-of-sample predictions
This is how you can extend predictions into the unknown future:

In [7]:
df = pd.read_csv("../../tests/test-data/air_passengers.csv")
future = m.make_future_dataframe(df, periods=3) # periods=m.n_forecasts, n_historic_predictions=False

Now, the forecast dataframe only contains predictions about the yet unobserved future.

## Predictions based on forecast target

In [8]:
forecast = m.predict(future)
forecast.tail(3)

Unnamed: 0,ds,y,yhat1,residual1,yhat2,residual2,yhat3,residual3,ar1,ar2,ar3,trend,season_yearly
5,1961-01-01,,452.290009,,,,,,-259.83194,,,708.035828,4.086137
6,1961-02-01,,,,466.089172,,,,,-222.952362,,714.823059,-25.781528
7,1961-03-01,,,,,,525.474487,,,,-178.790146,720.95343,-16.688808


## Predictions based on forecast start
We can also get the forecasts based on the forecast start

In [9]:
forecast = m.predict(future, raw=True, decompose=False)
forecast

Unnamed: 0,ds,step0,step1,step2
0,1961-01-01,452.290009,466.089172,525.474487


### Advanced: Get predictions based on forecast start as arrays
This function was not meant to be used directly, but if you have a specific need, it may be useful to get the values directly as arrays.
The returned predictions are also based on forecast origin.

... and as an array

In [10]:
dates, predicted, components = m._predict_raw(future, include_components=True)

In [11]:
dates[-3:]

5   1961-01-01
Name: ds, dtype: datetime64[ns]

In [12]:
predicted[-3:]

array([[452.29   , 466.08917, 525.4745 ]], dtype=float32)

In [13]:
[(key, values[-3:]) for key, values in components.items()]

[('trend', array([[708.0358 , 714.82306, 720.9534 ]], dtype=float32)),
 ('season_yearly',
  array([[  4.086137, -25.781528, -16.688808]], dtype=float32)),
 ('ar', array([[-259.83194, -222.95236, -178.79015]], dtype=float32))]