[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ourownstory/neural_prophet/blob/master/tutorials/feature-use/collect_predictions.ipynb)

First, let's fit a vanilla model:

In [1]:
if 'google.colab' in str(get_ipython()):
    !pip install git+https://github.com/ourownstory/neural_prophet.git # may take a while
    #!pip install neuralprophet # much faster, but may not have the latest upgrades/bugfixes
    
import pandas as pd
from neuralprophet import NeuralProphet, set_log_level
set_log_level("ERROR")

In [2]:
data_location = "https://raw.githubusercontent.com/ourownstory/neuralprophet-data/main/datasets/"
df = pd.read_csv(data_location + "air_passengers.csv")
df.tail(3)

Unnamed: 0,ds,y
141,1960-10-01,461
142,1960-11-01,390
143,1960-12-01,432


In [3]:
m = NeuralProphet(n_lags=5, n_forecasts=3)
metrics_train = m.fit(df=df, freq="MS")

  0%|          | 0/208 [00:00<?, ?it/s]

                                                                                                                                                  

# Collect in-sample predictions

## Predictions sorted based on forecast target
Calling `predict`, we get a `df_forecast` where each `'yhat<i>'` refers to the `<i>` -step-ahead prediction for **this row's datetime being the target**.
Here, `<i>`  refers to the age of the prediction.

e.g. `yhat3` is the prediction for this datetime, predicted 3 steps ago, it is "3 steps old".

Note that the last row `1961-3-01` only has a `yhat3`, which was forecasted at the last location with data `1960-12-01`.
Because we lack inputs after that location, we do not have more recent predictions `yhat1` from `1961-2-01` nor  `yhat2` from `1961-1-01`.

We also get the individual forecast components, which also refer to their respective contrigution to `yhat<i>`, forecasted `<i>` steps ago. 

Components without an added number are only time-dependent or based on future regressors, neither are lagged, and thus a single value.

In [4]:
df = pd.read_csv(data_location + "air_passengers.csv")
forecast = m.predict(df)
forecast.tail(3)

Unnamed: 0,ds,y,yhat1,residual1,yhat2,residual2,yhat3,residual3,ar1,ar2,ar3,trend,season_yearly
141,1960-10-01,461,465.766632,4.766632,470.808533,9.808533,479.141937,18.141937,-212.494751,-207.45285,-199.119446,697.528809,-19.267389
142,1960-11-01,390,410.267487,20.267487,411.062592,21.062592,422.379547,32.379547,-260.008392,-259.213318,-247.896347,704.447998,-34.172077
143,1960-12-01,432,421.338196,-10.661804,440.769836,8.769836,442.25235,10.25235,-301.709381,-282.27774,-280.795227,711.14386,11.903727


## Predictions based on forecast start
Calling `predict_raw`, we get a `df` where each `'step<i>'` refers to the `<i>`th step-ahead prediction **starting at this row's datetime**.
Here, `<i>`  refers to how many steps ahead the prediction is targeted at.

e.g. `step0` is the prediction for this datetime. `step1` is the prediction for the next datetime. 

All the predictions of a particular row were made at the same time: One step before the rows datestamp.

In [5]:
df = pd.read_csv(data_location + "air_passengers.csv")
forecast = m.predict(df, decompose=False, raw=True)
forecast.tail(3)

Unnamed: 0,ds,step0,step1,step2
136,1960-10-01,465.766632,411.062592,442.25235
137,1960-11-01,410.267487,440.769836,459.604309
138,1960-12-01,421.338196,444.649597,458.656647


Note that the last row contains the last possible forecast, forecasting `1961-1-01`, `1961-2-01` and `1961-3-01` with data available at `1960-12-01`.


Setting `decompose=True` will include the individual forecast components, which also refer to their respective contrigution to `step<i>` into the future. 

In [6]:
df = pd.read_csv(data_location + "air_passengers.csv")
forecast = m.predict(df, decompose=True, raw=True)
forecast.tail(3)

Unnamed: 0,ds,step0,step1,step2,trend0,trend1,trend2,season_yearly0,season_yearly1,season_yearly2,ar0,ar1,ar2
136,1960-10-01,465.766632,411.062592,442.25235,697.528809,704.447998,711.14386,-19.267389,-34.172077,11.903727,-212.494751,-259.213318,-280.795227
137,1960-11-01,410.267487,440.769836,459.604309,704.447998,711.14386,718.062927,-34.172077,11.903727,3.945953,-260.008392,-282.27774,-262.404602
138,1960-12-01,421.338196,444.649597,458.656647,711.14386,718.062927,724.982117,11.903727,3.945953,-25.682663,-301.709381,-277.359314,-240.642838


# Collect out-of-sample predictions
This is how you can extend predictions into the unknown future:

In [7]:
df = pd.read_csv(data_location + "air_passengers.csv")
future = m.make_future_dataframe(df, periods=3) # periods=m.n_forecasts, n_historic_predictions=False

Now, the forecast dataframe only contains predictions about the yet unobserved future.

## Predictions based on forecast target

In [8]:
forecast = m.predict(future)
forecast.tail(3)

Unnamed: 0,ds,y,yhat1,residual1,yhat2,residual2,yhat3,residual3,ar1,ar2,ar3,trend,season_yearly
5,1961-01-01,,451.39209,,,,,,-270.616791,,,718.062927,3.945953
6,1961-02-01,,,,465.398132,,,,,-233.901337,,724.982117,-25.682663
7,1961-03-01,,,,,,525.936646,,,,-187.814682,731.231628,-17.480263


## Predictions based on forecast start
We can also get the forecasts based on the forecast start

In [9]:
forecast = m.predict(future, raw=True, decompose=False)
forecast

Unnamed: 0,ds,step0,step1,step2
0,1961-01-01,451.39209,465.398132,525.936646


### Advanced: Get predictions based on forecast start as arrays
This function was not meant to be used directly, but if you have a specific need, it may be useful to get the values directly as arrays.
The returned predictions are also based on forecast origin.

... and as an array

In [10]:
dates, predicted, components = m._predict_raw(future, include_components=True)

In [11]:
dates[-3:]

5   1961-01-01
Name: ds, dtype: datetime64[ns]

In [12]:
predicted[-3:]

array([[451.3921 , 465.39813, 525.93665]], dtype=float32)

In [13]:
[(key, values[-3:]) for key, values in components.items()]

[('trend', array([[718.0629, 724.9821, 731.2316]], dtype=float32)),
 ('season_yearly',
  array([[  3.9459527, -25.682663 , -17.480263 ]], dtype=float32)),
 ('ar', array([[-270.6168 , -233.90134, -187.81468]], dtype=float32))]