### Prophet baseline
Here we shall create a forecast of the [Tabular Playground Series - Jan 2022](https://www.kaggle.com/c/tabular-playground-series-jan-2022) data using [Prophet](https://facebook.github.io/prophet/).

In [None]:
import numpy  as np
import pandas as pd

from fbprophet import Prophet

Read in the competition data

In [None]:
train_df = pd.read_csv("../input/tabular-playground-series-jan-2022/train.csv",parse_dates=['date'])
test_df  = pd.read_csv("../input/tabular-playground-series-jan-2022/test.csv",parse_dates=['date'])

# Prophet requires the columns to be named ds and y
train_df = train_df.rename(columns={"date": "ds", "num_sold": "y"})
test_df  = test_df.rename(columns={"date": "ds", "num_sold": "y"})

Add a GDP column from the dataset ["GDP 2015-2019: Finland, Norway, and Sweden"](https://www.kaggle.com/carlmcbrideellis/gdp-20152019-finland-norway-and-sweden), which will be used as an [additional regressor](http://facebook.github.io/prophet/docs/seasonality,_holiday_effects,_and_regressors.html#additional-regressors)

In [None]:
GDP_data = pd.read_csv("../input/gdp-20152019-finland-norway-and-sweden/GDP_data_2015_to_2019_Finland_Norway_Sweden.csv",index_col=["year"])

def get_GDP(row):
    country = 'GDP_' + row.country
    return GDP_data.loc[row.ds.year, country]

train_df["GDP"] = pd.DataFrame({'GDP': train_df.apply(get_GDP, axis=1)})
test_df["GDP"]  = pd.DataFrame({'GDP': test_df.apply(get_GDP, axis=1)})

Create a [holidays](http://facebook.github.io/prophet/docs/seasonality,_holiday_effects,_and_regressors.html#modeling-holidays-and-special-events) file

In [None]:
new_year = pd.DataFrame({
  'holiday': 'new_year',
  'ds': pd.to_datetime(['2015-01-01', '2016-01-01', '2017-01-01', '2018-01-01', '2019-01-01']),
  'lower_window': -8,
  'upper_window':  1,
})
Easter = pd.DataFrame({
  'holiday': 'Easter',
  'ds': pd.to_datetime(['2015-04-05', '2016-03-27', '2017-04-16', '2018-04-01', '2019-04-21']),
  'lower_window':  0,
  'upper_window': 56,
})
holidays = pd.concat((new_year, Easter))

Here is our [Prophet](https://facebook.github.io/prophet/) model. Given the piecewise continuous nature of the GDP data we [specify the locations of potential changepoints](http://facebook.github.io/prophet/docs/trend_changepoints.html#specifying-the-locations-of-the-changepoints), locating them at the end of each year of the training data:

In [None]:
def one_forecast(train,test):
    model = Prophet(
        holidays = holidays,
        changepoints=['2015-12-31','2016-12-31','2017-12-31','2018-12-31']
    )
    model.add_regressor('GDP')
    model.fit(train)
    forecast_pd = model.predict(test)
    return(forecast_pd)

Define a function to select a given `country`, `store`, and `product` from the train and test data

In [None]:
def filter_csp(country,store,product):
    train_tmp = train_df.loc[(train_df['country']  == country) 
                            & (train_df['store']   == store)
                            & (train_df['product'] == product)]
    test_tmp = test_df.loc[(test_df['country']     == country)
                            & (test_df['store']    == store)
                            & (test_df['product']  == product)]
    return(train_tmp,test_tmp)

Calculate 18 forecasts, one for each `country`, `store`, and `product` combination

In [None]:
# create an empty list to be filled with forecasts
forecasts_list = []

countries = train_df["country"].unique().tolist()
stores    = train_df["store"].unique().tolist()
products  = train_df["product"].unique().tolist()

for country in countries:
    for store in stores:
        for product in products:
            train_tmp, test_tmp = filter_csp(country,store,product)
            forecast_pd = one_forecast(train_tmp, test_tmp)
            single_forecast = pd.merge(test_tmp,forecast_pd,on='ds')
            # append the results to the list
            forecasts_list.append(single_forecast)

Merge the forecast results with the test data

In [None]:
results = pd.concat(forecasts_list)
df_submission = pd.merge(test_df, results, on=['ds','country','store','product'])
# change the predicted target name back to "num_sold"
df_submission = df_submission.rename(columns={"yhat": "num_sold"})

Create a `submission.csv` file

In [None]:
sample = pd.read_csv("../input/tabular-playground-series-jan-2022/sample_submission.csv")
sample['num_sold'] = df_submission['num_sold']
sample[['row_id', 'num_sold']].to_csv('submission.csv', index=False)

### Related reading
* [Prophet](https://facebook.github.io/prophet/) homepage
* [Sean J. Taylor and Benjamin Letham "*Forecasting at scale*", PeerJ Preprints 5:e3190v2 (2017)](https://peerj.com/preprints/3190/)