<a href="https://colab.research.google.com/github/sankhauri/Sales-forecasting-By-Prophet/blob/main/Sales_forecasting.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Sales forecasting

In this project we going to establish a model for forecasting sales of different items from  different stores. Our dataset contains the sales data for 10 stores and 50 items.

# Data set:
Data fields
date - Date of the sale data. There are no holiday effects or store closures.

store - Store ID

item - Item ID

sales - Number of items sold at a particular store on a particular date.

# Data Source
https://www.kaggle.com/competitions/demand-forecasting-kernels-only/data

In [5]:
## getting the dataset

from google.colab import drive
drive.mount('/content/drive')
file_path = '/content/drive/My Drive/train.csv'


Mounted at /content/drive


In [6]:
import pandas as pd
df = pd.read_csv(file_path)
df.head()

Unnamed: 0,date,store,item,sales
0,2013-01-01,1,1,13
1,2013-01-02,1,1,11
2,2013-01-03,1,1,14
3,2013-01-04,1,1,13
4,2013-01-05,1,1,10


In [7]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 913000 entries, 0 to 912999
Data columns (total 4 columns):
 #   Column  Non-Null Count   Dtype 
---  ------  --------------   ----- 
 0   date    913000 non-null  object
 1   store   913000 non-null  int64 
 2   item    913000 non-null  int64 
 3   sales   913000 non-null  int64 
dtypes: int64(3), object(1)
memory usage: 27.9+ MB


In [8]:
## Sorting the dataset according to store number and item number
df_sorted = df.sort_values(by=['store', 'item'], ascending=[True, True])
df_sorted

Unnamed: 0,date,store,item,sales
0,2013-01-01,1,1,13
1,2013-01-02,1,1,11
2,2013-01-03,1,1,14
3,2013-01-04,1,1,13
4,2013-01-05,1,1,10
...,...,...,...,...
912995,2017-12-27,10,50,63
912996,2017-12-28,10,50,59
912997,2017-12-29,10,50,74
912998,2017-12-30,10,50,62


In [9]:
## Create dataframe for store 1
df_1=df_sorted.iloc[:91300]
df_1

Unnamed: 0,date,store,item,sales
0,2013-01-01,1,1,13
1,2013-01-02,1,1,11
2,2013-01-03,1,1,14
3,2013-01-04,1,1,13
4,2013-01-05,1,1,10
...,...,...,...,...
896561,2017-12-27,1,50,38
896562,2017-12-28,1,50,52
896563,2017-12-29,1,50,59
896564,2017-12-30,1,50,66


In [10]:
# Pivot without aggregation ( this will put the sales for each items in columns to create a multivariable time series)
pivot_df_1 = df_1.pivot( index='date',columns='item', values='sales')
## Changing column names to item_1,..
columnnames = {}
count = 0
for i in pivot_df_1.columns:

  count += 1

  columnnames[i] = f"store1_item_{count}"

#columnnames
pivot_df_1.rename(columns = columnnames ,inplace = True)
# Rename the dataframe to df_1
df_1=pivot_df_1
df_1.head()  ## date is already in index.

item,store1_item_1,store1_item_2,store1_item_3,store1_item_4,store1_item_5,store1_item_6,store1_item_7,store1_item_8,store1_item_9,store1_item_10,...,store1_item_41,store1_item_42,store1_item_43,store1_item_44,store1_item_45,store1_item_46,store1_item_47,store1_item_48,store1_item_49,store1_item_50
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2013-01-01,13,33,15,10,11,31,25,33,18,37,...,6,21,22,20,37,30,17,21,18,30
2013-01-02,11,43,30,11,6,36,23,37,23,34,...,15,24,27,15,40,30,15,26,10,32
2013-01-03,14,23,14,8,8,18,34,38,25,32,...,5,14,19,11,42,30,5,25,17,25
2013-01-04,13,18,10,19,9,19,36,54,22,45,...,9,22,29,22,49,37,13,26,22,32
2013-01-05,10,34,23,12,8,31,38,51,29,35,...,13,18,34,19,52,28,12,28,15,35


In [11]:
## Changing the index to datetime
df_1.index = pd.to_datetime(df_1.index)

In [12]:
# Checking the missing values
df_1.isna().sum()

Unnamed: 0_level_0,0
item,Unnamed: 1_level_1
store1_item_1,0
store1_item_2,0
store1_item_3,0
store1_item_4,0
store1_item_5,0
store1_item_6,0
store1_item_7,0
store1_item_8,0
store1_item_9,0
store1_item_10,0


In [13]:
print(df_1.index.min())
print(df_1.index.max())

2013-01-01 00:00:00
2017-12-31 00:00:00


### Plotting Time Series
We are creating time series plot for only first five items. Putting all of 50 grpahs together will be messy.

In [14]:
import plotly.graph_objects as go
from plotly.subplots import make_subplots
# Initialize the figure
fig = go.Figure()

# Define number of rows and columns for the subplots
rows, cols = 5, 1 # 10x5 grid

# Create subplots
fig = make_subplots(rows=rows, cols=cols, shared_xaxes=False, shared_yaxes=False,
                    vertical_spacing=0.1, horizontal_spacing=0.1,
                    subplot_titles=df_1.columns[0:5])
# Add each variable to a separate subplot
for i, column in enumerate(df_1.columns[0:5]):
    row = (i // cols) + 1
    col = (i % cols) + 1
    fig.add_trace(go.Scatter(x=df_1.index, y=df_1[column], mode='lines', name=column), row=row, col=col)

# Customize layout
fig.update_layout(
    title="Multivariable Time Series with Grid Subplots",
    xaxis_title="Date",
    height=1300,  # Adjust height for overall layout
    showlegend=False,
    template="plotly_dark"
)
# Customize y-axis titles for each subplot
for i, column in enumerate(df_1.columns[0:5], start=1):
    fig['layout'][f'yaxis{i}']['title'] = column

# Show the figure
fig.show()

### Seasonal Decomposition
Seasonal decomposition separates a time series into three components: Trend, Seasonal, and Residual (Error). For multiple time series variables, this can be done by decomposing each variable individually and then plotting these components side by side in Plotly.

To perform seasonal decomposition in Python, you can use the seasonal_decompose function from the statsmodels library, and then visualize each component in a Plotly subplot. For seasonal decomposition, we have considered periods weekly, monthly and quarterly.

In [15]:
## Seasonal Decomposition with weekly period

from statsmodels.tsa.seasonal import seasonal_decompose
# Dictionary to store decomposition results
decomposed_data = {}

# Perform seasonal decomposition for each variable
for column in df_1.columns[:5]:
    decomposed = seasonal_decompose(df_1[column], model='additive', period=7)
    decomposed_data[column] = decomposed

# Set up the subplot grid
rows = 5 * 3  # 3 rows per variable (Trend, Seasonal, Residual)
cols = 1  # Only one column

# Create subplots
fig = make_subplots(
    rows=rows, cols=cols, shared_xaxes=False,
    vertical_spacing=0.02,
    subplot_titles=[f"{column} - {component}" for column in df_1.columns[ : 5] for component in ['Trend', 'Seasonal', 'Residual']]
)

# Add decomposed components to subplots
for var_idx, column in enumerate(df_1.columns[ : 5]):
    decomposed = decomposed_data[column]

    # Calculate the starting row for each variable
    start_row = var_idx * 3 + 1

    # Original time series
    #fig.add_trace(go.Scatter(x=df_1.index, y=df_1[column], mode='lines', name=f"{column}"),row=start_row, col=1)


    # Trend component
    fig.add_trace(go.Scatter(x=df_1.index, y=decomposed.trend, mode='lines', name=f"{column} Trend"),
                  row=start_row, col=1)

    # Seasonal component
    fig.add_trace(go.Scatter(x=df_1.index, y=decomposed.seasonal, mode='lines', name=f"{column} Seasonal"),
                  row=start_row + 1, col=1)

    # Residual component
    fig.add_trace(go.Scatter(x=df_1.index, y=decomposed.resid, mode='lines', name=f"{column} Residual"),
                  row=start_row + 2, col=1)

# Customize layout
fig.update_layout(
    title="Seasonal Decomposition of Multiple Items of Store 1",
    height=700 * len(df_1.columns[ : 5]),  # Adjust height based on number of variables
    template="plotly_dark",
    showlegend=False
)

# Customize y-axis titles for each subplot
for var_idx, column in enumerate(df_1.columns[ : 5]):
    start_row = var_idx * 3 + 1
    #fig.update_yaxes(title_text="column", row=start_row, col=1)
    fig.update_yaxes(title_text="Trend", row=start_row, col=1)
    fig.update_yaxes(title_text="Seasonal", row=start_row + 1, col=1)
    fig.update_yaxes(title_text="Residual", row=start_row + 2, col=1)

# Show the figure
fig.show()

In [16]:
df_reconstructed = pd.concat([decomposed.seasonal, decomposed.trend, decomposed.resid, decomposed.observed], axis = 1)
df_reconstructed.columns = ['seasonal', 'trend', 'resid', 'actual_values']
df_reconstructed.head(30)

Unnamed: 0_level_0,seasonal,trend,resid,actual_values
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2013-01-01,-1.374647,,,11.0
2013-01-02,-1.078493,,,6.0
2013-01-03,0.411617,,,8.0
2013-01-04,0.686892,9.428571,-1.115463,9.0
2013-01-05,1.922057,9.285714,-3.207771,8.0
2013-01-06,2.991287,9.428571,0.580141,13.0
2013-01-07,-3.558713,9.428571,5.130141,11.0
2013-01-08,-1.374647,9.571429,1.803218,10.0
2013-01-09,-1.078493,9.571429,-1.492936,7.0
2013-01-10,0.411617,9.285714,-1.697331,8.0


In [17]:
## Seasonal Decomposition with monthly period

from statsmodels.tsa.seasonal import seasonal_decompose
# Dictionary to store decomposition results
decomposed_data = {}

# Perform seasonal decomposition for each variable
for column in df_1.columns[:5]:
    decomposed = seasonal_decompose(df_1[column], model='additive', period=31)
    decomposed_data[column] = decomposed

# Set up the subplot grid
rows = 5 * 3  # 3 rows per variable (Trend, Seasonal, Residual)
cols = 1  # Only one column

# Create subplots
fig = make_subplots(
    rows=rows, cols=cols, shared_xaxes=False,
    vertical_spacing=0.02,
    subplot_titles=[f"{column} - {component}" for column in df_1.columns[ : 5] for component in ['Trend', 'Seasonal', 'Residual']]
)

# Add decomposed components to subplots
for var_idx, column in enumerate(df_1.columns[ : 5]):
    decomposed = decomposed_data[column]

    # Calculate the starting row for each variable
    start_row = var_idx * 3 + 1

    # Original time series
    #fig.add_trace(go.Scatter(x=df_1.index, y=df_1[column], mode='lines', name=f"{column}"),row=start_row, col=1)


    # Trend component
    fig.add_trace(go.Scatter(x=df_1.index, y=decomposed.trend, mode='lines', name=f"{column} Trend"),
                  row=start_row, col=1)

    # Seasonal component
    fig.add_trace(go.Scatter(x=df_1.index, y=decomposed.seasonal, mode='lines', name=f"{column} Seasonal"),
                  row=start_row + 1, col=1)

    # Residual component
    fig.add_trace(go.Scatter(x=df_1.index, y=decomposed.resid, mode='lines', name=f"{column} Residual"),
                  row=start_row + 2, col=1)

# Customize layout
fig.update_layout(
    title="Seasonal Decomposition of Multiple Items of Store 1",
    height=700 * len(df_1.columns[ : 5]),  # Adjust height based on number of variables
    template="plotly_dark",
    showlegend=False
)

# Customize y-axis titles for each subplot
for var_idx, column in enumerate(df_1.columns[ : 5]):
    start_row = var_idx * 3 + 1
    #fig.update_yaxes(title_text="column", row=start_row, col=1)
    fig.update_yaxes(title_text="Trend", row=start_row, col=1)
    fig.update_yaxes(title_text="Seasonal", row=start_row + 1, col=1)
    fig.update_yaxes(title_text="Residual", row=start_row + 2, col=1)

# Show the figure
fig.show()

In [18]:
## Seasonal Decomposition with quarterly period

from statsmodels.tsa.seasonal import seasonal_decompose
# Dictionary to store decomposition results
decomposed_data = {}

# Perform seasonal decomposition for each variable
for column in df_1.columns[:5]:
    decomposed = seasonal_decompose(df_1[column], model='additive', period=90)
    decomposed_data[column] = decomposed

# Set up the subplot grid
rows = 5 * 3  # 3 rows per variable (Trend, Seasonal, Residual)
cols = 1  # Only one column

# Create subplots
fig = make_subplots(
    rows=rows, cols=cols, shared_xaxes=False,
    vertical_spacing=0.02,
    subplot_titles=[f"{column} - {component}" for column in df_1.columns[ : 5] for component in ['Trend', 'Seasonal', 'Residual']]
)

# Add decomposed components to subplots
for var_idx, column in enumerate(df_1.columns[ : 5]):
    decomposed = decomposed_data[column]

    # Calculate the starting row for each variable
    start_row = var_idx * 3 + 1

    # Original time series
    #fig.add_trace(go.Scatter(x=df_1.index, y=df_1[column], mode='lines', name=f"{column}"),row=start_row, col=1)


    # Trend component
    fig.add_trace(go.Scatter(x=df_1.index, y=decomposed.trend, mode='lines', name=f"{column} Trend"),
                  row=start_row, col=1)

    # Seasonal component
    fig.add_trace(go.Scatter(x=df_1.index, y=decomposed.seasonal, mode='lines', name=f"{column} Seasonal"),
                  row=start_row + 1, col=1)

    # Residual component
    fig.add_trace(go.Scatter(x=df_1.index, y=decomposed.resid, mode='lines', name=f"{column} Residual"),
                  row=start_row + 2, col=1)

# Customize layout
fig.update_layout(
    title="Seasonal Decomposition of Multiple Items of Store 1",
    height=700 * len(df_1.columns[ : 5]),  # Adjust height based on number of variables
    template="plotly_dark",
    showlegend=False
)

# Customize y-axis titles for each subplot
for var_idx, column in enumerate(df_1.columns[ : 5]):
    start_row = var_idx * 3 + 1
    #fig.update_yaxes(title_text="column", row=start_row, col=1)
    fig.update_yaxes(title_text="Trend", row=start_row, col=1)
    fig.update_yaxes(title_text="Seasonal", row=start_row + 1, col=1)
    fig.update_yaxes(title_text="Residual", row=start_row + 2, col=1)

# Show the figure
fig.show()

In [19]:
## Seasonal Decomposition with yearly period

from statsmodels.tsa.seasonal import seasonal_decompose
# Dictionary to store decomposition results
decomposed_data = {}

# Perform seasonal decomposition for each variable
for column in df_1.columns[:5]:
    decomposed = seasonal_decompose(df_1[column], model='additive', period=365)
    decomposed_data[column] = decomposed

# Set up the subplot grid
rows = 5 * 3  # 3 rows per variable (Trend, Seasonal, Residual)
cols = 1  # Only one column

# Create subplots
fig = make_subplots(
    rows=rows, cols=cols, shared_xaxes=False,
    vertical_spacing=0.02,
    subplot_titles=[f"{column} - {component}" for column in df_1.columns[ : 5] for component in ['Trend', 'Seasonal', 'Residual']]
)

# Add decomposed components to subplots
for var_idx, column in enumerate(df_1.columns[ : 5]):
    decomposed = decomposed_data[column]

    # Calculate the starting row for each variable
    start_row = var_idx * 3 + 1

    # Original time series
    #fig.add_trace(go.Scatter(x=df_1.index, y=df_1[column], mode='lines', name=f"{column}"),row=start_row, col=1)


    # Trend component
    fig.add_trace(go.Scatter(x=df_1.index, y=decomposed.trend, mode='lines', name=f"{column} Trend"),
                  row=start_row, col=1)

    # Seasonal component
    fig.add_trace(go.Scatter(x=df_1.index, y=decomposed.seasonal, mode='lines', name=f"{column} Seasonal"),
                  row=start_row + 1, col=1)

    # Residual component
    fig.add_trace(go.Scatter(x=df_1.index, y=decomposed.resid, mode='lines', name=f"{column} Residual"),
                  row=start_row + 2, col=1)

# Customize layout
fig.update_layout(
    title="Seasonal Decomposition of Multiple Items of Store 1",
    height=700 * len(df_1.columns[ : 5]),  # Adjust height based on number of variables
    template="plotly_dark",
    showlegend=False
)

# Customize y-axis titles for each subplot
for var_idx, column in enumerate(df_1.columns[ : 5]):
    start_row = var_idx * 3 + 1
    #fig.update_yaxes(title_text="column", row=start_row, col=1)
    fig.update_yaxes(title_text="Trend", row=start_row, col=1)
    fig.update_yaxes(title_text="Seasonal", row=start_row + 1, col=1)
    fig.update_yaxes(title_text="Residual", row=start_row + 2, col=1)

# Show the figure
fig.show()

To compute and plot the Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) in Python with Plotly, you can use statsmodels to get the ACF and PACF values and then plotly.graph_objects to create the plots.

In [20]:
import numpy as np
from statsmodels.tsa.stattools import acf, pacf
# Calculate ACF and PACF
acf_values = acf(df_1["store1_item_1"], nlags=365)  # Adjust nlags as necessary
pacf_values = pacf(df_1["store1_item_1"], nlags=365)

# Create ACF plot
acf_fig = go.Figure()
acf_fig.add_trace(go.Scatter(x=np.arange(len(acf_values)), y=acf_values, name='ACF'))
acf_fig.update_layout(
    title="Autocorrelation Function (ACF)",
    xaxis_title="Lags",
    yaxis_title="ACF",
    template="plotly_white"
)
# Create PACF plot
pacf_fig = go.Figure()
pacf_fig.add_trace(go.Scatter(x=np.arange(len(pacf_values)), y=pacf_values, name='PACF'))
pacf_fig.update_layout(
    title="Partial Autocorrelation Function (PACF)",
    xaxis_title="Lags",
    yaxis_title="PACF",
    template="plotly_white"
)

# Show the plots
acf_fig.show()
pacf_fig.show()

In [23]:
def create_corr_plot(series, plot_pacf=False, nlags=None):
    corr_array = pacf(series.dropna(), alpha=0.05,nlags=nlags) if plot_pacf else acf(series.dropna(), alpha=0.05,nlags=nlags)
    lower_y = corr_array[1][:,0] - corr_array[0]
    upper_y = corr_array[1][:,1] - corr_array[0]

    fig = go.Figure()
    [fig.add_scatter(x=(x,x), y=(0,corr_array[0][x]), mode='lines',line_color='#3f3f3f')
     for x in range(len(corr_array[0]))]
    fig.add_scatter(x=np.arange(len(corr_array[0])), y=corr_array[0], mode='markers', marker_color='#1f77b4',
                   marker_size=6)
    fig.add_scatter(x=np.arange(len(corr_array[0])), y=upper_y, mode='lines', line_color='rgba(255,255,255,0)')
    fig.add_scatter(x=np.arange(len(corr_array[0])), y=lower_y, mode='lines',fillcolor='rgba(32, 146, 230,0.3)',
            fill='tonexty', line_color='rgba(255,255,255,0)')
    fig.update_traces(showlegend=False)
    fig.update_xaxes(range=[-1, nlags + 1])
    fig.update_yaxes(zerolinecolor='#000000')

    title='Partial Autocorrelation (PACF)' if plot_pacf else 'Autocorrelation (ACF)'
    fig.update_layout(title=title)
    fig.show()

In [24]:
create_corr_plot(df_1["store1_item_1"], plot_pacf=False,nlags=93)



In [25]:
create_corr_plot(df_1["store1_item_1"], plot_pacf=True,nlags=93)

In [26]:
create_corr_plot(df_1["store1_item_1"], plot_pacf=True,nlags=180)

Looks like AR(7) would be suitable for this time series. But we need to check the stationarity.

In [27]:
from statsmodels.tsa.stattools import adfuller
# Perform the ADF test
result = adfuller(df_1["store1_item_1"])

# Extract and display the test results
print("ADF Statistic:", result[0])
print("p-value:", result[1])
print("Critical Values:")
for key, value in result[4].items():
    print(f"   {key}: {value}")

# Interpret the p-value
if result[1] < 0.05:
    print("The time series is likely stationary (reject H0).")
else:
    print("The time series is likely non-stationary (fail to reject H0).")

ADF Statistic: -3.1576705563328042
p-value: 0.02256938062657153
Critical Values:
   1%: -3.4339840952648695
   5%: -2.8631452508003057
   10%: -2.567624583142913
The time series is likely stationary (reject H0).


**Check any other item of store 1**

In [28]:
create_corr_plot(df_1["store1_item_5"], plot_pacf=False,nlags=93)

In [29]:
create_corr_plot(df_1["store1_item_2"], plot_pacf=True,nlags=93)

In [30]:
from statsmodels.tsa.stattools import adfuller
# Perform the ADF test
result = adfuller(df_1["store1_item_2"])

# Extract and display the test results
print("ADF Statistic:", result[0])
print("p-value:", result[1])
print("Critical Values:")
for key, value in result[4].items():
    print(f"   {key}: {value}")

# Interpret the p-value
if result[1] < 0.05:
    print("The time series is likely stationary (reject H0).")
else:
    print("The time series is likely non-stationary (fail to reject H0).")

ADF Statistic: -3.1632412166455346
p-value: 0.02221347816461074
Critical Values:
   1%: -3.4339840952648695
   5%: -2.8631452508003057
   10%: -2.567624583142913
The time series is likely stationary (reject H0).


### Lets fit AR(7) model.

In [31]:
from statsmodels.tsa.ar_model import AutoReg
from sklearn.metrics import mean_squared_error
# Convert to pandas Series
data_series = pd.Series(df_1["store1_item_1"])
                             # Split the data into train and test sets
train_size = int(len(data_series) * 0.8)
train, test = data_series[:train_size], data_series[train_size:]
# Fit an AutoRegressive (AR) model
# Choose an appropriate lag value `p`. For example, let's use p=5
model = AutoReg(train, lags=7).fit()

# Generate predictions
predictions = model.predict(start=len(train), end=len(data_series)-1, dynamic=False)
# Calculate error (optional)
error = mean_squared_error(test, predictions)
print(f"Test MSE: {error}")

# Plot the observed and predicted values
fig = go.Figure()

# Plot observed values
fig.add_trace(go.Scatter(x=data_series.index, y=data_series, mode='lines', name='Observed'))

# Plot predicted values
fig.add_trace(go.Scatter(x=test.index, y=predictions, mode='lines', name='Predicted'))

# Update plot layout
fig.update_layout(
    title="Observed vs Predicted Time Series",
    xaxis_title="Time",
    yaxis_title="Value",
    template="plotly_white"
)

fig.show()


import warnings

warnings.filterwarnings("ignore", category=UserWarning)





No frequency information was provided, so inferred frequency D will be used.


Only PeriodIndexes, DatetimeIndexes with a frequency set, RangesIndexes, and Index with a unit increment support extending. The index is set will contain the position relative to the data length.



Test MSE: 54.081146963781656


## Use Prophet to predict the time series.

$Facebook Prophet$ (now simply known as "Prophet") is an open-source forecasting tool developed by Facebook's Core Data Science team. Designed for time series forecasting, it is especially useful for handling data with strong seasonal effects and multiple seasonalities, such as daily, weekly, and yearly trends. Prophet is known for its flexibility in automatically detecting holidays and special events, making it suitable for real-world business scenarios with irregular patterns. With its intuitive interface, Prophet enables analysts and data scientists to quickly build robust forecasting models without deep knowledge of complex statistical modeling. It is implemented in both Python and R, allowing for seamless integration with popular data science workflows.

Here are the main default parameters in Facebook Prophet that influence model behavior and flexibility:

**Growth: 'linear'**

Determines the type of trend. Defaults to 'linear', but can be set to 'logistic' for saturating growth.
Seasonality:

**Yearly Seasonality:** Enabled by default with a fourier_order of 10.

**Weekly Seasonality:** Enabled by default with a fourier_order of 3.

**Daily Seasonality:** Disabled by default but can be enabled with a fourier_order of 4.

**Holidays:** Disabled by default

Users can specify holiday effects by providing a list of holiday dates.

**Changepoint Range: 0.8**

The proportion of data (from the start) in which Prophet will place potential changepoints. Default is 0.8, meaning changepoints are considered in the first 80% of the data.
Changepoint Prior Scale: 0.05

Controls the flexibility of the trend. Lower values (e.g., 0.01) create a smoother trend, while higher values (e.g., 0.5) allow more flexibility.
Interval Width: 0.80

Defines the uncertainty interval for forecasts. By default, it produces an 80% prediction interval.

Uncertainty Samples: 1000

The number of simulations Prophet runs to estimate forecast uncertainty.

**Seasonality Mode: 'additive'**

Can be set to 'additive' or 'multiplicative', depending on the data's seasonal behavior.
These default settings make Prophet a versatile tool for common forecasting tasks, but they can be fine-tuned to capture more complex time series behavior.

In [32]:
## Prophet with  default seasonality( 365 days)

import numpy as np
import pandas as pd
from prophet import Prophet
import plotly.graph_objects as go
from sklearn.metrics import mean_squared_error,mean_absolute_error, mean_absolute_percentage_error



# Create a DataFrame with columns 'ds' and 'y' for Prophet
df = pd.DataFrame({"ds": df_1.index, "y":df_1["store1_item_1"]})

# Split the data into train and test sets
train_size = int(len(df) * 0.8)
train_df = df[:train_size]
test_df = df[train_size:]

# Initialize and fit the Prophet model
model = Prophet(interval_width=0.95) #by default is 80%
model.fit(train_df)

# Make future predictions
future = model.make_future_dataframe(periods=len(test_df), freq="D")
forecast = model.predict(future)

# Calculate the MSE for the test set
test_forecast = forecast[-len(test_df):]
error1 = mean_squared_error(test_df['y'], test_forecast['yhat'])
error2=mean_absolute_percentage_error(test_df['y'], test_forecast['yhat'])
print(f"Test MSE: {error1}")
print(f"Test MAPE: {error2}")

# Plot observed and predicted values
fig = go.Figure()

# Plot observed (train + test) values
fig.add_trace(go.Scatter(x=df['ds'], y=df['y'], mode='lines', name='Observed'))

# Plot predicted values
fig.add_trace(go.Scatter(x=test_forecast['ds'], y=test_forecast['yhat'], mode='lines', name='Predicted'))

# Update plot layout
fig.update_layout(
    title="Observed vs Predicted Time Series (Prophet)",
    xaxis_title="Date",
    yaxis_title="Value",
    template="plotly_white"
)

fig.show()

INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
DEBUG:cmdstanpy:input tempfile: /tmp/tmpzejfi0lw/me8_unrp.json
DEBUG:cmdstanpy:input tempfile: /tmp/tmpzejfi0lw/w8pm5wg8.json
DEBUG:cmdstanpy:idx 0
DEBUG:cmdstanpy:running CmdStan, num_threads: None
DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.10/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=98871', 'data', 'file=/tmp/tmpzejfi0lw/me8_unrp.json', 'init=/tmp/tmpzejfi0lw/w8pm5wg8.json', 'output', 'file=/tmp/tmpzejfi0lw/prophet_modelo2sd3q8j/prophet_model-20250109171330.csv', 'method=optimize', 'algorithm=lbfgs', 'iter=10000']
17:13:30 - cmdstanpy - INFO - Chain [1] start processing
INFO:cmdstanpy:Chain [1] start processing
17:13:30 - cmdstanpy - INFO - Chain [1] done processing
INFO:cmdstanpy:Chain [1] done processing


Test MSE: 25.088016214302236
Test MAPE: 0.22613043015454443


In [33]:
# Plot observed and predicted values
fig = go.Figure()

# Plot observed (train + test) values
fig.add_trace(go.Scatter(x=df[train_size:]['ds'], y=df[train_size:]['y'], mode='lines', name='Observed'))
# Plot predicted values
fig.add_trace(go.Scatter(x=test_forecast['ds'], y=test_forecast['yhat'], mode='lines', name='Predicted'))

# Update plot layout
fig.update_layout(
    title="Observed vs Predicted Time Series (Prophet)",
    xaxis_title="Date",
    yaxis_title="Value",
    template="plotly_white"
)

fig.show()

Improving a Prophet model for better time series forecasting involves several strategies, including tuning model parameters, adding additional regressors, adjusting seasonalities, and making use of domain-specific knowledge. Here are some effective approaches:

## 1. Tune the Growth Model
Prophet can handle linear and logistic growth models. By default, it uses linear growth, but if your data shows saturation effects, switching to a logistic growth model with a defined capacity can improve accuracy.
Example:

model = Prophet(growth='logistic')

df['cap'] = 100  # Set the upper limit (capacity)

model.fit(df)


## 2. Add Seasonality Components
By default, Prophet detects yearly seasonality (for datasets with sufficient data) and weekly seasonality. However, you can manually add additional seasonal components (e.g., quarterly or custom seasonalities) if your data exhibits more complex patterns.


model = Prophet()

model.add_seasonality(name='quarterly', period=90.5, fourier_order=8)  # Adjust period and fourier_order as needed

model.fit(df)



## 3. Use Additional Regressors
Prophet allows for adding external regressors that may influence your forecast. For example, adding information on holidays, special events, or external factors like temperature, promotions, or economic indicators can improve accuracy.


df['regressor'] = external_data  # external_data is an array or series of additional data

model = Prophet()

model.add_regressor('regressor')

model.fit(df)

## 4. Incorporate Holiday Effects
Prophet has a built-in feature to account for holiday effects, which can be especially useful if the time series is affected by significant events or holidays. Prophet supports predefined holiday lists for various countries.


from prophet.make_holidays import make_holidays_df

model = Prophet(holidays=make_holidays_df('US', start='2020-01-01', end='2023-01-01'))

model.fit(df)


## 5. Adjust Changepoints and Flexibility
Changepoints allow the model to detect shifts in trend. Prophet automatically selects changepoints, but sometimes fine-tuning them helps. You can increase n_changepoints or adjust changepoint_prior_scale to allow more or less flexibility in trend shifts.

model = Prophet(changepoint_prior_scale=0.05)  # Higher value increases flexibility

model.fit(df)


## 6. Increase Fourier Order for Seasonalities
Increasing the fourier_order for seasonalities allows the model to capture more complex seasonal patterns, but it also increases the risk of overfitting. Experiment with higher values if your seasonality has more intricate patterns.


model = Prophet(yearly_seasonality=False)

model.add_seasonality(name='yearly', period=365.25, fourier_order=20)  # Custom yearly seasonality with higher complexity

model.fit(df)


## 7. Tune Seasonality and Trend Flexibility Using Cross-Validation
Prophet has a built-in cross-validation function that can help you assess the model’s performance with different tuning parameters and identify the best configuration.

from prophet.diagnostics import cross_validation, performance_metrics

df_cv = cross_validation(model, initial='730 days', period='180 days', horizon='365 days')

df_p = performance_metrics(df_cv)

print(df_p.head())


## 8. Increase Forecast Accuracy with Hyperparameter Tuning
Systematic tuning of changepoint_prior_scale, seasonality_prior_scale, holidays_prior_scale, and fourier_order for seasonalities can help find the best model. You could automate this using a search method (e.g., grid search) or use libraries like optuna for optimization.

## 9. Ensure Quality of Data
Prophet is sensitive to outliers and missing data, which can negatively impact model performance. Handle any missing values, outliers, or irregular frequency before modeling.
By carefully tuning the Prophet model using these strategies, you can often capture more of the inherent patterns in your data and improve forecast accuracy.


## 1. Tune the Growth Model
cap=55

In [34]:
# Create a DataFrame with columns 'ds' and 'y' for Prophet
df = pd.DataFrame({"ds": df_1.index, "y":df_1["store1_item_1"]})
df['cap'] = 50
# Split the data into train and test sets
train_size = int(len(df) * 0.8)
train_df = df[:train_size]
test_df = df[train_size:]

# Initialize and fit the Prophet model
model = Prophet(growth='logistic')
model.fit(train_df)

# Make future predictions
future = model.make_future_dataframe(periods=len(test_df), freq="D")
future['cap'] = 50
forecast = model.predict(future)

# Calculate the MSE for the test set
test_forecast = forecast[-len(test_df):]
error1 = mean_squared_error(test_df['y'], test_forecast['yhat'])
error2=mean_absolute_percentage_error(test_df['y'], test_forecast['yhat'])
print(f"Test MSE: {error1}")
print(f"Test MAPE: {error2}")
# Plot observed and predicted values
fig = go.Figure()

# Plot observed (train + test) values
fig.add_trace(go.Scatter(x=df['ds'], y=df['y'], mode='lines', name='Observed'))

# Plot predicted values
fig.add_trace(go.Scatter(x=test_forecast['ds'], y=test_forecast['yhat'], mode='lines', name='Predicted'))

# Update plot layout
fig.update_layout(
    title="Observed vs Predicted Time Series (Prophet)",
    xaxis_title="Date",
    yaxis_title="Value",
    template="plotly_white"
)

fig.show()

INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
DEBUG:cmdstanpy:input tempfile: /tmp/tmpzejfi0lw/uoqertea.json
DEBUG:cmdstanpy:input tempfile: /tmp/tmpzejfi0lw/h3yt9y1s.json
DEBUG:cmdstanpy:idx 0
DEBUG:cmdstanpy:running CmdStan, num_threads: None
DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.10/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=9358', 'data', 'file=/tmp/tmpzejfi0lw/uoqertea.json', 'init=/tmp/tmpzejfi0lw/h3yt9y1s.json', 'output', 'file=/tmp/tmpzejfi0lw/prophet_modelnx0fzvha/prophet_model-20250109171521.csv', 'method=optimize', 'algorithm=lbfgs', 'iter=10000']
17:15:21 - cmdstanpy - INFO - Chain [1] start processing
INFO:cmdstanpy:Chain [1] start processing
17:15:21 - cmdstanpy - INFO - Chain [1] done processing
INFO:cmdstanpy:Chain [1] done processing


Test MSE: 26.216609920805624
Test MAPE: 0.23456089990818352


### **2. Add Seasonality Components**

In [35]:
#### period=365, fourier_order=8,growth='logistic'#####
#####################################
# Create a DataFrame with columns 'ds' and 'y' for Prophet
df = pd.DataFrame({"ds": df_1.index, "y":df_1["store1_item_1"]})
df['cap'] = 50
# Split the data into train and test sets
train_size = int(len(df) * 0.8)
train_df = df[:train_size]
test_df = df[train_size:]

# Initialize and fit the Prophet model
model = Prophet(growth='logistic')

model= model.add_seasonality(name="yearly", period=365, fourier_order=8) # Adjust period and fourier_order as needed
model.add_country_holidays("US")
model.fit(train_df)

# Make future predictions
future = model.make_future_dataframe(periods=len(test_df), freq="D")
future['cap'] = 50
forecast = model.predict(future)

# Calculate the MSE for the test set
test_forecast = forecast[-len(test_df):]
error1 = mean_squared_error(test_df['y'], test_forecast['yhat'])
error2=mean_absolute_percentage_error(test_df['y'], test_forecast['yhat'])
print(f"Test MSE: {error1}")
print(f"Test MAPE: {error2}")

# Plot observed and predicted values
fig = go.Figure()

# Plot observed (train + test) values
fig.add_trace(go.Scatter(x=df['ds'], y=df['y'], mode='lines', name='Observed'))

# Plot predicted values
fig.add_trace(go.Scatter(x=test_forecast['ds'], y=test_forecast['yhat'], mode='lines', name='Predicted'))

# Update plot layout
fig.update_layout(
    title="Observed vs Predicted Time Series (Prophet)",
    xaxis_title="Date",
    yaxis_title="Value",
    template="plotly_white"
)

fig.show()

INFO:prophet:Found custom seasonality named 'yearly', disabling built-in 'yearly' seasonality.
INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
DEBUG:cmdstanpy:input tempfile: /tmp/tmpzejfi0lw/78unuid6.json
DEBUG:cmdstanpy:input tempfile: /tmp/tmpzejfi0lw/_puki6ty.json
DEBUG:cmdstanpy:idx 0
DEBUG:cmdstanpy:running CmdStan, num_threads: None
DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.10/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=83244', 'data', 'file=/tmp/tmpzejfi0lw/78unuid6.json', 'init=/tmp/tmpzejfi0lw/_puki6ty.json', 'output', 'file=/tmp/tmpzejfi0lw/prophet_modell3vl7bu9/prophet_model-20250109171641.csv', 'method=optimize', 'algorithm=lbfgs', 'iter=10000']
17:16:41 - cmdstanpy - INFO - Chain [1] start processing
INFO:cmdstanpy:Chain [1] start processing
17:16:41 - cmdstanpy - INFO - Chain [1] done processing
INFO:cmdstanpy:Chain [1] done processing


Test MSE: 26.3737800265772
Test MAPE: 0.23646009680222918


In [36]:
#### Quarterly period=93, fourier_order=8,growth='logistic'#####
#####################################
# Create a DataFrame with columns 'ds' and 'y' for Prophet
df = pd.DataFrame({"ds": df_1.index, "y":df_1["store1_item_1"]})
df['cap'] = 50
# Split the data into train and test sets
train_size = int(len(df) * 0.8)
train_df = df[:train_size]
test_df = df[train_size:]

# Initialize and fit the Prophet model
model = Prophet(growth='logistic')

model= model.add_seasonality(name="quarterly", period=93, fourier_order=8) # Adjust period and fourier_order as needed
model.add_country_holidays("US")
model.fit(train_df)

# Make future predictions
future = model.make_future_dataframe(periods=len(test_df), freq="D")
future['cap'] = 50
forecast = model.predict(future)

# Calculate the MSE for the test set
test_forecast = forecast[-len(test_df):]
error1 = mean_squared_error(test_df['y'], test_forecast['yhat'])
error2=mean_absolute_percentage_error(test_df['y'], test_forecast['yhat'])
print(f"Test MSE: {error1}")
print(f"Test MAPE: {error2}")

# Plot observed and predicted values
fig = go.Figure()

# Plot observed (train + test) values
fig.add_trace(go.Scatter(x=df['ds'], y=df['y'], mode='lines', name='Observed'))

# Plot predicted values
fig.add_trace(go.Scatter(x=test_forecast['ds'], y=test_forecast['yhat'], mode='lines', name='Predicted'))

# Update plot layout
fig.update_layout(
    title="Observed vs Predicted Time Series (Prophet)",
    xaxis_title="Date",
    yaxis_title="Value",
    template="plotly_white"
)

fig.show()

INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
DEBUG:cmdstanpy:input tempfile: /tmp/tmpzejfi0lw/ihypl0mg.json
DEBUG:cmdstanpy:input tempfile: /tmp/tmpzejfi0lw/p7o7qye4.json
DEBUG:cmdstanpy:idx 0
DEBUG:cmdstanpy:running CmdStan, num_threads: None
DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.10/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=38918', 'data', 'file=/tmp/tmpzejfi0lw/ihypl0mg.json', 'init=/tmp/tmpzejfi0lw/p7o7qye4.json', 'output', 'file=/tmp/tmpzejfi0lw/prophet_model3sypp5nn/prophet_model-20250109171715.csv', 'method=optimize', 'algorithm=lbfgs', 'iter=10000']
17:17:15 - cmdstanpy - INFO - Chain [1] start processing
INFO:cmdstanpy:Chain [1] start processing
17:17:16 - cmdstanpy - INFO - Chain [1] done processing
INFO:cmdstanpy:Chain [1] done processing


Test MSE: 27.215334960545693
Test MAPE: 0.23779796583939983


In [37]:
#### Monthly period=31, fourier_order=5,growth='logistic'#####
#####################################
# Create a DataFrame with columns 'ds' and 'y' for Prophet
df = pd.DataFrame({"ds": df_1.index, "y":df_1["store1_item_1"]})
df['cap'] = 50
# Split the data into train and test sets
train_size = int(len(df) * 0.8)
train_df = df[:train_size]
test_df = df[train_size:]

# Initialize and fit the Prophet model
model = Prophet(growth='logistic')

model= model.add_seasonality(name="monthly", period=31, fourier_order=5) # Adjust period and fourier_order as needed
model.add_country_holidays("US")
model.fit(train_df)

# Make future predictions
future = model.make_future_dataframe(periods=len(test_df), freq="D")
future['cap'] = 50
forecast = model.predict(future)

# Calculate the MSE for the test set
test_forecast = forecast[-len(test_df):]
error1 = mean_squared_error(test_df['y'], test_forecast['yhat'])
error2=mean_absolute_percentage_error(test_df['y'], test_forecast['yhat'])
print(f"Test MSE: {error1}")
print(f"Test MAPE: {error2}")

# Plot observed and predicted values
fig = go.Figure()

# Plot observed (train + test) values
fig.add_trace(go.Scatter(x=df['ds'], y=df['y'], mode='lines', name='Observed'))

# Plot predicted values
fig.add_trace(go.Scatter(x=test_forecast['ds'], y=test_forecast['yhat'], mode='lines', name='Predicted'))

# Update plot layout
fig.update_layout(
    title="Observed vs Predicted Time Series (Prophet)",
    xaxis_title="Date",
    yaxis_title="Value",
    template="plotly_white"
)

fig.show()

INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
DEBUG:cmdstanpy:input tempfile: /tmp/tmpzejfi0lw/pufx1ryy.json
DEBUG:cmdstanpy:input tempfile: /tmp/tmpzejfi0lw/yyja2qi8.json
DEBUG:cmdstanpy:idx 0
DEBUG:cmdstanpy:running CmdStan, num_threads: None
DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.10/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=19193', 'data', 'file=/tmp/tmpzejfi0lw/pufx1ryy.json', 'init=/tmp/tmpzejfi0lw/yyja2qi8.json', 'output', 'file=/tmp/tmpzejfi0lw/prophet_model1xb0x0d_/prophet_model-20250109171743.csv', 'method=optimize', 'algorithm=lbfgs', 'iter=10000']
17:17:43 - cmdstanpy - INFO - Chain [1] start processing
INFO:cmdstanpy:Chain [1] start processing
17:17:44 - cmdstanpy - INFO - Chain [1] done processing
INFO:cmdstanpy:Chain [1] done processing


Test MSE: 26.407634327041976
Test MAPE: 0.23506649009000857


In [38]:
#### Weekly period=7, fourier_order=5,growth='logistic'#####
#####################################
# Create a DataFrame with columns 'ds' and 'y' for Prophet
df = pd.DataFrame({"ds": df_1.index, "y":df_1["store1_item_1"]})
df['cap'] = 50
# Split the data into train and test sets
train_size = int(len(df) * 0.8)
train_df = df[:train_size]
test_df = df[train_size:]

# Initialize and fit the Prophet model
model = Prophet(growth='logistic')

model= model.add_seasonality(name="weekly", period=7, fourier_order=5) # Adjust period and fourier_order as needed
model.add_country_holidays("US")
model.fit(train_df)

# Make future predictions
future = model.make_future_dataframe(periods=len(test_df), freq="D")
future['cap'] = 50
forecast = model.predict(future)

# Calculate the MSE for the test set
test_forecast = forecast[-len(test_df):]
error1 = mean_squared_error(test_df['y'], test_forecast['yhat'])
error2=mean_absolute_percentage_error(test_df['y'], test_forecast['yhat'])
print(f"Test MSE: {error1}")
print(f"Test MAPE: {error2}")

# Plot observed and predicted values
fig = go.Figure()

# Plot observed (train + test) values
fig.add_trace(go.Scatter(x=df['ds'], y=df['y'], mode='lines', name='Observed'))

# Plot predicted values
fig.add_trace(go.Scatter(x=test_forecast['ds'], y=test_forecast['yhat'], mode='lines', name='Predicted'))

# Update plot layout
fig.update_layout(
    title="Observed vs Predicted Time Series (Prophet)",
    xaxis_title="Date",
    yaxis_title="Value",
    template="plotly_white"
)

fig.show()

INFO:prophet:Found custom seasonality named 'weekly', disabling built-in 'weekly' seasonality.
INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
DEBUG:cmdstanpy:input tempfile: /tmp/tmpzejfi0lw/stp2gcz1.json
DEBUG:cmdstanpy:input tempfile: /tmp/tmpzejfi0lw/5vax9nhf.json
DEBUG:cmdstanpy:idx 0
DEBUG:cmdstanpy:running CmdStan, num_threads: None
DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.10/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=74665', 'data', 'file=/tmp/tmpzejfi0lw/stp2gcz1.json', 'init=/tmp/tmpzejfi0lw/5vax9nhf.json', 'output', 'file=/tmp/tmpzejfi0lw/prophet_modelwtxcfo5j/prophet_model-20250109171816.csv', 'method=optimize', 'algorithm=lbfgs', 'iter=10000']
17:18:16 - cmdstanpy - INFO - Chain [1] start processing
INFO:cmdstanpy:Chain [1] start processing
17:18:17 - cmdstanpy - INFO - Chain [1] done processing
INFO:cmdstanpy:Chain [1] done processing


Test MSE: 26.328979063345596
Test MAPE: 0.23532009810426166


In [39]:
## Multiplicative is not good.########################
###########################################################

# Create a DataFrame with columns 'ds' and 'y' for Prophet
df = pd.DataFrame({"ds": df_1.index, "y":df_1["store1_item_1"]})
## Creating model parameters
model_param ={
    "daily_seasonality": False,
    "weekly_seasonality":True,
    "yearly_seasonality":False,
    "seasonality_mode": "multiplicative",
    "growth": "logistic"
}
df['cap'] = df["y"].max() + df["y"].std() * 0.05
# Split the data into train and test sets
train_size = int(len(df) * 0.8)
train_df = df[:train_size]
test_df = df[train_size:]

# Initialize and fit the Prophet model
model = Prophet(**model_param)
model.fit(train_df)

# Make future predictions
future = model.make_future_dataframe(periods=len(test_df), freq="D")
future['cap'] = 50
forecast = model.predict(future)

# Calculate the MSE for the test set
test_forecast = forecast[-len(test_df):]
error1 = mean_squared_error(test_df['y'], test_forecast['yhat'])
error2=mean_absolute_percentage_error(test_df['y'], test_forecast['yhat'])
print(f"Test MSE: {error1}")
print(f"Test MAPE: {error2}")
# Plot observed and predicted values
fig = go.Figure()

# Plot observed (train + test) values
fig.add_trace(go.Scatter(x=df['ds'], y=df['y'], mode='lines', name='Observed'))

# Plot predicted values
fig.add_trace(go.Scatter(x=test_forecast['ds'], y=test_forecast['yhat'], mode='lines', name='Predicted'))

# Update plot layout
fig.update_layout(
    title="Observed vs Predicted Time Series (Prophet)",
    xaxis_title="Date",
    yaxis_title="Value",
    template="plotly_white"
)

fig.show()

DEBUG:cmdstanpy:input tempfile: /tmp/tmpzejfi0lw/7wgyxwdp.json
DEBUG:cmdstanpy:input tempfile: /tmp/tmpzejfi0lw/wuonzhk7.json
DEBUG:cmdstanpy:idx 0
DEBUG:cmdstanpy:running CmdStan, num_threads: None
DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.10/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=49998', 'data', 'file=/tmp/tmpzejfi0lw/7wgyxwdp.json', 'init=/tmp/tmpzejfi0lw/wuonzhk7.json', 'output', 'file=/tmp/tmpzejfi0lw/prophet_modelmx9ui591/prophet_model-20250109171837.csv', 'method=optimize', 'algorithm=lbfgs', 'iter=10000']
17:18:37 - cmdstanpy - INFO - Chain [1] start processing
INFO:cmdstanpy:Chain [1] start processing
17:18:38 - cmdstanpy - INFO - Chain [1] done processing
INFO:cmdstanpy:Chain [1] done processing


Test MSE: 41.81820465237015
Test MAPE: 0.2988278253007243


## **Conclusion**
Even though the time series data appears stationary, and the ACF and PACF plots suggest selecting an AR(7) model, FB Prophet outperforms the AR models. Prophet’s flexibility in capturing seasonality, holidays, and trend changes enables it to deliver more accurate forecasts in this case, making it a powerful alternative to traditional AR models for time series forecasting.