# Loading Libraries

What is SARIMAX? Seasonal Auto-Regressive Integrated Moving Average with eXogenous factors, or SARIMAX, is an extension of the ARIMA class of models.

SARIMAX is used on data sets that have seasonal cycles. The difference between ARIMA and SARIMAX is the seasonality and exogenous factors (seasonality and regular ARIMA don't mix well).

Prophet is a forecasting procedure implemented in R and Python. It is fast and provides completely automated forecasts that can be tuned by hand by data scientists and analysts. Install Prophet. Get started in R. Get started in Python.

In [None]:
import numpy as np
import pandas as pd

import plotly.express as px

from prophet import Prophet

import statsmodels.api as sm
from statsmodels.tsa.statespace.sarimax import SARIMAX

# Reading Files

In [None]:
# Get train
train_df = pd.read_csv('/kaggle/input/kaggle-pog-series-s01e04/train.csv')
train_df['date'] = pd.to_datetime(train_df['date'])

# Get test
test_df = pd.read_csv('/kaggle/input/kaggle-pog-series-s01e04/test.csv')
test_df['date'] = pd.to_datetime(test_df['date'])

# Get submission
submission_df = pd.read_csv('/kaggle/input/kaggle-pog-series-s01e04/sample_submission.csv')
submission_df['date'] = pd.to_datetime(submission_df['date'])

In [None]:
train_df.head()

Unnamed: 0,date,sleep_hours
0,2015-02-19,6.4
1,2015-02-20,7.583333
2,2015-02-21,6.35
3,2015-02-22,6.5
4,2015-02-23,8.916667


In [None]:
test_df.head()

Unnamed: 0,date,sleep_hours
0,2022-01-01,1
1,2022-01-02,1
2,2022-01-03,1
3,2022-01-04,1
4,2022-01-05,1


# Data Transformation

In [None]:
fig = px.line(x=train_df.date, y=train_df.sleep_hours)
fig.show()

In [None]:
# Procces outliers
train_mod = train_df.copy()
train_mod['sleep_hours'].loc[(train_mod["date"] >= "2017-09-27") & (train_mod["date"] <= "2018-06-12")] = train_mod['sleep_hours'] / 1.94



A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



In [None]:
fig = px.line(x=train_mod.date, y=train_mod.sleep_hours)
fig.show()

In [None]:
# Remove outliers
# Calculate the IQR
Q1 = train_mod['sleep_hours'].quantile(0.25)
Q3 = train_mod['sleep_hours'].quantile(0.75)
IQR = Q3 - Q1

# Define the lower and upper bounds
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR

# Remove outliers
train_mod = train_mod[(train_mod['sleep_hours'] >= lower_bound) & (train_mod['sleep_hours'] <= upper_bound)]

In [None]:
fig = px.line(x=train_mod.date, y=train_mod.sleep_hours)
fig.show()

# Modeling 

In [None]:
# Predict - Quantile
submission_df['quantile'] = np.full((submission_df.shape[0], 1),train_mod.dropna().quantile(0.54))

##################################################################################################

# Predict - SARIMAX
df = pd.concat([train_mod, test_df])
arima_model = SARIMAX(train_mod['sleep_hours'], order=(0, 1, 1), seasonal_order=(0, 0, 0, 12))
arima_result = arima_model.fit()
arima_pred = arima_result.predict(start=len(train_mod), end=len(df)-1, typ="levels")

submission_df['sarimax'] = arima_pred.values

##################################################################################################

# Predict - Prophet
m = Prophet(growth="linear",
            yearly_seasonality=True,
            weekly_seasonality=True,
            daily_seasonality=True)
m.add_country_holidays(country_name='US')

train_mod_prophet = train_mod.copy()
train_mod_prophet = train_mod_prophet.rename(columns={"date": "ds", "sleep_hours": "y"})
m.fit(train_mod_prophet)

submission_df_prophet = submission_df.copy()
submission_df_prophet = submission_df_prophet.rename(columns={"date": "ds"})
forecast = m.predict(submission_df_prophet[["ds"]])
submission_df['prophet'] = forecast['trend'].copy()

##################################################################################################


An unsupported index was provided and will be ignored when e.g. forecasting.


An unsupported index was provided and will be ignored when e.g. forecasting.

 This problem is unconstrained.


RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 2.220D-16
 N =            2     M =           10

At X0         0 variables are exactly at the bounds

At iterate    0    f=  1.19122D+00    |proj g|=  1.05044D-01

At iterate    5    f=  1.14017D+00    |proj g|=  1.49685D-02

At iterate   10    f=  1.13603D+00    |proj g|=  1.79572D-03

           * * *

Tit   = total number of iterations
Tnf   = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip  = number of BFGS updates skipped
Nact  = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F     = final function value

           * * *

   N    Tit     Tnf  Tnint  Skip  Nact     Projg        F
    2     12     15      1     0     0   6.533D-07   1.136D+00
  F =   1.1360306522627714     

CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL            



No supported index is available. Prediction results will be given with an integer index beginning at `start`.

08:40:06 - cmdstanpy - INFO - Chain [1] start processing
08:40:06 - cmdstanpy - INFO - Chain [1] done processing


# Submission

In [None]:
submission_df['sleep_hours'] = submission_df[['prophet', 'quantile', 'sarimax']].mean(axis=1)

sub = submission_df[['date', 'sleep_hours']]

# Create submission
sub.to_csv('submission.csv', index = False)