I will first collect Google stock price data using the Yahoo Finance API:

In [None]:
import pandas as pd
import yfinance as yf
from datetime import date, timedelta

In [None]:
end_date = date.today().strftime('%Y-%m-%d')
start_date = (date.today() - timedelta(days=365)).strftime('%Y-%m-%d')

data = yf.download('GOOG',
                   start=start_date,
                   end=end_date,
                   progress=False)
data['Date'] = data.index
data = data[['Date', 'Open', 'High', 'Low', 'Close', 'Adj Close', 'Volume']]
data.reset_index(drop=True, inplace=True)
display(data.head())

I only need the date and close prices columns for the rest of the task, so let’s select both the columns and move further:

In [None]:
data = data[['Date', 'Close']]
display(data.head())

Now let’s visualize the close prices of Google before moving forward:

In [None]:
import matplotlib.pyplot as plt

In [None]:
plt.style.use('Solarize_Light2')
plt.figure(figsize=(15, 10))
plt.plot(data['Date'], data['Close'], label='Closing Prices')
plt.xlabel('Date')
plt.ylabel('Closing Price ($)')
plt.title('Google Stock Prices')
plt.legend()
plt.show()


Before using the ARIMA model, we have to figure out whether our data is stationary or seasonal. The data visualization graph about the closing stock prices above shows that our dataset is not stationary. To check whether our dataset is stationary or seasonal properly, we can use the **seasonal decomposition** method that splits the time series data into trend, seasonal, and residuals for a better understanding of the time series data:

In [None]:
from statsmodels.tsa.seasonal import seasonal_decompose

In [None]:
result = seasonal_decompose(data['Close'], 
                            model='multiplicative', period=30)
fig = plt.figure()  
fig = result.plot()  
fig.set_size_inches(15, 10)

To determine if the time series is stationary or seasonal, we can examine the residual component of the decomposition. If the residual component appears to be random noise with a constant mean and variance, then the time series can be considered stationary. On the other hand, if the residual component exhibits a repeating pattern over time, then the time series may have a seasonal component.

So my data is not stationary it is seasonal, I need to use the Seasonal ARIMA (SARIMA) model for Time Series Forecasting on this data, but before using the SARIMA model, I will use the ARIMA model.

To use ARIMA or SARIMA, I need to find the p, d, and q values. I can find the value of p by plotting the autocorrelation of the Close column and the value of q by plotting the partial autocorrelation plot, the value of d is either 0 or 1. If the data is stationary, I should use 0, and if the data is seasonal, I should use 1. As my data is seasonal, I should use 1 as the d value.

Now here’s how to find the value of p:

In [None]:
plt.figure(figsize=(10, 6))
pd.plotting.autocorrelation_plot(data['Close'], color='blue', linestyle='--')
plt.xlabel('Lag (days)')
plt.ylabel('Autocorrelation')
plt.title('Autocorrelation Plot')
plt.show()

In the above autocorrelation plot, the curve is moving down after the 5th line of the first boundary. That is how to decide the p-value. Hence the value of p is 5. Now let’s find the value of q (moving average):

In [None]:
from statsmodels.graphics.tsaplots import plot_pacf
plot_pacf(data['Close'])

In the above partial autocorrelation plot, we can see that only two points are far away from all the points. That is how to decide the q value. Hence the value of q is 2. Now let’s build an ARIMA model:

In [None]:
p, d, q = 5, 1, 2
from statsmodels.tsa.arima.model import ARIMA

model = ARIMA(data['Close'], order=(p,d,q))
fitted_model = model.fit()
display(fitted_model.summary())

Now we predict the values using the ARIMA model

In [None]:
predictions = fitted_model.predict()
display(predictions)

The predicted values are wrong because the data is seasonal. ARIMA model will never perform well on seasonal time series data. So, here’s how to build a SARIMA model:

In [None]:
import statsmodels.api as sm
import warnings
model=sm.tsa.statespace.SARIMAX(data['Close'],
                                order=(p, d, q),
                                seasonal_order=(p, d, q, 12))
model=model.fit()
display(model.summary())

Now let’s predict the future stock prices using the SARIMA model for the next 10 days:

In [None]:
predictions = model.predict(len(data), len(data)+10)
display(predictions)

Then, we plot the predictions:

In [None]:
fig, ax = plt.subplots(figsize=(15, 10))
ax.plot(data['Close'], label='Training Data')
ax.plot(predictions, label='Predictions', color='red')
ax.set_xlabel('Date')
ax.set_ylabel('Closing Price')
ax.set_title('Actual vs. Predicted Closing Prices')
ax.legend()