# Time Series Analysis: Financial Data Forecasting

## Project Overview

In this project, we will perform time series analysis on financial data to forecast future trends. We will leverage Python and advanced statistical methods to explore the data, develop predictive models, and identify meaningful insights to make accurate forecasts.

The project will be divided into the following sections:

1. **Data Collection and Preprocessing**: Collect and preprocess financial data for analysis.
2. **Exploratory Data Analysis (EDA)**: Utilize EDA to uncover patterns and trends.
3. **Time Series Modeling**: Employ various time series models (ARIMA, SARIMA, Prophet) for forecasting.
4. **Model Evaluation**: Evaluate model performance and identify long-term trends.
5. **Seasonality Detection**: Detect seasonality patterns in the data.

Let's start with the first step: Data Collection and Preprocessing.

In [None]:
!pip install -q yfinance

In [None]:
import yfinance as yf

# Define the ticker symbol
tickerSymbol = 'AAPL'

# Get data on this ticker
tickerData = yf.Ticker(tickerSymbol)

# Get the historical prices for this ticker
tickerDf = tickerData.history(period='5y')

# Select only the close prices
close_prices = tickerDf['Close']
close_prices.head()

## Exploratory Data Analysis (EDA)

In this section, we'll explore our data to uncover patterns and trends. We'll start by visualizing the closing prices over time.

In [None]:
import matplotlib.pyplot as plt

plt.figure(figsize=(10, 6))
close_prices.plot()
plt.title('AAPL Closing Prices')
plt.xlabel('Date')
plt.ylabel('Price')
plt.grid(True)
plt.show()

In [None]:
from statsmodels.tsa.seasonal import seasonal_decompose

# Decompose the time series
decomposition = seasonal_decompose(close_prices, model='multiplicative')

# Plot the decomposed time series
plt.figure(figsize=(10, 8))
decomposition.plot()
plt.show()

In [None]:
# Decompose the time series with a frequency of 5
decomposition = seasonal_decompose(close_prices, model='multiplicative', period=5)

# Plot the decomposed time series
plt.figure(figsize=(10, 8))
decomposition.plot()
plt.show()

## Time Series Modeling

In this section, we'll build time series forecasting models to predict future closing prices. We'll start with an ARIMA model.

ARIMA, which stands for AutoRegressive Integrated Moving Average, is a class of models that is widely used for time series forecasting. ARIMA models take into account three aspects of the data:

- **AR (Autoregression):** The dependency between an observation and a number of lagged observations.
- **I (Integrated):** The use of differencing of raw observations to make the time series stationary.
- **MA (Moving Average):** The dependency between an observation and a residual error from a moving average model applied to lagged observations.

Before we can fit an ARIMA model, we need to determine the order of differencing (d), the number of autoregressive terms (p), and the number of lagged forecast errors in the prediction equation (q). We'll use the `pmdarima` library's `auto_arima` function to automatically select the best parameters based on the Akaike Information Criterion (AIC).

In [None]:
!pip install -q pmdarima

In [None]:
from pmdarima import auto_arima

# Fit auto_arima function to dataset
stepwise_fit = auto_arima(close_prices, start_p = 1, start_q = 1,
                          max_p = 3, max_q = 3, m = 5,
                          start_P = 0, seasonal = True,
                          d = None, D = 1, trace = True,
                          error_action ='ignore',   # we don't want to know if an order does not work
                          suppress_warnings = True,  # we don't want convergence warnings
                          stepwise = True)           # set to stepwise

# Print the summary
stepwise_fit.summary()

In [None]:
from statsmodels.tsa.statespace.sarimax import SARIMAX

# Split data into train / test sets
train = close_prices.iloc[:len(close_prices)-12]
test = close_prices.iloc[len(close_prices)-12:] # set one year(12 months) for testing

# Fit a SARIMAX(1, 0, 0)x(2, 1, 0, 5) on the training set
model = SARIMAX(train,
                order = (1, 0, 0),
                seasonal_order =(2, 1, 0, 5))

result = model.fit()
result.summary()

In [None]:
import matplotlib.pyplot as plt

# Predictions for one-year against the test set
predictions = result.predict(start = len(train), end = len(train) + len(test) - 1, typ = 'levels').rename('Predictions')

# plot predictions and actual values
predictions.plot(legend = True)
test.plot(legend = True)
plt.title('One-Year Forecast vs Actual Values')
plt.show()

In [None]:
from sklearn.metrics import mean_squared_error, mean_absolute_error
import numpy as np

# Calculate error metrics
mae = mean_absolute_error(test, predictions)
mse = mean_squared_error(test, predictions)
rmse = np.sqrt(mse)

{'MAE': mae, 'MSE': mse, 'RMSE': rmse}