# AI in Finance: Data-Driven Investment Strategies with Python | A1: Individual Assignment

**Student Name** : Jeffry Livingston

**Student Id** : 44774523

**Datasource** : https://www.kaggle.com/datasets/jessevent/all-crypto-currencies

# Introduction
This project leverages the Cryptocurrency Daily Market Price dataset, which provides extensive historical data for over 1,500 cryptocurrency tokens, covering the period from April 28, 2017 to May 21, 2018. With more than 750,000 daily observations, the dataset includes essential trading metrics such as:

Opening, high, low, and closing prices

Daily trading volume

Market capitalization

Token identifiers (e.g., name, symbol, slug)

The data has been thoroughly cleaned to correct formatting issues and ensure compatibility for analytical use. This rich dataset enables deep exploration of price movements, volatility patterns, and market behavior across a wide range of digital assets.

# Objective

The primary objective of this analysis is to apply data science and machine learning techniques to extract insights from the cryptocurrency market. Specifically, we aim to:

**Perform Descriptive Analysis**

  Summarize key variables and understand distributions, central tendencies, and variations in price and volume data.

**Create Visualizations**

  Use Plotly to generate at least three interactive charts that highlight important trends and relationships in the dataset.

**Apply Advanced Techniques**

  Use NumPy for mathematical operations such as return calculations.

  Explore advanced analysis including risk-adjusted performance metrics.

  Implement machine learning models such as Random Forest to detect price behavior patterns or classify trends.

**Summarize Key Findings**

  This Provide the major insights discovered and reflect on their implications for cryptocurrency trading and analysis.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import io

from statsmodels.tsa.stattools import adfuller
from statsmodels.tsa.seasonal import seasonal_decompose
from sklearn.model_selection import train_test_split, TimeSeriesSplit, GridSearchCV
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import FunctionTransformer
from sklearn.metrics import (
    accuracy_score, confusion_matrix, classification_report,roc_curve,
    roc_auc_score, RocCurveDisplay, mean_absolute_error, r2_score, mean_squared_error
)

from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsRegressor
from xgboost import XGBRegressor

import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.express as px

# Set default plot size
plt.rcParams['figure.figsize'] = (12, 4)
import warnings
from statsmodels.tsa.arima.model import ARIMA
from itertools import product
from statsmodels.tsa.stattools import adfuller
warnings.filterwarnings('ignore')

In [2]:
# Reading the dataset
df = pd.read_csv('crypto-markets.csv')

**Dataset Overview and Summary Statistics**

The initial exploration of the dataset provided a clear understanding of its structure and quality. A preview of the data showed consistent formatting across all columns, while the summary of data types confirmed that each variable is appropriately categorized. Additionally, there are no missing values, indicating the dataset is clean and complete. The statistical summary of the numerical features offered insights into the distribution, central tendency, and variability of key financial metrics such as price, volume, and market capitalization. This confirms that the dataset is well-prepared for further analysis and modeling.

In [3]:
# Data Types & Null Check
print(df.head())
print(df.info())

      slug symbol     name        date  ranknow    open    high     low  \
0  bitcoin    BTC  Bitcoin  2013-04-28        1  135.30  135.98  132.10   
1  bitcoin    BTC  Bitcoin  2013-04-29        1  134.44  147.49  134.00   
2  bitcoin    BTC  Bitcoin  2013-04-30        1  144.00  146.93  134.05   
3  bitcoin    BTC  Bitcoin  2013-05-01        1  139.00  139.89  107.72   
4  bitcoin    BTC  Bitcoin  2013-05-02        1  116.38  125.60   92.28   

    close  volume        market  close_ratio  spread  
0  134.21     0.0  1.488567e+09       0.5438    3.88  
1  144.54     0.0  1.603769e+09       0.7813   13.49  
2  139.00     0.0  1.542813e+09       0.3843   12.88  
3  116.99     0.0  1.298955e+09       0.2882   32.17  
4  105.21     0.0  1.168517e+09       0.3881   33.32  
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 942297 entries, 0 to 942296
Data columns (total 13 columns):
 #   Column       Non-Null Count   Dtype  
---  ------       --------------   -----  
 0   slug         9422

In [4]:
# Descriptive Statistics
print(df.describe())

             ranknow          open          high           low         close  \
count  942297.000000  9.422970e+05  9.422970e+05  9.422970e+05  9.422970e+05   
mean     1000.170608  3.483522e+02  4.085930e+02  2.962526e+02  3.461018e+02   
std       587.575283  1.318436e+04  1.616386e+04  1.092931e+04  1.309822e+04   
min         1.000000  2.500000e-09  3.200000e-09  2.500000e-10  2.000000e-10   
25%       465.000000  2.321000e-03  2.628000e-03  2.044000e-03  2.314000e-03   
50%      1072.000000  2.398300e-02  2.680200e-02  2.143700e-02  2.389200e-02   
75%      1484.000000  2.268600e-01  2.508940e-01  2.043910e-01  2.259340e-01   
max      2072.000000  2.298390e+06  2.926100e+06  2.030590e+06  2.300740e+06   

             volume        market    close_ratio        spread  
count  9.422970e+05  9.422970e+05  942297.000000  9.422970e+05  
mean   8.720383e+06  1.725060e+08       0.459499  1.123400e+02  
std    1.839802e+08  3.575590e+09       0.326160  6.783713e+03  
min    0.000000e+00

In [5]:
df.isna().sum()

Unnamed: 0,0
slug,0
symbol,0
name,0
date,0
ranknow,0
open,0
high,0
low,0
close,0
volume,0


The above output confirms that there are no missing (null) values across any of the columns in the dataset. This indicates that the data is already clean and complete, and no additional imputation or data cleaning is necessary for handling missing values. We can proceed directly with exploratory analysis and modeling.

# Exploratory analysis

In this EDA section, I will carefully address the key questions outlined in the introduction. For each plot or analysis, I will walk you through the results, explaining what the output reveals and highlighting the most important observations. The goal is to provide a clear and logical flow of insights, so you can easily follow the analytical reasoning behind each step.

In [6]:
df['symbol'].unique()
print(df['symbol'].unique())

['BTC' 'XRP' 'ETH' ... '42' 'BTWTY' 'NANOX']


Based on the result (there are some issues in the data as we can see 42 in symbol i choose BTC, so that data is not going be used in my analysis its nessesary to clean if we are using the entire dataset), I chose BTC (Bitcoin) for the focused analysis because it is the most dominant and widely traded cryptocurrency, consistently driving market trends and sentiment. Its high liquidity, extensive historical data, and central role in the crypto ecosystem make it ideal for uncovering meaningful insights and evaluating market dynamics.

In [7]:
# Filter dataset for Bitcoin only
btc_df = df[df['slug'] == 'bitcoin'].copy()
btc_df['date'] = pd.to_datetime(btc_df['date'])
btc_df = btc_df.sort_values('date')
btc_df.set_index('date', inplace=True)
btc_df.head()

Unnamed: 0_level_0,slug,symbol,name,ranknow,open,high,low,close,volume,market,close_ratio,spread
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
2013-04-28,bitcoin,BTC,Bitcoin,1,135.3,135.98,132.1,134.21,0.0,1488567000.0,0.5438,3.88
2013-04-29,bitcoin,BTC,Bitcoin,1,134.44,147.49,134.0,144.54,0.0,1603769000.0,0.7813,13.49
2013-04-30,bitcoin,BTC,Bitcoin,1,144.0,146.93,134.05,139.0,0.0,1542813000.0,0.3843,12.88
2013-05-01,bitcoin,BTC,Bitcoin,1,139.0,139.89,107.72,116.99,0.0,1298955000.0,0.2882,32.17
2013-05-02,bitcoin,BTC,Bitcoin,1,116.38,125.6,92.28,105.21,0.0,1168517000.0,0.3881,33.32


**Plot Daily Closing Price Trend**


In [8]:
fig = px.line(btc_df, x=btc_df.index, y='close',
              title='Bitcoin Daily Closing Price Over Time',
              labels={'date': 'Date', 'close': 'Closing Price (USD)'})
fig.update_traces(line=dict(color='royalblue', width=2))
fig.show()

**Plot Trading Volume Over Time**

In [9]:
fig = px.area(btc_df, x=btc_df.index, y='volume',
              title='Bitcoin Daily Trading Volume Over Time',
              labels={'date': 'Date', 'volume': 'Volume (USD)'})
fig.update_traces(line=dict(color='Crimson', width=2))
fig.show()

**Plot High-Low Price Range**

In [10]:
btc_df['price_range'] = btc_df['high'] - btc_df['low']

fig = px.line(btc_df, x=btc_df.index, y='price_range',
              title='Bitcoin Daily High-Low Price Range',
              labels={'date': 'Date', 'price_range': 'Price Range (USD)'})
fig.update_traces(line=dict(color='seagreen', width=2))
fig.show()

**Time Series Stability & Seasonal Trend Analysis**

To ensure reliable forecasting, it's important to assess whether the time series is stationary i.e., whether its statistical properties (mean and variance) remain constant over time. We apply the Dickey-Fuller test and perform seasonal decomposition to analyze trends and seasonal effects in Bitcoin’s monthly closing prices.

In [11]:
# Resample to monthly average
btc_monthly = btc_df['close'].resample('M').mean().dropna()

# Seasonal Decomposition (Additive)
decomposition = seasonal_decompose(btc_monthly, model='additive')

# Extract components
trend = decomposition.trend
seasonal = decomposition.seasonal
residual = decomposition.resid

# Create Plotly subplot
from plotly.subplots import make_subplots

fig = make_subplots(rows=4, cols=1, shared_xaxes=True,
                    subplot_titles=["Original Series", "Trend", "Seasonal", "Residual"])

fig.add_trace(go.Scatter(x=btc_monthly.index, y=btc_monthly,
                         name="Original", line=dict(color='blue')), row=1, col=1)

fig.add_trace(go.Scatter(x=trend.index, y=trend,
                         name="Trend", line=dict(color='orange')), row=2, col=1)

fig.add_trace(go.Scatter(x=seasonal.index, y=seasonal,
                         name="Seasonal", line=dict(color='green')), row=3, col=1)

fig.add_trace(go.Scatter(x=residual.index, y=residual,
                         name="Residual", line=dict(color='red')), row=4, col=1)

fig.update_layout(height=800, title_text="Seasonal Decomposition of Monthly Bitcoin Closing Prices")
fig.show()

# Dickey-Fuller test
p_value = adfuller(btc_monthly)[1]
print(f"Dickey–Fuller test p-value: {p_value:.5f}")

if p_value < 0.05:
    print("The series is stationary (reject null hypothesis).")
else:
    print("The series is not stationary (fail to reject null hypothesis).")

Dickey–Fuller test p-value: 0.47902
The series is not stationary (fail to reject null hypothesis).


**Interpretation of Dickey–Fuller Test Result**


The Dickey–Fuller test returned a p-value of 0.47902, which is significantly greater than 0.05. This means we fail to reject the null hypothesis, indicating that the Bitcoin monthly closing price series is not stationary. In other words, the time series exhibits changing trends or variance over time, which may affect the performance of forecasting models that assume stationarity.

In [12]:
btc_diff = btc_monthly.diff().dropna()

In [13]:
p_value_diff = adfuller(btc_diff)[1]
print(f"Differenced Series p-value: {p_value_diff:.5f}")

Differenced Series p-value: 0.00000


The Augmented Dickey-Fuller test on the differenced monthly Bitcoin prices returned a p-value of 0.00000, indicating the series is now stationary. This confirms that first-order differencing (d=1) is appropriate for ARIMA modeling. Stationarity is essential for reliable time series forecasting.

**`Using Numpy to calculate the returns`**

In [14]:
# Daily returns
btc_df['daily_return'] = btc_df['close'].pct_change()
print(btc_df['daily_return'])

# Cumulative return
btc_df['cumulative_return'] = (1 + btc_df['daily_return']).cumprod()
print(btc_df['cumulative_return'])

# Rolling 30-day average
btc_df['rolling_mean_30'] = btc_df['close'].rolling(window=30).mean()
print(btc_df['rolling_mean_30'])

# Rolling 30-day volatility
btc_df['rolling_volatility_30'] = btc_df['daily_return'].rolling(window=30).std()
print(btc_df['rolling_volatility_30'])

date
2013-04-28         NaN
2013-04-29    0.076969
2013-04-30   -0.038328
2013-05-01   -0.158345
2013-05-02   -0.100692
                ...   
2018-11-25    0.033295
2018-11-26   -0.057567
2018-11-27    0.011005
2018-11-28    0.114298
2018-11-29    0.005034
Name: daily_return, Length: 2042, dtype: float64
date
2013-04-28          NaN
2013-04-29     1.076969
2013-04-30     1.035690
2013-05-01     0.871694
2013-05-02     0.783921
                ...    
2018-11-25    29.878325
2018-11-26    28.158334
2018-11-27    28.468221
2018-11-28    31.722077
2018-11-29    31.881752
Name: cumulative_return, Length: 2042, dtype: float64
date
2013-04-28            NaN
2013-04-29            NaN
2013-04-30            NaN
2013-05-01            NaN
2013-05-02            NaN
                 ...     
2018-11-25    5797.513333
2018-11-26    5707.471667
2018-11-27    5618.616000
2018-11-28    5549.442333
2018-11-29    5480.928333
Name: rolling_mean_30, Length: 2042, dtype: float64
date
2013-04-28         NaN

**Calculate Risk-Adjusted Return Measures**

In [15]:
risk_free_rate = 0.01  # Annualized risk-free rate

# Convert to daily rate
daily_rf = risk_free_rate / 252

# Sharpe Ratio
sharpe_ratio = (btc_df['daily_return'].mean() - daily_rf) / btc_df['daily_return'].std()
print(f"Sharpe Ratio: {sharpe_ratio:.2f}")

# Sortino Ratio (using negative returns only)
downside_std = btc_df.loc[btc_df['daily_return'] < 0, 'daily_return'].std()
sortino_ratio = (btc_df['daily_return'].mean() - daily_rf) / downside_std
print(f"Sortino Ratio: {sortino_ratio:.2f}")


Sharpe Ratio: 0.06
Sortino Ratio: 0.08


A Sharpe Ratio of 0.06 suggests low risk-adjusted returns, implying that the average return does not adequately compensate for the risk taken, while a slightly higher Sortino Ratio indicates less severe downside risk compared to total volatility.

**ARIMA Model**

The ARIMA model was applied to Bitcoin’s monthly average closing price data to forecast future price trends. After performing a grid search over multiple parameter combinations, the optimal ARIMA configuration was found to be (p=3, d=1, q=0), based on the lowest AIC score of 8.00. This model effectively captures the trend and autocorrelation structure in the data, providing a smoothed and continuous forecast that aligns closely with observed historical prices. The forecasted line continues the rising and falling behavior seen in real market activity, offering a reliable projection for short-term trends.

p (AutoRegressive order) – Represents the number of past observations (lags) the model considers when making predictions. It determines how many previous values influence the current one.

d (Degree of Differencing) – Indicates how many times the original time series is differenced to remove trends and achieve stationarity.

q (Moving Average order) – Refers to the number of past forecast errors included in the model. It helps smooth out the predictions by learning from previous inaccuracies.



In [16]:
# Step 1: Train-test split
train = btc_monthly[:'2017']
test = btc_monthly['2018':]

# Step 2: Grid search for best (p,d,q)
p = q = range(0, 4)
d = 1
params = list(product(p, q))
best_aic = float('inf')
best_order = None

for param in params:
    try:
        model = ARIMA(train, order=(param[0], d, param[1]))
        result = model.fit()
        if result.aic < best_aic:
            best_aic = result.aic
            best_order = (param[0], d, param[1])
    except:
        continue

print(f"Best ARIMA Order: {best_order} with AIC: {best_aic:.2f}")

# Step 3: Fit final model on full series
final_model = ARIMA(btc_monthly, order=best_order)
fitted_model = final_model.fit()

# Step 4: Forecast full range (in-sample + out-of-sample)
forecast = fitted_model.predict(start=btc_monthly.index[0], end=btc_monthly.index[-1])

# Step 5: Plot with Plotly
fig = go.Figure()

fig.add_trace(go.Scatter(
    x=btc_monthly.index, y=btc_monthly,
    mode='lines', name='Actual Close', line=dict(color='steelblue')
))

fig.add_trace(go.Scatter(
    x=forecast.index, y=forecast,
    mode='lines', name='Predicted Close', line=dict(dash='dash', color='red')
))

fig.update_layout(
    title='Bitcoin Monthly Forecast (ARIMA)',
    xaxis_title='Date',
    yaxis_title='Price (USD)',
    legend=dict(x=0.01, y=0.99),
    height=500,
    width=1000,
    template='plotly_white'
)

fig.show()


Best ARIMA Order: (3, 1, 0) with AIC: 8.00


In [17]:
# Align forecast and test for overlapping range
forecast_test = forecast[test.index.intersection(forecast.index)]

# Ensure test and forecast are aligned
test_aligned = test.loc[forecast_test.index]

# Calculate error metrics
from sklearn.metrics import mean_absolute_error, mean_squared_error

mae = mean_absolute_error(test_aligned, forecast_test)
rmse = np.sqrt(mean_squared_error(test_aligned, forecast_test))

print(f"MAE: {mae:.2f}")
print(f"RMSE: {rmse:.2f}")

MAE: 1067.13
RMSE: 1412.97


To evaluate the performance of the ARIMA model, I calculated the error metrics between the actual and predicted Bitcoin prices for the test period. The model produced a Mean Absolute Error (MAE) of 1067 dollar, indicating that on average, the monthly predicted price was off by around 1067 dollar. The Root Mean Squared Error (RMSE) was 1412 dollar, suggesting that although some larger deviations existed, they were not excessively severe. Given the inherent volatility of Bitcoin, especially around the 2017–2018 period, these error levels suggest that the model effectively captured the general direction of the price trend and provided reasonably accurate forecasts.

**Random Forest Model**

This section introduces the Random Forest model as a powerful ensemble tree-based model, renowned for its ability to capture complex non-linear relationships. It is typically adapted for time series forecasting by using lagged features, moving averages, or other engineered time-based variables. The methodology should concisely mention key aspects like feature engineering and the crucial process of splitting data into training and test sets.

In [19]:
# Step 1: Split into train and test
train = btc_monthly[:'2017']
test = btc_monthly['2018':]

# Step 2: Create lagged features
def create_lagged_data(series, n_lags=6):
    df = pd.DataFrame(series.copy(), columns=['close'])
    df['rolling_mean_3'] = df['close'].rolling(3).mean()
    df['momentum'] = df['close'] - df['close'].shift(3)
    for i in range(1, n_lags + 1):
        df[f'lag_{i}'] = df['close'].shift(i)
    df.dropna(inplace=True)
    return df

# Step 3: Build training data
n_lags = 6
train_lagged = create_lagged_data(train, n_lags)
X_train = train_lagged.drop('close', axis=1)
y_train = train_lagged['close']

# Step 4: Train Random Forest
model_rf = RandomForestRegressor(n_estimators=100, random_state=42)
model_rf.fit(X_train, y_train)

# Step 5: Recursive Forecasting for test horizon
forecast_horizon = len(test)
last_known = train.copy()
predictions = []
forecast_index = []

for _ in range(forecast_horizon):
    next_date = last_known.index[-1] + pd.DateOffset(months=1)
    window = last_known[-(n_lags + 3):]

    rolling_mean_3 = window[-3:].mean()
    momentum = window[-1] - window[-4]
    lag_values = window[-n_lags:].values

    input_row = {
        'rolling_mean_3': rolling_mean_3,
        'momentum': momentum,
        **{f'lag_{j+1}': lag_values[j] for j in range(n_lags)}
    }

    input_df = pd.DataFrame([input_row])[X_train.columns]
    pred = model_rf.predict(input_df)[0]

    predictions.append(pred)
    forecast_index.append(next_date)
    last_known.loc[next_date] = pred

# Step 6: Create predicted series
predicted_series = pd.Series(predictions, index=forecast_index)

# Step 7: Evaluate
mae = mean_absolute_error(test, predicted_series)
rmse = np.sqrt(mean_squared_error(test, predicted_series))
print(f"Random Forest MAE: ${mae:.2f}")
print(f"Random Forest RMSE: ${rmse:.2f}")

# Step 8: Plot with Plotly
fig = go.Figure()

# Actual
fig.add_trace(go.Scatter(
    x=btc_monthly.index,
    y=btc_monthly,
    mode='lines',
    name='Actual',
    line=dict(color='blue')
))

# Forecast
fig.add_trace(go.Scatter(
    x=predicted_series.index,
    y=predicted_series,
    mode='lines',
    name='Random Forest Forecast',
    line=dict(color='orange', dash='dash')
))

# Vertical line to indicate forecast start
fig.add_shape(
    type="line",
    x0=forecast_index[0],
    y0=min(btc_monthly.min(), predicted_series.min()),
    x1=forecast_index[0],
    y1=max(btc_monthly.max(), predicted_series.max()),
    line=dict(color="gray", dash="dot"),
)

# Optionally add annotation
fig.add_annotation(
    x=forecast_index[0],
    y=max(btc_monthly.max(), predicted_series.max()),
    text="Forecast Start",
    showarrow=True,
    arrowhead=1,
    ax=0,
    ay=-40
)

# Layout settings
fig.update_layout(
    title='Random Forest Forecast from 2018 Onwards',
    xaxis_title='Date',
    yaxis_title='Price (USD)',
    template='plotly_white',
    height=500,
    width=1000,
    legend=dict(x=0.01, y=0.99)
)

fig.show()


Random Forest MAE: $2781.20
Random Forest RMSE: $3131.04


# Conclusion

The analysis of Bitcoin's historical price data from 2013-2018 reveals a market characterized by periods of remarkable stability followed by explosive, highly volatile growth, culminating in a significant correction. The dramatic increase in daily price range and trading volume during the late 2017 surge underscores the speculative nature and increased market participation that accompanied this peak. While seasonal decomposition confirmed underlying monthly patterns, the dominant forces were clearly the strong trend and, crucially, large, unpredictable residual components, especially during periods of high volatility. Both the ARIMA and Random Forest models, despite their distinct methodologies, demonstrated inherent limitations in forecasting such a dynamic and non-linear asset. They successfully captured broader trends but consistently struggled with the extreme price swings and sharp turning points characteristic of Bitcoin's 'bubble' and subsequent 'bust' phase. This suggests that for assets driven heavily by sentiment, external events, and rapid market regime shifts, traditional time series models, even those capable of non-linear relationships, may provide smoothed expectations rather than precise predictions of extreme events. The significant unexplained variance highlights the influence of factors beyond historical price patterns, such as regulatory news, technological developments, or macroeconomic shifts, which are not inherently captured by these models. Therefore, while these models offer valuable contributions to understanding underlying trends, their utility for precise short-term prediction during periods of high market uncertainty remains constrained, emphasizing the need for incorporating exogenous variables or employing more adaptive, real-time forecasting approaches. The consistent struggle of both models points to a fundamental limitation: models relying solely on historical price data cannot fully account for sudden, significant shifts driven by external, unpredictable factors common in nascent, speculative markets like cryptocurrency.

# Key Insights
**Explosive Growth Followed by Decline**

The closing price chart shows a steady early trend followed by a sharp exponential rise in 2017, peaking near $20,000, then dropping significantly in 2018.

This volatility underlines Bitcoin's sensitivity to speculative activity, media coverage, and market hype rather than purely fundamental or seasonal indicators.

**High Volatility Periods Clearly Identified**

The High-Low Price Range plot spikes prominently around the 2017 bull market, showing extreme daily price swings of over $4000.

This suggests high trading risk during bull runs and reflects poor market stability during speculative bubbles.

**Volume Indicates Institutional Interest**

The Trading Volume Over Time plot also peaks during the late 2017 period, aligning with the price surge, suggesting increased institutional and retail participation.

Post-2017, despite the price fall, volume remains relatively elevated, signaling ongoing market activity and interest.

**Time Series Decomposition Shows Strong Seasonal and Trend Components**
**The decomposition reveals:**

A strong upward trend from 2016 to end-2017.

Cyclic seasonal patterns across years, possibly linked to macro events or market cycles.

A surge in residuals in 2018, signaling unpredictable external shocks post-peak.

**Random Forest Forecast Lacks Long-Term Volatility Capture**

The Random Forest Forecast from 2018 onwards fails to capture the declining trend accurately. It flattens after an initial drop.

This is typical of tree-based models that struggle with extrapolation, especially in financial time series with sudden trend reversals.

Despite high MAE and RMSE ("$2781" and "$3131"), it offers short-term directional accuracy but lacks the robustness for volatile long-term forecasts.

**ARIMA Forecast Performs Better with Seasonality**

The ARIMA forecast better captures the declining trend post-2017, aligning more closely with the actual values.

It benefits from built-in seasonality and trend modeling, making it more suitable for monthly financial data than tree-based models.

# Strategic Implications
Short-Term Trading Models: Random Forest can be useful for short-term signals when supplemented with more dynamic features (e.g., RSI, MACD).

Long-Term Forecasting: Use ARIMA/SARIMA/Prophet or hybrid models to incorporate both historical seasonality and external events.
