**Machine Learning** (ML) is a subset of **Artificial Intelligence** (AI) that focuses on developing algorithms that enable computers to learn from and make decisions based on data. Rather than being explicitly programmed to perform a task, ML algorithms build a model based on sample inputs to make predictions or decisions without human intervention. This learning process involves the use of statistical techniques to identify patterns and relationships within the data, thereby enabling the machine to improve its performance over time with more data.

**Artificial Intelligence**, a term more people are familiar with, encompasses a broader range of techniques, including rule-based systems, natural language processing, and robotics, with the goal of creating systems that can perform tasks typically requiring human intelligence. **Machine Learning** is a crucial part of AI as it provides the ability to adapt and improve autonomously. In essence, while AI aims to simulate intelligent behaviour, ML is the method by which this intelligence is achieved through data-driven learning, which is perfect for trading and financial markets.

### Random Forest Model in Trading Technical Analysis

In [None]:
!pip install -U --no-cache-dir eodhd config scikit-learn matplotlib seaborn

In [None]:
from eodhd import APIClient
import config as cfg

api = APIClient(cfg.API_KEY)


def get_ohlc_data():
    # df = api.get_historical_data("GSPC.INDX", "d", results=2000)
    df = api.get_historical_data("BTC-USD.CC", "d", results=2000)
    return df

if __name__ == "__main__":
    df = get_ohlc_data()
    print(df)

In [None]:
def calculate_sma(data, window):
    return data.rolling(window=window).mean()


def calculate_macd(data, short_window=12, long_window=26, signal_window=9):
    short_ema = data.ewm(span=short_window, adjust=False).mean()
    long_ema = data.ewm(span=long_window, adjust=False).mean()
    macd = short_ema - long_ema
    signal_line = macd.ewm(span=signal_window, adjust=False).mean()
    return macd, signal_line


def calculate_rsi(data, window=14):
    delta = data.diff(1)
    gain = (delta.where(delta > 0, 0)).rolling(window=window).mean()
    loss = (-delta.where(delta < 0, 0)).rolling(window=window).mean()
    rs = gain / loss
    rsi = 100 - (100 / (1 + rs))
    return rsi


def calculate_vroc(volume, window=14):
    vroc = ((volume.diff(window)) / volume.shift(window)) * 100
    return vroc


if __name__ == "__main__":
    df = get_ohlc_data()

    df["sma50"] = calculate_sma(df["close"], 50)
    df["sma200"] = calculate_sma(df["close"], 200)
    df["macd"], df["signal"] = calculate_macd(df["close"])
    df["rsi14"] = calculate_rsi(df["close"])
    df["vroc14"] = calculate_vroc(df["volume"])

    df.dropna(inplace=True)

    print(df)


In [None]:
# include these library imports at the top of your file

from sklearn.model_selection import train_test_split  
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

# put this in your main at the end

features = [
    "open",
    "high",
    "low",
    "volume",
    "sma50",
    "sma200",
    "macd",
    "signal",
    "rsi14",
    "vroc14",
]
X = df[features]
y = df["close"]
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

In [None]:
print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)

In [None]:
rf = RandomForestRegressor(n_estimators=100, random_state=42)
rf.fit(X_train, y_train)

#### Making Predictions

In [None]:
y_train_pred = rf.predict(X_train)
y_test_pred = rf.predict(X_test)

#### Visualisation of the Predictions

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

#### Scatter Plot of Actual vs. Predicted Values

In [None]:
plt.figure(figsize=(14, 7))

plt.subplot(1, 2, 1)
plt.scatter(y_train, y_train_pred, alpha=0.3)
plt.xlabel("Actual Close Price (Train)")
plt.ylabel("Predicted Close Price (Train)")
plt.title("Actual vs. Predicted Close Price (Training Set)")
plt.plot([y_train.min(), y_train.max()], [y_train.min(), y_train.max()], "r--")

plt.subplot(1, 2, 2)
plt.scatter(y_test, y_test_pred, alpha=0.3)
plt.xlabel("Actual Close Price (Test)")
plt.ylabel("Predicted Close Price (Test)")
plt.title("Actual vs. Predicted Close Price (Testing Set)")
plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], "r--")

plt.tight_layout()
plt.show()

#### Line Plot of Actual vs. Predicted Values Over Time

In [None]:
plt.figure(figsize=(14, 7))

plt.plot(y_test.index, y_test, label="Actual Close Price")
plt.plot(y_test.index, y_test_pred, label="Predicted Close Price")
plt.xlabel("Date")
plt.ylabel("Close Price")
plt.title("Actual vs. Predicted Close Price Over Time (Testing Set)")
plt.legend()
plt.show()

#### Evaluating the Performance of the Model

In [None]:
train_mae = mean_absolute_error(y_train, y_train_pred)
test_mae = mean_absolute_error(y_test, y_test_pred)
train_mse = mean_squared_error(y_train, y_train_pred)
test_mse = mean_squared_error(y_test, y_test_pred)
train_r2 = r2_score(y_train, y_train_pred)
test_r2 = r2_score(y_test, y_test_pred)

print(f"Training MAE: {train_mae}")
print(f"Testing MAE: {test_mae}")
print(f"Training MSE: {train_mse}")
print(f"Testing MSE: {test_mse}")
print(f"Training R²: {train_r2}")
print(f"Testing R²: {test_r2}")

#### **Mean Absolute Error (MAE)**

MAE measures the average absolute errors between the predicted and actual values. It provides a straightforward measure of how far off predictions are on average.

A lower MAE indicates better model performance.

#### **Mean Squared Error (MSE)**

MSE measures the average squared errors between the predicted and actual values. It penalises larger errors more than MAE, making it sensitive to outliers.

A lower MSE indicates better model performance.

#### **R-squared (R²)**

R² measures the proportion of the variance in the dependent variable that is predictable from the independent variables. It ranges from 0 to 1, where 1 indicates perfect prediction.

A higher R² indicates better model performance.

In [None]:
# update this import at the top

from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV

# modify the mode in your main

param_grid = {
    "n_estimators": [50, 100, 200],
    "max_depth": [10, 20, 30, None],
    "min_samples_split": [2, 5, 10],
    "min_samples_leaf": [1, 2, 4],
    "bootstrap": [True, False],
}

rf = RandomForestRegressor(random_state=42)
grid_search = GridSearchCV(
    estimator=rf, param_grid=param_grid, cv=5, n_jobs=-1, verbose=2
)
grid_search.fit(X_train, y_train)

print(f"Best parameters: {grid_search.best_params_}")

best_rf = grid_search.best_estimator_
best_rf.fit(X_train, y_train)

In [None]:
import pandas as pd

#### Feature Importance

In [None]:
feature_importances = best_rf.feature_importances_
importance_df = pd.DataFrame(
    {"Feature": features, "Importance": feature_importances}
)

importance_df = importance_df.sort_values(by="Importance", ascending=False)

plt.figure(figsize=(12, 8))
sns.barplot(x="Importance", y="Feature", data=importance_df)
plt.title("Feature Importances of Technical Indicators")
plt.show()