Notebook: Random Forest Model Training for Stock Price Prediction

Introduction

This notebook trains and saves Random Forest models for predicting stock prices, focusing on individual tickers.

Key Steps

1.	Load processed data from SQLite.

2.	Filter data for a specific ticker (default: XOM).

3.	Preprocess and normalize features.

4.	Train a Random Forest model.

5.	Save the trained model and scaler for future evaluation and predictions.

Import Libraries

•	pandas: For data manipulation and analysis.

•	sqlite3: For interacting with the SQLite database storing stock data.

•	scikit-learn:

•	RandomForestRegressor for training the Random Forest model.

•	train_test_split for splitting data.

•	StandardScaler for feature normalization.

•	mean_squared_error, r2_score for evaluating model performance.

•	joblib: For saving trained models and scalers.

In [1]:
# Random Forest Model Training Notebook

# Import necessary libraries
import pandas as pd
import sqlite3
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error, r2_score
import joblib

Step 1: Load Data from SQLite Database

•	Purpose: Load preprocessed stock data stored in the SQLite database (stocks_data.db).
•	Steps:

1.	Define the database path.

2.	Query the processed_stocks table to fetch the entire dataset.

3.	Load the data into a pandas DataFrame.

•	Output: Displays the number of rows loaded from the database.

In [2]:

# Step 1: Load Data from SQLite Database
db_path = 'database/stocks_data.db'
with sqlite3.connect(db_path) as conn:
    query = "SELECT * FROM processed_stocks"
    data = pd.read_sql(query, conn)
print(f"Loaded processed data: {data.shape[0]} rows")

Loaded processed data: 67834 rows


Step 2: Filter Data for Default Ticker

•	Purpose: Focus on a specific stock ticker for model training.

•	Steps:

1.	Define the default ticker (e.g., XOM for Exxon Mobil).

2.	Filter the dataset to include only rows corresponding to the selected ticker.

•	Output: Displays the number of rows for the selected ticker.

In [3]:

# Step 2: Filter Data for Default Ticker
default_ticker = 'XOM'
ticker_data = data[data['Ticker'] == default_ticker]
print(f"Loaded data for {default_ticker}: {ticker_data.shape[0]} rows")


Loaded data for XOM: 11791 rows


Step 3: Define Features and Target

•	Features (X): Independent variables used for predictions:

•	7-day MA, 14-day MA, Volatility, Lag_1, Lag_2.

•	Target (y): Dependent variable to predict (Adj Close - Adjusted Closing Price).

In [4]:

# Step 3: Define Features and Target
features = ['7-day MA', '14-day MA', 'Volatility', 'Lag_1', 'Lag_2']
target = 'Adj Close'
X = ticker_data[features]
y = ticker_data[target]

Step 4: Split Data into Training and Testing Sets

•	Purpose: Divide the dataset into:

•	Training Set (80%): Used to train the model.

•	Testing Set (20%): Used to evaluate model performance.

•	Output: Ensures fair evaluation of the model’s predictive ability.

In [5]:

# Step 4: Split Data into Training and Testing Sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Step 5: Normalize Features

•	Purpose: Normalize feature values to ensure consistency and improve model performance.

•	Steps:

1.	Use StandardScaler to scale features to zero mean and unit variance.

2.	Fit the scaler to the training data.

3.	Transform both training and testing datasets using the scaler.

•	Output: Scaled training and testing feature sets.

In [6]:

# Step 5: Normalize Features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)


Step 6: Train Random Forest Regressor

•	Purpose: Train a Random Forest model using the normalized training dataset.
•	Steps:

1.	Initialize the RandomForestRegressor with 100 estimators and a random seed for reproducibility.

2.	Train the model on the scaled training data.

•	Output: Confirms successful model training.

In [7]:

# Step 6: Train Random Forest Regressor
model = RandomForestRegressor(n_estimators=100, random_state=42)
model.fit(X_train_scaled, y_train)
print("Random Forest model trained successfully.")


Random Forest model trained successfully.


Step 7: Evaluate Model on Test Data

•	Purpose: Assess the model’s performance on the test dataset using:

1.	Mean Squared Error (MSE): Average squared prediction errors.

2.	R-squared (R²): Proportion of variance explained by the model.

•	Output: Displays key evaluation metrics.

In [8]:

# Step 7: Evaluate Model on Test Data
y_pred = model.predict(X_test_scaled)
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f"Model Evaluation: MSE={mse:.2f}, R²={r2:.2f}")


Model Evaluation: MSE=0.24, R²=1.00


Step 8: Save the Model and Scaler

•	Purpose: Save the trained model and scaler for future use.

•	Steps:

1.	Save the model as model_<ticker>_rf.pkl.

2.	Save the scaler as scaler_<ticker>_rf.pkl.

•	Output: Confirms successful saving of the model and scaler.

In [9]:

# Step 8: Save the Model and Scaler
model_filename = f'models/model_{default_ticker}_rf.pkl'
scaler_filename = f'models/scaler_{default_ticker}_rf.pkl'
joblib.dump(model, model_filename)
joblib.dump(scaler, scaler_filename)
print(f"Model saved as '{model_filename}' and scaler saved as '{scaler_filename}'.")

Model saved as 'models/model_XOM_rf.pkl' and scaler saved as 'models/scaler_XOM_rf.pkl'.


Summary

This notebook successfully trains a Random Forest model for the default ticker (XOM) by:

1.	Loading preprocessed data.

2.	Filtering data for the selected ticker.

3.	Normalizing features and splitting the dataset.

4.	Training the Random Forest model.

5.	Saving the trained model and scaler for evaluation and predictions.

Next Steps

1.	Evaluate model predictions using a separate evaluation notebook.

2.	Extend the Flask app to dynamically load and use the saved Random Forest model.

3.	Visualize the model’s predictions and residuals.