# Model Training & Evaluation

This notebook demonstrates the training and evaluation of our baseline Bayesian network model for predicting the stock market "Trend" (whether the next day's "Close" price is higher than today's). 

In this notebook, we will:
- Load the processed stock data and prepare the binary target.
- Select a subset of features (preferably using the binned versions).
- Build the Bayesian network model using functions from our `scripts/model_training.py`.
- Evaluate the model's accuracy on a test set.
- Document observations and discuss potential improvements.


In [9]:
import sys
import os
# Add the absolute path to the "scripts" folder to sys.path
scripts_path = os.path.abspath(os.path.join('..', 'scripts'))
sys.path.insert(0, scripts_path)
print("Added scripts path:", scripts_path)

import pandas as pd
import matplotlib.pyplot as plt
from model_training import build_bayesian_network, evaluate_model, prepare_target
from sklearn.model_selection import train_test_split

%matplotlib inline


Added scripts path: /workspaces/StockTradingAI-Project/scripts


In [11]:
# Load the processed stock data from the processed folder.
# Update the filename if necessary. Here, we assume a file named 'processed_stock_data.csv'.
df = pd.read_csv('../data/processed/all_stocks_2006-01-01_to_2018-01-01.csv')

# Prepare the binary target 'Trend' (1 if next day's Close > today's, else 0)
df = prepare_target(df)

print("Processed Data Head:")
display(df.head())

print("\nData Shape:", df.shape)


Processed Data Head:


Unnamed: 0,Date,Open,High,Low,Close,Volume,Name,Open_binned,High_binned,Low_binned,Close_binned,Volume_binned,Trend
0,2006-01-03,0.335166,0.349163,0.338184,0.356895,-1.170054,MMM,3,3,3,3,0,0
1,2006-01-04,0.363377,0.351427,0.354762,0.350397,-1.357457,MMM,3,3,3,3,0,0
2,2006-01-05,0.345838,0.337781,0.34346,0.338616,-1.368077,MMM,3,3,3,3,0,1
3,2006-01-06,0.349594,0.341858,0.344775,0.349093,-1.387002,MMM,3,3,3,3,0,1
4,2006-01-09,0.347309,0.356909,0.358182,0.355436,-1.666864,MMM,3,3,3,3,0,0



Data Shape: (93612, 13)


In [12]:
# Identify numerical columns in the dataframe
numeric_cols = df.select_dtypes(include=[float, int]).columns.tolist()
if 'Trend' in numeric_cols:
    numeric_cols.remove('Trend')

# For this baseline, we choose the first three numerical features.
features = numeric_cols[:3]

# Prefer using binned versions if they exist (e.g., "Open_binned" instead of "Open")
features_binned = [f"{col}_binned" if f"{col}_binned" in df.columns else col for col in features]
target = 'Trend'

print("Features used for modeling:", features_binned)


Features used for modeling: ['Open_binned', 'High_binned', 'Low_binned']


In [13]:
# Build the Bayesian network model using the selected features and target.
model = build_bayesian_network(df, features_binned, target)
print("Bayesian Network model built successfully.")


Bayesian Network model built successfully.


In [14]:
# Evaluate the model on the test set and compute accuracy.
predictions, actuals, accuracy = evaluate_model(model, df, features_binned, target)
print("Model Accuracy:", accuracy)




Model Accuracy: 0.5111894461357689
