# Sequence Model Research

The scope of this notebook is to assess and train different sequence models given the training data generated.

Training data is generated based on financial time series data labeled with potential profits using a buy-sell system.

The goal is to create a sequence model that can choose favourable stock charts equal to or better than a human can via traditional technical analysis.

## Import Libraries and Data

In [1]:
import os
import numpy as np

# Define the data directory relative to the script location
data_dir = 'data'

# Define the file paths
sequences_path = os.path.join(data_dir, 'sequences.npy')
labels_path = os.path.join(data_dir, 'labels.npy')
metadata_path = os.path.join(data_dir, 'metadata.npy')

# Load the data
try:
    data_sequences = np.load(sequences_path)
    data_labels = np.load(labels_path)
    data_metadata = np.load(metadata_path)

    # Number of examples to select
    num_examples = 115000

    # Generate a random permutation of indices
    indices = np.random.permutation(len(data_sequences))

    # Select the first `num_examples` indices
    selected_indices = indices[:num_examples]

    # Use the selected indices to create the random subset
    data_sequences = data_sequences[selected_indices, -84:, :]
    data_labels = data_labels[selected_indices]
    data_metadata = data_metadata[selected_indices]

    # Inspect the shape and size of the loaded data before slicing
    print(f'Loaded sequences shape: {data_sequences.shape}')
    print(f'Loaded sequences size: {data_sequences.size}')
    print(f'Loaded labels shape: {data_labels.shape}')
    print(f'Loaded metadata shape: {data_metadata.shape}')

except FileNotFoundError as e:
    print(f"Error loading files: {e}")
except ValueError as e:
    print(f"Value error: {e}")

# Calculate and print the expected total size
expected_total_size = num_examples * 252 * 15
print(f'Expected total size: {expected_total_size}')

# Define relevant columns and indices for normalization
relevant_columns = [
    'Open', 'High', 'Low', 'Close', 'Volume', 'Turnover', 'Consol_Detected',
    'Consol_Len_Bars', 'Consol_Depth_Percent', 'Close_21_bar_ema',
    'Close_50_bar_sma', 'Close_150_bar_sma', 'Close_200_bar_sma',
    'RSL', 'RSL_NH'
]

price_columns_indices = [0, 1, 2, 3, 9, 10, 11, 12]  # Indices of price-related columns in the sequence data

# Map indices to column names
price_columns = [relevant_columns[i] for i in price_columns_indices]

print("\nPrice-related columns:")
for index, column in zip(price_columns_indices, price_columns):
    print(f"Index: {index}, Column: {column}")


Loaded sequences shape: (115000, 84, 15)
Loaded sequences size: 144900000
Loaded labels shape: (115000,)
Loaded metadata shape: (115000, 2)
Expected total size: 434700000

Price-related columns:
Index: 0, Column: Open
Index: 1, Column: High
Index: 2, Column: Low
Index: 3, Column: Close
Index: 9, Column: Close_21_bar_ema
Index: 10, Column: Close_50_bar_sma
Index: 11, Column: Close_150_bar_sma
Index: 12, Column: Close_200_bar_sma


## Data Preprocessing

### NaN Removal

In [2]:
# Replace all NaNs with 0 due to moving averages having insufficient data to compute anything, leaving blank inputs.
# Check if NaNs exist

# Dictionary to map variable names to their corresponding data arrays
data_dict = {
    'data_sequences': data_sequences,
    'data_labels': data_labels,
}

# Using a dictionary to iterate over variables
for var_name, data in data_dict.items():
    num_nans = np.sum(np.isnan(data))
    print(f"NaNs in {var_name}: {num_nans}")

    # Remove NaNs
    if num_nans > 0:
        data_dict[var_name][:] = np.nan_to_num(data)
        num_nans = np.sum(np.isnan(data))
        print(f"NaNs remaining in {var_name} after removal: {num_nans}")

print(f"Data Seq Min: {np.min(data_sequences[:,:,price_columns_indices])}")
print(f"Data Seq Max: {np.max(data_sequences[:,:,price_columns_indices])}")


NaNs in data_sequences: 1471552
NaNs remaining in data_sequences after removal: 0
NaNs in data_labels: 0
Data Seq Min: 0.0
Data Seq Max: 468223510183936.0


### Corrupted sequence removal

99% of stocks I buy will be below 1000, with a few above 1000, although they are important.

I also noticed quite a few training examples have weird price data, which I filter out below.

I noticed with thresholds above 3e3, the max is the threshold, which is very suspect.

The loss of training examples is insignificant, and the result is better normalization of the data and obviously no corrupted sequences.

In [3]:
# Set the threshold for abnormal values based on domain knowledge
threshold = 3.0e3

# Detect all sequences with abnormally large price data
abnormal_sequences = []

# Iterate through each sequence to check for abnormal values
for sequence_index in range(data_sequences.shape[0]):
    # Extract price-related columns for the current sequence
    price_data = data_sequences[sequence_index, :, price_columns_indices]
    
    # Check if any value in the price_data exceeds the threshold
    if np.any(price_data > threshold):
        abnormal_sequences.append(sequence_index)

# Print the indices of the abnormal sequences
print(f"Abnormal Sequence Count: {len(abnormal_sequences)}")
print(f"Indices of abnormal sequences: {abnormal_sequences}")

# Create a mask for sequences that are not abnormal
mask = np.ones(data_sequences.shape[0], dtype=bool)
mask[abnormal_sequences] = False

# Filter out abnormal sequences from data_sequences and data_labels
filtered_data_sequences = data_sequences[mask]
filtered_data_labels = data_labels[mask]

# Print the shape of the filtered data
print(f"Filtered data_sequences shape: {filtered_data_sequences.shape}")
print(f"Filtered data_labels shape: {filtered_data_labels.shape}")

print(f"Data Seq Min: {np.min(filtered_data_sequences[:,:,price_columns_indices])}")
print(f"Data Seq Max: {np.max(filtered_data_sequences[:,:,price_columns_indices])}")


Abnormal Sequence Count: 1040
Indices of abnormal sequences: [234, 247, 274, 544, 589, 638, 666, 745, 839, 849, 1004, 1061, 1095, 1161, 1279, 1462, 1473, 1580, 1591, 1808, 1823, 1824, 2015, 2055, 2125, 2238, 2325, 2511, 2600, 2647, 2683, 2747, 2762, 2771, 2813, 2945, 3073, 3090, 3147, 3180, 3193, 3231, 3553, 3554, 3684, 3778, 4015, 4183, 4230, 4292, 4475, 4617, 4642, 4701, 4747, 4787, 4837, 4951, 5091, 5150, 5262, 5460, 5478, 5824, 5882, 5974, 6278, 6390, 6408, 6452, 6563, 6707, 6882, 6927, 7010, 7150, 7544, 7593, 7601, 7679, 7682, 7768, 7811, 7826, 7977, 8013, 8096, 8113, 8141, 8263, 8452, 8466, 8479, 8647, 8669, 8695, 8915, 9042, 9044, 9071, 9399, 9418, 9560, 9834, 9914, 9981, 10026, 10375, 10564, 10631, 10965, 11113, 11315, 11383, 11658, 12047, 12094, 12206, 12234, 12284, 12313, 12643, 12775, 12814, 12827, 12912, 13034, 13065, 13153, 13241, 13428, 13508, 13642, 13986, 14055, 14115, 14157, 14163, 14415, 14418, 14448, 14763, 14814, 14912, 14913, 14958, 14981, 15106, 15200, 15289, 1531

### Normalization of Training Data

In [4]:
import numpy as np
from sklearn.preprocessing import MinMaxScaler

# Indices of price-related columns
price_columns_indices = [0, 1, 2, 3]
ma_columns_indices = [9, 10, 11, 12]

# Extract data shapes
num_sequences, num_timesteps, num_features = data_sequences.shape

# Calculate log transformation for price-related features
price_data = data_sequences[:, :, price_columns_indices]
log_price_data = np.log(price_data + 1e-8)

# Replace the original price-related features with the log values
data_sequences[:, :, price_columns_indices] = log_price_data

# Calculate percentage away from the Close price for moving averages
close_price_data = data_sequences[:, :, 3].reshape(num_sequences, num_timesteps, 1)  # Close price at index 3
ma_data = data_sequences[:, :, ma_columns_indices]

# Avoid division by zero by adding epsilon
epsilon = 1e-8
percentage_away_from_close = (close_price_data - ma_data) / (ma_data + epsilon)

# Replace the original moving average features with the percentage away values
data_sequences[:, :, ma_columns_indices] = percentage_away_from_close

# Handle infinite values by replacing them with NaNs and then replacing NaNs with zero
data_sequences = np.nan_to_num(data_sequences, nan=0.0, posinf=0.0, neginf=0.0)

# Clip extreme values to avoid large outliers
data_sequences = np.clip(data_sequences, -1e3, 1e3)

# Normalize the price-related features together
price_scaler = MinMaxScaler(feature_range=(-1, 1))

# Reshape the price-related features to fit the scaler
original_shape = data_sequences[:, :, price_columns_indices].shape
reshaped_data = data_sequences[:, :, price_columns_indices].reshape(-1, len(price_columns_indices))

# Fit and transform the price-related features
normalized_price_data = price_scaler.fit_transform(reshaped_data)

# Reshape back to the original shape
normalized_price_data = normalized_price_data.reshape(original_shape)

# Replace the original price-related features with the normalized ones
data_sequences[:, :, price_columns_indices] = normalized_price_data

# Normalize the remaining features individually
for feature_index in range(num_features):
    if feature_index not in price_columns_indices and feature_index not in ma_columns_indices:
        # Initialize a new scaler for each feature
        feature_scaler = MinMaxScaler()

        # Extract the feature data
        feature_data = data_sequences[:, :, feature_index].reshape(-1, 1)

        # Fit and transform the scaler
        normalized_feature_data = feature_scaler.fit_transform(feature_data)

        # Reshape back to the original shape
        normalized_feature_data = normalized_feature_data.reshape(num_sequences, num_timesteps)

        # Replace the original feature with the normalized one
        data_sequences[:, :, feature_index] = normalized_feature_data

# Print normalized data sequences to check
print(f"Normalized Data Seq Min: {np.min(data_sequences)}")
print(f"Normalized Data Seq Max: {np.max(data_sequences)}")

# Make the labels a binary decision, rather than a profit
min_profit = 0.1  # implies a good decision is a breakout that produces more than min_profit (*100 for percent, 0.2 = 20%)

data_labels = (data_labels > min_profit).astype(int)

print(f"Data Labels for > {min_profit*100}% Min: {np.min(data_labels)}")
print(f"Data Labels for > {min_profit*100}% Max: {np.max(data_labels)}")

# Count how many labels are 1 and how many are 0
num_ones = np.sum(data_labels)
num_zeros = len(data_labels) - num_ones

print(f"Number of labels that are 1: {num_ones}")
print(f"Number of labels that are 0: {num_zeros}")
print(f"Probability of randomly selecting a stock making {min_profit*100}% is {num_ones/(num_ones+num_zeros)*100}%")


Normalized Data Seq Min: -1000.0
Normalized Data Seq Max: 1000.0
Data Labels for > 10.0% Min: 0
Data Labels for > 10.0% Max: 1
Number of labels that are 1: 21254
Number of labels that are 0: 93746
Probability of randomly selecting a stock making 10.0% is 18.481739130434782%


## Model -> Hyperparameter Tuning

In [5]:
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Input, Attention, Flatten, Dropout, Conv1D, MaxPooling1D, GRU, BatchNormalization
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
from tensorflow.keras.regularizers import l1, l2
from sklearn.model_selection import train_test_split, KFold
from sklearn.utils.class_weight import compute_class_weight
import keras_tuner as kt
import numpy as np
from imblearn.over_sampling import SMOTE
from sklearn.metrics import confusion_matrix

# Check if TensorFlow is using GPU
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

# Ensure data_sequences and data_labels are already normalized and available
print("Shape of data_sequences:", data_sequences.shape)
print("Shape of data_labels:", data_labels.shape)

# Indices of the columns to be removed
columns_to_remove = [0, 3, 5, 6, 13]

# Remove specified columns
data_sequences = np.delete(data_sequences, columns_to_remove, axis=2)

# Verify the shapes before modification
print(f"Modified data_sequences shape: {data_sequences.shape}")

# Split data into training, validation, and test sets
test_size = 0.2
val_size = 0.2
X, X_test, y, y_test = train_test_split(data_sequences, data_labels, test_size=test_size, random_state=42)
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=val_size, random_state=42)

print("Shape of X_train:", X_train.shape)
print("Shape of X_val:", X_val.shape)
print("Shape of X_test:", X_test.shape)
print("Shape of y_train:", y_train.shape)
print("Shape of y_val:", y_val.shape)
print("Shape of y_test:", y_test.shape)

# Oversample the minority class using SMOTE
smote = SMOTE(random_state=42)
X_train_reshaped = X_train.reshape((X_train.shape[0], -1))  # Reshape for SMOTE
X_train_resampled, y_train_resampled = smote.fit_resample(X_train_reshaped, y_train)
X_train_resampled = X_train_resampled.reshape((X_train_resampled.shape[0], X_train.shape[1], X_train.shape[2]))

# Calculate class weights
class_weights = compute_class_weight(class_weight='balanced', classes=np.unique(y_train_resampled), y=y_train_resampled)
class_weights = {i: class_weights[i] for i in range(len(class_weights))}

print("Class weights:", class_weights)

class AccuracyReward(tf.keras.metrics.Metric):
    def __init__(self, name='accuracy_reward', **kwargs):
        super(AccuracyReward, self).__init__(name=name, **kwargs)
        self.tp = self.add_weight(name='tp', initializer='zeros')
        self.fp = self.add_weight(name='fp', initializer='zeros')
        self.total = self.add_weight(name='total', initializer='zeros')

    def update_state(self, y_true, y_pred, sample_weight=None):
        y_pred = tf.cast(y_pred > 0.5, tf.int32)
        y_true = tf.cast(y_true, tf.int32)
        
        tp = tf.reduce_sum(tf.cast((y_true == 1) & (y_pred == 1), tf.float32))
        fp = tf.reduce_sum(tf.cast((y_true == 0) & (y_pred == 1), tf.float32))
        total = tf.cast(tf.size(y_true), tf.float32)
        
        self.tp.assign_add(tp)
        self.fp.assign_add(fp)
        self.total.assign_add(total)

    def result(self):
        batting_average = self.tp / (self.tp + self.fp + tf.keras.backend.epsilon())
        opportunities = self.tp + self.fp
        accuracy_reward = batting_average * (opportunities / self.total)
        return accuracy_reward

    def reset_state(self):
        self.tp.assign(0)
        self.fp.assign(0)
        self.total.assign(0)

# Define the custom callback to monitor predictions and display custom metrics
class CustomMetricsCallback(tf.keras.callbacks.Callback):
    def __init__(self, validation_data, model_stop_on_one_outcome=True):
        super(CustomMetricsCallback, self).__init__()
        self.validation_data = validation_data
        self.model_stop_on_one_outcome = model_stop_on_one_outcome

    def on_epoch_end(self, epoch, logs=None):
        X_val, y_val = self.validation_data
        y_pred = (self.model.predict(X_val) > 0.5).astype(int).flatten()
        y_true = y_val.flatten()

        # Check if all predictions are the same
        if np.all(y_pred == y_pred[0]):
            print(f"Stopping early: All predictions are the same at epoch {epoch + 1}.")
            self.model.stop_training = True
            return

        tn, fp, fn, tp = confusion_matrix(y_true, y_pred).ravel()

        total = len(y_true)
        tp_percent = tp / total * 100
        tn_percent = tn / total * 100
        fp_percent = fp / total * 100
        fn_percent = fn / total * 100

        batting_average = tp_percent / (tp_percent + fp_percent) if (tp_percent + fp_percent) != 0 else np.nan

        # Calculate opportunities taken
        opportunities = tp + fp
        accuracy_reward = batting_average * (opportunities / total)

        print(f"Epoch {epoch + 1}:")
        print(f"True Positives: {tp_percent:.2f}%")
        print(f"True Negatives: {tn_percent:.2f}%")
        print(f"False Positives: {fp_percent:.2f}%")
        print(f"False Negatives: {fn_percent:.2f}%")
        print(f"Batting Average: {batting_average:.2f}")
        print(f"Opportunities Taken: {opportunities}")
        print(f"Accuracy Reward: {accuracy_reward:.2f}")

        logs['tp_percent'] = tp_percent
        logs['tn_percent'] = tn_percent
        logs['fp_percent'] = fp_percent
        logs['fn_percent'] = fn_percent
        logs['batting_average'] = batting_average
        logs['opportunities'] = opportunities
        logs['accuracy_reward'] = accuracy_reward

# Define the hypermodel for Keras Tuner
def build_model(hp):
    input_layer = Input(shape=(X_train.shape[1], X_train.shape[2]))

    # Add a Conv1D layer for feature extraction, include as hyperparameter
    if hp.Boolean('use_conv'):
        conv_out = Conv1D(filters=hp.Int('conv_filters', min_value=32, max_value=128, step=32), 
                          kernel_size=hp.Int('conv_kernel_size', min_value=3, max_value=7, step=2), 
                          activation='relu')(input_layer)
        conv_out = MaxPooling1D(pool_size=2)(conv_out)
    else:
        conv_out = input_layer

    # Add LSTM layers
    lstm_out = LSTM(
        units=hp.Int('lstm_units_l1', min_value=64, max_value=256, step=32), 
        return_sequences=True,
        dropout=hp.Float('dropout_rate', min_value=0.1, max_value=0.5, step=0.1),
        kernel_regularizer=l2(hp.Float('l2_regularization', min_value=1e-5, max_value=1e-2, sampling='log')),
        kernel_initializer='glorot_uniform',
        recurrent_initializer='orthogonal',
        bias_initializer='zeros'
    )(conv_out)

    lstm_out = LSTM(
        units=hp.Int('lstm_units_l2', min_value=64, max_value=256, step=32), 
        return_sequences=True,
        dropout=hp.Float('dropout_rate', min_value=0.1, max_value=0.5, step=0.1),
        kernel_regularizer=l2(hp.Float('l2_regularization', min_value=1e-5, max_value=1e-2, sampling='log')),
        kernel_initializer='glorot_uniform',
        recurrent_initializer='orthogonal',
        bias_initializer='zeros'
    )(lstm_out)

    lstm_out = LSTM(
        units=hp.Int('lstm_units_l3', min_value=32, max_value=128, step=32),
        return_sequences=True,
        dropout=hp.Float('dropout_rate', min_value=0.1, max_value=0.5, step=0.1),
        kernel_regularizer=l2(hp.Float('l2_regularization', min_value=1e-5, max_value=1e-2, sampling='log')),
        kernel_initializer='glorot_uniform',
        recurrent_initializer='orthogonal',
        bias_initializer='zeros'
    )(lstm_out)

    # Add Attention layer
    attention = Attention()([lstm_out, lstm_out])  
    attention_flatten = Flatten()(attention)

    dense_out = Dense(
        units=hp.Int('dense_units', min_value=32, max_value=128, step=32), 
        activation='relu',
        kernel_initializer='glorot_uniform',
        bias_initializer='zeros'
    )(attention_flatten)
    
    dense_out = Dropout(
        rate=hp.Float('dropout_rate', min_value=0.1, max_value=0.5, step=0.1)
    )(dense_out)
    
    output_layer = Dense(
        1, 
        activation='sigmoid',
        kernel_initializer='glorot_uniform',
        bias_initializer='zeros'
    )(dense_out)

    model = tf.keras.Model(inputs=input_layer, outputs=output_layer)

    learning_rate = hp.Float('learning_rate', min_value=1e-5, max_value=1e-2, sampling='log')
    decay_steps = 1000
    decay_rate = hp.Float('decay_rate', min_value=0.9, max_value=0.999, sampling='log')
    
    lr_schedule = tf.keras.optimizers.schedules.ExponentialDecay(
        initial_learning_rate=learning_rate,
        decay_steps=decay_steps,
        decay_rate=decay_rate
    )
    
    model.compile(
        optimizer=tf.keras.optimizers.Adam(learning_rate=lr_schedule),
        loss='binary_crossentropy',
        metrics=['accuracy', AccuracyReward()]
    )
    
    return model

# Use BayesianOptimization for hyperparameter tuning
tuner = kt.BayesianOptimization(
    hypermodel=build_model,
    objective=kt.Objective("val_accuracy_reward", direction="max"),
    max_trials=20,  # Increased number of trials
    executions_per_trial=1,
    directory='my_dir',
    project_name='hyperparameter_tuning'
)

early_stopping = EarlyStopping(monitor='val_loss', patience=10, min_delta=1e-5, restore_best_weights=True)

# Perform K-Fold Cross-Validation
k = 5  # Number of folds
kf = KFold(n_splits=k, shuffle=True, random_state=42)

for fold, (train_index, val_index) in enumerate(kf.split(X_train_resampled)):
    print(f"Fold {fold+1}/{k}")
    X_train_fold, X_val_fold = X_train_resampled[train_index], X_train_resampled[val_index]
    y_train_fold, y_val_fold = y_train_resampled[train_index], y_train_resampled[val_index]
    
    # Oversample the minority class in the training fold using SMOTE
    X_train_reshaped = X_train_fold.reshape((X_train_fold.shape[0], -1))
    smote = SMOTE(random_state=42)
    X_train_resampled_fold, y_train_resampled_fold = smote.fit_resample(X_train_reshaped, y_train_fold)
    X_train_resampled_fold = X_train_resampled_fold.reshape((X_train_resampled_fold.shape[0], X_train_fold.shape[1], X_train_fold.shape[2]))

    # Calculate class weights for the current fold
    class_weights_fold = compute_class_weight(class_weight='balanced', classes=np.unique(y_train_resampled_fold), y=y_train_resampled_fold)
    class_weights_fold = {i: class_weights_fold[i] for i in range(len(class_weights_fold))}

    # Create unique checkpoint path
    checkpoint_path = f'best_model_fold_{fold+1}.keras'
    model_checkpoint = ModelCheckpoint(checkpoint_path, monitor='val_loss', save_best_only=True)
    
    # Perform hyperparameter search with the custom callback
    tuner.search(X_train_resampled_fold, y_train_resampled_fold, epochs=50, validation_data=(X_val_fold, y_val_fold), callbacks=[early_stopping, model_checkpoint, CustomMetricsCallback(validation_data=(X_val_fold, y_val_fold))], class_weight=class_weights_fold)

# Extract the best hyperparameters
best_hps = tuner.get_best_hyperparameters(num_trials=1)[0]

print(f"""
Should you use conv? {best_hps.get('use_conv')}
The optimal number of conv1d filters is {best_hps.get('conv_filters')}.
The optimal number of units in the first LSTM layer is {best_hps.get('lstm_units_l1')}.
The optimal number of units in the second LSTM layer is {best_hps.get('lstm_units_l2')}.
The optimal number of units in the third LSTM layer is {best_hps.get('lstm_units_l3')}.
The optimal number of units in the dense layer is {best_hps.get('dense_units')}.
The optimal dropout rate is {best_hps.get('dropout_rate')}.
The optimal L2 regularization rate is {best_hps.get('l2_regularization')}.
The optimal initial learning rate is {best_hps.get('learning_rate')}.
The optimal decay rate is {best_hps.get('decay_rate')}.
""")

# Build the final model using the best hyperparameters
model = tuner.hypermodel.build(best_hps)

# Train the final model on the entire training data
model.fit(X_train_resampled, y_train_resampled, epochs=50, validation_data=(X_val, y_val), callbacks=[early_stopping, ModelCheckpoint('best_model_final.keras', monitor='val_loss', save_best_only=True), CustomMetricsCallback(validation_data=(X_val, y_val))], class_weight=class_weights)

Trial 20 Complete [00h 01m 48s]
val_accuracy_reward: 0.4927482008934021

Best val_accuracy_reward So Far: 0.49390295147895813
Total elapsed time: 08h 19m 35s
Fold 2/5
Fold 3/5
Fold 4/5
Fold 5/5

The optimal number of units in the first LSTM layer is 96.
The optimal number of units in the second LSTM layer is 256.
The optimal number of units in the third LSTM layer is 32.
The optimal number of units in the dense layer is 32.
The optimal dropout rate is 0.1.
The optimal L2 regularization rate is 1.6599485653646474e-05.
The optimal initial learning rate is 0.004138474276706615.
The optimal decay rate is 0.9592295700572687.

Epoch 1/50


W0000 00:00:1718632636.280709  557502 op_level_cost_estimator.cc:699] Error in PredictCost() for the op: op: "Softmax" attr { key: "T" value { type: DT_FLOAT } } inputs { dtype: DT_FLOAT shape { unknown_rank: true } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA GeForce RTX 3070 Ti Laptop GPU" frequency: 1410 num_cores: 46 environment { key: "architecture" value: "8.6" } environment { key: "cuda" value: "12000" } environment { key: "cudnn" value: "8800" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 4194304 shared_memory_size_per_multiprocessor: 102400 memory_size: 5834276864 bandwidth: 448064000 } outputs { dtype: DT_FLOAT shape { unknown_rank: true } }




W0000 00:00:1718632699.649812  557502 op_level_cost_estimator.cc:699] Error in PredictCost() for the op: op: "Softmax" attr { key: "T" value { type: DT_FLOAT } } inputs { dtype: DT_FLOAT shape { unknown_rank: true } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA GeForce RTX 3070 Ti Laptop GPU" frequency: 1410 num_cores: 46 environment { key: "architecture" value: "8.6" } environment { key: "cuda" value: "12000" } environment { key: "cudnn" value: "8800" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 4194304 shared_memory_size_per_multiprocessor: 102400 memory_size: 5834276864 bandwidth: 448064000 } outputs { dtype: DT_FLOAT shape { unknown_rank: true } }


  8/575 [..............................] - ETA: 4s  

W0000 00:00:1718632705.287915  557502 op_level_cost_estimator.cc:699] Error in PredictCost() for the op: op: "Softmax" attr { key: "T" value { type: DT_FLOAT } } inputs { dtype: DT_FLOAT shape { unknown_rank: true } } device { type: "GPU" vendor: "NVIDIA" model: "NVIDIA GeForce RTX 3070 Ti Laptop GPU" frequency: 1410 num_cores: 46 environment { key: "architecture" value: "8.6" } environment { key: "cuda" value: "12000" } environment { key: "cudnn" value: "8800" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 4194304 shared_memory_size_per_multiprocessor: 102400 memory_size: 5834276864 bandwidth: 448064000 } outputs { dtype: DT_FLOAT shape { unknown_rank: true } }


Stopping early: All predictions are the same at epoch 1.


<keras.src.callbacks.History at 0x7f4681c0d5d0>

In [6]:
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Input, Attention, Flatten, Dropout, Conv1D, MaxPooling1D, GRU, BatchNormalization
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
from tensorflow.keras.regularizers import l1, l2
from sklearn.model_selection import train_test_split, KFold
from sklearn.utils.class_weight import compute_class_weight
import keras_tuner as kt
import numpy as np
from imblearn.over_sampling import SMOTE
from sklearn.metrics import confusion_matrix
import json

# Load trials data
with open('trial_4.json') as f:
    trial_4 = json.load(f)

with open('trial_5.json') as f:
    trial_5 = json.load(f)

# Add more trials if needed
# with open('trial_X.json') as f:
#     trial_X = json.load(f)

# Function to get hyperparameters from trial
def get_hyperparameters(trial):
    return {
        "use_conv": trial['hyperparameters']['values']['use_conv'],
        "lstm_units_l1": trial['hyperparameters']['values']['lstm_units_l1'],
        "dropout_rate": trial['hyperparameters']['values']['dropout_rate'],
        "l2_regularization": trial['hyperparameters']['values']['l2_regularization'],
        "lstm_units_l2": trial['hyperparameters']['values']['lstm_units_l2'],
        "lstm_units_l3": trial['hyperparameters']['values']['lstm_units_l3'],
        "dense_units": trial['hyperparameters']['values']['dense_units'],
        "learning_rate": trial['hyperparameters']['values']['learning_rate'],
        "decay_rate": trial['hyperparameters']['values']['decay_rate'],
        "conv_filters": trial['hyperparameters']['values']['conv_filters'],
        "conv_kernel_size": trial['hyperparameters']['values']['conv_kernel_size']
    }

# Select trial to run
trial_to_run = 4
if trial_to_run == 4:
    hyperparameters = get_hyperparameters(trial_4)
elif trial_to_run == 5:
    hyperparameters = get_hyperparameters(trial_5)
# Add more trials if needed
# elif trial_to_run == X:
#     hyperparameters = get_hyperparameters(trial_X)

# Check if TensorFlow is using GPU
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

# Ensure data_sequences and data_labels are already normalized and available
print("Shape of data_sequences:", data_sequences.shape)
print("Shape of data_labels:", data_labels.shape)

# Indices of the columns to be removed
columns_to_remove = [0, 3, 5, 6, 13]

# Remove specified columns
#data_sequences = np.delete(data_sequences, columns_to_remove, axis=2)

# Verify the shapes before modification
print(f"Modified data_sequences shape: {data_sequences.shape}")

# Split data into training, validation, and test sets
test_size = 0.2
val_size = 0.2
X, X_test, y, y_test = train_test_split(data_sequences, data_labels, test_size=test_size, random_state=42)
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=val_size, random_state=42)

print("Shape of X_train:", X_train.shape)
print("Shape of X_val:", X_val.shape)
print("Shape of X_test:", X_test.shape)
print("Shape of y_train:", y_train.shape)
print("Shape of y_val:", y_val.shape)
print("Shape of y_test:", y_test.shape)

# Oversample the minority class using SMOTE
smote = SMOTE(random_state=42)
X_train_reshaped = X_train.reshape((X_train.shape[0], -1))  # Reshape for SMOTE
X_train_resampled, y_train_resampled = smote.fit_resample(X_train_reshaped, y_train)
X_train_resampled = X_train_resampled.reshape((X_train_resampled.shape[0], X_train.shape[1], X_train.shape[2]))

# Calculate class weights
class_weights = compute_class_weight(class_weight='balanced', classes=np.unique(y_train_resampled), y=y_train_resampled)
class_weights = {i: class_weights[i] for i in range(len(class_weights))}

print("Class weights:", class_weights)

class AccuracyReward(tf.keras.metrics.Metric):
    def __init__(self, name='accuracy_reward', **kwargs):
        super(AccuracyReward, self).__init__(name=name, **kwargs)
        self.tp = self.add_weight(name='tp', initializer='zeros')
        self.fp = self.add_weight(name='fp', initializer='zeros')
        self.total = self.add_weight(name='total', initializer='zeros')

    def update_state(self, y_true, y_pred, sample_weight=None):
        y_pred = tf.cast(y_pred > 0.5, tf.int32)
        y_true = tf.cast(y_true, tf.int32)
        
        tp = tf.reduce_sum(tf.cast((y_true == 1) & (y_pred == 1), tf.float32))
        fp = tf.reduce_sum(tf.cast((y_true == 0) & (y_pred == 1), tf.float32))
        total = tf.cast(tf.size(y_true), tf.float32)
        
        self.tp.assign_add(tp)
        self.fp.assign_add(fp)
        self.total.assign_add(total)

    def result(self):
        batting_average = self.tp / (self.tp + self.fp + tf.keras.backend.epsilon())
        opportunities = self.tp + self.fp
        optimal_trades = 4000  # Optimal number of trades
        deviation = tf.abs(opportunities - optimal_trades)
        reward_factor = tf.exp(-deviation / optimal_trades)
        accuracy_reward = batting_average * reward_factor
        return accuracy_reward

    def reset_state(self):
        self.tp.assign(0)
        self.fp.assign(0)
        self.total.assign(0)

# Define the custom callback to monitor predictions and display custom metrics
class CustomMetricsCallback(tf.keras.callbacks.Callback):
    def __init__(self, validation_data, model_stop_on_one_outcome=True):
        super(CustomMetricsCallback, self).__init__()
        self.validation_data = validation_data
        self.model_stop_on_one_outcome = model_stop_on_one_outcome

    def on_epoch_end(self, epoch, logs=None):
        X_val, y_val = self.validation_data
        y_pred = (self.model.predict(X_val) > 0.5).astype(int).flatten()
        y_true = y_val.flatten()

        # Check if all predictions are the same
        if np.all(y_pred == y_pred[0]):
            print(f"Stopping early: All predictions are the same at epoch {epoch + 1}.")
            self.model.stop_training = True
            return

        tn, fp, fn, tp = confusion_matrix(y_true, y_pred).ravel()

        total = len(y_true)
        tp_percent = tp / total * 100
        tn_percent = tn / total * 100
        fp_percent = fp / total * 100
        fn_percent = fn / total * 100

        batting_average = tp_percent / (tp_percent + fp_percent) if (tp_percent + fp_percent) != 0 else np.nan

        # Calculate opportunities taken
        opportunities = tp + fp
        accuracy_reward = batting_average * tf.exp(-tf.abs(opportunities - 4000) / 4000)  # Adjusted accuracy reward

        print(f"Epoch {epoch + 1}:")
        print(f"True Positives: {tp_percent:.2f}%")
        print(f"True Negatives: {tn_percent:.2f}%")
        print(f"False Positives: {fp_percent:.2f}%")
        print(f"False Negatives: {fn_percent:.2f}%")
        print(f"Batting Average: {batting_average:.2f}")
        print(f"Opportunities Taken: {opportunities}")
        print(f"Accuracy Reward: {accuracy_reward:.2f}")

        logs['tp_percent'] = tp_percent
        logs['tn_percent'] = tn_percent
        logs['fp_percent'] = fp_percent
        logs['fn_percent'] = fn_percent
        logs['batting_average'] = batting_average
        logs['opportunities'] = opportunities
        logs['accuracy_reward'] = accuracy_reward

# Build the model with the selected hyperparameters
def build_model(hyperparameters):
    input_layer = Input(shape=(X_train.shape[1], X_train.shape[2]))

    if hyperparameters["use_conv"]:
        conv_out = Conv1D(filters=hyperparameters["conv_filters"], 
                          kernel_size=hyperparameters["conv_kernel_size"], 
                          activation='relu')(input_layer)
        conv_out = MaxPooling1D(pool_size=2)(conv_out)
    else:
        conv_out = input_layer

    lstm_out = LSTM(
        units=hyperparameters["lstm_units_l1"], 
        return_sequences=True,
        dropout=hyperparameters["dropout_rate"],
        kernel_regularizer=l2(hyperparameters["l2_regularization"]),
        kernel_initializer='glorot_uniform',
        recurrent_initializer='orthogonal',
        bias_initializer='zeros'
    )(conv_out)

    lstm_out = LSTM(
        units=hyperparameters["lstm_units_l2"], 
        return_sequences=True,
        dropout=hyperparameters["dropout_rate"],
        kernel_regularizer=l2(hyperparameters["l2_regularization"]),
        kernel_initializer='glorot_uniform',
        recurrent_initializer='orthogonal',
        bias_initializer='zeros'
    )(lstm_out)

    lstm_out = LSTM(
        units=hyperparameters["lstm_units_l3"], 
        return_sequences=True,
        dropout=hyperparameters["dropout_rate"],
        kernel_regularizer=l2(hyperparameters["l2_regularization"]),
        kernel_initializer='glorot_uniform',
        recurrent_initializer='orthogonal',
        bias_initializer='zeros'
    )(lstm_out)

    attention = Attention()([lstm_out, lstm_out])  
    attention_flatten = Flatten()(attention)

    dense_out = Dense(
        units=hyperparameters["dense_units"], 
        activation='relu',
        kernel_initializer='glorot_uniform',
        bias_initializer='zeros'
    )(attention_flatten)
    
    dense_out = Dropout(
        rate=hyperparameters["dropout_rate"]
    )(dense_out)
    
    output_layer = Dense(
        1, 
        activation='sigmoid',
        kernel_initializer='glorot_uniform',
        bias_initializer='zeros'
    )(dense_out)

    model = tf.keras.Model(inputs=input_layer, outputs=output_layer)

    decay_steps = 1000
    lr_schedule = tf.keras.optimizers.schedules.ExponentialDecay(
        initial_learning_rate=hyperparameters["learning_rate"],
        decay_steps=decay_steps,
        decay_rate=hyperparameters["decay_rate"]
    )
    
    model.compile(
        optimizer=tf.keras.optimizers.Adam(learning_rate=lr_schedule),
        loss='binary_crossentropy',
        metrics=['accuracy', AccuracyReward()]
    )
    
    return model

# Build the final model using the selected hyperparameters
model = build_model(hyperparameters)

# Train the final model on the entire training data
model.fit(X_train_resampled, y_train_resampled, epochs=200, validation_data=(X_val, y_val), callbacks=[EarlyStopping(monitor='val_loss', patience=10, min_delta=1e-5, restore_best_weights=True), ModelCheckpoint('best_model_final.keras', monitor='val_loss', save_best_only=True), CustomMetricsCallback(validation_data=(X_val, y_val))], class_weight=class_weights)


FileNotFoundError: [Errno 2] No such file or directory: 'trial_4.json'

# Tuning results

| Trial Number | Validation Accuracy | Batting Average | Validation Accuracy Reward |
|--------------|---------------------|-----------------|----------------------------|
| 00           | 0.613756            | 0.712339        | 0.382110                   |
| 01           | 0.516621            | 0.505742        | 0.463459                   |
| 02           | 0.733901            | 0.712339        | 0.382110                   |
| 03           | 0.523929            | 0.509944        | 0.457237                   |
| 04           | 0.757245            | 0.770901        | 0.357346                   |
| 05           | 0.765639            | 0.751111        | 0.388165                   |
| 06           | 0.586869            | 0.565           | 0.3460                     |
| 07           | 0.493903            | Not available   | 0.225436                   |
| 08           | 0.493903            | Not available   | 0.236490                   |
| 09           | 0.765639            | 0.751111        | 0.388165                   |
| 10           | 0.582828            | 0.5708          | 0.3042                     |
| 11           | 0.589697            | 0.5781          | 0.3052                     |
| 12           | 0.589026            | 0.5737          | 0.3182                     |
| 13           | 0.748241            | 0.509944        | 0.457237                   |
| 14           | 0.757733            | 0.357346        | 0.357346                   |
| 15           | 0.523929            | 0.509944        | 0.457237                   |
| 16           | 0.586869            | 0.5650          | 0.3460                     |
| 17           | 0.493903            | Not available   | 0.225436                   |
| 18           | 0.586869            | 0.5650          | 0.3460                     |


# Model -> Training