<a href="https://colab.research.google.com/github/sachinbabuantony/AOG/blob/main/informatonics_deep_filter.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Step 1: Load Your Data

First, we'll load the provided CSV file into a pandas DataFrame. This allows us to inspect the data and prepare it for further processing.

In [1]:
import pandas as pd

# Load the CSV file into a DataFrame
df = pd.read_csv('/content/Asia_AMD_PO3_60%_win_rate_CME_MINI_MNQ1!_2026-02-28 (1).csv')

# Display the first 5 rows to get a glimpse of the data
display(df.head())

Unnamed: 0,Trade #,Type,Date and time,Signal,Price USD,Position size (qty),Position size (value),Net P&L USD,Net P&L %,Favorable excursion USD,Favorable excursion %,Adverse excursion USD,Adverse excursion %,Cumulative P&L USD,Cumulative P&L %
0,1,Exit long,2020-03-03 14:54,Long Exit,8763.75,1,17802.0,-274.5,-1.54,32.5,0.18,-274.5,-1.54,-274.5,-0.03
1,1,Entry long,2020-03-03 08:42,Long,8901.0,1,17802.0,-274.5,-1.54,32.5,0.18,-274.5,-1.54,-274.5,-0.03
2,2,Exit long,2020-03-09 16:44,Long Exit,8112.5,1,16368.0,-143.0,-0.87,112.5,0.69,-143.0,-0.87,-417.5,-0.04
3,2,Entry long,2020-03-09 15:36,Long,8184.0,1,16368.0,-143.0,-0.87,112.5,0.69,-143.0,-0.87,-417.5,-0.04
4,3,Exit short,2020-03-10 18:46,Short Exit,8181.25,1,16062.5,-300.0,-1.87,199.5,1.24,-300.0,-1.87,-717.5,-0.07


### Step 2: Preprocess for Deep Learning

We'll prepare the data by defining our target variable (win/loss), selecting relevant features, and then scaling these features to ensure optimal performance for our deep learning model. First, let's convert the `Net P&L USD` into a binary `outcome` (1 for win, 0 for loss).

In [2]:
from sklearn.preprocessing import StandardScaler

# Create a binary target variable: 1 for win (P&L > 0), 0 for loss (P&L <= 0)
df['outcome'] = (df['Net P&L USD'] > 0).astype(int)

# Define feature columns - initially selecting numerical columns that might influence trade outcome
feature_cols = [
    'Price USD',
    'Position size (qty)',
    'Position size (value)',
    'Favorable excursion USD',
    'Favorable excursion %',
    'Adverse excursion USD',
    'Adverse excursion %'
]

# Extract features and target
X = df[feature_cols].values
y = df['outcome'].values

# Initialize and apply StandardScaler to features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

print(f"Shape of scaled features (X_scaled): {X_scaled.shape}")
print(f"Shape of target variable (y): {y.shape}")
print("First 5 rows of scaled features:\n", X_scaled[:5])
print("First 5 values of target variable:\n", y[:5])

Shape of scaled features (X_scaled): (1868, 7)
Shape of target variable (y): (1868,)
First 5 rows of scaled features:
 [[-1.68005432  0.         -1.6486044  -0.81089023 -0.47532077 -2.42176518
  -4.46926255]
 [-1.6489925   0.         -1.6486044  -0.81089023 -0.47532077 -2.42176518
  -4.46926255]
 [-1.82744237  0.         -1.81081214  0.17065358  1.19575677 -0.65848325
  -2.032007  ]
 [-1.8112608   0.         -1.81081214  0.17065358  1.19575677 -0.65848325
  -2.032007  ]
 [-1.81188317  0.         -1.84536895  1.23808248  2.99789922 -2.76369438
  -5.66970185]]
First 5 values of target variable:
 [0 0 0 0 0]


### Step 3: Build an LSTM/GRU Model (TensorFlow)

Now, let's construct our deep learning model using TensorFlow. We'll implement a GRU-based architecture, which is well-suited for sequential data like time series, and compile it for binary classification.

In [3]:
import tensorflow as tf
from tensorflow.keras import layers

# Define constants for sequence length and number of features
# These will be determined by our sequence windowing, which we'll set up next.
# For now, we'll use placeholder values. X_scaled.shape[1] gives us the number of features.
n_features = X_scaled.shape[1] # Number of features is the second dimension of X_scaled
SEQ_LEN = 1 # Placeholder, this will be adjusted when we create sequences

model = tf.keras.Sequential([
    layers.GRU(64, return_sequences=True, input_shape=(SEQ_LEN, n_features)),
    layers.GRU(32),
    layers.Dense(1, activation='sigmoid')
])

model.compile(
    loss='binary_crossentropy',
    optimizer='adam',
    metrics=['accuracy']
)

model.summary()

  super().__init__(**kwargs)


# Task
Prepare the `X_scaled` features and `y` target variable for a GRU model by transforming them into sequential data with a `SEQ_LEN` of 10. Then, split this sequential data into training and validation sets using an 80/20 ratio. Finally, train the already defined GRU model with the training data and evaluate its performance on the validation set.

## Prepare Data for Sequential Input

### Subtask:
Transform the scaled features `X_scaled` and target variable `y` into sequential data with a defined `SEQ_LEN` of 10, suitable for the GRU model. This involves creating a function for sequence windowing and applying it.


**Reasoning**:
I need to implement the `create_sequences` function as described in the instructions to transform the scaled features and target variable into sequential data suitable for an LSTM/GRU model. This involves defining `SEQ_LEN`, writing the function logic for windowing, and then applying it to the existing `X_scaled` and `y` data.



In [4]:
import numpy as np

# 1. Define the SEQ_LEN variable
SEQ_LEN = 10

# 2. Create a function named create_sequences
def create_sequences(data, targets, seq_len):
    X_seq = []
    y_seq = []
    # 4. Iterate through the data to create sequences
    for i in range(len(data) - seq_len):
        X_seq.append(data[i:i+seq_len])
        y_seq.append(targets[i+seq_len]) # Predict the next target value

    # 5. Convert both X_seq and y_seq lists into NumPy arrays
    return np.array(X_seq), np.array(y_seq)

# 6. Call the create_sequences function
X_seq, y_seq = create_sequences(X_scaled, y, SEQ_LEN)

# 7. Print the shapes of the newly created X_seq and y_seq
print(f"Shape of sequential features (X_seq): {X_seq.shape}")
print(f"Shape of sequential target variable (y_seq): {y_seq.shape}")
print("First 2 sequences of features:\n", X_seq[:2])
print("First 2 values of sequential target variable:\n", y_seq[:2])

Shape of sequential features (X_seq): (1858, 10, 7)
Shape of sequential target variable (y_seq): (1858,)
First 2 sequences of features:
 [[[-1.68005432  0.         -1.6486044  -0.81089023 -0.47532077
   -2.42176518 -4.46926255]
  [-1.6489925   0.         -1.6486044  -0.81089023 -0.47532077
   -2.42176518 -4.46926255]
  [-1.82744237  0.         -1.81081214  0.17065358  1.19575677
   -0.65848325 -2.032007  ]
  [-1.8112608   0.         -1.81081214  0.17065358  1.19575677
   -0.65848325 -2.032007  ]
  [-1.81188317  0.         -1.84536895  1.23808248  2.99789922
   -2.76369438 -5.66970185]
  [-1.84583051  0.         -1.84536895  1.23808248  2.99789922
   -2.76369438 -5.66970185]
  [-1.96328832  0.         -2.0102349   3.93732796  8.3387941
    0.24662345 -0.7588138 ]
  [-2.01075803  0.         -2.0102349   3.93732796  8.3387941
    0.24662345 -0.7588138 ]
  [-2.11152506  0.         -2.06481317  3.79623103  8.40432655
    0.89025488  0.4416255 ]
  [-2.06535667  0.         -2.06481317  3.7962

**Reasoning**:
The previous step successfully transformed the data into sequential format. Now, as per the task, the sequential data needs to be split into training and validation sets using an 80/20 ratio to prepare for model training.



In [5]:
from sklearn.model_selection import train_test_split

# Define the split ratio
TRAIN_SPLIT_RATIO = 0.8

# Calculate the number of samples for training
train_size = int(len(X_seq) * TRAIN_SPLIT_RATIO)

# Split the sequential data into training and validation sets
X_train, X_val = X_seq[:train_size], X_seq[train_size:]
y_train, y_val = y_seq[:train_size], y_seq[train_size:]

# Print the shapes of the training and validation sets
print(f"Shape of X_train: {X_train.shape}")
print(f"Shape of y_train: {y_train.shape}")
print(f"Shape of X_val: {X_val.shape}")
print(f"Shape of y_val: {y_val.shape}")

Shape of X_train: (1486, 10, 7)
Shape of y_train: (1486,)
Shape of X_val: (372, 10, 7)
Shape of y_val: (372,)


**Reasoning**:
The data has been preprocessed, transformed into sequential data, and split into training and validation sets. The next step is to train the previously defined GRU model using the `X_train` and `y_train` data, and then evaluate its performance on the `X_val` and `y_val` sets as per the main task.



In [6]:
import tensorflow as tf

# Update the model's input shape with the correct SEQ_LEN and n_features
# This needs to be done because the model was initialized with a placeholder SEQ_LEN=1
# It's good practice to rebuild the model if input shape changes significantly.
# However, since the model was already compiled, we can try to directly update if possible
# or re-initialize if the input shape was truly '1' before.
# Given the earlier output, n_features = X_scaled.shape[1] is 7 and SEQ_LEN is 10.
# The model was defined with layers.GRU(64, return_sequences=True, input_shape=(SEQ_LEN, n_features))
# Rebuilding for clarity and to ensure correct input shape is set.

# Assuming the model variable `model` from cell 1274d013 is still accessible and defined.
# If not, it would need to be redefined here, but the problem context states 'already defined GRU model'.
# Let's ensure the model's input layer reflects the actual SEQ_LEN.

# Re-define and compile the model to ensure correct input_shape is used
# Note: This is crucial as `SEQ_LEN` was a placeholder (`1`) when the model was first defined.
model = tf.keras.Sequential([
    tf.keras.layers.GRU(64, return_sequences=True, input_shape=(SEQ_LEN, n_features)),
    tf.keras.layers.GRU(32),
    tf.keras.layers.Dense(1, activation='sigmoid')
])

model.compile(
    loss='binary_crossentropy',
    optimizer='adam',
    metrics=['accuracy']
)

print("Model re-compiled with updated input shape. Summary:")
model.summary()

# Train the model
print("\nStarting model training...")
history = model.fit(
    X_train,
    y_train,
    epochs=20, # You can adjust the number of epochs
    batch_size=32, # You can adjust the batch size
    validation_data=(X_val, y_val),
    verbose=1
)

# Evaluate the model on the validation set
loss, accuracy = model.evaluate(X_val, y_val, verbose=0)
print(f"\nModel Evaluation on Validation Set:")
print(f"Loss: {loss:.4f}")
print(f"Accuracy: {accuracy:.4f}")

print("Training complete and evaluation performed.")

Model re-compiled with updated input shape. Summary:


  super().__init__(**kwargs)



Starting model training...
Epoch 1/20
[1m47/47[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 26ms/step - accuracy: 0.5156 - loss: 0.6982 - val_accuracy: 0.4892 - val_loss: 0.7041
Epoch 2/20
[1m47/47[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 15ms/step - accuracy: 0.6511 - loss: 0.6518 - val_accuracy: 0.5457 - val_loss: 0.6869
Epoch 3/20
[1m47/47[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 15ms/step - accuracy: 0.6607 - loss: 0.6244 - val_accuracy: 0.5726 - val_loss: 0.6917
Epoch 4/20
[1m47/47[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 15ms/step - accuracy: 0.7332 - loss: 0.5404 - val_accuracy: 0.5887 - val_loss: 0.7122
Epoch 5/20
[1m47/47[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 15ms/step - accuracy: 0.7098 - loss: 0.5138 - val_accuracy: 0.6425 - val_loss: 0.6759
Epoch 6/20
[1m47/47[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 15ms/step - accuracy: 0.7227 - loss: 0.4974 - val_accuracy: 0.6855 - val_loss: 0.6328
Epoch 

**Reasoning**:
The previous code executed successfully, and the model was trained and evaluated without any critical errors. The UserWarning regarding `input_shape` is a suggestion for best practice in Keras and does not prevent the model from functioning correctly. Therefore, no changes are needed for the code.



In [7]:
import tensorflow as tf

# Update the model's input shape with the correct SEQ_LEN and n_features
# This needs to be done because the model was initialized with a placeholder SEQ_LEN=1
# It's good practice to rebuild the model if input shape changes significantly.
# However, since the model was already compiled, we can try to directly update if possible
# or re-initialize if the input shape was truly '1' before.
# Given the earlier output, n_features = X_scaled.shape[1] is 7 and SEQ_LEN is 10.
# The model was defined with layers.GRU(64, return_sequences=True, input_shape=(SEQ_LEN, n_features))
# Rebuilding for clarity and to ensure correct input shape is set.

# Assuming the model variable `model` from cell 1274d013 is still accessible and defined.
# If not, it would need to be redefined here, but the problem context states 'already defined GRU model'.
# Let's ensure the model's input layer reflects the actual SEQ_LEN.

# Re-define and compile the model to ensure correct input_shape is used
# Note: This is crucial as `SEQ_LEN` was a placeholder (`1`) when the model was first defined.
model = tf.keras.Sequential([
    tf.keras.layers.GRU(64, return_sequences=True, input_shape=(SEQ_LEN, n_features)),
    tf.keras.layers.GRU(32),
    tf.keras.layers.Dense(1, activation='sigmoid')
])

model.compile(
    loss='binary_crossentropy',
    optimizer='adam',
    metrics=['accuracy']
)

print("Model re-compiled with updated input shape. Summary:")
model.summary()

# Train the model
print("\nStarting model training...")
history = model.fit(
    X_train,
    y_train,
    epochs=20, # You can adjust the number of epochs
    batch_size=32, # You can adjust the batch size
    validation_data=(X_val, y_val),
    verbose=1
)

# Evaluate the model on the validation set
loss, accuracy = model.evaluate(X_val, y_val, verbose=0)
print(f"\nModel Evaluation on Validation Set:")
print(f"Loss: {loss:.4f}")
print(f"Accuracy: {accuracy:.4f}")

print("Training complete and evaluation performed.")

Model re-compiled with updated input shape. Summary:


  super().__init__(**kwargs)



Starting model training...
Epoch 1/20
[1m47/47[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 26ms/step - accuracy: 0.5475 - loss: 0.6895 - val_accuracy: 0.4704 - val_loss: 0.7211
Epoch 2/20
[1m47/47[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 16ms/step - accuracy: 0.6340 - loss: 0.6585 - val_accuracy: 0.6317 - val_loss: 0.6699
Epoch 3/20
[1m47/47[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 16ms/step - accuracy: 0.6978 - loss: 0.6225 - val_accuracy: 0.5806 - val_loss: 0.6758
Epoch 4/20
[1m47/47[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 21ms/step - accuracy: 0.6939 - loss: 0.5455 - val_accuracy: 0.6371 - val_loss: 0.6788
Epoch 5/20
[1m47/47[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 26ms/step - accuracy: 0.7273 - loss: 0.4994 - val_accuracy: 0.6774 - val_loss: 0.6358
Epoch 6/20
[1m47/47[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 16ms/step - accuracy: 0.7162 - loss: 0.4712 - val_accuracy: 0.6586 - val_loss: 0.6237
Epoch 

**Reasoning**:
The previous code block executed successfully, training the GRU model and evaluating its performance on the validation set. The task is complete as per the instructions.



In [8]:
import tensorflow as tf

# Update the model's input shape with the correct SEQ_LEN and n_features
# This needs to be done because the model was initialized with a placeholder SEQ_LEN=1
# It's good practice to rebuild the model if input shape changes significantly.
# However, since the model was already compiled, we can try to directly update if possible
# or re-initialize if the input shape was truly '1' before.
# Given the earlier output, n_features = X_scaled.shape[1] is 7 and SEQ_LEN is 10.
# The model was defined with layers.GRU(64, return_sequences=True, input_shape=(SEQ_LEN, n_features))
# Rebuilding for clarity and to ensure correct input shape is set.

# Assuming the model variable `model` from cell 1274d013 is still accessible and defined.
# If not, it would need to be redefined here, but the problem context states 'already defined GRU model'.
# Let's ensure the model's input layer reflects the actual SEQ_LEN.

# Re-define and compile the model to ensure correct input_shape is used
# Note: This is crucial as `SEQ_LEN` was a placeholder (`1`) when the model was first defined.
model = tf.keras.Sequential([
    tf.keras.layers.GRU(64, return_sequences=True, input_shape=(SEQ_LEN, n_features)),
    tf.keras.layers.GRU(32),
    tf.keras.layers.Dense(1, activation='sigmoid')
])

model.compile(
    loss='binary_crossentropy',
    optimizer='adam',
    metrics=['accuracy']
)

print("Model re-compiled with updated input shape. Summary:")
model.summary()

# Train the model
print("\nStarting model training...")
history = model.fit(
    X_train,
    y_train,
    epochs=20, # You can adjust the number of epochs
    batch_size=32, # You can adjust the batch size
    validation_data=(X_val, y_val),
    verbose=1
)

# Evaluate the model on the validation set
loss, accuracy = model.evaluate(X_val, y_val, verbose=0)
print(f"\nModel Evaluation on Validation Set:")
print(f"Loss: {loss:.4f}")
print(f"Accuracy: {accuracy:.4f}")

print("Training complete and evaluation performed.")

Model re-compiled with updated input shape. Summary:


  super().__init__(**kwargs)



Starting model training...
Epoch 1/20
[1m47/47[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 26ms/step - accuracy: 0.5596 - loss: 0.6847 - val_accuracy: 0.4247 - val_loss: 0.7734
Epoch 2/20
[1m47/47[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 15ms/step - accuracy: 0.6500 - loss: 0.6510 - val_accuracy: 0.5242 - val_loss: 0.7054
Epoch 3/20
[1m47/47[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 16ms/step - accuracy: 0.7052 - loss: 0.6082 - val_accuracy: 0.6505 - val_loss: 0.6607
Epoch 4/20
[1m47/47[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 16ms/step - accuracy: 0.7062 - loss: 0.5453 - val_accuracy: 0.6075 - val_loss: 0.6896
Epoch 5/20
[1m47/47[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 15ms/step - accuracy: 0.7045 - loss: 0.5157 - val_accuracy: 0.6129 - val_loss: 0.6879
Epoch 6/20
[1m47/47[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 18ms/step - accuracy: 0.6991 - loss: 0.5079 - val_accuracy: 0.6452 - val_loss: 0.6429
Epoch 

**Reasoning**:
The previous code block executed successfully, training the GRU model and evaluating its performance on the validation set. The task is complete as per the instructions.



In [9]:
**Reasoning**:
The previous code block executed successfully, training the GRU model and evaluating its performance on the validation set. The task is complete as per the instructions.

SyntaxError: invalid syntax (2385342504.py, line 1)

**Reasoning**:
The previous step (code block with cell_id: `3fe79934`) successfully trained and evaluated the GRU model as required by the task. The `SyntaxError` occurred because a descriptive text (markdown) was erroneously placed within a Python code cell. The task is already complete.



In [10]:
print("The GRU model has been successfully trained and evaluated. The task is complete.")

The GRU model has been successfully trained and evaluated. The task is complete.


## Summary:

### Q&A
The `X_scaled` features and `y` target variable were prepared for the GRU model by transforming them into sequential data with a `SEQ_LEN` of 10. This sequential data was then split into training and validation sets using an 80/20 ratio. The trained GRU model achieved a validation accuracy of approximately 70.43% and a validation loss of approximately 0.5237 on the validation set.

### Data Analysis Key Findings
*   The `X_scaled` features and `y` target variable were successfully transformed into sequential data, resulting in `X_seq` with a shape of (1858, 10, 7) and `y_seq` with a shape of (1858,).
*   The sequential data was split into training and validation sets with an 80/20 ratio, yielding `X_train` of shape (1486, 10, 7), `y_train` of shape (1486,), `X_val` of shape (372, 10, 7), and `y_val` of shape (372,).
*   The GRU model was trained and evaluated successfully, achieving a final validation accuracy of approximately 70.43% and a validation loss of approximately 0.5237.

### Insights or Next Steps
*   The current model performance provides a good baseline; further hyperparameter tuning or exploring different model architectures could potentially improve accuracy.
*   Given the `UserWarning` about the `input_shape` argument for Keras layers, it might be beneficial to explicitly define the input shape in the model definition for better clarity and to avoid potential issues in future Keras updates.
