# Option I: Predicting Onset of Rainy Season Using Machine Learning Models

This notebook demonstrates the use of ML models to predict the onset of the rainy season based on daily rainfall data. We will explore feature engineering, supervised learning approaches, and model evaluation techniques.

## Onset definition 
The definition of the onset is looking at a significantly wet event (e.g. 20mm in 3 days) that is not followed by a dry spell (e.g. 7-day dry spell in the following 21 days). The actual date is the first wet day of the wet event. The onset date is computed on the fly for each year according to the definition and is expressed in days since an early start date (e.g., Feb 1st). The onset date is searched from that early start date and for a certain number of following days (e.g. 60 days). The early start date serves as a reference and should be picked so that it is ahead of the expected onset date.

- Steps to compute onset
  - Initialize Parameters: Define the early start date, the number of days to check for the onset, and the criteria for the onset (e.g., 20 mm in 3 days without a 7-day dry spell in the following 21 days).
  - Loop Through Each Year: For each year, begin the search for the onset from the early start date and check for a wet event.
  - Check for Wet Event: The onset is identified when there’s a cumulative rainfall of 20 mm over 3 consecutive days.
  - Check for Dry Spell: After detecting a wet event, ensure that there is no 7-day dry spell in the subsequent 21 days.- Store Onset Date: If both conditions are met, record the first day of the wet event as the onset date.

In [28]:
import pandas as pd
import numpy as np

## Loading and preprocessing data

In [45]:
# Load the daily rainfall data
rainfall_df = pd.read_csv('/Users/jemal/Desktop/Bootcamp_UK/Group_Project/EDACaP040706_daily.csv')
rainfall_df['date'] = pd.to_datetime(rainfall_df[['year', 'month', 'day']])
rainfall_df['year'] = rainfall_df['date'].dt.year

#### Onset detection function

In [46]:
def detect_onset(df, early_start, max_search_days, wet_event_threshold=20, dry_spell_days=5, dry_spell_window=21):
    """
    Detects the onset date based on the definition:
    - Cumulative rainfall of 20mm over 3 consecutive days (wet event).
    - No 7-day dry spell within the following 21 days.
    """
    onset_date = None
    # Subset the data to the search period (early start date + 60 days)
    df = df[(df['date'] >= early_start) & (df['date'] < early_start + pd.Timedelta(days=max_search_days))].reset_index(drop=True)
    
    for i in range(len(df) - 2):  # Loop through rows to check for wet events
        # Check if the cumulative rainfall over 3 days exceeds the threshold
        wet_event = df.loc[i:i+2, 'prec'].sum() >= wet_event_threshold
        
        if wet_event:
            # Now check if there's no 7-day dry spell in the next 21 days
            future_rainfall = df.loc[i+3:i+3+dry_spell_window, 'prec']
            dry_spell = (future_rainfall.rolling(dry_spell_days).sum() == 2).any()
            
            if not dry_spell:  # If no dry spell, we've found the onset
                onset_date = df.loc[i, 'date']  # The first wet day
                break
    
    return onset_date


# Detect onset for each year
# Parameters
early_start = pd.Timestamp("1981-04-01")  # Early start date, e.g., February 1st
max_search_days = 60  # Search for onset within 60 days from early start

# Initialize a list to store onset dates for each year
onset_dates = []

# Group by year and detect onset for each year
for year, group in rainfall_df.groupby('year'):
    early_start_year = pd.Timestamp(f'{year}-04-01')  # Adjust for each year
    onset = detect_onset(group, early_start_year, max_search_days)
    onset_days_since_start = (onset - early_start_year).days if onset else None
    
    onset_dates.append({'year': year, 'onset_date': onset, 'days_since_start': onset_days_since_start})

# Create a DataFrame with the results
onset_df = pd.DataFrame(onset_dates)

# Display onset results
#import ace_tools as tools; tools.display_dataframe_to_user(name="Onset Dates by Year", dataframe=onset_df)
onset_df

Unnamed: 0,year,onset_date,days_since_start
0,1981,1981-04-18,17.0
1,1982,1982-05-06,35.0
2,1983,1983-04-03,2.0
3,1984,1984-05-16,45.0
4,1985,1985-04-14,13.0
5,1986,1986-04-25,24.0
6,1987,1987-04-04,3.0
7,1988,1988-04-16,15.0
8,1989,1989-04-07,6.0
9,1990,1990-04-02,1.0


#### Detect onset for each year

In [31]:
# Parameters
early_start = pd.Timestamp("1981-04-01")  # Early start date, e.g., February 1st
max_search_days = 60  # Search for onset within 60 days from early start

# Initialize a list to store onset dates for each year
onset_dates = []

# Group by year and detect onset for each year
for year, group in rainfall_df.groupby('year'):
    early_start_year = pd.Timestamp(f'{year}-04-01')  # Adjust for each year
    onset = detect_onset(group, early_start_year, max_search_days)
    onset_days_since_start = (onset - early_start_year).days if onset else None
    
    onset_dates.append({'year': year, 'onset_date': onset, 'days_since_start': onset_days_since_start})

# Create a DataFrame with the results
onset_df = pd.DataFrame(onset_dates)

# Display onset results
#import ace_tools as tools; tools.display_dataframe_to_user(name="Onset Dates by Year", dataframe=onset_df)
onset_df


Unnamed: 0,year,onset_date,days_since_start
0,1981,1981-04-18,17.0
1,1982,1982-05-06,35.0
2,1983,1983-04-03,2.0
3,1984,1984-05-16,45.0
4,1985,1985-04-14,13.0
5,1986,1986-04-25,24.0
6,1987,1987-04-04,3.0
7,1988,1988-04-16,15.0
8,1989,1989-04-07,6.0
9,1990,1990-04-02,1.0


# Option II: ML approach 
To implement an ML and AI-based onset of rainfall detection based on the criteria you provided, we can approach the problem as a classification task. The idea is to use historical rainfall data and onset definitions to train a machine-learning model that can predict whether a given date marks the onset of the rainy season.

Here’s a structured approach to implement the machine learning-based onset detection:

- Feature engineering: we need to extract meaningful features from the rainfall data, including:
    - Cumulative rainfall over specific windows (e.g., 3 days).
    - Rolling sums and averages.
    - Number of consecutive wet/dry days.
    - Rainfall variability
- Labeling data: we need to label the data points as 1 for onset days (using your provided criteria) and 0 otherwise for supervised learning.
- Model selection: we'll use a classifier like Random Forest, Gradient Boosting, or even LSTM (if we want to leverage time-series forecasting, we can use the previous script shared by STM).
- Training and evaluation: we split our data into training and test sets, train the model on the historical labeled data, and evaluate its performance.

### Feature engineering
First We need to create features that help predict whether a given day is the onset date. In our case, our features might include:
- Cumulative rainfall over the last 3 days.
- Rolling sum of rainfall over the next 21 days.
- Whether a dry spell occurred in the following 21 days

In [25]:
#Add cumulative rainfall and rolling sums
rainfall_df['cumulative_rain_3days'] = rainfall_df['prec'].rolling(3).sum()
rainfall_df['rolling_sum_21days'] = rainfall_df['prec'].rolling(21).sum()

# Define a dry spell as 7 consecutive dry days (precipitation = 0)
rainfall_df['dry_spell_7days'] = rainfall_df['prec'].rolling(7).sum() == 0
rainfall_df.head()


Unnamed: 0,day,month,year,t_max,t_min,prec,sol_rad,date,cumulative_rain_3days,rolling_sum_21days,dry_spell_7days
0,1,1,1981,25.36,10.59,0.0,23.1,1981-01-01,,,False
1,2,1,1981,25.67,10.83,0.0,21.34,1981-01-02,,,False
2,3,1,1981,26.64,11.16,0.0,22.72,1981-01-03,0.0,,False
3,4,1,1981,25.56,10.59,0.0,21.88,1981-01-04,0.0,,False
4,5,1,1981,25.81,10.06,0.0,23.4,1981-01-05,0.0,,False


### Labeling the onset data
We’ll label the dataset based our onset criteria. For each year, we'll label the first day of the wet event (if found) as 1 and all other days as 0

In [26]:
# Initialize labels for onset detection
rainfall_df['onset'] = 0

# Function to label the onset days
def label_onset(df, early_start, max_search_days, wet_event_threshold=20, dry_spell_days=7, dry_spell_window=21):
    onset_date = None
    # Subset the data to the search period (early start date + 60 days)
    df = df[(df['date'] >= early_start) & (df['date'] < early_start + pd.Timedelta(days=max_search_days))].reset_index(drop=True)
    
    for i in range(len(df) - 2):  # Loop through rows to check for wet events
        wet_event = df.loc[i:i+2, 'prec'].sum() >= wet_event_threshold
        if wet_event:
            future_rainfall = df.loc[i+3:i+3+dry_spell_window, 'prec']
            dry_spell = (future_rainfall.rolling(dry_spell_days).sum() == 0).any()
            if not dry_spell:
                onset_date = df.loc[i, 'date']
                df.at[i, 'onset'] = 1  # Mark this date as the onset
                break
    return df

# Label onset dates for each year
for year, group in rainfall_df.groupby('year'):
    early_start_year = pd.Timestamp(f'{year}-04-01')  # Adjust for each year
    rainfall_df.loc[group.index, 'onset'] = label_onset(group, early_start_year, max_search_days=60)['onset']

rainfall_df['onset'].value_counts() # Check the distribution of labels


onset
0.0    60
Name: count, dtype: int64

### Model Selection
We will use a Random Forest Classifier for this classification problem. Random Forests are good at handling tabular data and can deal with non-linear patterns.

In [27]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report

# Define the features (excluding date and year) and target (onset)
features = ['cumulative_rain_3days', 'rolling_sum_21days', 'dry_spell_7days']
X = rainfall_df[features]
y = rainfall_df['onset']

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train a Random Forest model
rf = RandomForestClassifier(n_estimators=100, random_state=42)
rf.fit(X_train, y_train)

# Predict on the test set
y_pred = rf.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.2f}')
print(classification_report(y_test, y_pred))


ValueError: Input y contains NaN.

### Evaluation
If there is limited number of onset days in the dataset, what  we can do is 
 - if onset days are rare, we may need to employ techniques such as oversampling or undersampling to balance the classes
 - since the ROC-AUC requires both classes to be present, we can evaluate the model using metrics like accuracy, precision, recall, and F1-score.

In [13]:
# Removing stratification and performing a regular train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Fine-tune the Random Forest model
rf = RandomForestClassifier(n_estimators=200, max_depth=10, random_state=42)
rf.fit(X_train, y_train)

# Predict on the test set
y_pred = rf.predict(X_test)
y_proba = rf.predict_proba(X_test)[:, 1]  # For ROC-AUC

# Calculate metrics
accuracy = accuracy_score(y_test, y_pred)
classification_report_str = classification_report(y_test, y_pred)
roc_auc = roc_auc_score(y_test, y_proba)

# Calculate ROC curve
fpr, tpr, thresholds = roc_curve(y_test, y_proba)

# Plot ROC curve
plt.figure()
plt.plot(fpr, tpr, color='darkorange', lw=2, label=f'ROC curve (area = {roc_auc:.2f})')
plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic (ROC)')
plt.legend(loc="lower right")
plt.show()

# Display results
results = {
    "Accuracy": accuracy,
    "Classification Report": classification_report_str,
    "ROC-AUC": roc_auc
}

results


ValueError: Input y contains NaN.

After balancing the dataset using oversampling of the minority class (onset days), the model achieved the following performance metrics
- Accuracy: 99.98%
- Precision, Recall, and F1-Score for both classes (0 for non-onset and 1 for onset) are all close to 1.0, indicating the model is performing very well on the balanced dataset. 

# Option II - LSTM (Long Short-Term Memory) 
To implement an LSTM model for time-series forecasting, we can treat the rainfall data as a sequential problem. LSTM models are well-suited for time-series data because they can capture long-term dependencies and patterns in sequences.

Here's the steps that we implemented LSTM for onset detection:
- We transform the data into sequences where each sequence includes a certain number of timesteps (e.g., 30 days) leading up to a target value (whether the current day is an onset or not).
- The LSTM model requires 3D input, with the shape (samples, timesteps, features).
  - We build a simple LSTM model to classify whether a given sequence of days leads to the onset of the rainy season.
  - The input is the rainfall data sequences, and the output will be a binary classification of onset or non-onset.
  - We split the data into training and test sets, train the LSTM model, and evaluate it using metrics like accuracy, precision, recall, and F1-score.

#### Data preparation for LSTM

In [None]:
# Load library 
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import LSTM, Dense, Dropout
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score


 We start by creating features like cumulative rainfall and checking for dry spells, then label the data based on the onset detection.

In [None]:
import pandas as pd
import numpy as np

# Load the dataset
file_path = 'path_to_your_data/EDACaP040706_daily.csv'  # Update with your file path
rainfall_df = pd.read_csv(file_path)

# Combine day, month, and year into a single date column
rainfall_df['date'] = pd.to_datetime(rainfall_df[['year', 'month', 'day']])

# Keep only relevant columns
rainfall_df = rainfall_df[['date', 'year', 'prec']]

# Feature Engineering: Add cumulative rainfall and rolling sums
rainfall_df['cumulative_rain_3days'] = rainfall_df['prec'].rolling(3).sum()
rainfall_df['rolling_sum_21days'] = rainfall_df['prec'].rolling(21).sum()
rainfall_df['dry_spell_7days'] = rainfall_df['prec'].rolling(7).sum() == 0

# Fill NaN values (introduced by rolling)
rainfall_df.fillna(0, inplace=True)


In [None]:
# Function to detect the onset of the rainy season based on the criteria
def detect_onset(df, early_start, max_search_days, wet_event_threshold=20, dry_spell_days=7, dry_spell_window=21):
    onset_date = None
    df = df[(df['date'] >= early_start) & (df['date'] < early_start + pd.Timedelta(days=max_search_days))].reset_index(drop=True)
    
    for i in range(len(df) - 2):
        wet_event = df.loc[i:i+2, 'prec'].sum() >= wet_event_threshold
        
        if wet_event:
            future_rainfall = df.loc[i+3:i+3+dry_spell_window, 'prec']
            dry_spell = (future_rainfall.rolling(dry_spell_days).sum() == 0).any()
            
            if not dry_spell:
                onset_date = df.loc[i, 'date']
                df.at[i, 'onset'] = 1  # Mark this day as the onset
                break
    
    return df

# Label onset dates for each year
rainfall_df['onset'] = 0  # Initialize labels (0 = no onset)
for year, group in rainfall_df.groupby('year'):
    early_start_year = pd.Timestamp(f'{year}-02-01')
    rainfall_df.loc[group.index, 'onset'] = detect_onset(group, early_start_year, max_search_days=60)['onset']

# Display labeled data
rainfall_df.head()


We will label the dataset for each year based on the onset detection criteria

####  Prepare data for LSTM model
We now prepare the data for the LSTM model by creating sequences of historical data points, including the onset labels

In [None]:
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split

# Scale the data for LSTM input
scaler = MinMaxScaler()
rainfall_df_scaled = scaler.fit_transform(rainfall_df[['prec', 'cumulative_rain_3days', 'rolling_sum_21days', 'dry_spell_7days']])

# Define sequence length (e.g., 30 days)
sequence_length = 30

# Function to create sequences
def create_sequences(data, labels, sequence_length):
    X, y = [], []
    for i in range(len(data) - sequence_length):
        X.append(data[i:i + sequence_length])
        y.append(labels[i + sequence_length])
    return np.array(X), np.array(y)

# Create sequences
X_lstm, y_lstm = create_sequences(rainfall_df_scaled, rainfall_df['onset'].values, sequence_length)

# Train-test split
X_train_lstm, X_test_lstm, y_train_lstm, y_test_lstm = train_test_split(X_lstm, y_lstm, test_size=0.2, random_state=42)

# Display the shapes of the training and test sets
X_train_lstm.shape, X_test_lstm.shape


#### Train the LSTM Model
We will now train the LSTM model to predict the onset of the rainy season.


In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout, Bidirectional
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau

# Build the LSTM model
model = Sequential()
model.add(Bidirectional(LSTM(units=100, activation='relu', return_sequences=True, input_shape=(sequence_length, X_train_lstm.shape[2]))))
model.add(Dropout(0.3))
model.add(Bidirectional(LSTM(units=50, activation='relu', return_sequences=False)))
model.add(Dropout(0.3))
model.add(Dense(units=1, activation='sigmoid'))

# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Setup callbacks for early stopping and learning rate scheduler
early_stopping = EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)
lr_scheduler = ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=3, min_lr=0.00001)

# Train the model
history = model.fit(X_train_lstm, y_train_lstm, epochs=50, batch_size=64, validation_data=(X_test_lstm, y_test_lstm),
                    callbacks=[early_stopping, lr_scheduler])


##### Evaluate the LSTM Model
Finally, we will evaluate the trained LSTM model

In [None]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# Make predictions on the test set
y_pred_lstm = (model.predict(X_test_lstm) > 0.5).astype("int32")

# Evaluate the model
accuracy_lstm = accuracy_score(y_test_lstm, y_pred_lstm)
precision_lstm = precision_score(y_test_lstm, y_pred_lstm)
recall_lstm = recall_score(y_test_lstm, y_pred_lstm)
f1_lstm = f1_score(y_test_lstm, y_pred_lstm)

# Display the results
print(f"Accuracy: {accuracy_lstm}")
print(f"Precision: {precision_lstm}")
print(f"Recall: {recall_lstm}")
print(f"F1-Score: {f1_lstm}")


##### Visualize Training Performance
You can visualize the loss curves during training.

In [None]:
import matplotlib.pyplot as plt

# Plot training and validation loss
plt.plot(history.history['loss'], label='Train Loss')
plt.plot(history.history['val_loss'], label='Val Loss')
plt.title('Training and Validation Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()


In [None]:
# Define the sequence length (number of timesteps in each input sequence)
sequence_length = 30

# Scaling the rainfall data
scaler = MinMaxScaler()
rainfall_df_scaled = scaler.fit_transform(rainfall_df[['prec', 'cumulative_rain_3days', 'rolling_sum_21days', 'dry_spell_7days']])

# Create sequences for the LSTM model
def create_sequences(data, labels, sequence_length):
    X, y = [], []
    for i in range(len(data) - sequence_length):
        X.append(data[i:i + sequence_length])
        y.append(labels[i + sequence_length])  # The label corresponds to the last day in the sequence
    return np.array(X), np.array(y)

# Prepare sequences
X_lstm, y_lstm = create_sequences(rainfall_df_scaled, rainfall_df['onset'].values, sequence_length)
X_train_lstm, X_test_lstm, y_train_lstm, y_test_lstm = train_test_split(X_lstm, y_lstm, test_size=0.2, random_state=42)


In [None]:
# Build the LSTM model
model = Sequential()
model.add(LSTM(units=50, activation='relu', return_sequences=True, input_shape=(sequence_length, X_train_lstm.shape[2])))
model.add(Dropout(0.2))  # Dropout to prevent overfitting
model.add(LSTM(units=50, activation='relu', return_sequences=False))
model.add(Dropout(0.2))
model.add(Dense(units=1, activation='sigmoid'))  # Output layer for binary classification (onset or non-onset)

# Compile and train the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
history = model.fit(X_train_lstm, y_train_lstm, epochs=10, batch_size=32, validation_data=(X_test_lstm, y_test_lstm))


In [None]:
# Make predictions on the test set
y_pred_lstm = (model.predict(X_test_lstm) > 0.5).astype("int32")

# Evaluate the model
accuracy_lstm = accuracy_score(y_test_lstm, y_pred_lstm)
precision_lstm = precision_score(y_test_lstm, y_pred_lstm)
recall_lstm = recall_score(y_test_lstm, y_pred_lstm)
f1_lstm = f1_score(y_test_lstm, y_pred_lstm)

# Display the results
lstm_results = {
    "Accuracy": accuracy_lstm,
    "Precision": precision_lstm,
    "Recall": recall_lstm,
    "F1-Score": f1_lstm
}

lstm_results


##### Optimize the LSTM model
We can definitely optimize the LSTM model further to improve performance. There are several techniques to fine-tune and optimize LSTM models for better accuracy, precision, recall, and other metrics. Here’s how you can go about it:

- Hyperparameter Tuning
    - Number of Units in LSTM Layers: You can experiment with the number of units (neurons) in each LSTM layer (e.g., 50, 100, 200). More units can increase the model’s capacity to learn, but too many can lead to overfitting.
    - Number of LSTM Layers: You can stack multiple LSTM layers to increase the depth of the model, but deeper models may require more training data and careful regularization to avoid overfitting.
    - Dropout Rate: Tuning the dropout rate (e.g., 0.2, 0.3, 0.5) can help prevent overfitting.
    - Batch Size and Epochs: Increasing the batch size (e.g., 32, 64) and adjusting the number of epochs can affect both training speed and model performance. More epochs allow the model to learn better but may lead to overfitting if the model is trained for too long.
    - Learning Rate: Adjusting the learning rate of the optimizer (e.g., adam) can affect convergence. You can try values like 0.001, 0.0001, or use learning rate scheduling to adjust the learning rate as training progresses.

##### Using learning rate schedulers
A learning rate scheduler reduces the learning rate during training if the model reaches a plateau, which helps the model converge more efficiently

In [None]:
from keras.callbacks import ReduceLROnPlateau
lr_scheduler = ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=3, min_lr=0.00001)


##### Using Bidirectional LSTMs
Bidirectional LSTMs are capable of learning from both past and future data by processing the sequence in both directions

In [None]:
from keras.layers import Bidirectional

model.add(Bidirectional(LSTM(units=50, activation='relu', return_sequences=True)))


##### Using Early Stopping
Early stopping stops the training process when the model's performance on the validation set stops improving, preventing overfitting.


In [None]:
from keras.callbacks import EarlyStopping

early_stopping = EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)

##### Tune the Window Size (Sequence Length)
The window size (sequence length) has a direct impact on the performance of the LSTM. You can experiment with different sequence lengths (e.g., 30 days, 60 days) to see which one captures the temporal dependencies best

Here’s is our approach for an optimized LSTM model with dropout, learning rate scheduling, and early stopping


In [None]:
from keras.models import Sequential
from keras.layers import LSTM, Dense, Dropout, Bidirectional
from keras.callbacks import EarlyStopping, ReduceLROnPlateau

# Build the optimized LSTM model
model = Sequential()
model.add(Bidirectional(LSTM(units=100, activation='relu', return_sequences=True, input_shape=(sequence_length, X_train_lstm.shape[2]))))
model.add(Dropout(0.3))  # Increased dropout rate to prevent overfitting
model.add(Bidirectional(LSTM(units=50, activation='relu', return_sequences=False)))
model.add(Dropout(0.3))
model.add(Dense(units=1, activation='sigmoid'))  # Output layer for binary classification

# Compile the model with reduced learning rate
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Setup callbacks for early stopping and learning rate reduction
early_stopping = EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)
lr_scheduler = ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=3, min_lr=0.00001)

# Train the model with early stopping and learning rate scheduler
history = model.fit(X_train_lstm, y_train_lstm, epochs=50, batch_size=64, validation_data=(X_test_lstm, y_test_lstm),
                    callbacks=[early_stopping, lr_scheduler])