# **1D Long Short-Term Memory (LSTM) Network**

**LSTM networks are a type of recurrent neural network (RNN) designed to capture dependencies in sequential data. Unlike CNNs, which detect spacial patterns, LSTMs are well-suited for learning temporal relationships, making them ideal for analyzing light curves over time. By maintaining a memory of past observations, LSTMs can recognize trends and fluctuations that may indicate exoplanet transits.**

---


In [1]:
import numpy as np
import tensorflow as tf
import random
import os
import pickle

# Load preprocessed data
X_train = np.load("X_train.npy")
X_test = np.load("X_test.npy")
y_train = np.load("y_train.npy")
y_test = np.load("y_test.npy")

SEED = 42
np.random.seed(SEED)
tf.random.set_seed(SEED)
random.seed(SEED)
os.environ['PYTHONHASHSEED'] = str(SEED) # Ensures consistent hashing
os.environ['TF_DETERMINISTIC_OPS'] = '1' # Ensures deterministic TensorFlow operations

In [2]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout, Input
from tensorflow.keras.initializers import GlorotUniform

# Define the model
model = Sequential([
    Input(shape=(X_train.shape[1], 1)), # Input layer

    LSTM(128, return_sequences=True, kernel_initializer=GlorotUniform(seed=42)), # LSTM layer capturing sequential patterns
    Dropout(0.3), # Avoids overfitting

    LSTM(64, return_sequences=False, kernel_initializer=GlorotUniform(seed=42)), # Reducing dimensions
    Dropout(0.3),

    Dense(64, activation="relu", kernel_initializer=GlorotUniform(seed=42)), # Fully connected layer
    Dropout(0.3),

    Dense(2, activation="softmax") # Output
])

model.compile(optimizer="adam",
             loss="sparse_categorical_crossentropy",
             metrics=["accuracy"])

model.summary()

In [3]:
from tensorflow.keras.callbacks import EarlyStopping

# Define parameters for early stopping to avoid overfitting
early_stop = EarlyStopping(monitor="val_loss", patience=5, restore_best_weights=True)

# Convert labels from {1, 2} to {0, 1}
y_train = y_train - 1
y_test = y_test - 1

# Train the model
history = model.fit(
    X_train, y_train,
    validation_data=(X_test, y_test),
    epochs=30, batch_size=32,
    callbacks=[early_stop],
    verbose=1
)

with open("history_lstm_baseline.pkl", "wb") as f:
    pickle.dump(history.history, f)

model.save("lstm_baseline.keras")

Epoch 1/30
[1m316/316[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m917s[0m 3s/step - accuracy: 0.5234 - loss: 0.6915 - val_accuracy: 0.9439 - val_loss: 0.6808
Epoch 2/30
[1m316/316[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m963s[0m 3s/step - accuracy: 0.5196 - loss: 0.6899 - val_accuracy: 0.9614 - val_loss: 0.6704
Epoch 3/30
[1m316/316[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m983s[0m 3s/step - accuracy: 0.5248 - loss: 0.6874 - val_accuracy: 0.9912 - val_loss: 0.5764
Epoch 4/30
[1m316/316[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m998s[0m 3s/step - accuracy: 0.5006 - loss: 0.6962 - val_accuracy: 0.9912 - val_loss: 0.6906
Epoch 5/30
[1m316/316[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m989s[0m 3s/step - accuracy: 0.5046 - loss: 0.6926 - val_accuracy: 0.9912 - val_loss: 0.6899
Epoch 6/30
[1m316/316[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m991s[0m 3s/step - accuracy: 0.4975 - loss: 0.6929 - val_accuracy: 0.0123 - val_loss: 0.7076
Epoch 7/30
[1m316/316

In [4]:
from sklearn.metrics import classification_report

# Evaluate on test set
test_loss, test_accuracy = model.evaluate(X_test, y_test)
print(f"Test Accuracy: {test_accuracy:.4f}")

# Get predicted class labels (chooses the highest probability)
y_pred = model.predict(X_test)
y_pred_classes = np.argmax(y_pred, axis=1)  # Selects class with highest probability

# Generate classification report
print(classification_report(y_test, y_pred_classes))

[1m18/18[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 559ms/step - accuracy: 0.9708 - loss: 0.5824
Test Accuracy: 0.9912
[1m18/18[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 596ms/step
              precision    recall  f1-score   support

           0       0.99      1.00      1.00       565
           1       0.00      0.00      0.00         5

    accuracy                           0.99       570
   macro avg       0.50      0.50      0.50       570
weighted avg       0.98      0.99      0.99       570



  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


## **Model Analysis**
- **Architecture**: Single-directional LSTM with three convolutional layers.
- **Regularization**: Batch normalization and dropout (`0.3`).
- **Loss Function**: `sparse_categorical_crossentropy` with `softmax` activation.
- **Optimization**: Adam optimizer with default learning rate.
- **Early Stopping**: Enabled (`patience=5`), stopping at 8 epochs.

### **Results**
- **Overall Test Accuracy**: `99.12%`
- **Precision for Label 2**: `0.00`
- **Recall for Label 2**: `0.00`
- **F1-Score for Label 2**: `0.00`

### **Observations**
The LSTM model failed to classify label 2 entirely.  

### **Next Steps**
- Adding higher weight to label 2 (`class_weights = {0: 1, 1: 5}`) should encourage the model to classify it better.


In [5]:
# Add weight to class 1 (Label 2)
class_weights = {0:1, 1:5}

history = model.fit(
    X_train, y_train,
    validation_data=(X_test, y_test),
    epochs=30, batch_size=32,
    class_weight=class_weights,
    callbacks=[early_stop],
    verbose=1
)

with open("history_lstm_weighted.pkl", "wb") as f:
    pickle.dump(history.history, f)

model.save("lstm_weighted.keras")

Epoch 1/30
[1m316/316[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m986s[0m 3s/step - accuracy: 0.4971 - loss: 1.4440 - val_accuracy: 0.0193 - val_loss: 1.4991
Epoch 2/30
[1m316/316[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m990s[0m 3s/step - accuracy: 0.4976 - loss: 1.3678 - val_accuracy: 0.0211 - val_loss: 1.5436
Epoch 3/30
[1m316/316[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m988s[0m 3s/step - accuracy: 0.4953 - loss: 1.3890 - val_accuracy: 0.0105 - val_loss: 1.5496
Epoch 4/30
[1m316/316[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m992s[0m 3s/step - accuracy: 0.4983 - loss: 1.3662 - val_accuracy: 0.0123 - val_loss: 1.5467
Epoch 5/30
[1m316/316[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m990s[0m 3s/step - accuracy: 0.4955 - loss: 1.3640 - val_accuracy: 0.0123 - val_loss: 1.5606
Epoch 6/30
[1m316/316[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m986s[0m 3s/step - accuracy: 0.4945 - loss: 1.3703 - val_accuracy: 0.0105 - val_loss: 1.5756


In [6]:
# Evaluate on test set
test_loss, test_accuracy = model.evaluate(X_test, y_test)
print(f"Test Accuracy: {test_accuracy:.4f}")

# Get predicted class labels (chooses the highest probability)
y_pred = model.predict(X_test)
y_pred_classes = np.argmax(y_pred, axis=1)  # Selects class with highest probability

# Generate classification report
print(classification_report(y_test, y_pred_classes))

[1m18/18[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 557ms/step - accuracy: 0.0400 - loss: 1.4726
Test Accuracy: 0.0193
[1m18/18[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 612ms/step
              precision    recall  f1-score   support

           0       1.00      0.01      0.02       565
           1       0.01      1.00      0.02         5

    accuracy                           0.02       570
   macro avg       0.50      0.51      0.02       570
weighted avg       0.99      0.02      0.02       570



## **Model Analysis**
### **What Changed**
- **Class Weights**: Class 1 (Label 2) assigned a weight of 5.

### **Results**
- **Overall Test Accuracy**: `1.93%`
- **Precision for Label 2**: `0.01`
- **Recall for Label 2**: `1.00`
- **F1-Score for Label 2**: `0.02`

### **Observations**
The model overcompensated for Label 2, leading to nearly all samples being predicted as Label 2 and reducing accuracy drastically. Class weighting was too aggressive.  

### **Next Steps**
- Revert class weights and try increasing early stopping patience to give the model more time to learn from the training data before stopping.


In [7]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout, Input
from tensorflow.keras.optimizers import Adam

# Define the model with reduced learning rate
model_2 = Sequential([
    Input(shape=(X_train.shape[1], 1)), # Input layer

    LSTM(128, return_sequences=True, kernel_initializer=GlorotUniform(seed=42)), # LSTM layer capturing sequential patterns
    Dropout(0.3), # Avoids overfitting

    LSTM(64, return_sequences=False, kernel_initializer=GlorotUniform(seed=42)), # Reducing dimensions
    Dropout(0.3),

    Dense(64, activation="relu", kernel_initializer=GlorotUniform(seed=42)), # Fully connected layer
    Dropout(0.3),

    Dense(2, activation="softmax") # Output
])

model_2.compile(optimizer=Adam(learning_rate=0.0003),
             loss="sparse_categorical_crossentropy",
             metrics=["accuracy"])

model_2.summary()

In [8]:
from tensorflow.keras.callbacks import EarlyStopping

# Increased patience for early stopping
early_stop = EarlyStopping(monitor="val_loss", patience=10, restore_best_weights=True)

# Train the model
history = model_2.fit(
    X_train, y_train,
    validation_data=(X_test, y_test),
    epochs=30, batch_size=32,
    callbacks=[early_stop],
    verbose=1
)

with open("history_lstm_patient.pkl", "wb") as f:
    pickle.dump(history.history, f)

model_2.save("lstm_patient.keras")

Epoch 1/30
[1m316/316[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m985s[0m 3s/step - accuracy: 0.5123 - loss: 0.6918 - val_accuracy: 0.9491 - val_loss: 0.6875
Epoch 2/30
[1m316/316[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m981s[0m 3s/step - accuracy: 0.5220 - loss: 0.6876 - val_accuracy: 0.9491 - val_loss: 0.6696
Epoch 3/30
[1m316/316[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m991s[0m 3s/step - accuracy: 0.5235 - loss: 0.6910 - val_accuracy: 0.0737 - val_loss: 0.7139
Epoch 4/30
[1m316/316[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1001s[0m 3s/step - accuracy: 0.5289 - loss: 0.6852 - val_accuracy: 0.7947 - val_loss: 0.6793
Epoch 5/30
[1m316/316[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m998s[0m 3s/step - accuracy: 0.5306 - loss: 0.6808 - val_accuracy: 0.1632 - val_loss: 0.7478
Epoch 6/30
[1m316/316[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1001s[0m 3s/step - accuracy: 0.5575 - loss: 0.6716 - val_accuracy: 0.7772 - val_loss: 0.5038
Epoch 7/30
[1m316/3

In [9]:
from sklearn.metrics import classification_report

# Evaluate on test set
test_loss, test_accuracy = model_2.evaluate(X_test, y_test)
print(f"Test Accuracy: {test_accuracy:.4f}")

# Get predicted class labels (chooses the highest probability)
y_pred = model_2.predict(X_test)
y_pred_classes = np.argmax(y_pred, axis=1)  # Selects class with highest probability

# Generate classification report
print(classification_report(y_test, y_pred_classes))

[1m18/18[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 553ms/step - accuracy: 0.7723 - loss: 0.5172
Test Accuracy: 0.7772
[1m18/18[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 643ms/step
              precision    recall  f1-score   support

           0       1.00      0.78      0.87       565
           1       0.02      0.60      0.05         5

    accuracy                           0.78       570
   macro avg       0.51      0.69      0.46       570
weighted avg       0.99      0.78      0.87       570



## **Model Analysis**
### **What Changed**
- **Removed Weights**: Class weights reverted to {0:1, 1:1}.
- **Early Stopping**: Enabled (`patience=10`), stopping at 16 epochs.
  
### **Results**
- **Overall Test Accuracy**: `77.72%`
- **Precision for Label 2**: `0.02`
- **Recall for Label 2**: `0.60`
- **F1-Score for Label 2**: `0.05`

### **Observations**
Increasing early stopping patienced improved recall for Label 2, however there is still a big imbalance. F1-score for Label 2 is only 0.05 because of very low precision. 

### **Next Steps**
- False positives for Label 2 need to be reduced.  Lowering the decision threshold may help.

In [11]:
# Lower default decision threshold from 0.5 to 0.3.

# Get predicted class labels (chooses the highest probability)
y_pred = model_2.predict(X_test)[:, 1]
threshold = 0.3
y_pred_adjusted = (y_pred > threshold).astype(int)

# Generate classification report
print(classification_report(y_test, y_pred_classes))

[1m18/18[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 477ms/step 
              precision    recall  f1-score   support

           0       1.00      0.78      0.87       565
           1       0.02      0.60      0.05         5

    accuracy                           0.78       570
   macro avg       0.51      0.69      0.46       570
weighted avg       0.99      0.78      0.87       570



## **Model Analysis**
### **What Changed**
- **Lowered Decision Threshold**: Class weights reverted to {0:1, 1:1}.
  
### **Results**
- **Precision for Label 2**: `0.02`
- **Recall for Label 2**: `0.60`
- **F1-Score for Label 2**: `0.05`

### **Observations**
Lowering the decision threshold did not change the performance of the model at all. 

---

## **Project Conclusions**

This project explored data preparation steps and the training of two deep learning architectures - 1D Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) networks - to classify exoplanet candidates based on flux data. The CNN models focused on pattern detection within the light curves, while the LSTM models aimed to capture temporal dependencies.  The raw data was heavily imbalanced with Label 2 (exoplanet stars) samples representing only a tiny fraction of the total samples. To mitigate this imbalance, the Synthetic Minority Oversampling Technique (SMOTE) was applied and the data was normalized and reshaped for use in training. Despite training that involved tuning paramaters such as regularization, dropout, learning rate, class weighting, and early stopping, both architectures struggled to achieve strong generalization, particularly in identifying the minority class. The LSTM performed slightly better in recall for Label 2, but at the expense of overall accuracy. The CNN maintained higher precision, but failed to distiguish Label 2 samples effectively.  

---

## **Suggested Next Steps** (Outside the scope of this project)
- **Feature Engineering**: Extract additional time-series features, such as periodicity, to enhance model interpretability.
- **Alternative Architectures**: Experiment with hybrid models that combine CNNs for pattern detection and LSTMs for sequence learning.
- **Hyperparameter Optimization**: Conduct a more exhaustive exploration of hyperparameter optimizations.
- **Domain Specific Techniques**: Explore astrophysics-informed methods such as phase-folding or transit fitting to improve signal extraction.