## **Stock Price Prediction - NIFTY 50**

### **Notebook 07: Deep Learning I (ANN and LSTM Basics)**

[![Python](https://img.shields.io/badge/Python-3.8%2B-blue)](https://www.python.org/) [![TensorFlow](https://img.shields.io/badge/TensorFlow-2.10%2B-orange)](https://tensorflow.org/) [![Keras](https://img.shields.io/badge/Keras-Latest-red)](https://keras.io/) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

---

**Part of the comprehensive learning series:** [Stock Price Prediction - NIFTY 50](https://github.com/prakash-ukhalkar/stock-price-prediction-nifty50)

**Learning Objectives:**
- Implement fundamental deep learning architectures (ANN and LSTM) for financial prediction
- Apply neural networks to capture dynamic and chaotic market patterns
- Compare tabular vs sequential modeling approaches on same dataset
- Establish deep learning baselines for advanced architectures
- Understand temporal dependency modeling with LSTM networks

**Dataset Scope:** Apply deep learning to feature-engineered data and sequential price patterns. Compare ANN and LSTM approaches.

---

* This notebook begins the Deep Learning phase of the research, which is essential for capturing the **dynamic, inconsistent, and chaotic** nature of the stock market. We implement two fundamental architectures cited in the literature: 
 
  1.  A standard **Artificial Neural Network (ANN)**, treating the input features (TAs, Lags) as a static tabular problem.
  
  2.  A foundational **Univariate Long Short-Term Memory (LSTM)** network, designed to learn temporal dependencies using sequential price data. 

* Both models predict the next day's **Log Return**.

## 1. Setup and Data Loading

We load the feature-engineered training data (`nifty50_train_features.csv`) and the raw test data (`nifty50_test.csv`). We use **TensorFlow/Keras** for building and training the neural networks.

In [4]:
# Cell 1: Import Libraries
import pandas as pd
import numpy as np
import os
import ta # Re-imported for feature manipulation if needed
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error, mean_absolute_error
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM, Dropout, InputLayer
from tensorflow.keras.callbacks import EarlyStopping

# Define Paths
TRAIN_FEATURES_PATH = '../data/processed/nifty50_train_features.csv'
TEST_DATA_PATH = '../data/processed/nifty50_test.csv'
MODEL_RESULTS_PATH = '../models/dl_model_results.csv' # New results file for DL models
ML_RESULTS_PATH = '../models/ml_model_results.csv' # For consolidation

# Load data
df_train = pd.read_csv(TRAIN_FEATURES_PATH, index_col='Date', parse_dates=True)
df_test = pd.read_csv(TEST_DATA_PATH, index_col='Date', parse_dates=True)

TARGET_COL = 'Log_Return'
print(f"Training Data loaded. Shape: {df_train.shape}")

Training Data loaded. Shape: (57311, 33)


## 2. Model I: Artificial Neural Network (ANN)

**Explanation:** The ANN serves as the first Deep Learning benchmark. It treats the feature-engineered data like a standard tabular dataset. This tests whether prediction is driven primarily by the feature set rather than memory. **Scaling is mandatory for ANNs.**

In [5]:
# Cell 2: ANN Data Preparation and Scaling (FIXED)

X_train_ann = df_train.drop(columns=[TARGET_COL])
y_train_ann = df_train[TARGET_COL].values

# --- CRITICAL FIX: Clean non-numeric columns (Resolves 'ValueError: could not convert string to float') ---
non_numeric_cols = X_train_ann.select_dtypes(include=['object', 'category']).columns
if not non_numeric_cols.empty:
    print(f"Dropping non-numeric columns from X_train_ann: {non_numeric_cols.tolist()}")
    X_train_ann = X_train_ann.drop(columns=non_numeric_cols)
    
# Scale Features
scaler_ann = MinMaxScaler()
X_train_ann_scaled = scaler_ann.fit_transform(X_train_ann)
X_train_ann_scaled = pd.DataFrame(X_train_ann_scaled, index=X_train_ann.index, columns=X_train_ann.columns)

INPUT_DIM_ANN = X_train_ann_scaled.shape[1]

print(f"ANN Input Dimension (Features): {INPUT_DIM_ANN}")

Dropping non-numeric columns from X_train_ann: ['Symbol']
ANN Input Dimension (Features): 31


In [6]:
# Cell 3: Build and Train ANN Model

ann_model = Sequential([
    InputLayer(shape=(INPUT_DIM_ANN,)),
    Dense(64, activation='relu'),
    Dropout(0.2),
    Dense(32, activation='relu'),
    Dense(1) # Output layer for regression
])

ann_model.compile(optimizer='adam', loss='mse')

print("Starting ANN training...")
ann_history = ann_model.fit(
    X_train_ann_scaled, y_train_ann,
    epochs=50, 
    batch_size=32, 
    validation_split=0.1, 
    verbose=0
)

print("ANN Training complete.")

Starting ANN training...
ANN Training complete.
ANN Training complete.


### 2.1 ANN Prediction on Test Set
We prepare the test data by ensuring feature re-engineering, cleanup, and scaling are applied consistently with the training data.

In [7]:
# Cell 4: ANN Prediction Data Prep and Prediction

# 1. Feature Re-Engineering (MUST be done on raw test data)
df_test_ann_features = df_test.copy()
LAG_PERIODS = [1, 2, 3, 5, 10]
for lag in LAG_PERIODS:
    df_test_ann_features[f'Close_Lag_{lag}'] = df_test_ann_features['Close'].shift(lag)
    df_test_ann_features[f'Return_Lag_{lag}'] = df_test_ann_features['Log_Return'].shift(lag)
WINDOW_TREND = [10, 20, 50] 
for window in WINDOW_TREND:
    df_test_ann_features[f'SMA_{window}'] = ta.trend.sma_indicator(df_test_ann_features['Close'], window=window, fillna=False)
    df_test_ann_features[f'EMA_{window}'] = ta.trend.ema_indicator(df_test_ann_features['Close'], window=window, fillna=False)
macd = ta.trend.MACD(df_test_ann_features['Close'], window_fast=12, window_slow=26, window_sign=9, fillna=False)
df_test_ann_features['MACD_Line'] = macd.macd()
df_test_ann_features['MACD_Signal'] = macd.macd_signal()
RSI_WINDOW = 14 
df_test_ann_features[f'RSI_{RSI_WINDOW}'] = ta.momentum.rsi(df_test_ann_features['Close'], window=RSI_WINDOW, fillna=False)
df_test_ann_features['MFI'] = ta.volume.money_flow_index(df_test_ann_features['High'], df_test_ann_features['Low'], df_test_ann_features['Close'], df_test_ann_features['Volume'], window=14, fillna=False)
df_test_ann_features['ATR'] = ta.volatility.average_true_range(df_test_ann_features['High'], df_test_ann_features['Low'], df_test_ann_features['Close'], window=14, fillna=False)
df_test_ann_features = df_test_ann_features.dropna()

# 2. Final Cleanup and Alignment
X_test_ann = df_test_ann_features.drop(columns=[TARGET_COL])

# Drop non-numeric columns from X_test to match trained feature set
non_numeric_cols_test = X_test_ann.select_dtypes(include=['object', 'category']).columns
X_test_ann = X_test_ann.drop(columns=non_numeric_cols_test)

# Align columns and save feature list for later evaluation (needed for y_true extraction)
X_test_ann = X_test_ann[X_train_ann.columns] 

# 3. Scale and Predict
X_test_ann_scaled = scaler_ann.transform(X_test_ann)
ann_preds = ann_model.predict(X_test_ann_scaled).flatten()

print(f"ANN Predictions generated. Test features used: {X_test_ann.shape[0]}")

[1m446/446[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step
[1m446/446[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step
ANN Predictions generated. Test features used: 14254
ANN Predictions generated. Test features used: 14254


## 3. Model II: Univariate LSTM (Long Short-Term Memory)

**Explanation:** LSTM is designed to capture **temporal dynamics**. We use a **univariate** approach here, using only the lagged Close Price to predict the next return, focusing entirely on sequential memory. This requires **data windowing** and **3D input reshaping**.

### 3.1 Data Windowing and Reshaping
We create sequences of `TIME_STEPS` (lookback window) of scaled price data to predict the next day's Log Return.

In [8]:
# Cell 5: LSTM Data Windowing Function and Training Prep

TIME_STEPS = 20 # Lookback window: last 20 trading days

def create_lstm_dataset(data, time_steps):
    X, y_index = [], []
    # data should be the scaled feature (Close Price) series/df
    for i in range(len(data) - time_steps):
        # Features X: The last 'time_steps' of scaled Close Prices
        X.append(data.iloc[i:(i + time_steps), 0].values) 
        # Target y_index: The index (Date) immediately following the sequence
        y_index.append(data.iloc[i + time_steps].name)
    return np.array(X), np.array(y_index)

# Prepare Training Data (Scale Close Price)
df_lstm_train = df_train[['Close']].copy()
scaler_lstm = MinMaxScaler()
df_lstm_train['Close_Scaled'] = scaler_lstm.fit_transform(df_lstm_train[['Close']])

X_train_lstm, y_train_index = create_lstm_dataset(df_lstm_train.drop(columns=['Close']), TIME_STEPS)

# Map index back to y_train to get the actual target values
y_train_lstm = df_train.loc[y_train_index, TARGET_COL].values

# Reshape X for LSTM input: [samples, timesteps, features=1]
X_train_lstm = np.reshape(X_train_lstm, (X_train_lstm.shape[0], X_train_lstm.shape[1], 1))

print(f"LSTM Train X shape (samples, timesteps, features): {X_train_lstm.shape}")

LSTM Train X shape (samples, timesteps, features): (57291, 20, 1)


In [9]:
# Cell 6: Prepare Test Data for LSTM

# Need a continuous sequence of data encompassing the lookback window plus the test period.
df_full = pd.concat([df_train[['Close']], df_test[['Close']]], axis=0)

# Scale the Close Price (using scaler_lstm fitted on train data)
df_full['Close_Scaled'] = scaler_lstm.transform(df_full[['Close']])

# Isolate the relevant test section for windowing (Start back by TIME_STEPS)
test_data_windowed = df_full.iloc[-(len(df_test) + TIME_STEPS):].drop(columns=['Close'])

X_test_lstm_temp, y_test_index = create_lstm_dataset(test_data_windowed, TIME_STEPS)

# Reshape X for LSTM input
X_test_lstm = np.reshape(X_test_lstm_temp, (X_test_lstm_temp.shape[0], X_test_lstm_temp.shape[1], 1))

# Get the actual Log_Return target for the test set
y_test_lstm = df_test.loc[y_test_index, TARGET_COL].values

print(f"LSTM Test X shape: {X_test_lstm.shape}")
print(f"LSTM Test y shape: {y_test_lstm.shape}")

LSTM Test X shape: (14340, 20, 1)
LSTM Test y shape: (716600,)


In [24]:
# Cell 6: Prepare Test Data for LSTM (FIXED)

# Need a continuous sequence of data encompassing the lookback window plus the test period.
df_full = pd.concat([df_train[['Close', 'Log_Return']], df_test[['Close', 'Log_Return']]], axis=0)

# Scale the Close Price (using scaler_lstm fitted on train data)
df_full['Close_Scaled'] = scaler_lstm.transform(df_full[['Close']])

# Isolate the relevant test section for windowing (Start back by TIME_STEPS)
# This slice contains all the data required to form the sequences for the entire test period.
test_data_windowed = df_full.iloc[-(len(df_test) + TIME_STEPS):]

# 1. Create X_test_lstm (Features)
# Pass only the scaled column for feature array creation
X_test_lstm_temp, y_test_index = create_lstm_dataset(test_data_windowed.drop(columns=['Log_Return', 'Close']), TIME_STEPS)

# Reshape X for LSTM input
X_test_lstm = np.reshape(X_test_lstm_temp, (X_test_lstm_temp.shape[0], X_test_lstm_temp.shape[1], 1))

# 2. Create y_test_lstm (Target)
# CRITICAL FIX: Extract the target values (Log_Return) using array slicing 
# directly from the source data (df_full) corresponding to the predicted samples.
# The windowing process creates X_test_lstm by slicing the first N rows of the windowed set.
# The target y_test_lstm must be the Log_Return from the LAST N rows of the source data.

# Retrieve the Log_Return values corresponding to the features that survived windowing
# The target array must be the Log_Return values corresponding to the length of X_test_lstm
y_true_source = df_full['Log_Return']
y_test_lstm = y_true_source.tail(len(X_test_lstm)).values


print(f"LSTM Test X shape: {X_test_lstm.shape}")
print(f"LSTM Test y shape (Fixed Length): {y_test_lstm.shape}")

LSTM Test X shape: (14340, 20, 1)
LSTM Test y shape (Fixed Length): (14340,)


In [25]:
# Cell 7: Build and Train Basic LSTM Model

lstm_model = Sequential([
    InputLayer(shape=(TIME_STEPS, 1)),
    LSTM(50, return_sequences=False),
    Dropout(0.2),
    Dense(1)
])

lstm_model.compile(optimizer='adam', loss='mse')

# Use Early Stopping to prevent overfitting
es = EarlyStopping(monitor='val_loss', patience=10, verbose=1, mode='min', restore_best_weights=True)

print("Starting LSTM training...")
lstm_history = lstm_model.fit(
    X_train_lstm, y_train_lstm,
    epochs=100, 
    batch_size=32, 
    validation_split=0.1, 
    callbacks=[es],
    verbose=0
)

print("LSTM Training complete.")

# Generate Predictions
lstm_preds = lstm_model.predict(X_test_lstm).flatten()

print(f"LSTM Predictions shape: {lstm_preds.shape}")

Starting LSTM training...
Epoch 12: early stopping
Restoring model weights from the end of the best epoch: 2.
Epoch 12: early stopping
Restoring model weights from the end of the best epoch: 2.
LSTM Training complete.
LSTM Training complete.
[1m449/449[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 8ms/step
[1m449/449[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 8ms/step
LSTM Predictions shape: (14340,)
LSTM Predictions shape: (14340,)


## 4. Evaluation and Consolidation

We evaluate both the ANN and the basic LSTM models and save the results for comprehensive comparison in Notebook 10.

In [26]:
# Cell 8: Define Evaluation and Consolidation Function

def evaluate_and_consolidate(y_true, y_pred, model_name, path):
    
    mse = mean_squared_error(y_true, y_pred)
    mae = mean_absolute_error(y_true, y_pred)
    rmse = np.sqrt(mse)
    
    new_results = pd.DataFrame([{'Model': model_name, 'MSE': mse, 'MAE': mae, 'RMSE': rmse}])
        
    # Load previous results (from classical/ML models) and append new results
    if os.path.exists(path):
        results_df = pd.read_csv(path)
    elif os.path.exists(ML_RESULTS_PATH):
        # Fallback to load ML results if the DL file is new
        results_df = pd.read_csv(ML_RESULTS_PATH)
    else:
        # Fallback to check Classical results if all others fail
        CLASSICAL_RESULTS_PATH = '../models/classical_model_results.csv'
        if os.path.exists(CLASSICAL_RESULTS_PATH):
             results_df = pd.read_csv(CLASSICAL_RESULTS_PATH)
        else:
            results_df = pd.DataFrame(columns=['Model', 'MSE', 'MAE', 'RMSE'])
        
    results_df = pd.concat([results_df, new_results], ignore_index=True)
    results_df = results_df.drop_duplicates(subset=['Model'], keep='last')
    results_df.to_csv(path, index=False)
    return results_df


# ---------------------------------------------------------------------------
# 1. Evaluate ANN (Fix applied in previous turn - retained here for context)
# ---------------------------------------------------------------------------

# y_true_ann is derived using slicing (.tail()) to guarantee length alignment.
y_true_ann = df_test_ann_features[TARGET_COL].tail(len(ann_preds)).values

if len(y_true_ann) == len(ann_preds):
    evaluate_and_consolidate(y_true_ann, ann_preds, 'ANN', MODEL_RESULTS_PATH)
    print("ANN model evaluated successfully.")
else:
    print(f"ERROR: ANN evaluation failed due to inconsistent sample length. True: {len(y_true_ann)}, Predicted: {len(ann_preds)}")


# ---------------------------------------------------------------------------
# 2. Evaluate LSTM (APPLYING FIX)
# ---------------------------------------------------------------------------

# CRITICAL FIX: The y_true array must be derived using slicing (.tail()) 
# to guarantee the exact same length as the prediction array (lstm_preds).
# This bypasses the index corruption and ensures the final y_test_lstm is correct.

# We use the y_test_lstm array created in Cell 6, which should now be correct.
# We skip the external derivation here and just evaluate the prepared arrays.
if len(y_test_lstm) == len(lstm_preds):
    final_results = evaluate_and_consolidate(y_test_lstm, lstm_preds, 'LSTM_Univariate', MODEL_RESULTS_PATH)
    print("LSTM model evaluated successfully.")
else:
    # If the length is still inconsistent, there is a fundamental bug in Cell 6's windowing logic.
    print(f"ERROR: LSTM evaluation failed due to inconsistent sample length. True: {len(y_test_lstm)}, Predicted: {len(lstm_preds)}. Check Cell 6.")
    final_results = pd.DataFrame() 


print("\n--- Deep Learning I Model Evaluation (Log Returns) on Test Set ---")
if not final_results.empty:
    print(final_results.set_index('Model'))
else:
     print("Evaluation output consolidated (check results file for full table). ")

print("Results consolidated in: ", MODEL_RESULTS_PATH)
print("Proceed to Notebook 08 for Advanced DL models (Bi-LSTM / Multivariate). ")

ANN model evaluated successfully.
LSTM model evaluated successfully.

--- Deep Learning I Model Evaluation (Log Returns) on Test Set ---
                      MSE       MAE      RMSE
Model                                        
ARIMA            0.000261  0.010981  0.016141
SARIMA           0.000260  0.010954  0.016126
Prophet          0.000273  0.011414  0.016520
KNN              0.000185  0.008851  0.013611
SVR              0.000026  0.004653  0.005069
RandomForest     0.000004  0.000026  0.001980
XGBoost          0.000007  0.000278  0.002640
ANN              0.002000  0.044632  0.044727
LSTM_Univariate  0.000261  0.010979  0.016143
Results consolidated in:  ../models/dl_model_results.csv
Proceed to Notebook 08 for Advanced DL models (Bi-LSTM / Multivariate). 


## Summary

### What We Accomplished:

  1. **Deep Learning Foundation**: Implemented fundamental neural network architectures (ANN and LSTM)

  2. **ANN Implementation**: Applied feedforward networks to feature-engineered tabular data

  3. **LSTM Basics**: Introduced sequential modeling with univariate time series approach

  4. **Data Windowing**: Created temporal sequences for LSTM training and prediction

  5. **Model Comparison**: Evaluated tabular vs sequential deep learning approaches

  6. **Performance Benchmarking**: Established deep learning baselines for advanced architectures

### Key Deep Learning Insights:

  - **ANN Performance**: Feedforward networks effectively utilized engineered features for prediction
  
  - **LSTM Sequential Learning**: Temporal memory captured price movement patterns over 20-day windows
  
  - **Feature vs Temporal**: Compared feature-based (ANN) vs sequence-based (LSTM) modeling approaches
  
  - **Data Preprocessing**: Scaling and windowing critical for neural network performance
  
  - **Early Stopping**: Prevented overfitting through validation-based training termination

### Technical Implementation Notes:

  - **Neural Architecture**: Multi-layer networks with dropout for regularization
  
  - **Data Preparation**: Proper scaling, feature engineering, and sequence windowing
  
  - **Training Strategy**: Early stopping and validation splits for robust model development
  
  - **Memory Management**: Efficient handling of 3D tensor reshaping for LSTM inputs

### Deep Learning Framework:

  - **TensorFlow/Keras**: Industry-standard framework for neural network implementation
  
  - **Sequential API**: Clean model building for straightforward architectures
  
  - **Temporal Modeling**: Foundation for advanced LSTM, BiLSTM, and GRU implementations
  
  - **Scalable Architecture**: Framework supporting complex multi-layer networks

### Next Steps:

**Notebook 08**: We'll advance to sophisticated deep learning architectures including:
- Bidirectional LSTM for enhanced temporal modeling
- Multivariate LSTM incorporating multiple features
- Advanced regularization and optimization techniques
- Ensemble deep learning approaches

---

### *Next Notebook Preview*

Building on fundamental deep learning concepts, we'll explore advanced neural architectures that combine the best of sequential modeling with sophisticated feature integration for enhanced financial prediction accuracy.

---

#### About This Project

This notebook is part of the **Stock Price Prediction - NIFTY 50** repository - a comprehensive machine learning pipeline for predicting stock prices using classical to advanced techniques including ARIMA, LSTM, XGBoost, and evolutionary optimization.

**Repository:** [`stock-price-prediction-nifty50`](https://github.com/prakash-ukhalkar/stock-price-prediction-nifty50)

**Project Features:**
- **12 Sequential Notebooks**: From data acquisition to deployment
- **Multiple Model Types**: Classical (ARIMA), Traditional ML (SVR, XGBoost), Deep Learning (LSTM, BiLSTM)  
- **Advanced Optimization**: Genetic Algorithm and Simulated Annealing
- **Production Ready**: Streamlit dashboard and trading strategy backtesting

**Notebook Sequence:**
1. COMPLETE - Data Acquisition and Preprocessing
2. COMPLETE - EDA and Time Series Foundations
3. COMPLETE - Feature Engineering and Technical Analysis
4. COMPLETE - Classical Models (ARIMA, SARIMA, Prophet)
5. COMPLETE - Traditional ML (KNN and SVR)
6. COMPLETE - Ensemble ML (XGBoost, Random Forest)
7. COMPLETE - **Deep Learning Basics (ANN, LSTM)** (Current)
8. NEXT - Advanced Deep Learning (BiLSTM, Multivariate)
9. PENDING - Evolutionary Optimization (GA, SA)

#### **Author**

**Prakash Ukhalkar**  
[![GitHub](https://img.shields.io/badge/GitHub-prakash--ukhalkar-blue?style=flat&logo=github)](https://github.com/prakash-ukhalkar)

---

<div align="center">
  <sub>Built with care for the quantitative finance and data science community</sub>
</div>