In [None]:
"""
# 📈 Professional Stock Market Prediction with LSTM

This notebook provides a complete, production-ready guide to implementing a sophisticated stock prediction model using Long Short-Term Memory (LSTM) networks. The methodology is synthesized from the findings of three key research papers, focusing on the highest-performing architecture and techniques to create a practical and robust tool.

[cite_start]**Core Methodology:** The implementation is primarily based on the work of **Shobayo et al. (2025)**, who achieved an **R-squared of 0.993** and a Mean Absolute Percentage Error (MAPE) of 1.33% with an optimized LSTM model[cite: 220, 221]. [cite_start]This model uses a 60-day time step and incorporates On-Balance Volume (OBV) as a key feature[cite: 221].

### Key Features of this Notebook:
* [cite_start]**Interactive & Customizable:** An interactive widget allows you to easily change the stock ticker and other key parameters[cite: 183].
* **End-to-End Pipeline:** Covers the entire workflow from data acquisition and feature engineering to model training, evaluation, and visualization.
* **Automated Evaluation:** Automatically evaluates the LSTM model against a naive baseline to provide a clear performance benchmark.
* **Robust & Reliable:** Includes input validation, error handling, and clear success/failure feedback for a smooth user experience.
* **Practical Outputs:** Provides functionality to save the trained model, load it for future use, and export predictions to a CSV file.

### Theoretical Backing (Adaptive Markets Hypothesis)
[cite_start]While the traditional Efficient Market Hypothesis (EMH) suggests that market prices are unpredictable, this notebook's approach is better supported by the **Adaptive Markets Hypothesis (AMH)**[cite: 158]. [cite_start]The AMH posits that markets are not perfectly efficient and that temporary inefficiencies arise, which advanced machine learning models like LSTM can learn to exploit for prediction[cite: 159].

---
"""

# @title
# # 📚 Table of Contents
#
# 1.  [Setup & Dependencies](#section1)
# 2.  [Configuration & Parameters](#section2)
# 3.  [Data Acquisition & Exploration](#section3)
# 4.  [Feature Engineering (Technical Indicators)](#section4)
# 5.  [Data Preprocessing](#section5)
# 6.  [LSTM Model Implementation](#section6)
# 7.  [Model Training](#section7)
# 8.  [Prediction & Performance Evaluation](#section8)
#     * [8.1 Baseline Model Comparison](#subsection8_1)
#     * [8.2 Visualizing Results](#subsection8_2)
#     * [8.3 Exporting Predictions](#subsection8_3)
# 9.  [Saving and Loading the Model](#section9)
# 10. [Live Prediction on New Data](#section10)
# 11. [Advanced Topics & Future Work](#section11)
#
# ---

"""
# <a id="section1"></a>
# ## 1. Setup & Dependencies
"""

# @markdown ### What's happening here?
# @markdown We are installing and importing all the necessary Python libraries for our project.
# [cite_start]@markdown - `yfinance`: To fetch historical stock market data from Yahoo Finance[cite: 106, 488].
# [cite_start]@markdown - `pandas` & `numpy`: For data manipulation and numerical operations[cite: 129, 418, 419].
# [cite_start]@markdown - `scikit-learn`: For data preprocessing (scaling) and evaluation metrics[cite: 129, 420].
# [cite_start]@markdown - `tensorflow`: The deep learning framework to build and train our LSTM model[cite: 129].
# @markdown - `plotly`: To create interactive and professional-looking visualizations.
# @markdown
# [cite_start]@markdown The code block also checks for GPU availability to accelerate model training, a key performance optimization technique mentioned in the research[cite: 459].

# @markdown ---
# @markdown **Run this cell to install and import dependencies.**

# Install libraries
!pip install yfinance pandas numpy scikit-learn tensorflow plotly kaleido -q

# Import libraries
import pandas as pd
import numpy as np
import yfinance as yf
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
from tensorflow.keras.optimizers import Adam
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from google.colab import drive, files
import os
import gc

# Suppress TensorFlow warnings
import logging
logging.getLogger('tensorflow').setLevel(logging.ERROR)

# --- GPU Check ---
device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
  print('⚠️ GPU device not found. Training will be on CPU.')
else:
  print(f'✅ Found GPU at: {device_name}. Training will be GPU-accelerated.')

# --- Reproducibility ---
def set_seeds(seed=42):
    os.environ['PYTHONHASHSEED'] = str(seed)
    tf.random.set_seed(seed)
    np.random.seed(seed)

set_seeds()
print("✅ Dependencies installed and imported successfully.")

"""
# <a id="section2"></a>
# ## 2. Configuration & Parameters
"""

# @markdown ### What's happening here?
# @markdown This interactive form centralizes all key parameters, making the notebook highly reusable and easy to customize. The default values are set to the **highest-performing configuration** identified in the research by Shobayo et al. (2025).
# @markdown
# [cite_start]@markdown - **Model:** Optimised LSTM (60-Day Time Step with OBV)[cite: 221, 303].
# @markdown - **Stock:** Enter any valid Yahoo Finance ticker symbol. `AAPL` is the default.
# [cite_start]@markdown - **Date Range:** A long period provides more data for robust training[cite: 512].
# [cite_start]@markdown - **Window Size:** Set to 60, found to be empirically optimal in multiple studies[cite: 62, 270, 341].
# [cite_start]@markdown - **Features:** The list includes `Close` and `OBV`, the combination that yielded an R² of 0.993[cite: 221].
# [cite_start]@markdown - **Hyperparameters:** The values for units, dropout rate, and learning rate are taken directly from the optimized model specifications[cite: 303].

# --- General Configuration ---
STOCK_TICKER = "AAPL" # @param {type:"string"}
START_DATE = "2010-01-01" # @param {type:"date"}
END_DATE = "2024-06-07" # @param {type:"date"}
TRAIN_TEST_SPLIT_RATIO = 0.85 # @param {type:"slider", min:0.7, max:0.9, step:0.05}

# --- Model & Feature Configuration ---
WINDOW_SIZE = 60 # @param {type:"integer"}
# [cite_start]The best model used 'Close' and 'OBV'[cite: 221]. You can add more features here.
# e.g., ['Close', 'OBV', 'RSI', 'MACD']
FEATURES_TO_USE = ['Close', 'OBV']

# --- LSTM Hyperparameters (from Shobayo et al., 2025) ---
LSTM_UNITS = 104 # @param {type:"integer"}
DROPOUT_RATE = 0.266 # @param {type:"number"}
LEARNING_RATE = 0.00369 # @param {type:"number"}
LOSS_FUNCTION = 'mean_squared_error'
OPTIMIZER = Adam(learning_rate=LEARNING_RATE)

# --- Training Configuration ---
EPOCHS = 100 # @param {type:"integer"}
BATCH_SIZE = 32 # @param {type:"integer"}
EARLY_STOPPING_PATIENCE = 10

print("✅ Configuration parameters set.")
print(f"Target Stock: {STOCK_TICKER}")
print(f"Features for Model: {FEATURES_TO_USE}")
print(f"Optimal Window Size: {WINDOW_SIZE} days")

"""
# <a id="section3"></a>
# ## 3. Data Acquisition & Exploration
"""

# @markdown ### What's happening here?
# @markdown We use the `yfinance` library to download historical stock data. This section includes robust error handling to ensure a valid ticker is provided and data is successfully fetched.
# @markdown
# [cite_start]@markdown 1.  **Fetch Data:** `yf.download()` pulls the historical market data[cite: 106].
# @markdown 2.  **Input Validation:** A `try-except` block validates the ticker symbol. If no data is returned, it prints a clear error message and stops execution.
# [cite_start]@markdown 3.  **Inspect Data:** We display the first few rows and check for missing values, which is a critical preprocessing step[cite: 118, 325].
# @markdown 4.  **Visualize:** An interactive candlestick chart helps in understanding the data's trend and volatility.

# --- Fetch Data with Error Handling ---
data = pd.DataFrame()
try:
    data = yf.download(STOCK_TICKER, start=START_DATE, end=END_DATE, progress=False)
    if data.empty:
        raise ValueError(f"No data found for ticker '{STOCK_TICKER}'. It may be an invalid symbol or delisted.")
    print(f"✅ Successfully downloaded {len(data)} data points for {STOCK_TICKER}.")
except Exception as e:
    print(f"❌ Error fetching data: {e}")

# --- Proceed only if data was downloaded successfully ---
if not data.empty:
    # --- Inspect Data ---
    print("\n--- First 5 Rows of Data ---")
    display(data.head())

    print("\n--- Data Info & Missing Values ---")
    # [cite_start]Handling missing values is a critical preprocessing step[cite: 118, 326].
    if data.isnull().sum().any():
        print("Missing values found. Applying dropna().")
        data.dropna(inplace=True)
    else:
        print("No missing values found.")


    # --- Visualize Historical Data ---
    fig_explore = go.Figure(data=[go.Candlestick(x=data.index,
                                               open=data['Open'],
                                               high=data['High'],
                                               low=data['Low'],
                                               close=data['Close'],
                                               name='Candlestick')])

    fig_explore.update_layout(
        title=f'Historical Price Data for {STOCK_TICKER}',
        xaxis_title='Date',
        yaxis_title='Stock Price (USD)',
        xaxis_rangeslider_visible=False,
        template='plotly_dark'
    )
    fig_explore.show()
else:
    print("\nStopping execution due to data fetching failure.")

"""
# <a id="section4"></a>
# ## 4. Feature Engineering (Technical Indicators)
"""

# @markdown ### What's happening here?
# @markdown We derive new features (technical indicators) from the raw price and volume data. [cite_start]The research highlights that these indicators provide the model with crucial insights into market dynamics, momentum, and volatility, thereby enhancing predictive accuracy[cite: 148, 305].
# @markdown
# [cite_start]@markdown We implement functions for each indicator mentioned in the source documents, with their formulas included as comments [cite: 306-320]. [cite_start]The critical feature for our best-performing model is **On-Balance Volume (OBV)**[cite: 221].

if not data.empty:
    # --- Indicator Implementation ---
    def calculate_sma(data, window):
        # [cite_start]Formula: SMA = (P1 + P2 + ... + Pn) / n [cite: 307]
        return data['Close'].rolling(window=window).mean()

    def calculate_rsi(data, window=14):
        # [cite_start]Formula: RSI = 100 - (100 / (1 + RS)) where RS = Avg Gain / Avg Loss [cite: 310]
        delta = data['Close'].diff()
        gain = (delta.where(delta > 0, 0)).rolling(window=window).mean()
        loss = (-delta.where(delta < 0, 0)).rolling(window=window).mean()
        rs = gain / loss
        return 100 - (100 / (1 + rs))

    def calculate_macd(data, short_window=12, long_window=26, signal_window=9):
        # [cite_start]Formula: MACD = EMA_short - EMA_long [cite: 312]
        short_ema = data['Close'].ewm(span=short_window, adjust=False).mean()
        long_ema = data['Close'].ewm(span=long_window, adjust=False).mean()
        macd = short_ema - long_ema
        signal_line = macd.ewm(span=signal_window, adjust=False).mean()
        return macd, signal_line

    def calculate_obv(data):
        # [cite_start]Formula: If Close > Prev_Close, OBV = Prev_OBV + Volume [cite: 316]
        # [cite_start]If Close < Prev_Close, OBV = Prev_OBV - Volume [cite: 317]
        obv = (np.sign(data['Close'].diff()) * data['Volume']).fillna(0).cumsum()
        return obv

    # --- Apply Indicators to DataFrame ---
    data['SMA_15'] = calculate_sma(data, 15)
    data['SMA_45'] = calculate_sma(data, 45)
    data['RSI'] = calculate_rsi(data)
    data['MACD'], data['Signal_Line'] = calculate_macd(data)
    data['OBV'] = calculate_obv(data)

    # Drop initial NaN values created by rolling windows
    data.dropna(inplace=True)

    print("✅ Technical indicators calculated and added to the dataframe.")
    print("--- Data with New Features ---")
    display(data.head())

"""
# <a id="section5"></a>
# ## 5. Data Preprocessing
"""

# @markdown ### What's happening here?
# [cite_start]@markdown This is a critical phase to prepare the data for our LSTM model, following the pipeline described in the research[cite: 323].
# @markdown
# [cite_start]@markdown 1.  **Feature Selection:** We select the subset of features specified in our configuration (`Close` and `OBV` by default)[cite: 228].
# @markdown 2.  **Normalization:** We use `MinMaxScaler` to scale all selected features to a range of [0, 1]. [cite_start]This is vital for LSTM models as it helps with efficient optimization[cite: 152, 334]. [cite_start]We must save the `scaler` objects to reverse the transformation on our predictions later[cite: 192].
# @markdown 3.  **Sequence Generation (Sliding Window):** LSTMs require data in a sequential format. [cite_start]A "sliding window" of `WINDOW_SIZE` (60) days of feature data is used as input (`X`) to predict the closing price of the next day (`y`)[cite: 122, 338].
# [cite_start]@markdown 4.  **Data Splitting:** The data is split chronologically into training and testing sets to prevent data leakage[cite: 126]. [cite_start]The 85/15 split ratio is used as per Shobayo et al.[cite: 517].

if not data.empty:
    # --- 1. Feature Selection ---
    features = data[FEATURES_TO_USE]
    print(f"Selected features for model: {features.columns.to_list()}")

    # --- 2. Normalization ---
    # Separate scalers for features and target for robustness
    feature_scaler = MinMaxScaler(feature_range=(0, 1))
    target_scaler = MinMaxScaler(feature_range=(0, 1))

    scaled_features = feature_scaler.fit_transform(features)
    scaled_target = target_scaler.fit_transform(data[['Close']])

    print("\n--- Data after Min-Max Scaling ---")
    print("Shape of scaled data:", scaled_features.shape)

    # --- 3. Sequence Generation ---
    def create_sequences(data, window_size):
        X, y = [], []
        for i in range(window_size, len(data)):
            X.append(data[i-window_size:i, :])
            y.append(data[i, 0]) # Target is the first column ('Close')
        return np.array(X), np.array(y)

    # Use scaled_features for X and scaled_target for y for alignment
    X, y_unsplit = create_sequences(scaled_features, WINDOW_SIZE)
    # The y created must align with the target scaler, let's create it from the scaled_target
    y = scaled_target[WINDOW_SIZE:].flatten()


    print(f"\n--- Sequences Created (Window Size: {WINDOW_SIZE}) ---")
    print(f"Shape of X (input sequences): {X.shape}")
    print(f"Shape of y (target values): {y.shape}")

    # --- 4. Data Splitting ---
    split_index = int(len(X) * TRAIN_TEST_SPLIT_RATIO)
    X_train, X_test = X[:split_index], X[split_index:]
    y_train, y_test = y[:split_index], y[split_index:]

    print(f"\n--- Data Split (Train: {TRAIN_TEST_SPLIT_RATIO*100:.0f}%, Test: {(1-TRAIN_TEST_SPLIT_RATIO)*100:.0f}%) ---")
    print(f"X_train shape: {X_train.shape}, y_train shape: {y_train.shape}")
    print(f"X_test shape: {X_test.shape}, y_test shape: {y_test.shape}")
    print("✅ Data preprocessing complete.")

    # --- Memory Management ---
    del features, scaled_features, scaled_target, y_unsplit
    gc.collect()
    print("\n🧹 Unused dataframes cleared from memory.")


"""
# <a id="section6"></a>
# ## 6. LSTM Model Implementation
"""

# @markdown ### What's happening here?
# @markdown We define the architecture of our LSTM model using TensorFlow's Keras API. [cite_start]The structure is based on the description of the best-performing model from the research, consisting of two stacked LSTM layers, a dropout layer for regularization, and a dense output layer[cite: 256].
# @markdown
# @markdown 1.  **`Sequential` Model:** A linear stack of layers.
# @markdown 2.  **First `LSTM` Layer:** Receives the input sequences. [cite_start]`return_sequences=True` is crucial for stacking another LSTM layer on top[cite: 53, 240].
# @markdown 3.  **Second `LSTM` Layer:** Learns higher-level temporal representations.
# [cite_start]@markdown 4.  **`Dropout` Layer:** A powerful technique to prevent overfitting by randomly setting a fraction of input units to 0 during training[cite: 54, 241, 439]. [cite_start]The rate of 0.266 is used from the optimized model[cite: 303].
# [cite_start]@markdown 5.  **`Dense` Output Layer:** A single neuron to output the predicted (normalized) stock price[cite: 56, 244].
# [cite_start]@markdown 6.  **`compile()`:** We configure the model for training with the Adam optimizer and Mean Squared Error loss function, as recommended[cite: 57, 58, 292, 293].

if not data.empty:
    # --- Define the Model Architecture ---
    model = Sequential()

    # First LSTM Layer
    model.add(LSTM(units=LSTM_UNITS,
                   return_sequences=True,
                   input_shape=(X_train.shape[1], X_train.shape[2])))

    # Second LSTM Layer
    model.add(LSTM(units=LSTM_UNITS,
                   return_sequences=False))

    # Dropout Layer for regularization
    model.add(Dropout(DROPOUT_RATE))

    # Dense Output Layer
    model.add(Dense(units=1))

    # --- Compile the Model ---
    model.compile(optimizer=OPTIMIZER, loss=LOSS_FUNCTION)

    print("✅ LSTM Model built successfully.")
    model.summary()

"""
# <a id="section7"></a>
# ## 7. Model Training
"""

# @markdown ### What's happening here?
# @markdown Now we train our compiled LSTM model on the preprocessed training data.
# @markdown
# @markdown 1.  **Callbacks:** We define two helpful callbacks:
# [cite_start]@markdown     - **`EarlyStopping`:** This monitors the validation loss and stops the training process if it doesn't improve for a set number of epochs (`patience`)[cite: 61, 268, 440]. This is a critical best practice to prevent overfitting and save computation time.
# @markdown     - **`ModelCheckpoint`:** This saves the best version of the model to a file during training.
# @markdown 2.  **`model.fit()`:** This is the main training function. [cite_start]We pass the training data, epochs, batch size, and a validation split to monitor performance on unseen data during training[cite: 131].
# @markdown 3.  **Visualize Training History:** After training, we plot the training vs. validation loss. [cite_start]This plot is essential for diagnosing issues like overfitting or underfitting[cite: 137, 162, 164].

if not data.empty:
    # --- Define Callbacks ---
    early_stop = EarlyStopping(monitor='val_loss',
                               patience=EARLY_STOPPING_PATIENCE,
                               restore_best_weights=True,
                               verbose=1)

    # Mount Google Drive to save the model
    try:
        drive.mount('/content/drive', force_remount=True)
        model_path = f'/content/drive/MyDrive/{STOCK_TICKER}_best_model.h5'
        checkpoint = ModelCheckpoint(filepath=model_path,
                                     monitor='val_loss',
                                     save_best_only=True,
                                     verbose=1)
        print("✅ Google Drive mounted successfully.")
    except Exception as e:
        print(f"❌ Error mounting Google Drive: {e}. Model will not be saved.")
        model_path = None
        checkpoint = [] # No checkpoint if drive fails

    # --- Train the Model ---
    print("\n--- Starting Model Training ---")
    history = model.fit(
        X_train, y_train,
        epochs=EPOCHS,
        batch_size=BATCH_SIZE,
        validation_split=0.1,
        callbacks=[early_stop] + ([checkpoint] if model_path else []),
        verbose=1
    )
    print("\n--- Model Training Complete ---")


    # --- Visualize Training History ---
    fig_history = go.Figure()
    fig_history.add_trace(go.Scatter(y=history.history['loss'], name='Training Loss'))
    fig_history.add_trace(go.Scatter(y=history.history['val_loss'], name='Validation Loss'))
    fig_history.update_layout(title='Model Training and Validation Loss Over Epochs',
                              xaxis_title='Epochs',
                              yaxis_title='Loss (MSE)',
                              template='plotly_dark')
    fig_history.show()

"""
# <a id="section8"></a>
# ## 8. Prediction & Performance Evaluation
"""

# @markdown ### What's happening here?
# @markdown With our model trained, we evaluate its performance on the unseen test dataset to understand how well it generalizes.
# @markdown
# [cite_start]@markdown 1.  **Make Predictions:** We use `model.predict()` on `X_test` to get the forecasts[cite: 134].
# [cite_start]@markdown 2.  **Inverse Transform:** We convert the normalized predictions and actual values back to their original price scale for interpretation[cite: 135, 192].
# [cite_start]@markdown 3.  **Calculate Metrics:** We compute a comprehensive suite of evaluation metrics as recommended by the research, including RMSE, MAE, MAPE, and R-squared [cite: 41, 43, 44, 45, 468-484].
# @markdown 4.  **Baseline Comparison:** We compare the LSTM model's performance against a simple "Naive Forecast" baseline (predicting today's price is the same as yesterday's). This helps quantify the value added by our complex model.

if not data.empty:
    # --- 1. Make Predictions ---
    predicted_prices_scaled = model.predict(X_test)

    # --- 2. Inverse Transform ---
    predicted_prices = target_scaler.inverse_transform(predicted_prices_scaled)
    actual_prices = target_scaler.inverse_transform(y_test.reshape(-1, 1))

    # --- 3. Calculate Performance Metrics ---
    def calculate_metrics(y_true, y_pred):
        rmse = np.sqrt(mean_squared_error(y_true, y_pred))
        mae = mean_absolute_error(y_true, y_pred)
        mape = np.mean(np.abs((y_true - y_pred) / y_true)) * 100
        r2 = r2_score(y_true, y_pred)
        return rmse, mae, mape, r2

    lstm_rmse, lstm_mae, lstm_mape, lstm_r2 = calculate_metrics(actual_prices, predicted_prices)


    # <a id="subsection8_1"></a>
    # ### 8.1 Baseline Model Comparison

    # --- Naive Forecast Baseline ---
    # The naive forecast predicts the price of day t to be the price of day t-1.
    # We need the 'Close' price from the original dataframe corresponding to the test set.
    test_data_start_index = split_index + WINDOW_SIZE
    # The actual prices for the test set start at test_data_start_index
    # The naive prediction for the first test day is the close price of the day before it.
    naive_predictions = data['Close'].iloc[test_data_start_index - 1 : -1].values

    # Ensure the lengths match
    naive_actuals = actual_prices[:len(naive_predictions)]

    naive_rmse, naive_mae, naive_mape, naive_r2 = calculate_metrics(naive_actuals, naive_predictions)

    # --- Display Metrics in a Comparison Table ---
    metrics_data = {
        'Metric': ['RMSE (USD)', 'MAE (USD)', 'MAPE (%)', 'R-squared (R²)'],
        'LSTM Model': [f"{lstm_rmse:.4f}", f"{lstm_mae:.4f}", f"{lstm_mape:.4f}", f"{lstm_r2:.4f}"],
        'Naive Baseline': [f"{naive_rmse:.4f}", f"{naive_mae:.4f}", f"{naive_mape:.4f}", f"{naive_r2:.4f}"],
        'Benchmark (Shobayo)': ['0.018522 (scaled)', 'N/A', '1.33', '0.993']
    }
    metrics_df = pd.DataFrame(metrics_data)

    print("\n--- Model Performance Comparison ---")
    display(metrics_df.style.hide_index())

    # <a id="subsection8_2"></a>
    # ### 8.2 Visualizing Results

    # --- Visualize Predictions vs Actual Prices ---
    test_dates = data.index[split_index + WINDOW_SIZE:]

    fig_results = go.Figure()
    fig_results.add_trace(go.Scatter(x=test_dates, y=actual_prices.flatten(),
                                     mode='lines', name='Actual Prices',
                                     line=dict(color='cyan')))
    fig_results.add_trace(go.Scatter(x=test_dates, y=predicted_prices.flatten(),
                                     mode='lines', name='LSTM Predicted Prices',
                                     line=dict(color='magenta')))

    fig_results.update_layout(
        title=f'LSTM Prediction vs. Actual Prices for {STOCK_TICKER}',
        xaxis_title='Date',
        yaxis_title='Stock Price (USD)',
        template='plotly_dark',
        legend=dict(yanchor="top", y=0.99, xanchor="left", x=0.01)
    )
    fig_results.show()

    # <a id="subsection8_3"></a>
    # ### 8.3 Exporting Predictions

    # --- Create a DataFrame with predictions and offer for download ---
    predictions_df = pd.DataFrame({
        'Date': test_dates,
        'Actual_Price': actual_prices.flatten(),
        'Predicted_Price': predicted_prices.flatten()
    })

    # Save to CSV
    csv_filename = f'{STOCK_TICKER}_predictions.csv'
    predictions_df.to_csv(csv_filename, index=False)

    print(f"\n✅ Predictions saved to '{csv_filename}'.")
    # Offer file for download in Colab
    try:
      files.download(csv_filename)
      print(f"Download started for {csv_filename}.")
    except Exception as e:
      print(f"Could not automatically download file. Please find it in the Colab file explorer to the left.")


"""
# <a id="section9"></a>
# ## 9. Saving and Loading the Model
"""

# @markdown ### What's happening here?
# @markdown After training, we save the model for future use without retraining.
# @markdown
# @markdown 1.  **Saving:** The `ModelCheckpoint` callback has already saved the best model to Google Drive.
# @markdown 2.  **Loading:** We show how to load this saved model using `tf.keras.models.load_model()`.
# @markdown 3.  **Verification:** We use the loaded model to make a prediction and compare it to the original model's prediction to ensure they are identical.

if not data.empty and model_path and os.path.exists(model_path):
    print(f"The best model has been saved to: {model_path}")

    # --- Load the Saved Model ---
    try:
        loaded_model = tf.keras.models.load_model(model_path)
        print("\n✅ Successfully loaded the saved model from Google Drive.")
    except Exception as e:
        print(f"❌ Error loading model: {e}")
        loaded_model = None

    # --- Verification ---
    if loaded_model:
        original_prediction = model.predict(X_test[:1], verbose=0)
        loaded_prediction = loaded_model.predict(X_test[:1], verbose=0)

        if np.isclose(original_prediction, loaded_prediction):
            print("✅ Verification successful: Loaded model prediction matches original model.")
        else:
            print("❌ Verification failed: Predictions do not match.")
else:
    print("Skipping model loading as the model file was not saved or found.")


"""
# <a id="section10"></a>
# ## 10. Live Prediction on New Data
"""

# @markdown ### What's happening here?
# @markdown This function simulates a real-time prediction for the next trading day. It fetches the latest available data, processes it using the same pipeline (feature calculation, scaling, sequencing), and feeds it into the trained model to generate a forecast.

def predict_next_day(ticker, model_to_use, feature_scaler_to_use, target_scaler_to_use):
    """
    Predicts the next day's closing price for a given stock.
    """
    print(f"\n--- Generating Live Prediction for {ticker} ---")
    try:
        # 1. Fetch latest data (enough to calculate indicators and form a window)
        latest_data = yf.download(ticker, period='6mo', progress=False)
        if latest_data.empty:
          print("❌ Could not fetch latest data.")
          return None

        # 2. Calculate necessary features
        latest_data['OBV'] = calculate_obv(latest_data)
        latest_data.dropna(inplace=True)

        # 3. Get the last WINDOW_SIZE days
        last_window_data = latest_data[FEATURES_TO_USE].tail(WINDOW_SIZE)
        if len(last_window_data) < WINDOW_SIZE:
            print("❌ Not enough recent data to form a full prediction window.")
            return None

        # 4. Scale the data using the *same* scaler from training
        scaled_window = feature_scaler_to_use.transform(last_window_data)

        # 5. Reshape for the model
        X_pred = np.reshape(scaled_window, (1, WINDOW_SIZE, len(FEATURES_TO_USE)))

        # 6. Predict
        predicted_scaled_price = model_to_use.predict(X_pred, verbose=0)

        # 7. Inverse transform to get the actual price
        predicted_price = target_scaler_to_use.inverse_transform(predicted_scaled_price)

        return predicted_price[0][0]

    except Exception as e:
        print(f"❌ An error occurred during live prediction: {e}")
        return None

if 'loaded_model' in locals() and loaded_model is not None:
    next_day_prediction = predict_next_day(STOCK_TICKER, loaded_model, feature_scaler, target_scaler)
    if next_day_prediction:
        print(f"\n BOLD_TEXT✅ Predicted closing price for the next trading day for {STOCK_TICKER}: ${next_day_prediction:.2f}")

"""
# <a id="section11"></a>
# ## 11. Advanced Topics & Future Work
"""

# @markdown This section explores advanced concepts and future development directions discussed in the source documents.
# @markdown
# @markdown ### Sentiment Analysis Integration
# [cite_start]@markdown The research by Chaudhary (2022) showed that integrating sentiment scores from news can improve accuracy by 8-12%[cite: 213]. A full implementation would require:
# [cite_start]@markdown 1.  **Data Acquisition:** Using a news API (e.g., Bloomberg, Reuters) to fetch real-time text data[cite: 187].
# [cite_start]@markdown 2.  **Temporal Alignment:** Aggregating sentiment scores (using a tool like VADER) for each trading day[cite: 115, 116].
# [cite_start]@markdown 3.  **Integration:** Adding the normalized sentiment score as an additional feature to the model[cite: 132].
# @markdown
# @markdown ### API Deployment for Real-time Inference
# [cite_start]@markdown For production use, the model should be deployed as a RESTful API[cite: 431]. This would allow trading platforms and dashboards to request predictions programmatically. A simple implementation could use a web framework like Flask or FastAPI.
# @markdown
# @markdown ### Robustness: Fallback Data Sources
# @markdown For a mission-critical system, relying on a single data source is risky. A robust implementation would include fallback logic.
# @markdown ```python
# @markdown def get_data_robust(ticker):
# @markdown     try:
# @markdown         # Try primary source
# @markdown         data = yf.download(ticker)
# @markdown         if data.empty: raise ValueError("No data from yfinance")
# @markdown         return data
# @markdown     except Exception as e:
# @markdown         print(f"Primary source failed: {e}. Trying fallback...")
# @markdown         # Try secondary source (e.g., Alpha Vantage, IEX Cloud)
# @markdown         # data = alpha_vantage.get_daily(...)
# @markdown         return data # or None if all fail
# @markdown ```
# @markdown
# @markdown ### Memory Management for Large Datasets
# @markdown While stock data is typically manageable, for extremely large datasets (e.g., tick-level data over decades), memory can be an issue.
# @markdown - **Data Types:** Use memory-efficient data types in pandas (e.g., `float32` instead of `float64`).
# @markdown - **Garbage Collection:** We use `del` and `gc.collect()` to manually free up memory after variables are no longer needed.
# @markdown - **Generators/tf.data:** For datasets that don't fit in memory, use Python generators or the `tf.data` API to feed data to the model in batches without loading everything at once.