** *italicized text*BullRun.ai: Hybrid LSTM-Attention Model**  
**Objective**: Predict 5-day forecasts for crypto prices (open, high, low, close) and trading volume using historical data.  


---


### **Key Steps**  
1. **Data Prep**:  
   - **Input**: Daily prices, volume, marketCap for 5 cryptos (Bitcoin, Dogecoin, etc.).  
   - **Processing**: Clean NaNs/zeros, normalize features (MinMaxScaler), create 10-day sliding windows.  


2. **Model**:  
   - **Bidirectional LSTM**: Captures past/future trends in price sequences.  
   - **Attention Layer**: Focuses on critical time steps (e.g., volatility spikes).  
   - **Output**: Predicts all 6 features (open, high, low, close, volume, marketCap).  


3. **Training**:  
   - **Time-Series CV**: Maintains temporal order in train-test splits.  
   - **Regularization**: Dropout (30%) + L2 penalties to prevent overfitting.  
   - **Callbacks**: Early stopping + adaptive learning rates.  


4. **Forecasting**:  
   - Autoregressive prediction: Iteratively feed outputs as inputs for 5-day forecasts.  
   - Full precision preserved (critical for low-value coins like Dogecoin).  


---


### **Results**  
- **Metrics**: MAE, MSE, R² (e.g., Bitcoin R² ~90-95%).  
- **Output**: CSV with 5-day predictions for all cryptos, unrounded floats.  


---


### **Why This Works**  
- Handles multi-feature correlations (price + volume + marketCap).  
- Balances sequential learning (LSTM) and feature weighting (Attention).  
- Adaptable to volatile markets via iterative retraining.  


**Use Case**: Short-term trading strategy planning for crypto investors.


**BullRun.AI: Cryptocurrency Price Prediction Using Hybrid LSTM-Attention Model**


---


### **Objective**
To develop a robust machine learning system that predicts **5-day future prices** (open, high, low, close) and trading volume for multiple cryptocurrencies, using historical market data. The model aims to capture complex temporal patterns in volatile markets while preserving precision for low-value assets like Dogecoin.


---


### **Motivation**
Cryptocurrencies exhibit extreme volatility due to market sentiment, news, and macroeconomic factors. Accurate short-term predictions empower traders and investors to:
- Optimize entry/exit points
- Hedge risks
- Automate trading strategies  
Traditional statistical models (e.g., ARIMA) struggle with multi-feature crypto data. Deep learning architectures like **LSTMs** excel at sequence modeling, while **attention mechanisms** help focus on critical time steps. Combining these techniques improves forecast accuracy.


---


### **Input Data**
**Source**: Historical daily crypto data (CSV format) with columns:  
- `date`, `crypto_name`, `open`, `high`, `low`, `close`, `volume`, `marketCap`  
**Example Row**:  
`2023-01-01, Bitcoin, 45000.25, 45500.30, 44500.15, 45200.50, 2500000000, 850000000000`


**Key Characteristics**:
- Time-series data (date-sorted)
- Multiple cryptocurrencies in one dataset
- High variance in values (e.g., Bitcoin vs. Dogecoin)


---


### **Data Preprocessing**
1. **Filtering**:
   - Select target cryptos (Bitcoin, Litecoin, XRP, Dogecoin, Monero)
   - Remove rows with missing dates or invalid crypto names


2. **Feature Selection**:
   - Retain `open`, `high`, `low`, `close`, `volume`, `marketCap`  
   *Rationale*: These capture price movement and market activity.


3. **Cleaning**:
   - Drop rows with `NaN` or zero volume (invalid trading days)
   - Ensure chronological ordering by date


4. **Normalization**:
   - Apply `MinMaxScaler` to bound values between 0–1  
   *Why?*: LSTMs require normalized data for stable training.  
   *Formula*: \( X_{\text{scaled}} = \frac{X - X_{\min}}{X_{\max} - X_{\min}} \)


5. **Windowing**:
   - Convert time series into overlapping sequences using a **10-day window**  
   *Example*: Days 1–10 → Predict Day 11; Days 2–11 → Predict Day 12  
   *Output Shapes*:  
     - Input (`X`): `(num_samples, 10, 6)`  
     - Target (`y`): `(num_samples, 6)`


---


### **Model Architecture: Hybrid LSTM-Attention**
A bidirectional LSTM processes sequences forward/backward, while attention identifies critical historical patterns.


1. **Layers**:
   - **Input**: 3D tensor `(batch_size, 10, 6)`  
   - **Bidirectional LSTM**: 50 units per direction + dropout (30%) + L2 regularization  
     *Purpose*: Learn long-term dependencies in both temporal directions.  
   - **Attention**: Weights important timesteps dynamically  
     *Mechanism*: \( \text{Attention}(Q, K, V) = \text{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)V \)  
   - **Concatenation**: Merge LSTM outputs with attention context  
   - **Dense Layers**: 64 ReLU units → 6 linear outputs (one per feature)


2. **Training**:
   - **Loss**: Mean Squared Error (MSE)  
   - **Optimizer**: Adam (learning rate = 0.001)  
   - **Callbacks**:  
     - Early stopping (patience=10)  
     - Learning rate reduction on plateau  


3. **Key Innovations**:
   - **Multi-Feature Prediction**: Simultaneously forecasts all 6 features.  
   - **Autoregressive Inference**: Uses predictions as input for subsequent days.  


---


### **Training Process**
1. **Time-Series Cross-Validation**:
   - Split data into 5 folds using `TimeSeriesSplit`  
   *Rationale*: Preserve temporal order during validation.


2. **Batch Training**:
   - Batch size: 32 (or smaller if data is limited)  
   - Epochs: 50 (early stopping may terminate early)


3. **Regularization**:
   - **Dropout**: 30% neuron deactivation to prevent overfitting  
   - **L2 Regularization**: Penalize large weights (\( \lambda = 0.001 \))


---


### **Prediction Workflow**
1. **Last Window Extraction**:
   - Use the most recent 10 days of scaled data  
   *Shape*: `(1, 10, 6)`


2. **Autoregressive Loop**:
   For each of the next 5 days:  
   - Predict all 6 features using the current window  
   - Append prediction to the window  
   - Remove oldest day to maintain 10-day window  


3. **Inverse Scaling**:
   - Convert normalized predictions back to original USD values  


---


### **Handling Low-Value Cryptos (e.g., Dogecoin)**
- **Precision Preservation**:  
  - Avoid rounding during CSV export (`float_format` disabled)  
  - Example: Dogecoin’s predicted `open` = `0.000123456` (not rounded to 0.00)  
- **Scaler Adaptation**:  
  - `MinMaxScaler` handles small values without precision loss.


---


### **Validation & Verification**
1. **Feature Alignment**:
   - Ensure input/output features match (e.g., 6 in → 6 out)  
   - Print actual vs predicted values during evaluation:  
     ```
     open: Predicted 45200.123 vs Actual 45150.456
     marketCap: Predicted 8.5e11 vs Actual 8.6e11
     ```


2. **Metrics**:
   - **MAE**: Average absolute error across all features  
   - **MSE**: Emphasizes larger errors (sensitive to outliers)  
   - **R²**: % variance explained (1.0 = perfect prediction)


3. **Data Sufficiency Checks**:
   - Require ≥11 days of data (10 for window + 1 prediction)  
   - Skip cryptos with insufficient history.


---


### **Output**
- **CSV File**: `cryptoprediction_full_precision.csv`  
- **Columns**: `open, high, low, close, volume, marketCap, crypto_name, date`  
- **Example Prediction**:  
  ```
  open      high       low      close    volume      marketCap   crypto_name   date
  0.000123  0.000135  0.000118  0.000128  15234567   18273645    Dogecoin     2023-12-10
  ```


---


### **Enhancements (Optional)**
1. **Feature Engineering**: Add technical indicators (RSI, Bollinger Bands).  
2. **Sentiment Integration**: Incorporate news/social media sentiment scores.  
3. **Uncertainty Quantification**: Use Monte Carlo dropout for confidence intervals.  
4. **Multi-Head Attention**: Capture diverse temporal patterns.  


---


### **Conclusion**
This approach provides a scalable framework for multi-crypto forecasting by:  
- Leveraging bidirectional LSTMs for temporal dynamics  
- Using attention to focus on critical historical patterns  
- Ensuring precision for low-value assets  
- Validating feature integrity at every stage  



In [4]:
# If you're in Colab or a new environment, uncomment to install packages:
!pip install pandas numpy matplotlib scikit-learn tensorflow

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import TimeSeriesSplit
from tensorflow.keras.models import Model
from tensorflow.keras.layers import LSTM, Dense, Dropout, Attention, Concatenate, Input, Bidirectional
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau
from tensorflow.keras.regularizers import l2
import tensorflow as tf

# If you have a GPU, this prints it out. (Optional in Colab)
device = tf.config.list_physical_devices('GPU')
if device:
    print(f"GPU found: {device}")
else:
    print("No GPU found. Using CPU.")

# Mapping from crypto_name to shorter symbol used in plot filenames
symbol_map = {
    "Bitcoin": "btc",
    "Dogecoin": "doge",
    "Ethereum": "eth",
    "Litecoin": "ltc"
}

def create_sliding_windows(df, window_len=5):
    """
    Create overlapping windows of length `window_len` across df.
    Each window is used as an input sequence for training.
    """
    print(f"[INFO] Creating sliding windows with window_len={window_len} ...")
    data = df.values
    X = []
    for i in range(len(df) - window_len):
        X.append(data[i : i + window_len])
    X = np.array(X)
    print(f"[INFO] Created X shape: {X.shape} (samples, window_len, features)")
    return X

def build_hybrid_model(input_shape):
    """
    Build a Stacked Bidirectional LSTM + Attention model predicting
    the 4 features: [Price, High, Low, Close].
    """
    print("[INFO] Building a stacked Bidirectional LSTM + Attention model...")
    inputs = Input(shape=input_shape)

    # First Bidirectional LSTM
    x = Bidirectional(
        LSTM(
            64,
            return_sequences=True,
            kernel_regularizer=l2(0.001)
        )
    )(inputs)
    x = Dropout(0.2)(x)

    # Second Bidirectional LSTM
    x = Bidirectional(
        LSTM(
            64,
            return_sequences=True,
            kernel_regularizer=l2(0.001)
        )
    )(x)
    x = Dropout(0.2)(x)

    # Attention mechanism
    attention = Attention()([x, x])
    attention = Dense(1, activation='tanh')(attention)

    # Concatenate LSTM output with attention
    concat = Concatenate()([x, attention])
    concat = Dense(64, activation='relu', kernel_regularizer=l2(0.001))(concat)

    # Output layer (predicts all 4 features)
    outputs = Dense(input_shape[-1], activation='linear')(concat[:, -1, :])

    model = Model(inputs=inputs, outputs=outputs)
    model.compile(optimizer=Adam(learning_rate=0.001), loss='mse')
    print("[INFO] Model built and compiled.")
    return model

def evaluate_predictions(actual, preds, prefix=""):
    """
    Compute and print evaluation metrics.
    prefix: Just a string to label each fold's results.
    """
    mae = mean_absolute_error(actual, preds)
    mse = mean_squared_error(actual, preds)
    r2 = r2_score(actual, preds)
    print(f"{prefix}MAE: {mae:.6f}")
    print(f"{prefix}MSE: {mse:.6f}")
    print(f"{prefix}R²: {r2:.6f}")
    return mae, mse, r2

def predict_next_5_days(model, last_window_scaled, scaler=None, feature_cols=None):
    """
    Predict the next 5 days (with full-precision)
    given the last sliding window.
    """
    print("[INFO] Predicting the next 5 days...")
    predictions_scaled = []
    current_window = last_window_scaled.copy()

    for i in range(5):
        next_day_scaled = model.predict(current_window, verbose=0)[0]
        predictions_scaled.append(next_day_scaled)

        # Shift the window and add new day at the end
        new_window = np.roll(current_window, -1, axis=1)
        new_window[0, -1, :] = next_day_scaled
        current_window = new_window

    predictions_scaled = np.array(predictions_scaled)
    if scaler:
        predictions = scaler.inverse_transform(predictions_scaled)
    else:
        predictions = predictions_scaled

    if feature_cols is None:
        feature_cols = [f"f{i}" for i in range(predictions.shape[1])]

    print("[INFO] Predictions for 5 future days generated.")
    return pd.DataFrame(predictions, columns=feature_cols)

def plot_4_subplots_for_7_and_5_days(
    last_7: pd.DataFrame,
    predicted_5: pd.DataFrame,
    crypto: str,
    out_dir: str = ""
):
    """
    Creates one figure with 4 subplots (Price, High, Low, Close).
    - Last 7 days in a solid line
    - Next 5 days in a dashed line
    Saves as <symbol>_prediction.png (e.g., btc_prediction.png).
    """
    features = ["Price", "High", "Low", "Close"]
    fig, axes = plt.subplots(2, 2, figsize=(12, 8), sharex=True)

    for i, feature in enumerate(features):
        ax = axes.flat[i]

        # Plot last 7 days (actual) in solid line
        ax.plot(
            last_7["Date"],
            last_7[feature],
            label="Actual",
            marker='o',
            color='blue'
        )

        # Plot next 5 days (predicted) in dashed line
        ax.plot(
            predicted_5["Date"],
            predicted_5[feature],
            label="Predicted",
            linestyle='--',
            marker='x',
            color='red'
        )

        ax.set_title(feature)
        ax.legend()
        ax.grid(True)
        # Rotate x-ticks
        for tick in ax.get_xticklabels():
            tick.set_rotation(45)

    fig.suptitle(f"{crypto} - Last 7 days + Next 5 days Prediction")
    plt.tight_layout()
    # Adjust top to ensure suptitle isn't clipped
    plt.subplots_adjust(top=0.90)

    symbol = symbol_map.get(crypto, crypto.lower())
    out_file = f"{out_dir}{symbol}_prediction.png"
    plt.savefig(out_file)
    print(f"[INFO] Plot saved as: {out_file}")
    plt.close(fig)


# ====================== MAIN SCRIPT ======================
if __name__ == "__main__":
    print("[INFO] Reading dataset...")
    csv_path = "btc_doge_eth_ltc-alldata.csv"  # Make sure it has the correct columns
    df = pd.read_csv(csv_path)
    print(f"[INFO] Dataset loaded. Shape: {df.shape}")

    # Date handling: parse in DD-MM-YYYY format (dayfirst=True)
    print("[INFO] Converting 'Date' column to datetime (DD-MM-YYYY)...")
    df['Date'] = pd.to_datetime(df['Date'], dayfirst=True, errors='coerce')
    df = df.dropna(subset=['Date'])  # drop rows that fail conversion
    print(f"[INFO] Dataset shape after dropping invalid dates: {df.shape}")

    # Ensure consistent crypto naming (title-case)
    df['crypto_name'] = df['crypto_name'].str.title()

    # List of cryptos present in your dataset
    cryptos = ["Bitcoin", "Dogecoin", "Ethereum", "Litecoin"]
    print(f"[INFO] Cryptos to process: {cryptos}")

    # DataFrame to store all future-day predictions
    all_predictions = pd.DataFrame()

    # For each crypto...
    for crypto in cryptos:
        print(f"\n{'='*40}\n[INFO] PROCESSING {crypto.upper()}\n{'='*40}")

        # Subset the data for the specific crypto
        sub_df = df[df['crypto_name'] == crypto].copy()
        sub_df.set_index('Date', inplace=True)
        sub_df.sort_index(inplace=True)

        # Keep only [Price, High, Low, Close]
        required_features = ['Price', 'High', 'Low', 'Close']
        print("[INFO] Selecting required features:", required_features)
        sub_df = sub_df[required_features].dropna()
        print(f"[INFO] After dropping NaNs, data shape for {crypto}: {sub_df.shape}")

        # We need enough data for a 10-day window + at least 1 test sample
        if len(sub_df) < 11:
            print(f"[WARNING] Insufficient data for {crypto}. "
                  f"Needs at least 11 rows, found {len(sub_df)}. Skipping.")
            continue

        # Scale features
        print("[INFO] Applying MinMaxScaler on 4 features.")
        scaler = MinMaxScaler()
        scaled_data = scaler.fit_transform(sub_df)
        sub_df_scaled = pd.DataFrame(scaled_data, columns=sub_df.columns, index=sub_df.index)
        print("[INFO] Scaling complete. Example of scaled values:")
        print(sub_df_scaled.head())

        # Create windows of length 10
        window_len = 10
        X = create_sliding_windows(sub_df_scaled, window_len)
        # Targets start after the first window_len
        y = sub_df_scaled.iloc[window_len:].values  # shape: (#samples, 4)
        print(f"[INFO] y shape: {y.shape} (samples, features)")

        # TimeSeriesSplit for train/test
        tscv = TimeSeriesSplit(n_splits=5)
        print(f"[INFO] Performing 5-fold TimeSeriesSplit cross-validation on {crypto}...")

        # For collecting cross-validation results
        fold_metrics = []

        # Cross-validation loop
        for fold_idx, (train_index, test_index) in enumerate(tscv.split(X), start=1):
            print(f"\n[INFO] ---- Fold {fold_idx} of 5 ----")
            X_train, X_test = X[train_index], X[test_index]
            y_train, y_test = y[train_index], y[test_index]

            print(f"[INFO] X_train shape: {X_train.shape}, y_train shape: {y_train.shape}")
            print(f"[INFO] X_test  shape: {X_test.shape},  y_test  shape: {y_test.shape}")

            # Build a fresh model each fold
            model = build_hybrid_model((window_len, len(required_features)))

            # Train the model with verbose=1 to see the epoch logs
            print("[INFO] Training model...")
            history = model.fit(
                X_train,
                y_train,
                validation_data=(X_test, y_test),
                epochs=60,  # a bit more than before
                batch_size=32,
                callbacks=[
                    EarlyStopping(patience=10, restore_best_weights=True),
                    ReduceLROnPlateau(factor=0.5, patience=5)
                ],
                verbose=1  # prints full training logs
            )

            print("[INFO] Evaluating fold performance...")
            preds_scaled = model.predict(X_test, verbose=0)
            preds = scaler.inverse_transform(preds_scaled)
            actual = scaler.inverse_transform(y_test)

            # Calculate metrics
            print("[INFO] Fold metrics:")
            mae, mse, r2 = evaluate_predictions(actual, preds, prefix="  ")
            fold_metrics.append((mae, mse, r2))

        # Average metrics over all folds
        if fold_metrics:
            avg_mae = np.mean([m[0] for m in fold_metrics])
            avg_mse = np.mean([m[1] for m in fold_metrics])
            avg_r2 = np.mean([m[2] for m in fold_metrics])
            print(f"\n[INFO] Average Cross-Validation Metrics for {crypto}:")
            print(f"  MAE = {avg_mae:.6f}")
            print(f"  MSE = {avg_mse:.6f}")
            print(f"  R²  = {avg_r2:.6f}")
        else:
            print(f"[WARNING] No metrics computed for {crypto}. Possibly no valid folds.")
            continue

        print("\n[INFO] ----------- FINAL RETRAIN on ALL data -----------")
        # After cross-validation, we build a final model
        # using all samples for training to predict the future.
        model_final = build_hybrid_model((window_len, len(required_features)))
        print("[INFO] Training final model on the entire dataset (X, y) ...")
        model_final.fit(
            X, y,
            epochs=60,
            batch_size=32,
            callbacks=[
                EarlyStopping(patience=10, restore_best_weights=True),
                ReduceLROnPlateau(factor=0.5, patience=5)
            ],
            verbose=1  # prints training logs
        )

        # Predict next 5 days
        print("[INFO] Predicting next 5 days based on last window of scaled data...")
        last_window = sub_df_scaled.tail(window_len).values.reshape(1, window_len, -1)
        predictions = predict_next_5_days(model_final, last_window, scaler, required_features)

        # Create a date range for the next 5 predictions
        predictions['Date'] = pd.date_range(
            start=sub_df.index[-1] + pd.Timedelta(days=1),
            periods=5
        )
        predictions['crypto_name'] = crypto

        # Reorder columns
        predictions = predictions[['Price', 'High', 'Low', 'Close', 'crypto_name', 'Date']]
        all_predictions = pd.concat([all_predictions, predictions], ignore_index=True)
        print("[INFO] 5-day future predictions for", crypto, "complete.")

        # -----------------------------
        # Plot last 7 + next 5 days in 4 subplots
        # -----------------------------
        print("[INFO] Creating a 4-subplot chart for last 7 days + next 5 days ...")
        # Last 7 days (unscaled) from sub_df
        last_7_df = sub_df.tail(7).reset_index()  # keep the 'Date' as a column

        # The "predictions" DF is already unscaled
        predicted_5_df = predictions[['Date','Price','High','Low','Close']].copy()

        # Create the 4-subplot figure
        plot_4_subplots_for_7_and_5_days(
            last_7_df,
            predicted_5_df,
            crypto=crypto
        )

    # Save all predictions to a single CSV
    if not all_predictions.empty:
        out_file = "btc_doge_eth_ltc_predictions.csv"
        print(f"\n[INFO] Saving all predictions to {out_file} ...")
        all_predictions.to_csv(out_file, index=False)
        print("[INFO] Predictions saved successfully. Preview:")
        print(
            all_predictions.head(10).to_string(
                float_format=lambda x: f"{x:.10f}"
            )
        )
    else:
        print("[WARNING] No predictions generated for any crypto.")


GPU found: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
[INFO] Reading dataset...
[INFO] Dataset loaded. Shape: (12950, 6)
[INFO] Converting 'Date' column to datetime (DD-MM-YYYY)...
[INFO] Dataset shape after dropping invalid dates: (12950, 6)
[INFO] Cryptos to process: ['Bitcoin', 'Dogecoin', 'Ethereum', 'Litecoin']

[INFO] PROCESSING BITCOIN
[INFO] Selecting required features: ['Price', 'High', 'Low', 'Close']
[INFO] After dropping NaNs, data shape for Bitcoin: (3812, 4)
[INFO] Applying MinMaxScaler on 4 features.
[INFO] Scaling complete. Example of scaled values:
               Price      High       Low     Close
Date                                              
2014-09-17  0.002635  0.002672  0.002727  0.002355
2014-09-18  0.002325  0.002298  0.002642  0.002251
2014-09-19  0.002045  0.002026  0.002333  0.001984
2014-09-20  0.002178  0.002077  0.002055  0.001943
2014-09-21  0.002083  0.002109  0.002182  0.001843
[INFO] Creating sliding windows with window_len

  current = self.get_monitor_value(logs)
  callback.on_epoch_end(epoch, logs)


[1m119/119[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 13ms/step - loss: 0.0235 - learning_rate: 0.0010
Epoch 3/60
[1m119/119[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 10ms/step - loss: 0.0060 - learning_rate: 0.0010
Epoch 4/60
[1m119/119[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 10ms/step - loss: 0.0033 - learning_rate: 0.0010
Epoch 5/60
[1m119/119[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 10ms/step - loss: 0.0020 - learning_rate: 0.0010
Epoch 6/60
[1m119/119[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 10ms/step - loss: 0.0020 - learning_rate: 0.0010
Epoch 7/60
[1m119/119[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 10ms/step - loss: 0.0013 - learning_rate: 0.0010
Epoch 8/60
[1m119/119[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 10ms/step - loss: 0.0011 - learning_rate: 0.0010
Epoch 9/60
[1m119/119[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 10ms/step - loss: 9.6193e-04 - learning_rate: 0.0010

  current = self.get_monitor_value(logs)
  callback.on_epoch_end(epoch, logs)


[1m83/83[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 10ms/step - loss: 0.0411 - learning_rate: 0.0010
Epoch 3/60
[1m83/83[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 10ms/step - loss: 0.0072 - learning_rate: 0.0010
Epoch 4/60
[1m83/83[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 12ms/step - loss: 0.0028 - learning_rate: 0.0010
Epoch 5/60
[1m83/83[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 15ms/step - loss: 0.0021 - learning_rate: 0.0010
Epoch 6/60
[1m83/83[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 10ms/step - loss: 0.0017 - learning_rate: 0.0010
Epoch 7/60
[1m83/83[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 10ms/step - loss: 0.0012 - learning_rate: 0.0010
Epoch 8/60
[1m83/83[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 10ms/step - loss: 0.0012 - learning_rate: 0.0010
Epoch 9/60
[1m83/83[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 10ms/step - loss: 0.0011 - learning_rate: 0.0010
Epoch 10/60
[1m83/

  current = self.get_monitor_value(logs)
  callback.on_epoch_end(epoch, logs)


[1m83/83[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 15ms/step - loss: 0.0685 - learning_rate: 0.0010
Epoch 3/60
[1m83/83[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 11ms/step - loss: 0.0234 - learning_rate: 0.0010
Epoch 4/60
[1m83/83[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 10ms/step - loss: 0.0123 - learning_rate: 0.0010
Epoch 5/60
[1m83/83[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 10ms/step - loss: 0.0081 - learning_rate: 0.0010
Epoch 6/60
[1m83/83[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 10ms/step - loss: 0.0063 - learning_rate: 0.0010
Epoch 7/60
[1m83/83[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 10ms/step - loss: 0.0045 - learning_rate: 0.0010
Epoch 8/60
[1m83/83[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 10ms/step - loss: 0.0036 - learning_rate: 0.0010
Epoch 9/60
[1m83/83[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 10ms/step - loss: 0.0029 - learning_rate: 0.0010
Epoch 10/60
[1m83/

  current = self.get_monitor_value(logs)
  callback.on_epoch_end(epoch, logs)


[1m119/119[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 10ms/step - loss: 0.0190 - learning_rate: 0.0010
Epoch 3/60
[1m119/119[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 10ms/step - loss: 0.0045 - learning_rate: 0.0010
Epoch 4/60
[1m119/119[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 12ms/step - loss: 0.0024 - learning_rate: 0.0010
Epoch 5/60
[1m119/119[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 10ms/step - loss: 0.0021 - learning_rate: 0.0010
Epoch 6/60
[1m119/119[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 10ms/step - loss: 0.0016 - learning_rate: 0.0010
Epoch 7/60
[1m119/119[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 10ms/step - loss: 0.0018 - learning_rate: 0.0010
Epoch 8/60
[1m119/119[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 10ms/step - loss: 0.0015 - learning_rate: 0.0010
Epoch 9/60
[1m119/119[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 10ms/step - loss: 0.0013 - learning_rate: 0.0010
Epo