# Time-Series-Forecasting

## 0. Research

## 0.1. Which neural networks could be used?

I have a set with Time series data of the bitcoin hourly price for the last  years. The set hat 500000 rows. I want to train a neural network to predict the price in the next hour. I want to use also some technical indicators like RSI, SMA, EMA, MACD. I want to use also Seasonality and Trends, Autocorellations.


To predict the next hourly Bitcoin price using your dataset, you can choose from several neural network architectures tailored to time-series forecasting. Here’s an overview of suitable networks and how they can incorporate technical indicators, seasonality, trends, and autocorrelations:

---

### 1. **Recurrent Neural Networks (RNNs)**
   - **Simple RNN**: Captures sequential dependencies but may struggle with long-term dependencies.
   - **Long Short-Term Memory (LSTM)**:
     - Designed to address the vanishing gradient problem, making it better for long-term dependencies in time series.
     - Use to model autocorrelations and temporal patterns effectively.
   - **Gated Recurrent Units (GRU)**:
     - Similar to LSTMs but with fewer parameters, often faster to train.

   **How to use:** 
   - Input historical prices and technical indicators as features.
   - Augment the input with Fourier-transformed or seasonal components for seasonality.
   - Use lagged features for autocorrelations.

---

### 2. **Convolutional Neural Networks (CNNs)**
   - Surprisingly effective for time-series forecasting by capturing local temporal patterns.
   - Use 1D convolutions to extract patterns from sequences of technical indicators and price history.

   **How to use:**
   - Stack CNN layers followed by fully connected layers.
   - Use sliding windows of input data (e.g., the past 24 hours of features).

---

### 3. **Temporal Convolutional Networks (TCNs)**
   - A modern alternative to RNNs with dilated convolutions to capture long-term dependencies.
   - Handles autoregressive tasks efficiently.

   **How to use:**
   - Use technical indicators and trends as input features.
   - Extract temporal dependencies without requiring sequential processing.

---

### 4. **Transformer Models**
   - Initially designed for NLP but now adapted for time-series forecasting.
   - **Time-Series Transformers (TSTs)** are specifically tailored for this purpose.

   **How to use:**
   - Incorporate positional encoding for temporal information.
   - Use self-attention to weigh the importance of past data points, technical indicators, and trends.

---

### 5. **Hybrid Architectures**
   - Combine models to leverage strengths of each:
     - **CNN + LSTM**: Use CNN layers to extract local patterns, followed by LSTMs for temporal dependencies.
     - **Attention Mechanisms in LSTMs**: Enhance LSTM predictions by focusing on relevant time steps.
     - **Transformer + LSTM/CNN**: Use Transformers for long-term dependencies and CNNs for feature extraction.

---

### 6. **Autoregressive Models with Neural Networks**
   - Combine autoregressive (AR) components with deep learning.
   - Neural networks model nonlinearities, while AR captures autocorrelation directly.

   **How to use:**
   - Incorporate lagged prices, technical indicators, and residuals from trend decomposition.

---

### 7. **DeepAR or Probabilistic Models**
   - Predict probabilistic distributions instead of point estimates.
   - Useful for capturing uncertainty in volatile datasets like Bitcoin prices.

---

### 8. **Graph Neural Networks (GNNs)**
   - Incorporate relationships between features (e.g., correlations among technical indicators).
   - Less common but can add a new perspective.

---

### Incorporating Seasonality, Trends, and Autocorrelations:
1. **Seasonality**:
   - Use Fourier transformations or seasonal decomposition (STL) to extract periodic components.
   - Include these as additional input features.

2. **Trends**:
   - Detrend the data and feed both original and detrended values into the model.
   - Use techniques like Hodrick-Prescott filter to isolate trends.

3. **Autocorrelations**:
   - Include lagged features or rolling statistics (e.g., past n-hour averages, autocorrelation lags).

---

### Tools and Frameworks:
- **Keras/TensorFlow**: Flexible and user-friendly.
- **PyTorch**: Excellent for custom and advanced models.
- **Darts, GluonTS**: Frameworks focused on time-series forecasting.

Let me know if you'd like detailed code or a step-by-step guide for any specific architecture!

## 02. Hybrid Architectures

I want to use a Hybrid Architectures. Give me more information 

Hybrid architectures combine the strengths of multiple neural network types to handle the complex dynamics of time-series data, such as trends, seasonality, and autocorrelations. Here’s how you can design and implement a hybrid architecture for your Bitcoin hourly price prediction:

---

### **Components of a Hybrid Architecture**

### 1. **CNN for Feature Extraction**
   - **Purpose**: Captures short-term patterns and local temporal dependencies from the data.
   - **Implementation**:
     - Use 1D convolutions over the time-series data (e.g., past 24–48 hours).
     - Extract features from technical indicators (RSI, SMA, EMA, MACD) and raw prices.

   **Key Hyperparameters**:
   - Kernel size: Determines how many time steps are analyzed together.
   - Strides: Controls the overlap between convolution windows.

   **Output**:
   - Feature maps summarizing localized patterns in the data.

---

### 2. **LSTM/GRU for Temporal Dependencies**
   - **Purpose**: Captures long-term dependencies and sequential trends in the data.
   - **Implementation**:
     - Feed the feature maps from the CNN into an LSTM or GRU.
     - Use the memory state to capture temporal relationships in extracted features.

   **Key Hyperparameters**:
   - Number of units: Controls the memory capacity of the network.
   - Dropout: Prevents overfitting by randomly deactivating neurons.

   **Output**:
   - Temporal feature representations.

---

### 3. **Attention Mechanism**
   - **Purpose**: Highlights the most relevant parts of the sequence for the prediction.
   - **Implementation**:
     - Add an attention layer on top of the LSTM/GRU output.
     - Compute a weighted sum of the LSTM outputs, where weights are determined dynamically.

   **Key Benefit**:
   - Allows the model to focus on important time steps (e.g., recent volatility spikes).

---

### 4. **Dense (Fully Connected) Layers**
   - **Purpose**: Integrates all extracted features and generates the final prediction.
   - **Implementation**:
     - Combine outputs from CNN, LSTM, and attention layers.
     - Add dense layers with ReLU activations.
     - Use a linear activation for the output layer (for price regression).

---

### 5. **Additional Components**
   - **Seasonality and Trends**:
     - Decompose the series using STL or Fourier Transform.
     - Add the decomposed components as input features.
   - **Lagged Features**:
     - Include rolling statistics, such as moving averages or lagged values.
   - **External Data**:
     - Incorporate macroeconomic indicators, sentiment analysis, or news-derived features if available.

---

### **Architecture Design**
Here’s a step-by-step implementation plan:

### Input Data:
1. **Raw Data**: Hourly Bitcoin prices.
2. **Technical Indicators**: RSI, SMA, EMA, MACD.
3. **Additional Features**: Lagged features, seasonality, and trends.

---

### Model Architecture:
```python
from tensorflow.keras.models import Model
from tensorflow.keras.layers import (
    Input, Conv1D, MaxPooling1D, LSTM, GRU, Dense, Dropout, Flatten, Attention, Concatenate
)

# Input layer
input_layer = Input(shape=(sequence_length, num_features))

# CNN for feature extraction
cnn_layer = Conv1D(filters=32, kernel_size=3, activation='relu')(input_layer)
cnn_layer = MaxPooling1D(pool_size=2)(cnn_layer)
cnn_layer = Flatten()(cnn_layer)

# LSTM for temporal dependencies
lstm_layer = LSTM(64, return_sequences=True)(input_layer)
attention_output = Attention()([lstm_layer, lstm_layer])  # Self-attention mechanism
lstm_output = Flatten()(attention_output)

# Combine CNN and LSTM outputs
combined = Concatenate()([cnn_layer, lstm_output])

# Fully connected layers
dense_layer = Dense(128, activation='relu')(combined)
dense_layer = Dropout(0.2)(dense_layer)
dense_layer = Dense(64, activation='relu')(dense_layer)

# Output layer
output_layer = Dense(1, activation='linear')(dense_layer)

# Build model
model = Model(inputs=input_layer, outputs=output_layer)
model.compile(optimizer='adam', loss='mse', metrics=['mae'])

model.summary()
```

---

### **Training Plan**
1. **Data Preprocessing**:
   - Normalize the data to a range of [0, 1].
   - Create sequences of `n` past hours as input and the next hour’s price as the target.
   - Train-test split (e.g., 80%-20%).
   - Handle missing data and outliers if present.

2. **Hyperparameter Tuning**:
   - Sequence length: Test with different window sizes (e.g., 24, 48, 72 hours).
   - Batch size: Experiment with sizes like 32, 64, 128.
   - Optimizer: Use Adam with learning rate scheduling.

3. **Evaluation Metrics**:
   - Root Mean Squared Error (RMSE).
   - Mean Absolute Error (MAE).
   - Mean Absolute Percentage Error (MAPE).

---

### **Benefits of Hybrid Architectures**
- **CNN**: Efficiently captures short-term fluctuations and relationships between technical indicators.
- **LSTM/GRU**: Models temporal dependencies and autocorrelations effectively.
- **Attention**: Focuses on critical moments in the sequence.
- **Integration of Seasonality/Trends**: Improves long-term forecasting accuracy.

This setup is versatile and can adapt to various time-series forecasting challenges, such as volatility and seasonal variations. Let me know if you'd like help with specific implementation details!

## 03. Preparing of the dataset

Give me more details for the preparing of the dataset. My dataset hat the following columns: [timestamp,	open, high, low,	close, volume]. Step by step pls

Preparing your dataset for a neural network involves several steps, including feature engineering, scaling, and splitting the data into training and testing sets. Here’s a step-by-step guide tailored to your Bitcoin dataset with columns `[timestamp, open, high, low, close, volume]`.

---

### **Step 1: Import Libraries**
```python
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
```

---

### **Step 2: Load the Dataset**
```python
# Load your dataset
df = pd.read_csv('bitcoin_data.csv')

# Convert the timestamp column to a datetime object
df['timestamp'] = pd.to_datetime(df['timestamp'])

# Sort the data by timestamp (if not already sorted)
df = df.sort_values(by='timestamp').reset_index(drop=True)
```

---

### **Step 3: Feature Engineering**
### 3.1. **Add Technical Indicators**
Use libraries like `ta` to calculate technical indicators:
```python
import ta

# Relative Strength Index (RSI)
df['RSI'] = ta.momentum.RSIIndicator(close=df['close'], window=14).rsi()

# Simple Moving Average (SMA)
df['SMA'] = df['close'].rolling(window=14).mean()

# Exponential Moving Averages (EMA12, EMA26)
df['EMA12'] = ta.trend.EMAIndicator(close=df['close'], window=12).ema_indicator()
df['EMA26'] = ta.trend.EMAIndicator(close=df['close'], window=26).ema_indicator()

# MACD and Signal Line
macd = ta.trend.MACD(close=df['close'])
df['MACD'] = macd.macd()
df['Signal_Line'] = macd.macd_signal()
```

---

### 3.2. **Add Lagged Features**
Lagged features capture autocorrelation and temporal patterns.
```python
# Create lagged close prices (e.g., past 3 hours)
for lag in range(1, 4):
    df[f'lag_{lag}'] = df['close'].shift(lag)

# Create lagged volume
for lag in range(1, 4):
    df[f'volume_lag_{lag}'] = df['volume'].shift(lag)
```

---

### 3.3. **Add Rolling Statistics**
Rolling statistics capture trends and smoothing.
```python
# Rolling mean and standard deviation of close price
df['rolling_mean'] = df['close'].rolling(window=24).mean()  # 24-hour rolling mean
df['rolling_std'] = df['close'].rolling(window=24).std()   # 24-hour rolling std
```

---

### 3.4. **Add Time-Based Features**
Extract seasonal and cyclical patterns from the timestamp.
```python
# Hour, day of the week, day of the month, etc.
df['hour'] = df['timestamp'].dt.hour
df['day_of_week'] = df['timestamp'].dt.dayofweek
df['month'] = df['timestamp'].dt.month

# Add cyclical transformations for time-based features
df['hour_sin'] = np.sin(2 * np.pi * df['hour'] / 24)
df['hour_cos'] = np.cos(2 * np.pi * df['hour'] / 24)
df['day_of_week_sin'] = np.sin(2 * np.pi * df['day_of_week'] / 7)
df['day_of_week_cos'] = np.cos(2 * np.pi * df['day_of_week'] / 7)
```

---

### **Step 4: Create Target Variable**
### Predict the **next hour’s price**:
```python
# Target: Next hour's close price
df['target'] = df['close'].shift(-1)
```

---

### **Step 5: Handle Missing Values**
Some features (e.g., RSI, lagged values) will have NaN values due to the calculation windows.
```python
# Drop rows with NaN values
df = df.dropna().reset_index(drop=True)
```

---

### **Step 6: Scale the Data**
Neural networks perform better when data is scaled. Use `MinMaxScaler` for normalization.
```python
scaler = MinMaxScaler()

# Select features for scaling
features_to_scale = ['open', 'high', 'low', 'close', 'volume', 
                     'RSI', 'SMA', 'EMA12', 'EMA26', 'MACD', 'Signal_Line', 
                     'rolling_mean', 'rolling_std'] + \
                     [f'lag_{lag}' for lag in range(1, 4)] + \
                     [f'volume_lag_{lag}' for lag in range(1, 4)]

# Scale features
df[features_to_scale] = scaler.fit_transform(df[features_to_scale])
```

---

### **Step 7: Create Sequences**
Neural networks like CNNs, LSTMs, and GRUs require sequential input data. Convert your data into overlapping sequences.

### Define Sequence Length:
- For example, use the past **24 hours** to predict the next hour.
```python
sequence_length = 24

# Prepare sequences
X = []
y = []

for i in range(len(df) - sequence_length):
    X.append(df.iloc[i:i + sequence_length][features_to_scale].values)
    y.append(df.iloc[i + sequence_length]['target'])

X = np.array(X)
y = np.array(y)
```

---

### **Step 8: Train-Test Split**
Split your dataset into training, validation, and testing sets.
```python
# Train-test split (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=False)

# Further split the training set into training and validation sets (80% train, 20% validation)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.2, shuffle=False)
```

---

### **Step 9: Prepare for Neural Networks**
Ensure the data is ready for your hybrid neural network:
- **Input Shape**: `(batch_size, sequence_length, num_features)`
- **Output**: Single value (the next hour’s price).

```python
print("X_train shape:", X_train.shape)  # (samples, sequence_length, num_features)
print("y_train shape:", y_train.shape)  # (samples,)
```

---

### **Next Steps**
- Use this preprocessed dataset to train your hybrid architecture.
- Tune hyperparameters, monitor training performance using RMSE or MAE, and adjust features as needed.

Let me know if you'd like help with a specific model implementation!

## 04. Training. Hyperparameter tuning.

Give an example how to use the data for the training. How can i tune the hyperparameters. Step by step pls

Here’s a step-by-step guide on how to use your prepared dataset for training a neural network and tuning hyperparameters:

---

### **Step 1: Import Libraries**
Start by importing the necessary libraries for building, training, and evaluating the model.

```python
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout, Conv1D, MaxPooling1D, Flatten, Input, Concatenate
from tensorflow.keras.callbacks import EarlyStopping
from sklearn.metrics import mean_squared_error, mean_absolute_error
from sklearn.model_selection import train_test_split
import numpy as np
```

---

### **Step 2: Define Model Architecture**
Let’s use a hybrid CNN + LSTM architecture for time-series forecasting.

```python
def build_model(sequence_length, num_features):
    # Input layer
    input_layer = Input(shape=(sequence_length, num_features))
    
    # CNN for feature extraction
    cnn_layer = Conv1D(filters=32, kernel_size=3, activation='relu')(input_layer)
    cnn_layer = MaxPooling1D(pool_size=2)(cnn_layer)
    cnn_layer = Flatten()(cnn_layer)
    
    # LSTM for temporal dependencies
    lstm_layer = LSTM(64, return_sequences=False)(input_layer)
    
    # Combine CNN and LSTM outputs
    combined = Concatenate()([cnn_layer, lstm_layer])
    
    # Fully connected layers
    dense_layer = Dense(128, activation='relu')(combined)
    dense_layer = Dropout(0.2)(dense_layer)
    dense_layer = Dense(64, activation='relu')(dense_layer)
    
    # Output layer (regression)
    output_layer = Dense(1, activation='linear')(dense_layer)
    
    # Build and compile the model
    model = tf.keras.Model(inputs=input_layer, outputs=output_layer)
    model.compile(optimizer='adam', loss='mse', metrics=['mae'])
    return model
```

---

### **Step 3: Initialize the Model**
Build your model by specifying the input shape based on your data.

```python
sequence_length = X_train.shape[1]  # Length of the sequences (e.g., 24 hours)
num_features = X_train.shape[2]  # Number of features in the dataset

model = build_model(sequence_length, num_features)
model.summary()
```

---

### **Step 4: Train the Model**
Set up training parameters and callbacks to monitor performance.

### Early Stopping
Early stopping prevents overfitting by stopping training when the validation performance stops improving.

```python
early_stopping = EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)
```

### Training
Train the model with your training data, using validation data to track performance.

```python
history = model.fit(
    X_train, y_train,
    epochs=50,  # You can adjust this
    batch_size=32,  # Experiment with batch sizes
    validation_data=(X_val, y_val),
    callbacks=[early_stopping],
    verbose=1
)
```

---

### **Step 5: Evaluate the Model**
Evaluate the model on the test set to see how well it generalizes.

```python
# Predictions
y_pred = model.predict(X_test)

# Metrics
mse = mean_squared_error(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)

print(f"Mean Squared Error (MSE): {mse}")
print(f"Mean Absolute Error (MAE): {mae}")
```

---

### **Step 6: Hyperparameter Tuning**
To tune hyperparameters, you can use tools like **Keras Tuner** or manually adjust parameters in a structured way.

### Option 1: Grid Search (Manual)
Manually try different combinations of hyperparameters:
- **Number of filters in CNN (e.g., 16, 32, 64).**
- **LSTM units (e.g., 32, 64, 128).**
- **Batch size (e.g., 16, 32, 64).**
- **Learning rate (e.g., 0.001, 0.0001).**
- **Dropout rate (e.g., 0.2, 0.3, 0.4).**

Example:
```python
for filters in [16, 32, 64]:
    for lstm_units in [32, 64, 128]:
        model = build_model(sequence_length, num_features)
        print(f"Training with filters={filters}, lstm_units={lstm_units}")
        model.fit(X_train, y_train, validation_data=(X_val, y_val), epochs=20, batch_size=32)
```

---

### Option 2: Keras Tuner (Automated)
Use **Keras Tuner** to automate hyperparameter tuning.

```python
import keras_tuner as kt

# Define a model-building function for Keras Tuner
def build_tuned_model(hp):
    input_layer = Input(shape=(sequence_length, num_features))
    
    # CNN
    cnn_layer = Conv1D(
        filters=hp.Int('filters', min_value=16, max_value=64, step=16), 
        kernel_size=3, activation='relu'
    )(input_layer)
    cnn_layer = MaxPooling1D(pool_size=2)(cnn_layer)
    cnn_layer = Flatten()(cnn_layer)
    
    # LSTM
    lstm_layer = LSTM(
        units=hp.Int('lstm_units', min_value=32, max_value=128, step=32), 
        return_sequences=False
    )(input_layer)
    
    # Combine
    combined = Concatenate()([cnn_layer, lstm_layer])
    dense_layer = Dense(
        units=hp.Int('dense_units', min_value=64, max_value=256, step=64), 
        activation='relu'
    )(combined)
    dense_layer = Dropout(hp.Float('dropout', 0.2, 0.5, step=0.1))(dense_layer)
    output_layer = Dense(1, activation='linear')(dense_layer)
    
    model = tf.keras.Model(inputs=input_layer, outputs=output_layer)
    model.compile(
        optimizer=tf.keras.optimizers.Adam(hp.Choice('learning_rate', [0.001, 0.0005, 0.0001])),
        loss='mse',
        metrics=['mae']
    )
    return model

# Initialize the tuner
tuner = kt.RandomSearch(
    build_tuned_model,
    objective='val_loss',
    max_trials=10,
    executions_per_trial=1,
    directory='my_tuning_dir',
    project_name='bitcoin_forecasting'
)

# Run the search
tuner.search(X_train, y_train, validation_data=(X_val, y_val), epochs=20, batch_size=32)

# Get the best model
best_hps = tuner.get_best_hyperparameters(num_trials=1)[0]
print(f"Best hyperparameters: {best_hps.values}")
```

---

### **Step 7: Save and Load the Model**
Save the best model for future use.

```python
# Save model
model.save('bitcoin_price_model.h5')

# Load model
from tensorflow.keras.models import load_model
model = load_model('bitcoin_price_model.h5')
```

---

### **Step 8: Forecast Future Prices**
Once the model is trained, you can use it to forecast future prices.

```python
# Prepare a sequence of the last 24 hours
latest_sequence = X_test[-1].reshape(1, sequence_length, num_features)

# Predict the next hour's price
next_price = model.predict(latest_sequence)
print(f"Predicted next price: {next_price[0][0]}")
```

---

This approach allows you to train, evaluate, and tune your model effectively. Let me know if you need more help!

## 05. Hyperparameter tuning with Optuna

Here's a step-by-step guide to hyperparameter tuning with **Optuna**, a flexible and efficient framework for automated hyperparameter optimization.

---

### **Step 1: Install Optuna**
First, ensure you have Optuna installed. If not, install it using:

```bash
pip install optuna
```

---

### **Step 2: Import Libraries**
Import the required libraries, including Optuna.

```python
import optuna
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout, Conv1D, MaxPooling1D, Flatten, Input, Concatenate
from sklearn.metrics import mean_squared_error
```

---

### **Step 3: Define the Objective Function**
The objective function defines the model architecture, hyperparameters, and training process. Optuna will optimize this function by finding the combination of hyperparameters that minimizes the validation loss.

```python
def objective(trial):
    # Define hyperparameters to tune
    filters = trial.suggest_int('filters', 16, 64, step=16)  # Filters for CNN
    lstm_units = trial.suggest_int('lstm_units', 32, 128, step=32)  # Units in LSTM
    dense_units = trial.suggest_int('dense_units', 64, 256, step=64)  # Units in Dense layer
    dropout_rate = trial.suggest_float('dropout_rate', 0.2, 0.5, step=0.1)  # Dropout rate
    learning_rate = trial.suggest_loguniform('learning_rate', 1e-4, 1e-2)  # Learning rate

    # Build the model
    input_layer = Input(shape=(X_train.shape[1], X_train.shape[2]))

    # CNN
    cnn_layer = Conv1D(filters=filters, kernel_size=3, activation='relu')(input_layer)
    cnn_layer = MaxPooling1D(pool_size=2)(cnn_layer)
    cnn_layer = Flatten()(cnn_layer)

    # LSTM
    lstm_layer = LSTM(lstm_units, return_sequences=False)(input_layer)

    # Combine CNN and LSTM
    combined = Concatenate()([cnn_layer, lstm_layer])

    # Dense layers
    dense_layer = Dense(dense_units, activation='relu')(combined)
    dense_layer = Dropout(dropout_rate)(dense_layer)
    output_layer = Dense(1, activation='linear')(dense_layer)

    # Compile the model
    model = tf.keras.Model(inputs=input_layer, outputs=output_layer)
    model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=learning_rate), loss='mse', metrics=['mae'])

    # Train the model
    history = model.fit(
        X_train, y_train,
        validation_data=(X_val, y_val),
        epochs=10,  # Keep epochs small for quicker tuning
        batch_size=32,  # You can also tune this as a hyperparameter
        verbose=0  # Suppress output during training
    )

    # Evaluate the model on the validation set
    val_loss = history.history['val_loss'][-1]

    return val_loss
```

---

### **Step 4: Create an Optuna Study**
Define a study and specify the optimization direction (`minimize` for validation loss).

```python
study = optuna.create_study(direction='minimize')
```

---

### **Step 5: Run the Optimization**
Run the optimization for a specified number of trials.

```python
study.optimize(objective, n_trials=20)  # Adjust the number of trials as needed
```

---

### **Step 6: Get the Best Hyperparameters**
After the optimization completes, retrieve the best hyperparameters and validation loss.

```python
print("Best Hyperparameters:", study.best_params)
print("Best Validation Loss:", study.best_value)
```

---

### **Step 7: Train the Final Model with Best Hyperparameters**
Rebuild and train the model using the best hyperparameters.

```python
best_params = study.best_params

# Build the model with the best hyperparameters
input_layer = Input(shape=(X_train.shape[1], X_train.shape[2]))

# CNN
cnn_layer = Conv1D(filters=best_params['filters'], kernel_size=3, activation='relu')(input_layer)
cnn_layer = MaxPooling1D(pool_size=2)(cnn_layer)
cnn_layer = Flatten()(cnn_layer)

# LSTM
lstm_layer = LSTM(best_params['lstm_units'], return_sequences=False)(input_layer)

# Combine CNN and LSTM
combined = Concatenate()([cnn_layer, lstm_layer])

# Dense layers
dense_layer = Dense(best_params['dense_units'], activation='relu')(combined)
dense_layer = Dropout(best_params['dropout_rate'])(dense_layer)
output_layer = Dense(1, activation='linear')(dense_layer)

# Compile the model
final_model = tf.keras.Model(inputs=input_layer, outputs=output_layer)
final_model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=best_params['learning_rate']), loss='mse', metrics=['mae'])

# Train the model
history = final_model.fit(
    X_train, y_train,
    validation_data=(X_val, y_val),
    epochs=50,  # Longer training for final model
    batch_size=32,
    verbose=1
)
```

---

### **Step 8: Evaluate the Final Model**
Evaluate the model on the test set.

```python
# Predictions on the test set
y_pred = final_model.predict(X_test)

# Compute metrics
mse = mean_squared_error(y_test, y_pred)
print(f"Test MSE: {mse}")
```

---

### **Step 9: Visualize the Results**
Visualize the optimization process and loss history.

### Visualize Hyperparameter Optimization
```python
optuna.visualization.plot_optimization_history(study).show()
```

### Visualize Hyperparameter Importance
```python
optuna.visualization.plot_param_importances(study).show()
```

### Training Loss History
```python
import matplotlib.pyplot as plt

plt.plot(history.history['loss'], label='Train Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()
```

---

This setup ensures that you efficiently search for the best hyperparameters while maintaining flexibility. Let me know if you need help fine-tuning further!

## 06. Fine tuning

Fine-tuning in the context of neural networks involves refining the model to achieve the best performance on your dataset. Below is a detailed guide on different fine-tuning methods you can apply:

---

### **1. Hyperparameter Fine-Tuning**
This involves optimizing key parameters such as learning rate, batch size, number of layers, and dropout rates.

### Key Parameters to Fine-Tune:
- **Learning Rate**: The most crucial hyperparameter. Start with a default value (e.g., 0.001) and adjust using:
  - A smaller learning rate (e.g., 0.0001) for fine adjustments.
  - A larger learning rate (e.g., 0.01) for faster convergence but with a risk of instability.
- **Batch Size**:
  - Smaller batch sizes (16, 32) can generalize better but take longer to train.
  - Larger batch sizes (64, 128) speed up training but might overfit.
- **Dropout Rate**: Prevents overfitting; typically ranges between 0.2 and 0.5.
- **Number of Layers and Neurons**:
  - Experiment with deeper models or additional neurons in fully connected layers.

---

### **How to Implement Learning Rate Schedules**
Dynamic adjustment of the learning rate can enhance training efficiency. Use learning rate schedulers in Keras:

```python
from tensorflow.keras.callbacks import ReduceLROnPlateau

# Reduce learning rate when validation loss plateaus
reduce_lr = ReduceLROnPlateau(
    monitor='val_loss', factor=0.5, patience=3, min_lr=1e-6, verbose=1
)

model.fit(
    X_train, y_train,
    validation_data=(X_val, y_val),
    epochs=50,
    batch_size=32,
    callbacks=[reduce_lr]
)
```

---

### **2. Regularization Fine-Tuning**
Regularization helps reduce overfitting and improve model generalization.

### Techniques:
1. **L2 Regularization (Weight Decay)**:
   Penalizes large weights in the loss function.
   ```python
   from tensorflow.keras.regularizers import l2

   dense_layer = Dense(128, activation='relu', kernel_regularizer=l2(0.01))(input_layer)
   ```

2. **Dropout**:
   Dropout randomly disables neurons during training.
   ```python
   dropout_layer = Dropout(0.3)(dense_layer)
   ```

3. **Batch Normalization**:
   Speeds up training and stabilizes the model.
   ```python
   from tensorflow.keras.layers import BatchNormalization

   batch_norm_layer = BatchNormalization()(dense_layer)
   ```

---

### **3. Data Augmentation for Time-Series**
Introduce variability in training data to improve robustness.

### Example Techniques:
- **Add Gaussian Noise**:
   Slightly alter the training data by adding random noise.
   ```python
   noise_factor = 0.01
   X_train_noisy = X_train + noise_factor * np.random.normal(size=X_train.shape)
   ```
- **Time Shifting**:
   Shift the entire time-series slightly forward or backward.
- **Scaling and Normalization Variants**:
   Experiment with scaling techniques like Min-Max, Standard Scaling, or Robust Scaling.

---

### **4. Use Transfer Learning**
If you have a small dataset, leverage a pre-trained model. While transfer learning is more common in image or NLP tasks, you can use pre-trained time-series models or design a custom transfer learning setup:
- **Train a base model on a larger dataset (e.g., S&P500, other crypto data).**
- **Freeze early layers** and fine-tune later layers on your dataset.

```python
# Freeze base model layers
for layer in base_model.layers[:-2]:
    layer.trainable = False

# Add new trainable layers
x = Dense(64, activation='relu')(base_model.output)
x = Dropout(0.3)(x)
output = Dense(1, activation='linear')(x)

# Compile the fine-tuned model
fine_tuned_model = tf.keras.Model(inputs=base_model.input, outputs=output)
fine_tuned_model.compile(optimizer='adam', loss='mse', metrics=['mae'])
```

---

### **5. Early Stopping**
Monitor validation performance and stop training when improvement stalls.

```python
from tensorflow.keras.callbacks import EarlyStopping

early_stopping = EarlyStopping(
    monitor='val_loss',
    patience=5,
    restore_best_weights=True,
    verbose=1
)

model.fit(
    X_train, y_train,
    validation_data=(X_val, y_val),
    epochs=100,
    callbacks=[early_stopping]
)
```

---

### **6. Cross-Validation for Time-Series**
Standard k-fold cross-validation can disrupt temporal order in time-series. Use **TimeSeriesSplit** to preserve temporal structure.

```python
from sklearn.model_selection import TimeSeriesSplit

tscv = TimeSeriesSplit(n_splits=5)
for train_index, val_index in tscv.split(X):
    X_train, X_val = X[train_index], X[val_index]
    y_train, y_val = y[train_index], y[val_index]

    # Train your model on this split
    model.fit(X_train, y_train, validation_data=(X_val, y_val))
```

---

### **7. Optimize Training with Mixed Precision**
Leverage GPUs by using mixed precision for faster training and reduced memory usage.

```python
from tensorflow.keras.mixed_precision import experimental as mixed_precision

policy = mixed_precision.Policy('mixed_float16')
mixed_precision.set_policy(policy)

# Build and train the model as usual
```

---

### **8. Hyperparameter Optimization with Optuna**
If you haven’t already, use Optuna for automated fine-tuning.

### Example Parameters to Optimize:
- Number of filters in CNN
- LSTM units
- Dense units
- Dropout rate
- Learning rate
- Batch size

```python
import optuna

def objective(trial):
    # Suggest hyperparameters
    filters = trial.suggest_int('filters', 16, 64, step=16)
    lstm_units = trial.suggest_int('lstm_units', 32, 128, step=32)
    dropout_rate = trial.suggest_float('dropout_rate', 0.2, 0.5)
    learning_rate = trial.suggest_loguniform('learning_rate', 1e-4, 1e-2)
    
    # Build model
    model = build_model(filters, lstm_units, dropout_rate, learning_rate)
    
    # Train model
    history = model.fit(X_train, y_train, validation_data=(X_val, y_val), epochs=10, batch_size=32, verbose=0)
    
    # Return validation loss
    return min(history.history['val_loss'])

study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=20)

# Best hyperparameters
print(study.best_params)
```

---

### **9. Ensemble Models**
Combine multiple models (e.g., CNN, LSTM, and Transformer models) to improve robustness. Train each model separately and average their predictions.

```python
# Average predictions from multiple models
y_pred_ensemble = (y_pred_model1 + y_pred_model2 + y_pred_model3) / 3
```

---

### **10. Monitoring Training**
Use tools like **TensorBoard** to visualize metrics and fine-tune based on trends.

```bash
tensorboard --logdir logs
```

Add a callback to log training:

```python
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir='logs')

model.fit(
    X_train, y_train,
    validation_data=(X_val, y_val),
    epochs=50,
    callbacks=[tensorboard_callback]
)
```

---

By applying these fine-tuning techniques systematically, you can significantly improve your model’s performance. Let me know which aspect you'd like more details on!

## 07. Are 500_000 values enough for the training?

Whether 500,000 values are enough for training your model depends on several factors:

---

### **Factors to Consider**

#### **1. Complexity of the Model**
- **Simple models (e.g., MLP, basic LSTM)**: 500,000 rows can be sufficient, especially if the dataset is representative of the problem you're solving.
- **Complex models (e.g., hybrid CNN-LSTM, Transformers)**: These require more data to learn patterns effectively and avoid overfitting.

---

#### **2. Feature Engineering**
If you've added meaningful features like technical indicators (RSI, SMA, EMA, MACD) and seasonal/trend components, you reduce the model's reliance on raw data size. Good feature engineering can make 500,000 rows more effective.

---

#### **3. Time Span of the Data**
- If your data covers **multiple years**, it is more likely to capture different market conditions (bull markets, bear markets, high/low volatility).
- If it covers only a short period (e.g., a few months), it might not be diverse enough for a robust model.

---

#### **4. Pretraining for Financial Models**
Pretraining can improve your model if:
- Your dataset isn't large enough.
- You want your model to generalize better across different market conditions.
- You have access to additional financial data (e.g., other cryptocurrencies, stock prices).

For pretraining:
1. **Use related datasets**: Pretrain on other cryptocurrencies (e.g., Ethereum, Litecoin) or financial indices (e.g., S&P 500) that share similar patterns.
2. **Transfer learning**: Train the model on the additional dataset, freeze the earlier layers, and fine-tune on your Bitcoin dataset.

---

#### **5. Cross-Validation**
Even with 500,000 rows, it's important to assess the model’s generalization. Use **time-series cross-validation** to ensure your model performs well across different time periods.

---

#### **6. Compare with Other Studies**
In financial modeling:
- Some studies achieve good results with **10,000–100,000 rows** and proper feature engineering.
- Larger datasets (1M+ rows) are ideal for training advanced models like Transformers.

500,000 rows are **on the larger side for time-series models** and should generally be enough if the dataset is well-preprocessed.

---

### **Recommendations**
1. **Try Training Without Pretraining First**
   - With 500,000 rows, you likely don't need pretraining initially.
   - Focus on proper preprocessing, feature engineering, and hyperparameter tuning.

2. **Augment the Dataset (If Needed)**
   - Include more data from other cryptocurrencies or financial instruments to increase diversity.
   - Add synthetic data using techniques like time-series augmentation.

3. **Monitor Generalization**
   - Ensure the model performs well on validation/test sets using metrics like MSE, RMSE, or MAE.
   - Test it across different market conditions (e.g., bull vs. bear markets).

4. **Experiment with Pretraining**
   - If your model overfits or underperforms, pretrain it on a larger dataset (e.g., other crypto or stock market data).
   - Use transfer learning to fine-tune it on your specific Bitcoin dataset.

---

In conclusion, **500,000 values should be sufficient for training your model**, especially if combined with good feature engineering and proper validation. However, if your model struggles to generalize or you want to further boost performance, pretraining on additional data can help.