<hr>
<div style="background-color: lightgray; padding: 20px; color: black;">
<div>
<img src="https://th.bing.com/th/id/R.3cd1c8dc996c5616cf6e65e20b6bf586?rik=09aaLyk4hfbBiQ&riu=http%3a%2f%2fcidics.uanl.mx%2fwp-content%2fuploads%2f2016%2f09%2fcimat.png&ehk=%2b0brgMUkA2BND22ixwLZheQrrOoYLO3o5cMRqsBOrlY%3d&risl=&pid=ImgRaw&r=0" style="float: right; margin-right: 30px;" width="200"/> 
<font size="5.5" color="8C3061"><b>Only-Encoder Transformer y Fine-Tunning para predicción de series temporales </b></font> <br>
<font size="4.5" color="8C3061"><b>Aprendizaje de Máquina II - Tarea 2 </b></font> 
</div>
<div style="text-align: left">  <br>
Edison David Serrano Cárdenas. <br>
MSc en Matemáticas Aplicadas <br>
CIMAT - Sede Guanajuato <br>
</div>

</div>
<hr>


# <font color="8C3061" >**Cargar Librerías**</font> 

In [42]:
# Load basic libraries
import numpy as np
import pandas as pd
import random
import os

# Load libraries for data processing
from generate_data import GenerateData as gd
from only_encoder_transformer import ForecastingModel
from peft import get_peft_model, LoraConfig, TaskType
import torch

# Load libraries for model evaluation
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

# Load libraries for plotting
import plotly.graph_objects as go

**Verificación de CUDA, Fijar la Semilla**


In [2]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print("device is:",device)

SEED = 42
def seed_everything(seed):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.backends.cudnn.deterministic = True

seed_everything(SEED)

device is: cuda


# <font color="8C3061" >**Enunciado del problema**</font> 

Considere el valor histórico por hora de una base de datos de tipo de cambio  monedas y precio diario en dólares (USD) del petroleo, obtenga N series de al menos una longitud de T = 100. 

Luego entrene un modelo usando un OnlyEncoder Transformer y realizando Full-FineTuning y LoRA tuning para que dadas las series S[:T-t] prediga la parte final de cada serie S[T-t:]. Un valor típico es t=5.

# <font color="8C3061" >**Preprocesamiento de los Datos**</font> 

Descarga de los datos:

In [3]:
crypto_data = gd.download_crypto_data()
exchange_data = gd.download_exchange_and_oil_data()

Loading existing data from data/crypto_data.csv
Loading existing data from data/exchange_oil_data.csv


Parámetros para la generación de los datos: 

Se elige un BATCH_SIZE = 32 y type_normalization = MinMaxScaler.

In [4]:
cryptos = ['BTC-USD', 'ETH-USD', 'BNB-USD', 'XRP-USD', 'ADA-USD', 'SOL-USD', 'DOGE-USD']
currencies = ['EUR','JPY', 'GBP', 'AUD', 'CAD','HKD', 'MXN','COP']
oil_ticker = 'CL=F'
period = '1mo'
interval = '1h'
folder_name = 'data'
file_name = 'exchange_oil_data'
type_normalization = "MinMaxScaler"
test_size = 0.2
batch_size = 32

<div class="alert alert-block alert-info">
<b>Nota:</b> Los datos se generaron bajo las instrucciones del script <i><b> generate_data.py</b></i>. Los datos se generaron una vez y se guardaron en la carpeta <i><b>data</b></i>, si se quiere usar otro tipo de cripto monedas o tasas de cambio, se debe eliminar la carpeta. </div>


Generación de los datos:


In [5]:
train_loader_crypto, test_loader_crypto, train_loader_exchange, test_loader_exchange, scalers_cd, scalers_ed  = gd.generate_data(cryptos = cryptos,
                                                                                                        currencies = currencies,
                                                                                                        oil_ticker = oil_ticker,
                                                                                                        period = period,
                                                                                                        interval = interval,
                                                                                                        folder_name = folder_name,
                                                                                                        file_name = file_name,
                                                                                                        type_normalization = type_normalization,
                                                                                                        test_size = test_size,
                                                                                                        batch_size = batch_size)

Loading existing data from data/exchange_oil_data.csv
Loading existing data from data/exchange_oil_data.csv


# <font color="8C3061" >**Entrenamiento del Modelo**</font> 

Entrenamiento de los modelos:

In [6]:
# Función de entrenamiento
def train_only_encoder(model, dataloader, optimizer, criterion, num_epochs=10):
    model.train()
    for epoch in range(num_epochs):
        total_loss = 0
        for x_batch, y_batch in dataloader:
            optimizer.zero_grad()
            
            # Convertir los datos a float32 y enviarlos al dispositivo
            x_batch = x_batch.to(device).float()
            y_batch = y_batch.to(device).float()
            
            # Forward pass
            output = model(x=x_batch)
            
            # Calcular la pérdida
            loss = criterion(output, y_batch)
            total_loss += loss.item()
            
            # Backward pass y optimización
            loss.backward()
            optimizer.step()
        
        print(f'Epoch {epoch+1}/{num_epochs}, Loss: {total_loss/len(dataloader)}')

Evaluación del rendimiento de los modelos:

In [7]:
def test_evaluation(model, test_loader,name_model = None):
    predicted_close_values = []
    true_close_values = []
    model.eval()
    with torch.no_grad():
        for x_batch, y_batch in test_loader:
            x_batch = x_batch.to(device).float()
            y_batch = y_batch.to(device).float()
            
            output = model(x= x_batch)
            
            predicted_close_values.extend(output.cpu().numpy())
            true_close_values.extend(y_batch.cpu().numpy())
            
    true_close_values = np.array(true_close_values)
    predicted_close_values = np.array(predicted_close_values)
    
    mse = mean_squared_error(true_close_values, predicted_close_values)
    mae = mean_absolute_error(true_close_values, predicted_close_values)
    r2 = r2_score(true_close_values, predicted_close_values)
    
    return mse, mae, r2

Guardar los modelos

In [8]:
# Save the model
def save_model(model, name_model):
    if not os.path.exists('models'):
        os.makedirs('models')
    
    model_path = f'models/{name_model}.pth'
    
    if os.path.exists(model_path):
        print(f"The model {name_model} already exists")
    else:
        torch.save(model.state_dict(), model_path)
        print(f"Model {name_model} saved successfully")

Cargar modelos guardados

In [9]:
def load_model(model, name_model):
    model.load_state_dict(torch.load(f'models/{name_model}.pth',weights_only=True))
    return model

Contar los parámetros entrenables del modelo

In [10]:
def count_trainable_parameters(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

Definición de los hiperpárametros:

In [11]:
# Hyperparameters
d_model = 64
nhead = 4
num_encoder_layers = 3
dim_feedforward =  2048
dropout = 0.1 

# create a score dataframe
score = pd.DataFrame(columns=["Model","MSE","MAE","R2","Trainable Parameters"])

## <font color="8C3061" >**Modelo OnlyEncoder**</font> 

### <font color="8C3061" >**Modelo Entrenado Desde Cero (Cambio de Moneda)**</font> 

In [12]:
only_encoder_model_exchange = ForecastingModel(seq_len=95,
                                               pred_len=5,
                                               embed_size=d_model,
                                               nhead=nhead,
                                               dim_feedforward=dim_feedforward,
                                               dropout=dropout,
                                               conv1d_emb=True,
                                               conv1d_kernel_size=3,
                                               device=device).to(device)

criterion_ex = torch.nn.MSELoss()
optimizer_ex = torch.optim.Adam(only_encoder_model_exchange.parameters(), lr=0.0001)

train_only_encoder(only_encoder_model_exchange, train_loader_exchange, optimizer_ex, criterion_ex, num_epochs=10)

Epoch 1/10, Loss: 0.04294068758775081
Epoch 2/10, Loss: 0.02530731393822602
Epoch 3/10, Loss: 0.014574544221561934
Epoch 4/10, Loss: 0.012691771412002189
Epoch 5/10, Loss: 0.010260386009966689
Epoch 6/10, Loss: 0.010950134568182486
Epoch 7/10, Loss: 0.008585814752482943
Epoch 8/10, Loss: 0.008049559363696192
Epoch 9/10, Loss: 0.008377493991117393
Epoch 10/10, Loss: 0.007347668388060161


In [13]:
mse, mae, r2 = test_evaluation(only_encoder_model_exchange, test_loader_exchange)
num_par = count_trainable_parameters(only_encoder_model_exchange)
score.loc[0] = ["Only Encoder Exchange", mse, mae, r2, num_par]
print("Only Encoder Exchange score:\nMSE: {}\nMAE: {}\nR2: {}".format(mse, mae, r2))
print("Trainable Parameters: {:,}".format(num_par))

Only Encoder Exchange score:
MSE: 0.009579326957464218
MAE: 0.07404439896345139
R2: 0.8826397657394409
Trainable Parameters: 15,923,909


### <font color="8C3061" >**Modelo Entrenado Desde Cero (Cryptos)**</font> 

In [14]:
only_encoder_model_crypto = ForecastingModel(seq_len=95,
                                            pred_len=5,
                                            embed_size=d_model,
                                            nhead=nhead,
                                            dim_feedforward=dim_feedforward,
                                            dropout=dropout,
                                            conv1d_emb=True,
                                            conv1d_kernel_size=3,
                                            device=device).to(device)

criterion_cr = torch.nn.MSELoss()
optimizer_cr = torch.optim.Adam(only_encoder_model_crypto.parameters(), lr=0.0001)

train_only_encoder(only_encoder_model_crypto, train_loader_crypto, optimizer_cr, criterion_cr, num_epochs=10)

Epoch 1/10, Loss: 0.040095917148781676
Epoch 2/10, Loss: 0.020898833804364716
Epoch 3/10, Loss: 0.013555606993447458
Epoch 4/10, Loss: 0.012641879813080388
Epoch 5/10, Loss: 0.009917656159294503
Epoch 6/10, Loss: 0.008835517418836909
Epoch 7/10, Loss: 0.00865155687954809
Epoch 8/10, Loss: 0.008441472692149026
Epoch 9/10, Loss: 0.007273379347420165
Epoch 10/10, Loss: 0.007354637308578406


In [15]:
mse, mae, r2 = test_evaluation(only_encoder_model_crypto, test_loader_crypto)
num_par = count_trainable_parameters(only_encoder_model_crypto)
score.loc[1] = ["Only Encoder Crypto", mse, mae, r2, num_par]
print("Only Encoder Crypto score:\nMSE: {}\nMAE: {}\nR2: {}".format(mse, mae, r2))
print("Trainable Parameters: ", num_par)

Only Encoder Crypto score:
MSE: 0.009489606134593487
MAE: 0.07232800126075745
R2: 0.8836885690689087
Trainable Parameters:  15923909


In [16]:
save_model(only_encoder_model_exchange, "only_encoder_model_exchange")
save_model(only_encoder_model_crypto, "only_encoder_model_crypto")

The model only_encoder_model_exchange already exists
The model only_encoder_model_crypto already exists


## <font color="8C3061" >**Fine Tunning**</font> 

Creación de los Modelos LoRA bajo diferentes parámetros $r$.

In [17]:
def LoRAModel(r = 1, epochs = 2):
    load_model_Lora = ForecastingModel(seq_len=95,
                                        pred_len=5,
                                        embed_size=d_model,
                                        nhead=nhead,
                                        dim_feedforward=dim_feedforward,
                                        dropout=dropout,
                                        conv1d_emb=True,
                                        conv1d_kernel_size=3,
                                        device=device).to(device)
    
    load_model_Lora = load_model(load_model_Lora, "only_encoder_model_crypto")
    print("Total parameters before fine-tuning:\t{:,.0f}".format(count_trainable_parameters(load_model_Lora)))
    
    # Define the LoRA configuration
    lora_config = LoraConfig(
        task_type=TaskType.FEATURE_EXTRACTION,  # Use TASK for regression
        r=r,  # Adjust this based on your needs
        lora_alpha=32,  # Scaling factor
        lora_dropout=0.1,  # Dropout for LoRA layers
        target_modules=[
            "self_attn.in_proj_weight", 
            "self_attn.out_proj",
            "linear1", 
            "linear2"
        ]  # Ensure these modules are appropriate for your model
    )
    
    lora_model = get_peft_model(load_model_Lora, lora_config)
    print("Total parameters after fine-tuning:\t{:,.0f}\n\n".format(count_trainable_parameters(lora_model)))
    
    # Define the loss function and optimizer
    criterion_lora1 = torch.nn.MSELoss()
    optimizer_lora1 = torch.optim.Adam(filter(lambda p: p.requires_grad, lora_model.parameters()), lr=0.0001)
    
    # Fine-tune the model for 2 epochs
    train_only_encoder(model=lora_model, 
                        dataloader=train_loader_crypto, 
                        optimizer=optimizer_lora1, 
                        criterion=criterion_lora1,
                        num_epochs=epochs)
    
    return lora_model 

### <font color="8C3061" >**Full Fine-Tunning (Bajo todos los Parámetros)**</font> 

Cargar el modelo guardado only_encoder_model_crypto

In [18]:
load_model_to_complete_ft = ForecastingModel(seq_len=95,
                                    pred_len=5,
                                    embed_size=d_model,
                                    nhead=nhead,
                                    dim_feedforward=dim_feedforward,
                                    dropout=dropout,
                                    conv1d_emb=True,
                                    conv1d_kernel_size=3,
                                    device=device).to(device)

load_model_to_complete_ft = load_model(load_model_to_complete_ft, "only_encoder_model_crypto")

criterion_complete_ft = torch.nn.MSELoss()
optimizer_complete_ft = torch.optim.Adam(load_model_to_complete_ft.parameters(), lr=0.0001)

# Fine-tune for 2 epochs on the train_loader_exchange
train_only_encoder(model=load_model_to_complete_ft, 
                       dataloader=train_loader_exchange, 
                       optimizer=optimizer_complete_ft, 
                       criterion=criterion_complete_ft, 
                       num_epochs=2)

Epoch 1/2, Loss: 0.011290919409865248
Epoch 2/2, Loss: 0.007608411135151982


In [19]:
mse, mae, r2 = test_evaluation(load_model_to_complete_ft, test_loader_exchange)
num_par = count_trainable_parameters(load_model_to_complete_ft)
score.loc[2] = ["OE_full_fine_tuning", mse, mae, r2,num_par]
print("Only Encoder Crypto score after fine-tuning:\nMSE: {}\nMAE: {}\nR2: {}".format(mse, mae, r2))
print("Trainable Parameters: ", num_par)

Only Encoder Crypto score after fine-tuning:
MSE: 0.010599355213344097
MAE: 0.08186305314302444
R2: 0.8700926899909973
Trainable Parameters:  15923909


In [20]:
save_model(load_model_to_complete_ft, "only_encoder_model_crypto_complete_ft")

The model only_encoder_model_crypto_complete_ft already exists


### <font color="8C3061" >**Fine-Tunning (LoRA $r=1$)**</font> 

In [21]:
lora_model_1 = LoRAModel(r = 1, epochs = 2)

mse, mae, r2 = test_evaluation(lora_model_1, test_loader_crypto)
num_par = count_trainable_parameters(lora_model_1)
score.loc[3] = ["LoRA r=1", mse, mae, r2, num_par]

Total parameters before fine-tuning:	15,923,909
Total parameters after fine-tuning:	24,256


Epoch 1/2, Loss: 0.006881394132506102
Epoch 2/2, Loss: 0.0066655989503487945


### <font color="8C3061" >**Fine-Tunning (LoRA $r=2$)**</font> 

In [22]:
lora_model_2 = LoRAModel(r = 2, epochs = 2)

mse, mae, r2 = test_evaluation(lora_model_2, test_loader_crypto)
num_par = count_trainable_parameters(lora_model_2)
score.loc[4] = ["LoRA r=2", mse, mae, r2, num_par]

Total parameters before fine-tuning:	15,923,909
Total parameters after fine-tuning:	48,512


Epoch 1/2, Loss: 0.00670599882557456
Epoch 2/2, Loss: 0.006552244145755789


### <font color="8C3061" >**Fine-Tunning (LoRA $r=5$)**</font> 

In [23]:
lora_model_3 = LoRAModel(r = 5, epochs = 2)

mse, mae, r2 = test_evaluation(lora_model_3, test_loader_crypto)
num_par = count_trainable_parameters(lora_model_3)
score.loc[5] = ["LoRA r=5", mse, mae, r2, num_par]

Total parameters before fine-tuning:	15,923,909
Total parameters after fine-tuning:	121,280


Epoch 1/2, Loss: 0.006792780569022787
Epoch 2/2, Loss: 0.006537294913349407


# <font color="8C3061" >**Inferencia Sobre el Peso Mexicano**</font> 

In [35]:
scaler = scalers_cd['MXN']
def dnorm(y_close,scaler=scaler):
    place_holder = np.zeros((5,5))
    place_holder[:,3] = y_close
    return scaler.inverse_transform(place_holder)[:,3]

def inference(model,data_loader,currency = 'MXN'):
    
    predicted_close_values = []
    true_close_values = []
    
    model.eval()
    with torch.no_grad():
        for x_batch, y_batch in data_loader:
            x_batch = x_batch.to(device).float()
            y_batch = y_batch.to(device).float()
            
            output = model(x= x_batch)
            
            predicted_close_values.extend(output.cpu().numpy())
            true_close_values.extend(y_batch.cpu().numpy())
            
    true_close_values = np.array(true_close_values)
    predicted_close_values = np.array(predicted_close_values)

    return dnorm(true_close_values),dnorm(predicted_close_values)

In [43]:
dataMXN = exchange_data[exchange_data['Asset'] == 'MXN'].iloc[-100:,:].reset_index(drop=True)

data = scalers_ed['MXN'].transform(dataMXN[['Open','High','Low','Close','Volume']])
x, y = data[:95,:],data[-5:,3]

inf_loader = gd.DataLoader_data(data=[(x,y)],batch_size=1)

inf_only_enc =  inference(only_encoder_model_exchange, inf_loader, currency='MXN')
inf_full_ft =   inference(load_model_to_complete_ft, inf_loader, currency='MXN')
inf_lora_1 =    inference(lora_model_1, inf_loader, currency='MXN')
inf_lora_2 =    inference(lora_model_2, inf_loader, currency='MXN')
inf_lora_3 =    inference(lora_model_3, inf_loader, currency='MXN')

In [47]:
fig = go.Figure()

# Add traces
fig.add_trace(go.Scatter(x = dataMXN['Datetime'].iloc[-5:], y = dataMXN['Close'].iloc[-5:], mode='lines+markers', name='True Close',line=dict(color='blue')))
fig.add_trace(go.Scatter(x = dataMXN['Datetime'].iloc[-5:], y = inf_only_enc[1], mode='lines+markers', name='Only Encoder Exchange',line=dict(color='red')))
fig.add_trace(go.Scatter(x = dataMXN['Datetime'].iloc[-5:], y = inf_full_ft[1], mode='lines+markers', name='Only Encoder Exchange Full Fine-Tuning',line=dict(color='green')))
fig.add_trace(go.Scatter(x = dataMXN['Datetime'].iloc[-5:], y = inf_lora_1[1], mode='lines+markers', name='LoRA r=1',line=dict(color='purple')))
fig.add_trace(go.Scatter(x = dataMXN['Datetime'].iloc[-5:], y = inf_lora_2[1], mode='lines+markers', name='LoRA r=2',line=dict(color='orange')))
fig.add_trace(go.Scatter(x = dataMXN['Datetime'].iloc[-5:], y = inf_lora_3[1], mode='lines+markers', name='LoRA r=5',line=dict(color='black')))

fig.update_layout(title='Comparison of the different models for the MXN currency',
                     xaxis_title='Datetime',
                     yaxis_title='Close',
                     xaxis_rangeslider_visible=False)

fig.show()


# <font color="8C3061" >**Conclusiones**</font> 

> - Entre los modelos implementados el que tuvo mejor desempeño fue el que se le realizó un fine-tuning con LoRA (r=1)
> - Se reentreno tan solo $24256$ parámetros en el modelo LoRA (r=1)
> - El modelo con peor rendimiento fue el que se realizó un tuning a partir de todos  los modelos.
> - Las inferencias sobre la moneda MXN resultan considerablemente buenas.
> - El entrenamiento del modelo only_encoder_transformer con los datos del cambio de la moneda solo superó al full_tuning del modelo de las cryptomonedas en desempeño.

In [62]:
display(score.iloc[[0,2,3,4,5]].sort_values(by='MSE').reset_index(drop=True))

Unnamed: 0,Model,MSE,MAE,R2,Trainable Parameters
0,LoRA r=1,0.009067,0.07125,0.88884,24256
1,LoRA r=2,0.009139,0.071598,0.887974,48512
2,LoRA r=5,0.009167,0.071385,0.887663,121280
3,Only Encoder Exchange,0.009579,0.074044,0.88264,15923909
4,OE_full_fine_tuning,0.010599,0.081863,0.870093,15923909
