### Transformer for prediction 
 The goal is to predict whether the next X(x=6)-minute close of BAP will be higher than the current close, and to feel first-hand how self-attention:

**Puntos clave en Transformers:**

- **B, L, D**
- Los más importantes:
  - **L** es la longitud de secuencia
  - **D** es el tamaño del embedding o los features que tenga el caso de uso

---

- **L** para este ejemplo es el tamaño de la secuencia de intervalos diarios,  
  en el ejemplo se toman diferentes secuencias de forma aleatoria.
- **D** es el número de features: price closed, volumen, etc.

---

**En el dataset tiene la forma `29x49x5`:**

- 29 es el número de días útiles de trading
- 49 es la cantidad de intervalos diarios (bars),  
  o sea, número máximo de intervalos en un día.  
  Para el entrenamiento se toma por ejemplo una secuencia fija de 32  
  pero se van extrayendo de forma aleatoria de los 49
  (considere que para este caso cada bar equivale a 2min)
- 5 cantidad

In [27]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import DataLoader, Dataset
import math
import matplotlib.pyplot as plt
import yfinance as yf, pandas as pd, torch, numpy as np

In [28]:
class SelfAttention(nn.Module):

    def __init__(self, d: int):
        super().__init__()
        self.q = nn.Linear(d, d, bias=False)
        self.k = nn.Linear(d, d, bias=False)
        self.v = nn.Linear(d, d, bias=False)
        self.scale = math.sqrt(d)

    def forward(self, x):
        Q = self.q(x)
        K = self.q(x)
        V = self.q(x)
        score = Q @ K.transpose(-2, -1) / self.scale  # BxLxL
        attn_w = F.softmax(score, dim=-1)
        context = attn_w @ V
        return context, attn_w

class FeedForward(nn.Module):

    def __init__(self, d: int, f: int):
        super().__init__()
        self.l1 = nn.Linear(d, f)
        self.l2 = nn.Linear(f, d)

    def forward(self, x):
        return self.l2(F.gelu(self.l1(x)))

In [29]:
class EncoderLayer(nn.Module):

    def __init__(self, d: int, d_f: int):
        super().__init__()
        self.attn = SelfAttention(d)
        self.norm1 = nn.LayerNorm(d)
        self.ff = FeedForward(d, d_f)
        self.norm2 = nn.LayerNorm(d)

    def forward(self, x, return_attention=False):
        context, atten = self.attn(x)
        x = self.norm1(x + context)
        x = self.norm2(x + self.ff(x))
        
        return (x, atten if return_attention else x)

In [30]:
class FinanceTransformer(nn.Module):

    def __init__(self, in_dim=5, d=32, d_f=64):
        super().__init__()
        self.proj = nn.Linear(in_dim, d)
        self.enc = EncoderLayer(d, d_f)
        self.cls = nn.Linear(d, 1)

    def forward(self, x, return_attention=False):
        x = self.proj(x)
        x, attn = self.enc(x, return_attention=True)
        logits = self.cls(x).squeeze(-1)

        return (logits, attn) if return_attention else logits

In [31]:
TICKER   = "BAP"
PERIOD   = "60d"
INTERVAL = "2m"
df = yf.download(TICKER, period=PERIOD, interval=INTERVAL, progress=False)
df = df.between_time("09:30", "16:00")  # Regular trading hours

  df = yf.download(TICKER, period=PERIOD, interval=INTERVAL, progress=False)


In [32]:
df["label"] = (df["Close"].shift(-3) > df["Close"]).astype(int)
df.dropna(inplace=True)

In [33]:
# numero real de dias de trading
steps_per_day = int(len(df) / len(df.index.normalize().unique()))
n_days = len(df) // steps_per_day

In [34]:
# Features: OHLCV
cols = ["Open", "High", "Low", "Close", "Volume"]
X = df[cols].values[:n_days * steps_per_day].reshape(n_days, steps_per_day, len(cols))
y = df["label"].values[:n_days * steps_per_day].reshape(n_days, steps_per_day)

In [35]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X = scaler.fit_transform(X.reshape(-1, len(cols))).reshape(X.shape)

In [36]:
device = "cuda" if torch.cuda.is_available() else "cpu"
model = FinanceTransformer().to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
loss_fn = nn.BCEWithLogitsLoss()

In [37]:
# usamos un tamaño de sequencia de 32
# sin embargo el tamaño maximo puede ser el numero de pasos por dia 
seq_len = 32  

for epoch in range(64):
    epoch_loss = 0
    for i in range(n_days):
        # Pick random window of length seq_len
        k = np.random.randint(0, steps_per_day - seq_len - 3)
        xb = torch.tensor(X[i:i+1, k:k+seq_len, :], dtype=torch.float32).to(device)
        yb = torch.tensor(y[i:i+1, k:k+seq_len], dtype=torch.float32).to(device)

        logits, _ = model(xb, return_attention=True)
        loss = loss_fn(logits, yb)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        epoch_loss += loss.item()
    print(f"Epoch {epoch+1} loss: {epoch_loss / n_days:.4f}")

Epoch 1 loss: 0.6976
Epoch 2 loss: 0.6867
Epoch 3 loss: 0.6796
Epoch 4 loss: 0.6789
Epoch 5 loss: 0.6632
Epoch 6 loss: 0.6679
Epoch 7 loss: 0.6691
Epoch 8 loss: 0.6744
Epoch 9 loss: 0.6677
Epoch 10 loss: 0.6700
Epoch 11 loss: 0.6639
Epoch 12 loss: 0.6738
Epoch 13 loss: 0.6535
Epoch 14 loss: 0.6612
Epoch 15 loss: 0.6532
Epoch 16 loss: 0.6726
Epoch 17 loss: 0.6541
Epoch 18 loss: 0.6397
Epoch 19 loss: 0.6500
Epoch 20 loss: 0.6564
Epoch 21 loss: 0.6541
Epoch 22 loss: 0.6522
Epoch 23 loss: 0.6550
Epoch 24 loss: 0.6542
Epoch 25 loss: 0.6479
Epoch 26 loss: 0.6428
Epoch 27 loss: 0.6435
Epoch 28 loss: 0.6535
Epoch 29 loss: 0.6374
Epoch 30 loss: 0.6363
Epoch 31 loss: 0.6512
Epoch 32 loss: 0.6370
Epoch 33 loss: 0.6443
Epoch 34 loss: 0.6393
Epoch 35 loss: 0.6287
Epoch 36 loss: 0.6578
Epoch 37 loss: 0.6389
Epoch 38 loss: 0.6376
Epoch 39 loss: 0.6434
Epoch 40 loss: 0.6366
Epoch 41 loss: 0.6352
Epoch 42 loss: 0.6409
Epoch 43 loss: 0.6239
Epoch 44 loss: 0.6367
Epoch 45 loss: 0.6357
Epoch 46 loss: 0.63