## Module Imports

This cell imports all necessary PyTorch modules and other standard libraries required for building and training the HormoneLLM model.

In [20]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import math
import json

## Positional Encoding

This class implements positional encoding, which adds information about the position of tokens in a sequence to their embeddings. This is crucial for transformer models as they do not inherently process sequence order.

In [21]:
class PositionalEncoding(nn.Module):
    def __init__(self, d_model, max_len=4096):
        super().__init__()

        pe = torch.zeros(max_len, d_model)
        position = torch.arange(0, max_len).unsqueeze(1)
        div_term = torch.exp(
            torch.arange(0, d_model, 2) * (-math.log(10000.0) / d_model)
        )

        pe[:, 0::2] = torch.sin(position * div_term)
        pe[:, 1::2] = torch.cos(position * div_term)

        self.register_buffer("pe", pe.unsqueeze(0))

    def forward(self, x):
        return x + self.pe[:, : x.size(1)]

## Feed-Forward Network

This class defines a simple feed-forward network with a GELU activation and dropout, used within the Transformer blocks for non-linear transformations.

In [22]:
class FeedForward(nn.Module):
    def __init__(self, d_model, d_ff, dropout=0.1):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(d_model, d_ff),
            nn.GELU(),
            nn.Dropout(dropout),
            nn.Linear(d_ff, d_model)
        )

    def forward(self, x):
        return self.net(x)

## Encoder Block

This class represents a single encoder block of the Transformer architecture. It consists of a multi-head self-attention layer followed by a feed-forward network, with layer normalization and residual connections.

In [23]:
class EncoderBlock(nn.Module):
    def __init__(self, d_model, n_heads, d_ff, dropout=0.1):
        super().__init__()

        self.attn = nn.MultiheadAttention(
            d_model, n_heads, dropout=dropout, batch_first=True
        )
        self.ffn = FeedForward(d_model, d_ff, dropout)

        self.norm1 = nn.LayerNorm(d_model)
        self.norm2 = nn.LayerNorm(d_model)

    def forward(self, x, mask=None):
        attn_out, _ = self.attn(x, x, x, key_padding_mask=mask)
        x = self.norm1(x + attn_out)

        ffn_out = self.ffn(x)
        x = self.norm2(x + ffn_out)

        return x

## Decoder Block

This class represents a single decoder block. It includes a multi-head self-attention layer, a multi-head cross-attention layer (attending to the encoder output), and a feed-forward network, all with layer normalization and residual connections.

In [24]:
class DecoderBlock(nn.Module):
    def __init__(self, d_model, n_heads, d_ff, dropout=0.1):
        super().__init__()

        self.self_attn = nn.MultiheadAttention(
            d_model, n_heads, dropout=dropout, batch_first=True
        )
        self.cross_attn = nn.MultiheadAttention(
            d_model, n_heads, dropout=dropout, batch_first=True
        )

        self.ffn = FeedForward(d_model, d_ff, dropout)

        self.norm1 = nn.LayerNorm(d_model)
        self.norm2 = nn.LayerNorm(d_model)
        self.norm3 = nn.LayerNorm(d_model)

    def forward(self, x, memory, tgt_mask=None, memory_mask=None):
        self_attn_out, _ = self.self_attn(x, x, x, attn_mask=tgt_mask)
        x = self.norm1(x + self_attn_out)

        cross_attn_out, _ = self.cross_attn(
            x, memory, memory, key_padding_mask=memory_mask
        )
        x = self.norm2(x + cross_attn_out)

        ffn_out = self.ffn(x)
        x = self.norm3(x + ffn_out)

        return x

## Hormone Attention Head

This custom attention head is designed to extract specific 'hormone' signals from the encoder's states. It uses a learnable query to attend to the encoder outputs and then processes the attention output through an MLP to derive a scalar hormone value.

In [25]:
class HormoneAttentionHead(nn.Module):
    def __init__(self, d_model, temperature=0.5):
        super().__init__()

        self.temperature = temperature

        self.attn = nn.MultiheadAttention(
            d_model, 1, batch_first=True
        )

        self.query = nn.Parameter(torch.randn(1, 1, d_model))

        self.mlp = nn.Sequential(
            nn.Linear(d_model, d_model),
            nn.GELU(),
            nn.Linear(d_model, d_model // 2),
            nn.GELU(),
            nn.Linear(d_model // 2, 1),
            nn.Sigmoid()
        )

    def forward(self, encoder_states):
        B = encoder_states.size(0)

        q = self.query.expand(B, -1, -1) / self.temperature
        attn_out, _ = self.attn(q, encoder_states, encoder_states)

        return self.mlp(attn_out.squeeze(1))

## Hormone Emotion Block

This block modulates the encoder states based on the predicted hormone vector. It projects the hormone values into the model's dimension and applies a weighted scaling to the encoder states, introducing an 'emotional' influence.

In [26]:
class HormoneEmotionBlock(nn.Module):
    def __init__(self, d_model, num_hormones):
        super().__init__()

        self.project = nn.Sequential(
            nn.Linear(num_hormones, d_model),
            nn.GELU(),
            nn.LayerNorm(d_model),
            nn.Tanh()
        )

        self.alpha = nn.Parameter(torch.tensor(0.3))

    def forward(self, encoder_states, hormone_vector):
        e = self.project(hormone_vector)  # [B, D]

        alpha = torch.clamp(self.alpha, 0.1, 0.5)
        modulation = 1 + alpha * e.unsqueeze(1)

        return encoder_states * modulation

## HormoneLLM Model Definition

This is the main model class, integrating all the previously defined components. It's a Transformer-based Encoder-Decoder model with an added 'Hormone' mechanism that predicts and uses hormone-like signals to modulate the encoder's output before passing it to the decoder. It also includes an orthogonal initialization for hormone queries.

In [27]:
class HormoneLLM(nn.Module):
    def __init__(
        self,
        vocab_size,
        hormone_names,
        d_model=512,
        n_heads=8,
        d_ff=2048,
        num_layers=6,
        max_len=4096
    ):
        super().__init__()

        self.hormone_names = hormone_names
        self.num_hormones = len(hormone_names)

        self.embedding = nn.Embedding(vocab_size, d_model)
        self.positional = PositionalEncoding(d_model, max_len)

        # Encoder
        self.encoder_layers = nn.ModuleList([
            EncoderBlock(d_model, n_heads, d_ff)
            for _ in range(num_layers)
        ])

        # Hormone Heads (dynamic)
        self.hormone_heads = nn.ModuleDict({
            name: HormoneAttentionHead(d_model)
            for name in hormone_names
        })

        self.hormone_block = HormoneEmotionBlock(
            d_model, self.num_hormones
        )

        # Decoder
        self.decoder_layers = nn.ModuleList([
            DecoderBlock(d_model, n_heads, d_ff)
            for _ in range(num_layers)
        ])

        self.output_head = nn.Linear(d_model, vocab_size)

        self._init_orthogonal_hormones()

    def _init_orthogonal_hormones(self):
        with torch.no_grad():
            Q = torch.stack([
                head.query.squeeze()
                for head in self.hormone_heads.values()
            ])  # [H, D]

            Q, _ = torch.linalg.qr(Q.T)
            Q = Q.T

            for i, head in enumerate(self.hormone_heads.values()):
                head.query.copy_(Q[i].unsqueeze(0).unsqueeze(0))

    def encode(self, input_ids, src_mask=None):
        x = self.embedding(input_ids)
        x = self.positional(x)

        for layer in self.encoder_layers:
            x = layer(x, src_mask)

        return x

    def decode(self, tgt_ids, memory, tgt_mask=None, memory_mask=None):
        x = self.embedding(tgt_ids)
        x = self.positional(x)

        for layer in self.decoder_layers:
            x = layer(x, memory, tgt_mask, memory_mask)

        return self.output_head(x)

    def forward(self, input_ids, decoder_ids):
        encoder_states = self.encode(input_ids)

        hormone_values = []
        hormone_dict = {}

        for name, head in self.hormone_heads.items():
            value = head(encoder_states)
            hormone_values.append(value)
            hormone_dict[name] = value

        hormone_vector = torch.cat(hormone_values, dim=1)

        encoder_states = self.hormone_block(
            encoder_states, hormone_vector
        )

        logits = self.decode(decoder_ids, encoder_states)

        return {
            "logits": logits,
            "hormones": hormone_vector,
            "hormone_map": hormone_dict
        }

## Model and Hormone Initialization

This cell defines the list of 'hormone' names to be used by the model and then initializes an instance of the `HormoneLLM` with a specified vocabulary size and model parameters.

In [28]:
hormones = [
    "dopamine",
    "serotonin",
    "cortisol",
    "oxytocin",
    "adrenaline",
    "endorphin",
]

model = HormoneLLM(
    vocab_size=32000,
    hormone_names=hormones
)

## Simple Tokenizer

This class implements a basic tokenizer for converting text to numerical IDs and vice-versa. It handles building a vocabulary from text data, encoding sentences into token IDs, and decoding token IDs back into sentences. It also supports saving and loading its vocabulary.

In [29]:
class SimpleTokenizer:
    def __init__(self, vocab=None):
        if vocab is None:
            self.vocab = {"<pad>":0, "<bos>":1, "<eos>":2, "<unk>":3}
        else:
            self.vocab = vocab

        self.inv_vocab = {v:k for k,v in self.vocab.items()}

    def build_vocab(self, texts):
        idx = len(self.vocab)
        for text in texts:
            for tok in text.lower().split():
                if tok not in self.vocab:
                    self.vocab[tok] = idx
                    idx += 1
        self.inv_vocab = {v:k for k,v in self.vocab.items()}

    def encode(self, text):
        tokens = text.lower().split()
        ids = [self.vocab.get(t, self.vocab["<unk>"]) for t in tokens]
        return [self.vocab["<bos>"]] + ids + [self.vocab["<eos>"]]

    def decode(self, ids):
        tokens = []
        for i in ids:
            if i == self.vocab["<eos>"]:
                break
            tokens.append(self.inv_vocab.get(i, "<unk>"))
        return " ".join(tokens)

    def save(self, path):
        with open(path, "w") as f:
            json.dump(self.vocab, f)

    @classmethod
    def load(cls, path):
        with open(path) as f:
            vocab = json.load(f)
        return cls(vocab)

## Toy Dataset

This cell defines a small, synthetic dataset (`toy_data`) for training and demonstration purposes. Each entry includes an input text, an expected output text, and a corresponding vector of 'ground truth' hormone values. The `hormone_names` list is also defined here for clarity.

In [30]:
hormone_names = hormones

toy_data = [
    {
        "input": "i am very happy today",
        "output": "that is wonderful to hear",
        "hormones": [0.9, 0.8, 0.1, 0.6, 0.3, 0.7]
    },
    {
        "input": "i feel sad and lonely",
        "output": "i am here for you",
        "hormones": [0.2, 0.3, 0.6, 0.8, 0.2, 0.3]
    },
    {
        "input": "i am angry right now",
        "output": "let us calm down together",
        "hormones": [0.3, 0.2, 0.8, 0.1, 0.7, 0.2]
    },
    {
        "input": "thank you for helping me",
        "output": "you are very welcome",
        "hormones": [0.7, 0.7, 0.1, 0.9, 0.2, 0.8]
    }
]

## Tokenizer Building and Batch Creation

This cell prepares the tokenizer by building its vocabulary from the `toy_data`. It also defines a utility function `make_batch` to convert a sample from the `toy_data` into PyTorch tensors suitable for model input.

In [31]:
texts = []
for d in toy_data:
    texts.append(d["input"])
    texts.append(d["output"])

tokenizer = SimpleTokenizer()
tokenizer.build_vocab(texts)

def make_batch(sample):
    src = torch.tensor(tokenizer.encode(sample["input"]))
    tgt = torch.tensor(tokenizer.encode(sample["output"]))
    hormones = torch.tensor(sample["hormones"], dtype=torch.float)
    return src, tgt, hormones

## Loss Functions

This cell defines the two loss functions used for training the model:
- `language_loss`: A cross-entropy loss for the language modeling part (predicting the next token).
- `hormone_loss`: A Mean Squared Error (MSE) loss to guide the hormone predictions to match the target hormone values.

In [32]:
def language_loss(logits, targets):
    return F.cross_entropy(
        logits.view(-1, logits.size(-1)),
        targets.view(-1),
        ignore_index=0
    )

def hormone_loss(pred, target):
    return F.mse_loss(pred, target)

## Model Training Loop

This cell sets up and executes the training process for the `HormoneLLM`. It initializes the model, optimizer, and runs for a specified number of epochs. During each epoch, it iterates through the `toy_data`, calculates both language and hormone losses, and updates the model's parameters.

In [33]:
device = "cuda" if torch.cuda.is_available() else "cpu"

model = HormoneLLM(
    vocab_size=len(tokenizer.vocab),
    hormone_names=hormone_names,
    d_model=128,
    n_heads=4,
    d_ff=512,
    num_layers=3
).to(device)

optimizer = torch.optim.AdamW(model.parameters(), lr=3e-4)

epochs = 300

model.train()
for epoch in range(epochs):
    total_loss = 0.0

    for sample in toy_data:
        src, tgt, hormone_target = make_batch(sample)
        src = src.unsqueeze(0).to(device)
        tgt = tgt.unsqueeze(0).to(device)
        hormone_target = hormone_target.unsqueeze(0).to(device)

        out = model(src, tgt[:, :-1])

        logits = out["logits"]
        hormone_pred = out["hormones"]

        l_lang = language_loss(logits, tgt[:, 1:])
        l_horm = hormone_loss(hormone_pred, hormone_target)

        loss = l_lang + 0.5 * l_horm

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        total_loss += loss.item()

    if epoch % 50 == 0:
        print(f"Epoch {epoch} | Loss {total_loss:.4f}")

Epoch 0 | Loss 14.5373
Epoch 50 | Loss 0.1150
Epoch 100 | Loss 0.0514
Epoch 150 | Loss 0.0328
Epoch 200 | Loss 0.0214
Epoch 250 | Loss 0.0153


## Chat Inference Function

This function demonstrates how to use the trained `HormoneLLM` for generating responses in a chat-like manner. Given an input text, it encodes it, then iteratively decodes a response token by token until an end-of-sequence token is generated or `max_len` is reached. It also returns the predicted hormone values for the input.

In [34]:
def chat(model, tokenizer, text, max_len=20):
    model.eval()

    src = torch.tensor(tokenizer.encode(text)).unsqueeze(0).to(device)
    decoder = torch.tensor([[tokenizer.vocab["<bos>"]]]).to(device)

    hormones_out = None

    for _ in range(max_len):
        out = model(src, decoder)
        logits = out["logits"][:, -1]
        next_id = logits.argmax(-1).unsqueeze(1)

        decoder = torch.cat([decoder, next_id], dim=1)
        hormones_out = out["hormone_map"]

        if next_id.item() == tokenizer.vocab["<eos>"]:
            break

    response = tokenizer.decode(decoder[0].tolist())
    hormone_values = {
        k: float(v.item()) for k,v in hormones_out.items()
    }

    return response, hormone_values


## Example Chat Interaction

This cell showcases an example usage of the `chat` function with the trained model and tokenizer. It provides an input phrase and prints the generated response along with the corresponding predicted hormone levels.

In [35]:
resp, hormones = chat(
    model,
    tokenizer,
    "i am very happy today"
)

print("Response:", resp)
print("Hormones:")
for k,v in hormones.items():
    print(f"  {k}: {v:.2f}")

Response: <bos> that is wonderful to hear
Hormones:
  dopamine: 0.90
  serotonin: 0.80
  cortisol: 0.10
  oxytocin: 0.60
  adrenaline: 0.30
  endorphin: 0.70


## Saving Model and Tokenizer

This cell saves the trained model's state dictionary and the tokenizer's vocabulary to disk. This allows for persistence of the trained components, so they can be loaded and reused later without retraining.

In [36]:
torch.save(model.state_dict(), "hormone_llm.pt")
tokenizer.save("tokenizer.json")

## Loading and Testing the Model

This cell demonstrates how to load a previously saved tokenizer and model. It then performs another chat interaction with the loaded model to verify that it is functioning correctly after being reloaded from disk.

In [38]:
loaded_tokenizer = SimpleTokenizer.load("tokenizer.json")

loaded_model = HormoneLLM(
    vocab_size=len(loaded_tokenizer.vocab),
    hormone_names=hormone_names,
    d_model=128,
    n_heads=4,
    d_ff=512,
    num_layers=3
).to(device)

loaded_model.load_state_dict(
    torch.load("hormone_llm.pt", map_location=device)
)

resp, hormones = chat(
    loaded_model,
    loaded_tokenizer,
    "i hate you"
)

print("Loaded model response:", resp)
print("Hormones:", hormones)

Loaded model response: <bos> you
Hormones: {'dopamine': 0.45865148305892944, 'serotonin': 0.4118742346763611, 'cortisol': 0.433440625667572, 'oxytocin': 0.5855403542518616, 'adrenaline': 0.275560587644577, 'endorphin': 0.5346636772155762}
