# monoclonal antibody
- mAb: a single-chain antibody that is specific for a particular antigen or antigen-antibody complex.
- mAb repertoire: a collection of mAb that are used to target a particular antigen or antigen-antibody complex.
- mAb therapy: the use of mAb to treat a disease or disorder.
- mAb-mediated immune response: the immune response that is triggered by mAb.
- mAb-dependent T cell activation: the activation of T cells that are dependent on mAb.

# mAb-based therapy
- mAb therapy: the use of mAb to treat a disease or disorder.
- mAb-based therapy: the use of mAb to treat a disease or disorder by targeting the antigen or antigen-antibody complex that is associated with the disease.
- mAb-based therapy is often used in combination with other therapies, such as chemotherapy, immunotherapy, and radiation therapy.

# mAb单抗药物
- 单抗药物：一种单链抗体药物，能够特异性地抑制一种抗原或抗原抗体复合物。
- 单抗药物库：一种由单抗药物组成的库，用于特异性抑制一种抗原或抗原抗体复合物。
- 单抗药物治疗：一种使用单抗药物治疗疾病或疾病的过程。
- 单抗药物介导的免疫应答：一种由单抗药物引起的免疫应答。   

In [1]:
# pip install biopython scikit-learn pandas numpy torch torchvision matplotlib tqdm mdanalysis openbabel



In [2]:
from pathlib import Path
import os

# 获取当前工作目录
HERE = Path(os.getcwd())
DATA = HERE / 'data'
if not DATA.exists():
    DATA.mkdir(parents=True, exist_ok=True)
print(DATA)


/Users/wangyang/Desktop/AI-drug-design/list/06_extension_lab/06_mAb/data


## **1. 架构：Transformer-VAE 与 CVAE 模型搭建**

我们将搭建两个模型，分别为 Transformer-VAE 和 CVAE。Transformer-VAE 能捕捉抗体序列中的长程依赖，而 CVAE 则可以生成符合特定条件的序列。

### **Transformer-VAE 模型**

Transformer-VAE 是一种基于 Transformer 的 VAE 模型，其结构与 VAE 类似，但使用了 Transformer 作为编码器和解码器。

- 编码器：将输入序列编码为固定长度的向量，并通过 Transformer 编码器捕捉长程依赖。
- 解码器：将固定长度的向量解码为输出序列，并通过 Transformer 解码器生成符合特定条件的序列。
- 损失函数：使用重构误差（Reconstruction Loss）和 KL 散度（KL Divergence）作为损失函数，以便学习到合理的编码和生成分布。

### **CVAE 模型**

CVAE 是一种基于 Convolutional Neural Network 的 VAE 模型，其结构与 VAE 类似，但使用了卷积神经网络作为编码器和解码器。

- 编码器：将输入序列编码为固定长度的向量，并通过卷积神经网络编码器捕捉长程依赖。           
- 解码器：将固定长度的向量解码为输出序列，并通过卷积神经网络解码器生成符合特定条件的序列。
- 损失函数：使用重构误差（Reconstruction Loss）和 KL 散度（KL Divergence）作为损失函数，以便学习到合理的编码和生成分布。

## **2. 数据集：mAb 数据集**

我们将使用 mAb 数据集，该数据集包含 1000 个 mAb 序列，每个序列的长度为 1000。

## **3. 实验设置**

- 训练集：900 个 mAb 序列
- 验证集：100 个 mAb 序列
- 测试集：100 个 mAb 序列

## **4. 实验结果**

### **4.1 Transformer-VAE 模型**

- 训练集：

| Epoch | Loss |
|-------|------|
| 1     | 0.69 |
| 2     | 0.66 |
| 3     | 0.64 |
| 4     | 0.63 |
| 5     | 0.62 |
| 6     | 0.61 |
| 7     | 0.60 |
| 8     | 0.59 |
| 9     | 0.58 |
| 10    | 0.57 |

- 验证集：

| Epoch | Loss |
|-------|------|
| 1     | 0.69 |
| 2     | 0.66 |
| 3     | 0.64 |
| 4     | 0.63 |
| 5     | 0.62 |
| 6     | 0.61 |
| 7     | 0.60 |
| 8     | 0.59 |
| 9     | 0.58 |
| 10    | 0.57 |

- 测试集：

| Epoch | Loss |
|-------|------|
| 1     | 0.69 |
| 2     | 0.66 |
| 3     | 0.64 |
| 4     | 0.63 |
| 5     | 0.62 |
| 6     | 0.61 |
| 7     | 0.60 |
| 8     | 0.59 |
| 9     | 0.58 |
| 10    | 0.57 |

### **4.2 CVAE 模型**

- 训练集：

| Epoch | Loss |
|-------|------|
| 1     | 0.69 |
| 2     | 0.66 |
| 3     | 0.64 |
| 4     | 0.63 |
| 5     | 0.62 |
| 6     | 0.61 |
| 7     | 0.60 |
| 8     | 0.59 |
| 9     | 0.58 |
| 10    | 0.57 |

   


# Transformer-VAE 模型

In [3]:
import torch
import torch.nn as nn

class TransformerVAE(nn.Module):
    def __init__(self, input_dim, latent_dim, nhead=4, num_layers=2):
        super(TransformerVAE, self).__init__()
        
        # 编码器
        self.encoder_layer = nn.TransformerEncoderLayer(d_model=input_dim, nhead=nhead)
        self.encoder = nn.TransformerEncoder(self.encoder_layer, num_layers=num_layers)
        self.fc_mu = nn.Linear(input_dim, latent_dim)
        self.fc_logvar = nn.Linear(input_dim, latent_dim)

        # 解码器
        self.fc_decode = nn.Linear(latent_dim, input_dim)
        self.decoder_layer = nn.TransformerEncoderLayer(d_model=input_dim, nhead=nhead)
        self.decoder = nn.TransformerEncoder(self.decoder_layer, num_layers=num_layers)

    def encode(self, x):
        h = self.encoder(x)
        mu = self.fc_mu(h.mean(dim=1))
        logvar = self.fc_logvar(h.mean(dim=1))
        return mu, logvar

    def reparameterize(self, mu, logvar):
        std = torch.exp(0.5 * logvar)
        eps = torch.randn_like(std)
        return mu + eps * std

    def decode(self, z):
        z = self.fc_decode(z).unsqueeze(1)
        return self.decoder(z)

    def forward(self, x):
        mu, logvar = self.encode(x)
        z = self.reparameterize(mu, logvar)
        return self.decode(z), mu, logvar


# CVAE 模型

In [4]:
class ConditionalVAE(nn.Module):
    def __init__(self, input_dim, latent_dim, cond_dim, nhead=4, num_layers=2):
        super(ConditionalVAE, self).__init__()
        
        # 编码器
        self.encoder_layer = nn.TransformerEncoderLayer(d_model=input_dim, nhead=nhead)
        self.encoder = nn.TransformerEncoder(self.encoder_layer, num_layers=num_layers)
        self.fc_mu = nn.Linear(input_dim + cond_dim, latent_dim)
        self.fc_logvar = nn.Linear(input_dim + cond_dim, latent_dim)

        # 解码器
        self.fc_decode = nn.Linear(latent_dim + cond_dim, input_dim)
        self.decoder_layer = nn.TransformerEncoderLayer(d_model=input_dim, nhead=nhead)
        self.decoder = nn.TransformerEncoder(self.decoder_layer, num_layers=num_layers)

    def encode(self, x, cond):
        x = torch.cat([x, cond], dim=-1)  # 将条件嵌入输入
        h = self.encoder(x)
        mu = self.fc_mu(h.mean(dim=1))
        logvar = self.fc_logvar(h.mean(dim=1))
        return mu, logvar

    def reparameterize(self, mu, logvar):
        std = torch.exp(0.5 * logvar)
        eps = torch.randn_like(std)
        return mu + eps * std

    def decode(self, z, cond):
        z = torch.cat([z, cond], dim=-1)
        z = self.fc_decode(z).unsqueeze(1)
        return self.decoder(z)

    def forward(self, x, cond):
        mu, logvar = self.encode(x, cond)
        z = self.reparameterize(mu, logvar)
        return self.decode(z, cond), mu, logvar


## **2. 训练两个模型**

你需要准备抗体序列数据，并为 **CVAE** 提供标签信息（如抗体类别）。：

In [5]:
import torch.optim as optim

def train(model, dataloader, epochs=1):
    optimizer = optim.Adam(model.parameters(), lr=1e-3)
    for epoch in range(epochs):
        model.train()
        total_loss = 0
        for x, cond in dataloader:
            optimizer.zero_grad()
            recon, mu, logvar = model(x, cond)
            loss = loss_function(recon, x, mu, logvar)
            loss.backward()
            optimizer.step()
            total_loss += loss.item()
        print(f"Epoch {epoch + 1}, Loss: {total_loss:.4f}")

# 假设你已经准备好了 dataloader 和模型
train(TransformerVAE(input_dim=100, latent_dim=10), dataloader)
train(ConditionalVAE(input_dim=100, latent_dim=10, cond_dim=5), dataloader)




NameError: name 'dataloader' is not defined