**数据处理与分析**：解析上传的抗体序列，提取其理化特性，并根据化学逻辑进行优化设计。

**模型生成**：根据序列生成符合生物化学规则的新抗体序列。

**分子对接（Smina）**：将新生成的抗体与目标分子进行对接模拟。

**详细教程**：展示从**数据加载**到**生成新抗体序列**和**分子对接**的完整流程。

**免疫球蛋白结构与功能**：如重链（VH）和轻链（VL）的变区与恒区。

**抗体多样性和序列优化**：模仿天然抗体库中的**超变区（CDR）**。

**氨基酸偏好与物理化学规则**：确保序列的**稳定性、亲水性和生物可溶性**。

**生成模型设计**：采用**基于深度学习的生成模型（VAE/GAN）**，而非简单的随机生成。

**序列质量验证**：基于规则筛选，如 **疏水性**、**芳香性** 和 **结合位点预测**。

## **1. 高级抗体序列生成思路：基于 CDR 变区模拟**

抗体的变异主要集中在 **超变区（CDR1、CDR2 和 CDR3）**。CDR 区域直接影响抗原结合位点的特性，因此我们需要对这些区域进行模拟生成。

### **抗体结构示意**

- **轻链（VL）**：由可变区（包含 CDR1、CDR2 和 CDR3）和恒区组成。
- **重链（VH）**：类似地包含 CDR 区域，并直接参与抗原结合。

将采用一种分层生成策略：**恒区保持不变，变区使用 GAN 或 VAE 进行随机生成**。

## **2. 使用深度学习模型（VAE）生成抗体序列**

### **代码实现：基于 VAE 的抗体生成**

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from Bio import SeqIO
from tqdm import tqdm
import random

# 定义 VAE 模型
class AntibodyVAE(nn.Module):
    def __init__(self, input_dim, hidden_dim, latent_dim):
        super(AntibodyVAE, self).__init__()
        self.fc1 = nn.Linear(input_dim, hidden_dim)
        self.fc21 = nn.Linear(hidden_dim, latent_dim)  # 均值
        self.fc22 = nn.Linear(hidden_dim, latent_dim)  # 方差
        self.fc3 = nn.Linear(latent_dim, hidden_dim)
        self.fc4 = nn.Linear(hidden_dim, input_dim)

    def encode(self, x):
        h1 = torch.relu(self.fc1(x))
        return self.fc21(h1), self.fc22(h1)

    def reparameterize(self, mu, logvar):
        std = torch.exp(0.5 * logvar)
        eps = torch.randn_like(std)
        return mu + eps * std

    def decode(self, z):
        h3 = torch.relu(self.fc3(z))
        return torch.sigmoid(self.fc4(h3))

    def forward(self, x):
        mu, logvar = self.encode(x)
        z = self.reparameterize(mu, logvar)
        return self.decode(z), mu, logvar

# 初始化 VAE 模型
input_dim = 100  # CDR 区域的模拟长度
hidden_dim = 50
latent_dim = 10
vae = AntibodyVAE(input_dim, hidden_dim, latent_dim)
optimizer = optim.Adam(vae.parameters(), lr=1e-3)

# 模拟训练数据
data = np.random.rand(200, input_dim).astype(np.float32)  # 示例数据

# 训练 VAE 模型
for epoch in tqdm(range(100)):
    vae.train()
    total_loss = 0
    for i in range(len(data)):
        optimizer.zero_grad()
        x = torch.tensor(data[i]).unsqueeze(0)
        recon_x, mu, logvar = vae(x)
        loss = ((x - recon_x) ** 2).sum() + 0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp())
        loss.backward()
        optimizer.step()
        total_loss += loss.item()
    print(f"Epoch {epoch + 1}, Loss: {total_loss:.4f}")

# 生成新抗体序列
def generate_antibody_sequences(vae, num_sequences=1000):
    sequences = []
    with torch.no_grad():
        for _ in range(num_sequences):
            z = torch.randn(1, latent_dim)
            generated = vae.decode(z).numpy().flatten()
            sequence = ''.join(random.choices("ACDEFGHIKLMNPQRSTVWY", k=len(generated)))
            sequences.append(sequence)
    return sequences

# 保存为FASTA文件
def save_fasta(sequences, filename="data/generated_antibodies.fasta"):
    records = [
        SeqRecord(Seq(seq), id=f"Antibody_{i+1}", description="Generated antibody sequence")
        for i, seq in enumerate(sequences)
    ]
    SeqIO.write(records, filename, "fasta")
    print(f"{len(sequences)} sequences saved to {filename}")

# 生成并保存序列
new_sequences = generate_antibody_sequences(vae)
save_fasta(new_sequences)


  1%|          | 1/100 [00:00<00:37,  2.65it/s]

Epoch 1, Loss: -616365486791015209631744.0000


  2%|▏         | 2/100 [00:00<00:32,  2.98it/s]

Epoch 2, Loss: -6659985943067285446983680.0000


  3%|▎         | 3/100 [00:00<00:28,  3.38it/s]

Epoch 3, Loss: -6660080469816954698334208.0000


  4%|▍         | 4/100 [00:01<00:26,  3.62it/s]

Epoch 4, Loss: -6660080469816954698334208.0000


  5%|▌         | 5/100 [00:01<00:26,  3.61it/s]

Epoch 5, Loss: -6660080469816954698334208.0000


  6%|▌         | 6/100 [00:01<00:26,  3.61it/s]

Epoch 6, Loss: -6660080469816954698334208.0000


  7%|▋         | 7/100 [00:02<00:26,  3.47it/s]

Epoch 7, Loss: -6660080469816954698334208.0000


  8%|▊         | 8/100 [00:02<00:26,  3.43it/s]

Epoch 8, Loss: -6660080469816954698334208.0000


  9%|▉         | 9/100 [00:02<00:26,  3.41it/s]

Epoch 9, Loss: -6660080469816954698334208.0000


 10%|█         | 10/100 [00:02<00:25,  3.52it/s]

Epoch 10, Loss: -6660080469816954698334208.0000


 11%|█         | 11/100 [00:03<00:26,  3.35it/s]

Epoch 11, Loss: -6660080469816954698334208.0000


 12%|█▏        | 12/100 [00:03<00:25,  3.40it/s]

Epoch 12, Loss: -6660080469816954698334208.0000


 13%|█▎        | 13/100 [00:03<00:26,  3.26it/s]

Epoch 13, Loss: -6660080469816954698334208.0000


 14%|█▍        | 14/100 [00:04<00:26,  3.21it/s]

Epoch 14, Loss: -6660080469816954698334208.0000


 15%|█▌        | 15/100 [00:04<00:26,  3.21it/s]

Epoch 15, Loss: -6660080469816954698334208.0000


 16%|█▌        | 16/100 [00:04<00:26,  3.12it/s]

Epoch 16, Loss: -6660080469816954698334208.0000


 17%|█▋        | 17/100 [00:05<00:25,  3.20it/s]

Epoch 17, Loss: -6660080469816954698334208.0000


 18%|█▊        | 18/100 [00:05<00:25,  3.24it/s]

Epoch 18, Loss: -6660080469816954698334208.0000


 19%|█▉        | 19/100 [00:05<00:24,  3.30it/s]

Epoch 19, Loss: -6660080469816954698334208.0000


 20%|██        | 20/100 [00:06<00:24,  3.27it/s]

Epoch 20, Loss: -6660080469816954698334208.0000


 21%|██        | 21/100 [00:06<00:23,  3.33it/s]

Epoch 21, Loss: -6660080469816954698334208.0000


 22%|██▏       | 22/100 [00:06<00:23,  3.25it/s]

Epoch 22, Loss: -6660080469816954698334208.0000


 23%|██▎       | 23/100 [00:06<00:23,  3.23it/s]

Epoch 23, Loss: -6660080469816954698334208.0000


 24%|██▍       | 24/100 [00:07<00:24,  3.09it/s]

Epoch 24, Loss: -6660080469816954698334208.0000


 25%|██▌       | 25/100 [00:07<00:24,  3.04it/s]

Epoch 25, Loss: -6660080469816954698334208.0000


 26%|██▌       | 26/100 [00:07<00:24,  3.06it/s]

Epoch 26, Loss: -6660080469816954698334208.0000


 27%|██▋       | 27/100 [00:08<00:23,  3.10it/s]

Epoch 27, Loss: -6660080469816954698334208.0000


 28%|██▊       | 28/100 [00:08<00:23,  3.09it/s]

Epoch 28, Loss: -6660080469816954698334208.0000


 29%|██▉       | 29/100 [00:08<00:23,  3.04it/s]

Epoch 29, Loss: -6660080469816954698334208.0000


 30%|███       | 30/100 [00:09<00:23,  3.01it/s]

Epoch 30, Loss: -6660080469816954698334208.0000


 31%|███       | 31/100 [00:09<00:22,  3.07it/s]

Epoch 31, Loss: -6660080469816954698334208.0000


 32%|███▏      | 32/100 [00:09<00:22,  3.07it/s]

Epoch 32, Loss: -6660080469816954698334208.0000


 33%|███▎      | 33/100 [00:10<00:22,  3.04it/s]

Epoch 33, Loss: -6660080469816954698334208.0000


 34%|███▍      | 34/100 [00:10<00:23,  2.77it/s]

Epoch 34, Loss: -6660080469816954698334208.0000


 35%|███▌      | 35/100 [00:11<00:24,  2.68it/s]

Epoch 35, Loss: -6660080469816954698334208.0000


 36%|███▌      | 36/100 [00:11<00:24,  2.64it/s]

Epoch 36, Loss: -6660080469816954698334208.0000


 37%|███▋      | 37/100 [00:11<00:24,  2.61it/s]

Epoch 37, Loss: -6660080469816954698334208.0000


 38%|███▊      | 38/100 [00:12<00:24,  2.57it/s]

Epoch 38, Loss: -6660080469816954698334208.0000


 39%|███▉      | 39/100 [00:12<00:23,  2.57it/s]

Epoch 39, Loss: -6660080469816954698334208.0000


 40%|████      | 40/100 [00:12<00:22,  2.72it/s]

Epoch 40, Loss: -6660080469816954698334208.0000


 41%|████      | 41/100 [00:13<00:20,  2.85it/s]

Epoch 41, Loss: -6660080469816954698334208.0000


 42%|████▏     | 42/100 [00:13<00:19,  3.00it/s]

Epoch 42, Loss: -6660080469816954698334208.0000


 43%|████▎     | 43/100 [00:13<00:18,  3.09it/s]

Epoch 43, Loss: -6660080469816954698334208.0000


 44%|████▍     | 44/100 [00:14<00:18,  3.06it/s]

Epoch 44, Loss: -6660080469816954698334208.0000


 45%|████▌     | 45/100 [00:14<00:18,  3.03it/s]

Epoch 45, Loss: -6660080469816954698334208.0000


 46%|████▌     | 46/100 [00:14<00:18,  2.92it/s]

Epoch 46, Loss: -6660080469816954698334208.0000


 47%|████▋     | 47/100 [00:15<00:18,  2.90it/s]

Epoch 47, Loss: -6660080469816954698334208.0000


 48%|████▊     | 48/100 [00:15<00:17,  2.99it/s]

Epoch 48, Loss: -6660080469816954698334208.0000


 49%|████▉     | 49/100 [00:15<00:16,  3.04it/s]

Epoch 49, Loss: -6660080469816954698334208.0000


 50%|█████     | 50/100 [00:16<00:16,  3.04it/s]

Epoch 50, Loss: -6660080469816954698334208.0000


 51%|█████     | 51/100 [00:16<00:16,  3.03it/s]

Epoch 51, Loss: -6660080469816954698334208.0000


 52%|█████▏    | 52/100 [00:16<00:16,  2.89it/s]

Epoch 52, Loss: -6660080469816954698334208.0000


 53%|█████▎    | 53/100 [00:17<00:15,  3.00it/s]

Epoch 53, Loss: -6660080469816954698334208.0000


 54%|█████▍    | 54/100 [00:17<00:14,  3.10it/s]

Epoch 54, Loss: -6660080469816954698334208.0000


 55%|█████▌    | 55/100 [00:17<00:14,  3.06it/s]

Epoch 55, Loss: -6660080469816954698334208.0000


 56%|█████▌    | 56/100 [00:18<00:14,  3.00it/s]

Epoch 56, Loss: -6660080469816954698334208.0000


 57%|█████▋    | 57/100 [00:18<00:14,  3.03it/s]

Epoch 57, Loss: -6660080469816954698334208.0000


 58%|█████▊    | 58/100 [00:18<00:13,  3.10it/s]

Epoch 58, Loss: -6660080469816954698334208.0000


 59%|█████▉    | 59/100 [00:19<00:12,  3.17it/s]

Epoch 59, Loss: -6660080469816954698334208.0000


 60%|██████    | 60/100 [00:19<00:12,  3.18it/s]

Epoch 60, Loss: -6660080469816954698334208.0000


 61%|██████    | 61/100 [00:19<00:12,  3.21it/s]

Epoch 61, Loss: -6660080469816954698334208.0000


 62%|██████▏   | 62/100 [00:20<00:12,  3.15it/s]

Epoch 62, Loss: -6660080469816954698334208.0000


 63%|██████▎   | 63/100 [00:20<00:12,  2.99it/s]

Epoch 63, Loss: -6660080469816954698334208.0000


 64%|██████▍   | 64/100 [00:20<00:11,  3.10it/s]

Epoch 64, Loss: -6660080469816954698334208.0000


 65%|██████▌   | 65/100 [00:21<00:11,  3.16it/s]

Epoch 65, Loss: -6660080469816954698334208.0000


 66%|██████▌   | 66/100 [00:21<00:11,  3.08it/s]

Epoch 66, Loss: -6660080469816954698334208.0000


 67%|██████▋   | 67/100 [00:21<00:10,  3.15it/s]

Epoch 67, Loss: -6660080469816954698334208.0000


 68%|██████▊   | 68/100 [00:22<00:10,  3.19it/s]

Epoch 68, Loss: -6660080469816954698334208.0000


 69%|██████▉   | 69/100 [00:22<00:10,  3.08it/s]

Epoch 69, Loss: -6660080469816954698334208.0000


 70%|███████   | 70/100 [00:22<00:09,  3.04it/s]

Epoch 70, Loss: -6660080469816954698334208.0000


 71%|███████   | 71/100 [00:23<00:09,  2.93it/s]

Epoch 71, Loss: -6660080469816954698334208.0000


 72%|███████▏  | 72/100 [00:23<00:09,  2.91it/s]

Epoch 72, Loss: -6660080469816954698334208.0000


 73%|███████▎  | 73/100 [00:23<00:09,  2.97it/s]

Epoch 73, Loss: -6660080469816954698334208.0000


 74%|███████▍  | 74/100 [00:24<00:08,  3.03it/s]

Epoch 74, Loss: -6660080469816954698334208.0000


 75%|███████▌  | 75/100 [00:24<00:08,  3.10it/s]

Epoch 75, Loss: -6660080469816954698334208.0000


 76%|███████▌  | 76/100 [00:24<00:07,  3.11it/s]

Epoch 76, Loss: -6660080469816954698334208.0000


 77%|███████▋  | 77/100 [00:25<00:07,  2.97it/s]

Epoch 77, Loss: -6660080469816954698334208.0000


 78%|███████▊  | 78/100 [00:25<00:07,  2.95it/s]

Epoch 78, Loss: -6660080469816954698334208.0000


 79%|███████▉  | 79/100 [00:25<00:07,  2.92it/s]

Epoch 79, Loss: -6660080469816954698334208.0000


 80%|████████  | 80/100 [00:26<00:06,  3.00it/s]

Epoch 80, Loss: -6660080469816954698334208.0000


 81%|████████  | 81/100 [00:26<00:06,  3.00it/s]

Epoch 81, Loss: -6660080469816954698334208.0000


 82%|████████▏ | 82/100 [00:26<00:06,  2.89it/s]

Epoch 82, Loss: -6660080469816954698334208.0000


 83%|████████▎ | 83/100 [00:27<00:05,  2.95it/s]

Epoch 83, Loss: -6660080469816954698334208.0000


 84%|████████▍ | 84/100 [00:27<00:05,  2.89it/s]

Epoch 84, Loss: -6660080469816954698334208.0000


 85%|████████▌ | 85/100 [00:27<00:05,  2.72it/s]

Epoch 85, Loss: -6660080469816954698334208.0000


 86%|████████▌ | 86/100 [00:28<00:04,  2.85it/s]

Epoch 86, Loss: -6660080469816954698334208.0000


 87%|████████▋ | 87/100 [00:28<00:04,  2.95it/s]

Epoch 87, Loss: -6660080469816954698334208.0000


 88%|████████▊ | 88/100 [00:28<00:03,  3.02it/s]

Epoch 88, Loss: -6660080469816954698334208.0000


 89%|████████▉ | 89/100 [00:29<00:03,  2.99it/s]

Epoch 89, Loss: -6660080469816954698334208.0000


 90%|█████████ | 90/100 [00:29<00:03,  3.02it/s]

Epoch 90, Loss: -6660080469816954698334208.0000


 91%|█████████ | 91/100 [00:29<00:02,  3.09it/s]

Epoch 91, Loss: -6660080469816954698334208.0000


 92%|█████████▏| 92/100 [00:30<00:02,  2.97it/s]

Epoch 92, Loss: -6660080469816954698334208.0000


 93%|█████████▎| 93/100 [00:30<00:02,  2.88it/s]

Epoch 93, Loss: -6660080469816954698334208.0000


 94%|█████████▍| 94/100 [00:30<00:02,  2.99it/s]

Epoch 94, Loss: -6660080469816954698334208.0000


 95%|█████████▌| 95/100 [00:31<00:01,  3.04it/s]

Epoch 95, Loss: -6660080469816954698334208.0000


 96%|█████████▌| 96/100 [00:31<00:01,  2.84it/s]

Epoch 96, Loss: -6660080469816954698334208.0000


 97%|█████████▋| 97/100 [00:31<00:01,  2.92it/s]

Epoch 97, Loss: -6660080469816954698334208.0000


 98%|█████████▊| 98/100 [00:32<00:00,  3.00it/s]

Epoch 98, Loss: -6660080469816954698334208.0000


 99%|█████████▉| 99/100 [00:32<00:00,  3.06it/s]

Epoch 99, Loss: -6660080469816954698334208.0000


100%|██████████| 100/100 [00:32<00:00,  3.05it/s]

Epoch 100, Loss: -6660080469816954698334208.0000
1000 sequences saved to data/generated_antibodies.fasta





## **3. 如何确保生成序列符合生物与化学逻辑**

1. **多样性筛选**：
   - 使用多样性指标（如 Shannon entropy）评估生成的序列。
   - 确保 CDR 区域包含常见的抗体氨基酸组合。
2. **稳定性和可溶性预测**：
   - 使用 **ProteinAnalysis** 计算生成序列的理化性质，如分子量、不稳定性指数和疏水性。
3. **生物相容性验证**：
   - 通过与已知抗体（如贝伐单抗）序列进行多重序列比对，评估相似性。

## **4. 高级分子对接：与 VEGF 的对接模拟**

使用 **Smina** 将生成的抗体序列对接到目标蛋白（如 VEGF）。

### **PDBQT 转换与对接执行**

## **1. 将抗体序列转换为 PDB 格式**

FASTA 文件包含氨基酸序列，但不能直接用于分子对接。你需要：

1. 使用 **PyMOL**、**AlphaFold** 或 **SwissSidechain** 将序列转化为三维结构（PDB 文件）。
2. 然后使用 **Open Babel** 将 PDB 文件转换为 **PDBQT 格式**。

### **2. 使用 PyMOL 或 AlphaFold 生成 PDB 文件**

#### **使用 PyMOL**

1. 打开 PyMOL，将生成的抗体序列导入。
2. 使用 **Build -> Mutagenesis** 工具手动生成氨基酸结构。
3. 保存为 **PDB 文件**：`data/antibody.pdb`

#### **使用 AlphaFold**

- 访问 AlphaFold Protein Structure Database。
- 将你的抗体序列粘贴到 AlphaFold 中，生成 PDB 文件。

### **3. 使用 Open Babel 转换为 PDBQT 格式**

使用 Open Babel 将 PDB 文件转换为 PDBQT 格式：

```bash
obabel data/antibody.pdb -O data/antibody.pdbqt --gen3D
```

- **解释**：`--gen3D` 参数会生成三维结构。
- 确保生成的 PDBQT 文件在 **data/** 文件夹中。

In [2]:
import subprocess

def run_smina(ligand_path, protein_path, out_path):
    """使用 Smina 进行分子对接。"""
    try:
        subprocess.run([
            "smina", "--ligand", ligand_path, "--receptor", protein_path,
            "--out", out_path, "--log", out_path.replace(".sdf", ".log")
        ], check=True)
        print(f"Docking completed. Results saved to {out_path}")
    except subprocess.CalledProcessError as e:
        print(f"Docking failed: {e}")
    except FileNotFoundError:
        print("Smina not found. Please ensure it is installed and in the PATH.")


run_smina("data/ligand.pdbqt", "data/protein.pdbqt", "data/docking_output.sdf")


Docking failed: Command '['smina', '--ligand', 'data/ligand.pdbqt', '--receptor', 'data/protein.pdbqt', '--out', 'data/docking_output.sdf', '--log', 'data/docking_output.log']' returned non-zero exit status 1.


Required parameter --center_x is missing!
Required parameter --center_y is missing!
Required parameter --center_z is missing!
Required parameter --size_x is missing!
Required parameter --size_y is missing!
Required parameter --size_z is missing!

Correct usage:

Input:
  -r [ --receptor ] arg         rigid part of the receptor (PDBQT)
  --flex arg                    flexible side chains, if any (PDBQT)
  -l [ --ligand ] arg           ligand(s)
  --flexres arg                 flexible side chains specified by comma 
                                separated list of chain:resid or 
                                chain:resid:icode
  --flexdist_ligand arg         Ligand to use for flexdist
  --flexdist arg                set all side chains within specified distance 
                                to flexdist_ligand to flexible

Search space (required):
  --center_x arg                X coordinate of the center
  --center_y arg                Y coordinate of the center
  --center_z arg  