# 🧬 다중 펩타이드 후보 기반 단백질 결합력 예측 파이프라인
Multi Peptide–Protein Binding Prediction Pipeline

ProtGPT2로 여러 펩타이드 후보를 생성하고, 각 후보에 대해 구조 예측, 도킹, 상호작용 분석 및 결합력 예측(Pafnucy)을 수행합니다.

- ProtGPT2 펩타이드 생성 → 복합체 FASTA 생성 → 구조 예측(ColabFold) → 도킹/PLIP/Pafnucy 결합력 평가

## 0. Google Drive 마운트

In [None]:
from google.colab import drive
import os

drive.mount('/content/drive')
work_dir = "/content/drive/MyDrive/peptide_docking_pipeline_multi"
os.makedirs(work_dir, exist_ok=True)
os.chdir(work_dir)
print(f"Working directory: {work_dir}")

## 1. ProtGPT2로 펩타이드 후보 다중 생성

In [None]:
!pip install transformers sentencepiece

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("nferruz/ProtGPT2")
model = AutoModelForCausalLM.from_pretrained("nferruz/ProtGPT2").to("cuda" if torch.cuda.is_available() else "cpu")

N = 5  # 생성할 후보 수 (필요시 변경)
peptides = []

for _ in range(N):
    input_ids = tokenizer("generate:", return_tensors="pt").input_ids.to(model.device)
    output = model.generate(input_ids, max_length=30, num_return_sequences=1, do_sample=True, top_k=950, top_p=0.96)
    sequence = tokenizer.decode(output[0], skip_special_tokens=True).replace("generate:", "").strip()
    peptides.append(sequence)

# 펩타이드 FASTA 파일 저장
for i, pep in enumerate(peptides):
    with open(f"peptide_{i}.fasta", "w") as f:
        f.write(f">pep{i}\n{pep}\n")

print("✅ 생성된 펩타이드 후보:")
for i, seq in enumerate(peptides):
    print(f"[{i}] {seq}")

## 2. 단백질 서열 준비

In [None]:
# 사용자가 단백질 서열을 입력해야 합니다.
protein_sequence = "MTMKQLNDLENRLLGFLGNTILADATKSTQAKLEKELLGTTFGAEA"
with open("protein.fasta", "w") as f:
    f.write(">protein\n" + protein_sequence)

print("✅ 단백질 서열 준비 완료")

## 3. 구조 예측 (ColabFold/AlphaFold-Multimer)

In [None]:
merged_files = []
pred_dirs = []

for i in range(N):
    fname = f"complex_{i}.fasta"
    with open(fname, "w") as out, open("protein.fasta") as pro, open(f"peptide_{i}.fasta") as pep:
        out.writelines(pro.readlines())
        out.writelines(pep.readlines())
    merged_files.append(fname)
    pred_dirs.append(f"prediction_complex_{i}")

print("✅ 복합체 FASTA 파일 및 예측 폴더 준비:")
for f, d in zip(merged_files, pred_dirs):
    print(f"- {f} → {d}")

## 4. 구조 예측 결과 준비 및 평가용 출력

In [None]:
try:
    import colabfold
except ImportError:
    !pip install -q colabfold

for fasta_file, out_dir in zip(merged_files, pred_dirs):
    print(f"Running colabfold_batch for {fasta_file} → {out_dir}")
    !colabfold_batch {fasta_file} {out_dir}

In [None]:
import os
pdb_paths = []
for i, pred_dir in enumerate(pred_dirs):
    pdb_file = f"{pred_dir}/complex_{i}_0.pdb"
    if os.path.exists(pdb_file):
        print(f"[{i}] 구조 예측 완료: {pdb_file}")
        pdb_paths.append(pdb_file)
    else:
        print(f"[{i}] ❌ 예측 실패 또는 누락: {pdb_file}")

if len(pdb_paths) == 0:
    raise RuntimeError("⛔ 예측된 구조 파일이 없습니다! ColabFold 예측 상태를 확인하세요.")

## 5. 도킹, PLIP, Pafnucy를 통한 결합력 평가 및 점수 계산

In [None]:
!apt-get install -y openbabel
!pip install -q plip
!git clone https://github.com/oddt/pafnucy.git
%cd pafnucy
!pip install -q -r requirements.txt
%cd ..

from plip.structure.preparation import PDBComplex
import pandas as pd

results = []

for i, pred_pdb in enumerate(pdb_paths):
    os.system(f"obabel {pred_pdb} -O receptor_{i}.pdbqt")
    os.system(f"cp receptor_{i}.pdbqt ligand_{i}.pdbqt")

    # AutoDock Vina 설치 및 실행
    if not os.path.exists("vina_1.2.3_linux_x86_64/vina"):
        !wget -q https://github.com/ccsb-scripps/AutoDock-Vina/releases/download/v1.2.3/vina_1.2.3_linux_x86_64.zip
        !unzip -q vina_1.2.3_linux_x86_64.zip
        !chmod +x vina_1.2.3_linux_x86_64/vina

    vina_cmd = f"./vina_1.2.3_linux_x86_64/vina --receptor receptor_{i}.pdbqt --ligand ligand_{i}.pdbqt --center_x 0 --center_y 0 --center_z 0 --size_x 20 --size_y 20 --size_z 20 --out output_{i}.pdbqt --log log_{i}.txt"
    os.system(vina_cmd)

    vina_score = None
    with open(f"log_{i}.txt") as f:
        for line in f:
            if "REMARK VINA RESULT" in line:
                vina_score = float(line.strip().split()[3])
                break

    # PLIP 상호작용 분석
    structure = PDBComplex()
    structure.load_pdb(f"output_{i}.pdbqt")
    structure.analyze()
    interaction_count = 0
    for ligand in structure.ligands:
        inter = structure.interaction_sets[ligand]
        interaction_count += len(inter.hbonds) + len(inter.hydrophobic_contacts) + len(inter.saltbridge_ligands)

    # Pafnucy 평가
    os.system(f"obabel output_{i}.pdbqt -O complex_final_{i}.pdb")
    os.system(f"python pafnucy/predict.py --pdb complex_final_{i}.pdb --out affinity_{i}.csv")
    pafnucy_df = pd.read_csv(f"affinity_{i}.csv")
    pafnucy_affinity = float(pafnucy_df['predicted_affinity'].iloc[0])

    final_score = (-1 * vina_score) + (-1 * pafnucy_affinity) + (0.5 * interaction_count)
    results.append({
        "index": i,
        "peptide": peptides[i],
        "vina_score": vina_score,
        "pafnucy": pafnucy_affinity,
        "interaction": interaction_count,
        "final_score": final_score
    })

df = pd.DataFrame(results)
df_sorted = df.sort_values("final_score", ascending=False)
df_sorted.to_csv("peptide_binding_rank.csv", index=False)
print("✅ 랭킹 결과 (상위 5개):")
display(df_sorted.head())

print("전체 결과 파일 저장 완료: peptide_binding_rank.csv")