# 옵션 1: Vina로 ColabFold 복합체 PDB 포즈 튜닝 + 기존 파이프라인 Step4-최종 평가
- 입력: 기존 pepbind_pipeline.py 실행 결과 폴더 (예: PDP_20251211_165304)
- 출력: 같은 위치에 PDP_20251211_165304_op1_vina_tuned 폴더 생성
- 튜닝: Vina 포즈 탐색으로 복합체 PDB를 튜닝한 파일 생성
- 평가: 튜닝된 PDB를 대상으로 기존 파이프라인처럼 Step4(Vina) ~ 최종 테이블까지 생성
- 최종 결과: PDP_..._op1_vina_tuned/results/ 아래에 xlsx 저장

In [1]:
# %% [markdown]
# 0) 환경 준비
# - pepbind_pipeline.py 가 import 가능해야 함
# - Vina, OpenBabel(obabel), PLIP, Prodigy가 PATH에 있어야 함 (pepbind 환경 가정)

from pathlib import Path
import importlib
import shutil
import subprocess
import copy
import csv

import numpy as np
from Bio.PDB import PDBParser, PDBIO

import pepbind_pipeline as pb
pb = importlib.reload(pb)

print("VINA_CMD   :", pb.VINA_CMD)
print("OBABEL_CMD :", pb.OBABEL_CMD)
print("PLIP_CMD   :", pb.PLIP_CMD)
print("PRODIGY_SCRIPT:", pb.PRODIGY_SCRIPT)

  from .autonotebook import tqdm as notebook_tqdm


[INFO] PyTorch device: cuda
[INFO] PyTorch device: cuda
VINA_CMD   : vina
OBABEL_CMD : /home/aisys/miniconda3/envs/pepbind/bin/obabel
PLIP_CMD   : plip
PRODIGY_SCRIPT: prodigy


In [2]:
# %% [markdown]
# 1) 입력 RUN_ID 설정 + 기존 RUN_DIR 자동 탐색

RUN_ID = "PDP_20251222_162855"   # ← 여기를 바꿔서 사용

def find_run_dir(run_id: str) -> Path:
    cwd = Path.cwd().resolve()
    candidate_dirs = [
        getattr(pb, "BASE_DIR", Path("~/work/pipeline").expanduser()) / run_id,
        cwd / "pipeline" / run_id,
        cwd.parent / "pipeline" / run_id,
        cwd.parent.parent / "pipeline" / run_id,
        cwd / run_id,
    ]
    print("===== RUN_DIR 후보 =====")
    for p in candidate_dirs:
        print(" -", p, "OK" if p.exists() else "MISSING")
    for p in candidate_dirs:
        if p.exists():
            return p
    raise FileNotFoundError(f"RUN_ID={run_id} 폴더를 찾지 못했습니다. 위 후보를 확인해서 수동 지정하세요.")

RUN_DIR = find_run_dir(RUN_ID)

FASTA_DIR      = RUN_DIR / "fasta"
PDB_DIR        = RUN_DIR / "pdb"
COLABFOLD_DIR  = PDB_DIR / "colabfold_output"

print("\n===== 입력 RUN_DIR =====")
print("RUN_DIR     :", RUN_DIR)
print("COLABFOLD   :", COLABFOLD_DIR, "OK" if COLABFOLD_DIR.exists() else "MISSING")

===== RUN_DIR 후보 =====
 - /home/aisys/work/pipeline/PDP_20251222_162855 OK
 - /home/aisys/work/pipeline/pipeline/PDP_20251222_162855 MISSING
 - /home/aisys/work/pipeline/PDP_20251222_162855 OK
 - /home/aisys/pipeline/PDP_20251222_162855 MISSING
 - /home/aisys/work/pipeline/PDP_20251222_162855 OK

===== 입력 RUN_DIR =====
RUN_DIR     : /home/aisys/work/pipeline/PDP_20251222_162855
COLABFOLD   : /home/aisys/work/pipeline/PDP_20251222_162855/pdb/colabfold_output OK


In [3]:
# %% [markdown]
# 2) 출력 폴더 생성
# - PDP_..._op1_vina_tuned 에 원본 colabfold_output을 복사한 뒤, 튜닝된 PDB를 추가 생성합니다.
# - 결과 엑셀 등은 OUT_RUN_DIR/results/ 아래에 생성됩니다.

OUT_RUN_ID  = f"{RUN_ID}_op1_vina_tuned"
OUT_RUN_DIR = RUN_DIR.parent / OUT_RUN_ID

OUT_FASTA_DIR     = OUT_RUN_DIR / "fasta"
OUT_PDB_DIR       = OUT_RUN_DIR / "pdb"
OUT_COLABFOLD_DIR = OUT_PDB_DIR / "colabfold_output"
OUT_RESULTS_DIR   = OUT_RUN_DIR / "results"
OUT_VINA_TUNE_DIR = OUT_RESULTS_DIR / "vina_tune"
OUT_TEMP_DIR      = OUT_RUN_DIR / "temp"

print("OUT_RUN_DIR :", OUT_RUN_DIR)

OUT_RUN_DIR.mkdir(parents=True, exist_ok=True)
for d in [OUT_FASTA_DIR, OUT_PDB_DIR, OUT_RESULTS_DIR, OUT_VINA_TUNE_DIR, OUT_TEMP_DIR]:
    d.mkdir(parents=True, exist_ok=True)

# fasta 복사
if FASTA_DIR.exists():
    shutil.copytree(FASTA_DIR, OUT_FASTA_DIR, dirs_exist_ok=True)

# colabfold_output 복사 (이미 있으면 스킵)
if COLABFOLD_DIR.exists():
    if not OUT_COLABFOLD_DIR.exists():
        shutil.copytree(COLABFOLD_DIR, OUT_COLABFOLD_DIR)
    else:
        print("[INFO] OUT_COLABFOLD_DIR 이미 존재 → 복사 스킵")
else:
    raise FileNotFoundError(f"입력 colabfold_output 폴더가 없습니다: {COLABFOLD_DIR}")

print("\n===== 출력 폴더 준비 완료 =====")
print("OUT_COLABFOLD_DIR:", OUT_COLABFOLD_DIR)
print("OUT_RESULTS_DIR  :", OUT_RESULTS_DIR)

OUT_RUN_DIR : /home/aisys/work/pipeline/PDP_20251222_162855_op1_vina_tuned

===== 출력 폴더 준비 완료 =====
OUT_COLABFOLD_DIR: /home/aisys/work/pipeline/PDP_20251222_162855_op1_vina_tuned/pdb/colabfold_output
OUT_RESULTS_DIR  : /home/aisys/work/pipeline/PDP_20251222_162855_op1_vina_tuned/results


In [4]:
# %% [markdown]
# 3) 튜닝 대상 PDB 목록 수집
# - 기본: rank_001 포함 파일만 대상으로 함

src_rank1_pdbs = sorted(COLABFOLD_DIR.glob("*rank_001*.*pdb"))
print("src_rank1_pdbs:", len(src_rank1_pdbs))
if src_rank1_pdbs:
    print("예시:", src_rank1_pdbs[0].name)
else:
    print("[WARN] rank_001 PDB가 없습니다. 필요하면 glob 패턴을 수정하세요.")

src_rank1_pdbs: 20
예시: complex_0_unrelaxed_rank_001_alphafold2_multimer_v3_model_1_seed_000.pdb


In [5]:
# %% [markdown]
# 4) Vina 튜닝 파라미터
# - padding: 박스 여유(Å). 작게 하면 '현재 포즈 주변 로컬 탐색'에 가까워짐.
# - exhaustiveness: 탐색 강도(클수록 느림)
# - num_modes: 저장할 포즈 수. 1이면 best pose만 저장

PADDING_ANGSTROM = 6.0
EXHAUSTIVENESS = 16
NUM_MODES = 1
CPU = 0          # 0이면 Vina 기본값(자동) 사용
SEED = 42

print("PADDING_ANGSTROM:", PADDING_ANGSTROM)
print("EXHAUSTIVENESS  :", EXHAUSTIVENESS)
print("NUM_MODES       :", NUM_MODES)
print("CPU             :", CPU)
print("SEED            :", SEED)

PADDING_ANGSTROM: 6.0
EXHAUSTIVENESS  : 16
NUM_MODES       : 1
CPU             : 0
SEED            : 42


In [6]:
# %% [markdown]
# 5) 헬퍼 함수들

def assert_tools_available():
    # obabel
    try:
        res = subprocess.run([pb.OBABEL_CMD, "-V"], capture_output=True, text=True)
        if res.returncode != 0:
            raise RuntimeError(res.stderr[:300])
    except FileNotFoundError:
        raise RuntimeError("obabel을 찾지 못했습니다. (conda/apt로 openbabel 설치 필요)")
    # vina
    try:
        res = subprocess.run([pb.VINA_CMD, "--help"], capture_output=True, text=True)
        if (res.stdout or "") == "" and (res.stderr or "") == "":
            raise RuntimeError("vina 실행 확인 실패")
    except FileNotFoundError:
        raise RuntimeError("vina를 찾지 못했습니다. (conda/apt로 autodock-vina 설치 또는 VINA_CMD 설정)")

assert_tools_available()
print("[OK] vina / obabel 확인 완료")


def obabel_pdbqt_to_pdb(pdbqt_path: Path, out_pdb_path: Path):
    cmd = [pb.OBABEL_CMD, "-ipdbqt", str(pdbqt_path), "-opdb", "-O", str(out_pdb_path)]
    res = subprocess.run(cmd, capture_output=True, text=True)
    if res.returncode != 0:
        raise RuntimeError(f"obabel pdbqt->pdb 실패: {pdbqt_path.name}\n{res.stderr[:300]}")
    return out_pdb_path


def patch_coords_preserve_topology(original_lig_pdb: Path, tuned_lig_pdb: Path, out_pdb: Path) -> Path:
    """
    tuned_lig_pdb의 좌표를 original_lig_pdb 구조(체인/잔기 번호/원자 이름 유지)에 덮어써서 저장.
    원자 수가 안 맞으면 tuned_lig_pdb를 그대로 사용(체인/번호는 달라질 수 있음).
    """
    parser = PDBParser(QUIET=True)
    orig = parser.get_structure("orig", str(original_lig_pdb))
    tuned = parser.get_structure("tuned", str(tuned_lig_pdb))

    orig_atoms = list(next(orig.get_models()).get_atoms())
    tuned_atoms = list(next(tuned.get_models()).get_atoms())

    if len(orig_atoms) != len(tuned_atoms):
        shutil.copy2(tuned_lig_pdb, out_pdb)
        return out_pdb

    for a_orig, a_tuned in zip(orig_atoms, tuned_atoms):
        a_orig.set_coord(a_tuned.get_coord())

    io = PDBIO()
    io.set_structure(orig)
    io.save(str(out_pdb))
    return out_pdb


def merge_receptor_and_ligand(rec_pdb: Path, lig_pdb: Path, out_complex_pdb: Path) -> Path:
    parser = PDBParser(QUIET=True)
    rec = parser.get_structure("rec", str(rec_pdb))
    lig = parser.get_structure("lig", str(lig_pdb))

    rec_model = next(rec.get_models())
    lig_model = next(lig.get_models())

    from Bio.PDB.Structure import Structure
    from Bio.PDB.Model import Model

    new_struct = Structure("complex")
    new_model = Model(0)
    new_struct.add(new_model)

    for ch in rec_model:
        new_model.add(copy.deepcopy(ch))
    for ch in lig_model:
        if ch.id in [c.id for c in new_model]:
            ch2 = copy.deepcopy(ch)
            ch2.id = "B" if ch.id != "B" else "C"
            new_model.add(ch2)
        else:
            new_model.add(copy.deepcopy(ch))

    io = PDBIO()
    io.set_structure(new_struct)
    io.save(str(out_complex_pdb))
    return out_complex_pdb

[OK] vina / obabel 확인 완료


In [7]:
# %% [markdown]
# 6) Vina 튜닝 실행
# - OUT_COLABFOLD_DIR 에 *_op1_vina_tuned.pdb 생성
# - 요약 CSV: OUT_VINA_TUNE_DIR/vina_tune_summary.csv

summary_rows = []
summary_csv = OUT_VINA_TUNE_DIR / "vina_tune_summary.csv"

for complex_pdb in src_rank1_pdbs:
    base = complex_pdb.stem
    out_subdir = OUT_VINA_TUNE_DIR / base
    out_subdir.mkdir(parents=True, exist_ok=True)

    log_file = out_subdir / f"{base}_vina_tune.log"

    print("\n" + "="*80)
    print("[TUNE]", complex_pdb.name)

    chain_counts = pb.get_chain_residue_counts(complex_pdb)
    rec_chain, lig_chain = pb.auto_assign_receptor_ligand(chain_counts, prefer_receptor="A")

    row = {
        "complex": complex_pdb.name,
        "rec_chain": rec_chain,
        "lig_chain": lig_chain,
        "vina_score": "",
        "status": "",
        "tuned_pdb": "",
        "log_file": str(log_file),
    }

    if rec_chain is None or lig_chain is None:
        row["status"] = f"SKIP: receptor/ligand 체인 탐지 실패 (chains={chain_counts})"
        summary_rows.append(row)
        print("[SKIP]", row["status"])
        continue

    try:
        # 1) 체인 분리
        rec_pdb, lig_pdb = pb.split_complex_to_receptor_ligand(
            complex_pdb=complex_pdb,
            out_dir=out_subdir,
            receptor_chain=rec_chain,
            ligand_chain=lig_chain,
        )

        # 2) PDBQT 준비
        rec_pdbqt, lig_pdbqt = pb.prepare_pdbqt(rec_pdb, lig_pdb, out_subdir)

        # 3) 박스 계산 (원본 ligand 기준)
        box = pb.compute_box_from_ligand(lig_pdb, padding=PADDING_ANGSTROM)

        out_pdbqt = out_subdir / f"{base}_vina_out.pdbqt"

        # 4) Vina 실행
        cmd = [
            pb.VINA_CMD,
            "--receptor", str(rec_pdbqt),
            "--ligand", str(lig_pdbqt),
            "--center_x", f"{box['center_x']:.3f}",
            "--center_y", f"{box['center_y']:.3f}",
            "--center_z", f"{box['center_z']:.3f}",
            "--size_x",   f"{box['size_x']:.3f}",
            "--size_y",   f"{box['size_y']:.3f}",
            "--size_z",   f"{box['size_z']:.3f}",
            "--out", str(out_pdbqt),
            "--exhaustiveness", str(EXHAUSTIVENESS),
            "--num_modes", str(NUM_MODES),
        ]
        if CPU and int(CPU) > 0:
            cmd += ["--cpu", str(CPU)]
        if SEED is not None:
            cmd += ["--seed", str(int(SEED))]

        result = subprocess.run(cmd, capture_output=True, text=True)

        log_file.write_text(
            "=== CMD ===\n" + " ".join(cmd) + "\n\n"
            "=== STDOUT ===\n" + (result.stdout or "") + "\n\n"
            "=== STDERR ===\n" + (result.stderr or "") + "\n",
            encoding="utf-8",
        )

        if result.returncode != 0:
            row["status"] = f"ERROR: vina returncode={result.returncode}"
            summary_rows.append(row)
            print("[ERROR] vina 실패:", row["status"])
            continue

        score = pb.parse_vina_score_from_stdout(result.stdout or "")
        row["vina_score"] = score if score is not None else ""
        row["status"] = "OK"

        # 5) out_pdbqt -> out_pdb (ligand만)
        lig_tuned_raw_pdb = out_subdir / f"{base}_ligand_tuned_raw.pdb"
        obabel_pdbqt_to_pdb(out_pdbqt, lig_tuned_raw_pdb)

        # 6) 원본 ligand topology 유지하면서 좌표만 덮어쓰기
        lig_tuned_pdb = out_subdir / f"{base}_ligand_tuned.pdb"
        patch_coords_preserve_topology(lig_pdb, lig_tuned_raw_pdb, lig_tuned_pdb)

        # 7) receptor + tuned ligand merge
        out_complex_pdb = OUT_COLABFOLD_DIR / f"{base}_op1_vina_tuned.pdb"
        merge_receptor_and_ligand(rec_pdb, lig_tuned_pdb, out_complex_pdb)

        tuned_chains = pb.get_chain_residue_counts(out_complex_pdb)
        row["tuned_pdb"] = str(out_complex_pdb)
        row["status"] = f"OK (chains={tuned_chains})"

        print("[OK] vina_score:", row["vina_score"])
        print("[OK] tuned_pdb :", out_complex_pdb.name)

    except Exception as e:
        row["status"] = f"ERROR: {type(e).__name__}: {e}"
        print("[ERROR]", row["status"])

    summary_rows.append(row)

# 요약 CSV 저장
with open(summary_csv, "w", newline="", encoding="utf-8") as f:
    writer = csv.DictWriter(
        f,
        fieldnames=["complex","rec_chain","lig_chain","vina_score","status","tuned_pdb","log_file"],
    )
    writer.writeheader()
    writer.writerows(summary_rows)

print("\n===== 튜닝 완료 =====")
print("OUT_RUN_DIR :", OUT_RUN_DIR)
print("summary_csv :", summary_csv)


[TUNE] complex_0_unrelaxed_rank_001_alphafold2_multimer_v3_model_1_seed_000.pdb
[RUN] /home/aisys/miniconda3/envs/pepbind/bin/obabel -ipdb /home/aisys/work/pipeline/PDP_20251222_162855_op1_vina_tuned/results/vina_tune/complex_0_unrelaxed_rank_001_alphafold2_multimer_v3_model_1_seed_000/complex_0_unrelaxed_rank_001_alphafold2_multimer_v3_model_1_seed_000_receptor_A.pdb -xr -opdbqt -O /home/aisys/work/pipeline/PDP_20251222_162855_op1_vina_tuned/results/vina_tune/complex_0_unrelaxed_rank_001_alphafold2_multimer_v3_model_1_seed_000/complex_0_unrelaxed_rank_001_alphafold2_multimer_v3_model_1_seed_000_receptor_A.pdbqt
[RUN] /home/aisys/miniconda3/envs/pepbind/bin/obabel -ipdb /home/aisys/work/pipeline/PDP_20251222_162855_op1_vina_tuned/results/vina_tune/complex_0_unrelaxed_rank_001_alphafold2_multimer_v3_model_1_seed_000/complex_0_unrelaxed_rank_001_alphafold2_multimer_v3_model_1_seed_000_ligand_B.pdb -opdbqt -O /home/aisys/work/pipeline/PDP_20251222_162855_op1_vina_tuned/results/vina_tune/

In [8]:
# %% [markdown]
# 7) 튜닝된 rank1 PDB 리스트 확보
# - 이후 Step4-최종평가에서는 이 리스트만 사용

rank1_pdbs = sorted(OUT_COLABFOLD_DIR.glob("*_op1_vina_tuned.pdb"))
print("tuned rank1_pdbs:", len(rank1_pdbs))
if rank1_pdbs:
    print("예시:", rank1_pdbs[0].name)
else:
    raise FileNotFoundError("튜닝된 PDB를 찾지 못했습니다. (OUT_COLABFOLD_DIR 확인)")

tuned rank1_pdbs: 20
예시: complex_0_unrelaxed_rank_001_alphafold2_multimer_v3_model_1_seed_000_op1_vina_tuned.pdb


In [9]:
# %% [markdown]
# 8) Step4-최종평가용 폴더(dict) 구성 (check_pepbind_pipeline.ipynb 스타일)
# - 최종 결과 엑셀은 OUT_RESULTS_DIR 아래에 생성됨

RESULTS_DIR = OUT_RESULTS_DIR
VINA_DIR    = RESULTS_DIR / "vina"
PLIP_DIR    = RESULTS_DIR / "plip"
PRODIGY_DIR = RESULTS_DIR / "prodigy"
TEMP_DIR    = OUT_TEMP_DIR

for d in [RESULTS_DIR, VINA_DIR, PLIP_DIR, PRODIGY_DIR, TEMP_DIR]:
    d.mkdir(parents=True, exist_ok=True)

FOLDERS = {
    "root": OUT_RUN_DIR,
    "fasta": OUT_FASTA_DIR,
    "pdb": OUT_PDB_DIR,
    "colabfold_out": OUT_COLABFOLD_DIR,
    "results": RESULTS_DIR,
    "vina": VINA_DIR,
    "plip": PLIP_DIR,
    "prodigy": PRODIGY_DIR,
    "temp": TEMP_DIR,
}

print("RESULTS_DIR:", RESULTS_DIR)

RESULTS_DIR: /home/aisys/work/pipeline/PDP_20251222_162855_op1_vina_tuned/results


In [10]:
# %% [markdown]
# 9) Step4: Vina 도킹(점수 산출) 실행
# - 기존 pepbind_pipeline.run_vina_on_rank1 재사용
# - 결과: RESULTS_DIR/vina/vina_summary.xlsx

pb.run_vina_on_rank1(rank1_pdbs, VINA_DIR)

vina_summary = VINA_DIR / "vina_summary.xlsx"
print("vina_summary:", vina_summary, "OK" if vina_summary.exists() else "MISSING")


STEP 4: AutoDock Vina 도킹

[INFO] Vina 준비: complex_0_unrelaxed_rank_001_alphafold2_multimer_v3_model_1_seed_000_op1_vina_tuned.pdb
[INFO] complex_0_unrelaxed_rank_001_alphafold2_multimer_v3_model_1_seed_000_op1_vina_tuned.pdb 체인 구성: {'A': 222, 'B': 4}
[INFO] 자동 할당 체인: receptor=A, ligand=B
[RUN] /home/aisys/miniconda3/envs/pepbind/bin/obabel -ipdb /home/aisys/work/pipeline/PDP_20251222_162855_op1_vina_tuned/results/vina/complex_0_unrelaxed_rank_001_alphafold2_multimer_v3_model_1_seed_000_op1_vina_tuned/complex_0_unrelaxed_rank_001_alphafold2_multimer_v3_model_1_seed_000_op1_vina_tuned_receptor_A.pdb -xr -opdbqt -O /home/aisys/work/pipeline/PDP_20251222_162855_op1_vina_tuned/results/vina/complex_0_unrelaxed_rank_001_alphafold2_multimer_v3_model_1_seed_000_op1_vina_tuned/complex_0_unrelaxed_rank_001_alphafold2_multimer_v3_model_1_seed_000_op1_vina_tuned_receptor_A.pdbqt
[RUN] /home/aisys/miniconda3/envs/pepbind/bin/obabel -ipdb /home/aisys/work/pipeline/PDP_20251222_162855_op1_vina_tuned/

In [11]:
# %% [markdown]
# 10) Step5: PLIP 실행
# - 결과: RESULTS_DIR/plip/plip_summary.xlsx

pb.run_plip_on_rank1(rank1_pdbs, PLIP_DIR)

plip_summary = PLIP_DIR / "plip_summary.xlsx"
print("plip_summary:", plip_summary, "OK" if plip_summary.exists() else "MISSING")


STEP 5: PLIP 상호작용 분석
[RUN] plip -f /home/aisys/work/pipeline/PDP_20251222_162855_op1_vina_tuned/pdb/colabfold_output/complex_0_unrelaxed_rank_001_alphafold2_multimer_v3_model_1_seed_000_op1_vina_tuned.pdb -o /home/aisys/work/pipeline/PDP_20251222_162855_op1_vina_tuned/results/plip/complex_0_unrelaxed_rank_001_alphafold2_multimer_v3_model_1_seed_000_op1_vina_tuned -x -t --chains [['A'], ['B']]
✔️ PLIP 완료: complex_0_unrelaxed_rank_001_alphafold2_multimer_v3_model_1_seed_000_op1_vina_tuned.pdb → /home/aisys/work/pipeline/PDP_20251222_162855_op1_vina_tuned/results/plip/complex_0_unrelaxed_rank_001_alphafold2_multimer_v3_model_1_seed_000_op1_vina_tuned
[RUN] plip -f /home/aisys/work/pipeline/PDP_20251222_162855_op1_vina_tuned/pdb/colabfold_output/complex_10_unrelaxed_rank_001_alphafold2_multimer_v3_model_1_seed_000_op1_vina_tuned.pdb -o /home/aisys/work/pipeline/PDP_20251222_162855_op1_vina_tuned/results/plip/complex_10_unrelaxed_rank_001_alphafold2_multimer_v3_model_1_seed_000_op1_vina_tu

In [12]:
# %% [markdown]
# 11) Step6: Prodigy 실행
# - 결과: RESULTS_DIR/prodigy/prodigy_summary.xlsx

pb.run_prodigy_on_rank1(rank1_pdbs, PRODIGY_DIR)

prodigy_summary = PRODIGY_DIR / "prodigy_summary.xlsx"
print("prodigy_summary:", prodigy_summary, "OK" if prodigy_summary.exists() else "MISSING")


STEP 6: PRODIGY 결합 친화도 평가
[RUN] prodigy /home/aisys/work/pipeline/PDP_20251222_162855_op1_vina_tuned/pdb/colabfold_output/complex_0_unrelaxed_rank_001_alphafold2_multimer_v3_model_1_seed_000_op1_vina_tuned.pdb --selection A B
[RUN] prodigy /home/aisys/work/pipeline/PDP_20251222_162855_op1_vina_tuned/pdb/colabfold_output/complex_10_unrelaxed_rank_001_alphafold2_multimer_v3_model_1_seed_000_op1_vina_tuned.pdb --selection A B
[RUN] prodigy /home/aisys/work/pipeline/PDP_20251222_162855_op1_vina_tuned/pdb/colabfold_output/complex_11_unrelaxed_rank_001_alphafold2_multimer_v3_model_1_seed_000_op1_vina_tuned.pdb --selection A B
[RUN] prodigy /home/aisys/work/pipeline/PDP_20251222_162855_op1_vina_tuned/pdb/colabfold_output/complex_12_unrelaxed_rank_001_alphafold2_multimer_v3_model_1_seed_000_op1_vina_tuned.pdb --selection A B
[RUN] prodigy /home/aisys/work/pipeline/PDP_20251222_162855_op1_vina_tuned/pdb/colabfold_output/complex_13_unrelaxed_rank_001_alphafold2_multimer_v3_model_1_seed_000_op1_

In [13]:
# %% [markdown]
# 12) (중요) ipTM 로딩 패치
# - 튜닝된 PDB 파일명(stem)에 '_op1_vina_tuned'가 붙으면서
#   ColabFold의 ranking_debug/scores JSON 파일명과 매칭이 안 될 수 있음
# - build_and_save_final_table 내부에서 사용하는 pb.load_iptm_scores 를,
#   suffix를 제거한 stem으로 찾고 다시 suffix 키로 매핑해주는 래퍼로 패치

_orig_load_iptm_scores = pb.load_iptm_scores

def _patched_load_iptm_scores(colabfold_out_dir: Path, rank1_pdbs_input):
    suffix = "_op1_vina_tuned"
    # 1) 원본 stem(접미사 제거)로 가짜 Path 리스트 생성
    fake_pdbs = []
    map_tuned_to_orig = {}
    for p in rank1_pdbs_input:
        b_tuned = p.stem
        b_orig = b_tuned.replace(suffix, "")
        map_tuned_to_orig[b_tuned] = b_orig
        fake_pdbs.append(colabfold_out_dir / f"{b_orig}.pdb")  # 존재 여부는 load_iptm_scores가 사용하지 않음

    # 2) 원본 로더로 ipTM 로딩 (키는 b_orig 기준)
    iptm_orig = _orig_load_iptm_scores(colabfold_out_dir, fake_pdbs)

    # 3) 튜닝 stem 키로 재매핑
    iptm_tuned = {}
    for b_tuned, b_orig in map_tuned_to_orig.items():
        if b_orig in iptm_orig:
            iptm_tuned[b_tuned] = iptm_orig[b_orig]
    return iptm_tuned

pb.load_iptm_scores = _patched_load_iptm_scores
print("pb.load_iptm_scores 패치 완료")

pb.load_iptm_scores 패치 완료


In [14]:
# %% [markdown]
# 13) 최종 평가 테이블 생성 (기존 pepbind_pipeline.build_and_save_final_table 재사용)
# - 결과: RESULTS_DIR/final_peptide_rank_*.xlsx

def load_peptides_from_fasta(fasta_path: Path):
    peptides = []
    seq = ""
    if not fasta_path.exists():
        return peptides
    with open(fasta_path, "r", encoding="utf-8") as f:
        for line in f:
            line = line.strip()
            if not line:
                continue
            if line.startswith(">"):
                if seq:
                    peptides.append(seq)
                    seq = ""
            else:
                seq += line
        if seq:
            peptides.append(seq)
    return peptides

PEPTIDES = load_peptides_from_fasta(OUT_FASTA_DIR / "peptides.fasta")
print("PEPTIDES:", len(PEPTIDES))

final_xlsx = pb.build_and_save_final_table(FOLDERS, PEPTIDES, rank1_pdbs)
print("최종 결과 엑셀:", final_xlsx)

print("\nRESULTS_DIR 안의 엑셀 파일:")
for xlsx_path in sorted(RESULTS_DIR.glob("*.xlsx")):
    print(" -", xlsx_path.name)

PEPTIDES: 20
[INFO] Vina 요약 엑셀에서 점수 로드: /home/aisys/work/pipeline/PDP_20251222_162855_op1_vina_tuned/results/vina/vina_summary.xlsx
[INFO] Vina 점수를 읽어온 구조 수: 20
[INFO] PRODIGY 요약 엑셀에서 점수 로드: /home/aisys/work/pipeline/PDP_20251222_162855_op1_vina_tuned/results/prodigy/prodigy_summary.xlsx
[INFO] PRODIGY ΔG를 요약 파일에서 불러옴: 20개 구조
[INFO] ipTM 값을 읽어온 구조 수: 20 / 20
[INFO] PLIP 파싱 디버그 로그: /home/aisys/work/pipeline/PDP_20251222_162855_op1_vina_tuned/results/plip/plip_parse_debug.txt
[INFO] PLIP 요약 엑셀 저장: /home/aisys/work/pipeline/PDP_20251222_162855_op1_vina_tuned/results/plip/plip_summary.xlsx
[INFO] PLIP 상호작용을 읽어온 구조 수: 20
✅ 최종 결과 엑셀 저장: /home/aisys/work/pipeline/PDP_20251222_162855_op1_vina_tuned/results/final_peptide_rank_20251222_173903.xlsx
최종 결과 엑셀: /home/aisys/work/pipeline/PDP_20251222_162855_op1_vina_tuned/results/final_peptide_rank_20251222_173903.xlsx

RESULTS_DIR 안의 엑셀 파일:
 - final_peptide_rank_20251222_173903.xlsx


In [15]:
# %% [markdown]
# 14) 튜닝된 최종 PDB만 모아서 zip으로 압축 (results 폴더 저장)

import zipfile
from pathlib import Path

# 저장 위치/파일명
zip_path = RESULTS_DIR / f"{OUT_RUN_ID}_tuned_pdbs.zip"

# 압축 대상: Step7에서 만든 rank1_pdbs (튜닝 최종 PDB만)
# rank1_pdbs 변수가 없다면 아래 한 줄로 재생성 가능
# rank1_pdbs = sorted(OUT_COLABFOLD_DIR.glob("*_op1_vina_tuned.pdb"))

if not rank1_pdbs:
    raise RuntimeError("rank1_pdbs가 비어있습니다. Step7 셀을 먼저 실행했는지 확인하세요.")

with zipfile.ZipFile(zip_path, mode="w", compression=zipfile.ZIP_DEFLATED) as zf:
    for pdb_path in rank1_pdbs:
        pdb_path = Path(pdb_path)
        if pdb_path.exists() and pdb_path.suffix.lower() == ".pdb":
            # zip 내부에는 파일명만 넣음 (폴더 구조 없이)
            zf.write(pdb_path, arcname=pdb_path.name)

print("압축 완료:", zip_path)
print("포함된 PDB 개수:", len(rank1_pdbs))

압축 완료: /home/aisys/work/pipeline/PDP_20251222_162855_op1_vina_tuned/results/PDP_20251222_162855_op1_vina_tuned_tuned_pdbs.zip
포함된 PDB 개수: 20
