# 옵션 2: openmm으로 ColabFold 복합체 PDB 포즈 튜닝 + 기존 파이프라인 Step4-최종 평가
- 입력: 기존 pepbind_pipeline.py 실행 결과 폴더 (예: PDP_20251211_165304)
- 출력: 같은 위치에 PDP_20251211_165304_op2_openmm_tuned 폴더 생성
- 튜닝: openmm

In [10]:
# Cell 1) 기본 설정 (입력 폴더/출력 폴더/코드 경로)

from pathlib import Path
import shutil
import os

# 1) PepBind 베이스 경로 (네 환경에 맞게 수정 가능)
BASE_DIR = Path(os.environ.get("PEPBIND_BASE_DIR", "~/work/pipeline")).expanduser()

# 2) 기존 결과 워크스페이스 이름
SRC_WS = "PDP_20251222_162855"

# 3) 새로 만들 워크스페이스 이름(요청한 규칙)
DST_WS = f"{SRC_WS}_op2_openmm_tuned"

SRC_ROOT = BASE_DIR / SRC_WS
DST_ROOT = BASE_DIR / DST_WS

# 4) OpenMM 튜닝 기능이 들어있는 파이썬 코드 파일 경로
PIPELINE_PY_PATH = (BASE_DIR / "pepbind_pipeline_openmm06.py").expanduser()

print("BASE_DIR      =", BASE_DIR)
print("SRC_ROOT      =", SRC_ROOT)
print("DST_ROOT      =", DST_ROOT)
print("PIPELINE_PY   =", PIPELINE_PY_PATH)

if not SRC_ROOT.exists():
    raise FileNotFoundError(f"기존 워크스페이스 폴더가 없습니다: {SRC_ROOT}")

if not PIPELINE_PY_PATH.exists():
    raise FileNotFoundError(f"pepbind_pipeline_openmm06.py 파일이 없습니다: {PIPELINE_PY_PATH}")

if DST_ROOT.exists():
    print("기존 DST 폴더 삭제:", DST_ROOT)
    shutil.rmtree(DST_ROOT)
print("준비 완료")

BASE_DIR      = /home/aisys/work/pipeline
SRC_ROOT      = /home/aisys/work/pipeline/PDP_20251222_162855
DST_ROOT      = /home/aisys/work/pipeline/PDP_20251222_162855_op2_openmm_tuned
PIPELINE_PY   = /home/aisys/work/pipeline/pepbind_pipeline_openmm06.py
준비 완료


In [11]:
# Cell 2) 워크스페이스 복제 (PDP_... → PDP_..._op2_openmm_tuned)

print("선택 복제 시작...")

# DST 워크스페이스 생성
DST_ROOT.mkdir(parents=True, exist_ok=False)

def copy_dir(src: Path, dst: Path):
    if not src.exists():
        print("스킵(없음):", src)
        return
    shutil.copytree(src, dst)
    print("복사(폴더):", src, "->", dst)

def copy_file(src: Path, dst: Path):
    if not src.exists():
        print("스킵(없음):", src)
        return
    dst.parent.mkdir(parents=True, exist_ok=True)
    shutil.copy2(src, dst)
    print("복사(파일):", src, "->", dst)

# 최소 필수만 복사
copy_dir(SRC_ROOT / "fasta", DST_ROOT / "fasta")
copy_dir(SRC_ROOT / "pdb" / "colabfold_output", DST_ROOT / "pdb" / "colabfold_output")

# 있으면 batch_complexes.csv만 복사 (temp 폴더 전체는 복사하지 않음)
copy_file(SRC_ROOT / "temp" / "batch_complexes.csv", DST_ROOT / "temp" / "batch_complexes.csv")

# 혼동 방지: 결과 폴더는 새로 만들기(빈 폴더)
(DST_ROOT / "results" / "vina").mkdir(parents=True, exist_ok=True)
(DST_ROOT / "results" / "plip").mkdir(parents=True, exist_ok=True)
(DST_ROOT / "results" / "prodigy").mkdir(parents=True, exist_ok=True)

# refined 저장 위치도 미리 생성
(DST_ROOT / "pdb" / "refined").mkdir(parents=True, exist_ok=True)

print("선택 복제 완료:", DST_ROOT)

선택 복제 시작...
복사(폴더): /home/aisys/work/pipeline/PDP_20251222_162855/fasta -> /home/aisys/work/pipeline/PDP_20251222_162855_op2_openmm_tuned/fasta
복사(폴더): /home/aisys/work/pipeline/PDP_20251222_162855/pdb/colabfold_output -> /home/aisys/work/pipeline/PDP_20251222_162855_op2_openmm_tuned/pdb/colabfold_output
복사(파일): /home/aisys/work/pipeline/PDP_20251222_162855/temp/batch_complexes.csv -> /home/aisys/work/pipeline/PDP_20251222_162855_op2_openmm_tuned/temp/batch_complexes.csv
선택 복제 완료: /home/aisys/work/pipeline/PDP_20251222_162855_op2_openmm_tuned


In [12]:
# Cell 3) 모듈 로드 (pepbind_pipeline_openmm02.py를 import해서 함수 재사용)
import importlib.util
import sys

spec = importlib.util.spec_from_file_location("pepbind_openmm02", str(PIPELINE_PY_PATH))
pep = importlib.util.module_from_spec(spec)
sys.modules["pepbind_openmm02"] = pep
spec.loader.exec_module(pep)

print("모듈 로드 완료:", pep.__name__)

[INFO] PyTorch device: cuda
모듈 로드 완료: pepbind_openmm02


In [13]:
# Cell 4) 출력 폴더 dict 구성 + (선택) 기존 평가 결과 폴더 정리
from pathlib import Path

folders = {
    "root": DST_ROOT,
    "fasta": DST_ROOT / "fasta",
    "pdb": DST_ROOT / "pdb",
    "colabfold_out": DST_ROOT / "pdb" / "colabfold_output",
    "results": DST_ROOT / "results",
    "vina": DST_ROOT / "results" / "vina",
    "plip": DST_ROOT / "results" / "plip",
    "prodigy": DST_ROOT / "results" / "prodigy",
    "temp": DST_ROOT / "temp",
}

for p in folders.values():
    if isinstance(p, Path):
        p.mkdir(parents=True, exist_ok=True)

print("폴더 준비 완료")
for k, v in folders.items():
    print(k, "=>", v)

폴더 준비 완료
root => /home/aisys/work/pipeline/PDP_20251222_162855_op2_openmm_tuned
fasta => /home/aisys/work/pipeline/PDP_20251222_162855_op2_openmm_tuned/fasta
pdb => /home/aisys/work/pipeline/PDP_20251222_162855_op2_openmm_tuned/pdb
colabfold_out => /home/aisys/work/pipeline/PDP_20251222_162855_op2_openmm_tuned/pdb/colabfold_output
results => /home/aisys/work/pipeline/PDP_20251222_162855_op2_openmm_tuned/results
vina => /home/aisys/work/pipeline/PDP_20251222_162855_op2_openmm_tuned/results/vina
plip => /home/aisys/work/pipeline/PDP_20251222_162855_op2_openmm_tuned/results/plip
prodigy => /home/aisys/work/pipeline/PDP_20251222_162855_op2_openmm_tuned/results/prodigy
temp => /home/aisys/work/pipeline/PDP_20251222_162855_op2_openmm_tuned/temp


In [14]:
# Cell 5) 기존 peptides 읽기 (peptides.fasta 또는 batch_complexes.csv)
def read_peptides_from_fasta(fa_path: Path):
    if not fa_path.exists():
        return []
    peptides = []
    seq = []
    with open(fa_path, "r") as f:
        for line in f:
            line = line.strip()
            if not line:
                continue
            if line.startswith(">"):
                if seq:
                    peptides.append("".join(seq))
                    seq = []
            else:
                seq.append(line)
        if seq:
            peptides.append("".join(seq))
    return peptides

def read_peptides_from_batch_csv(csv_path: Path):
    if not csv_path.exists():
        return []
    import csv
    peptides = []
    with open(csv_path, "r", newline="") as f:
        reader = csv.DictReader(f)
        for row in reader:
            seq = row["sequence"]
            # "target:peptide"
            if ":" in seq:
                pep_seq = seq.split(":")[1].strip()
                peptides.append(pep_seq)
    return peptides

pep_fasta = folders["fasta"] / "peptides.fasta"
batch_csv = folders["temp"] / "batch_complexes.csv"

peptides = read_peptides_from_fasta(pep_fasta)
if not peptides:
    peptides = read_peptides_from_batch_csv(batch_csv)

print("peptides 개수:", len(peptides))
print("peptides 예시:", peptides[:5])
if not peptides:
    raise RuntimeError("peptides를 읽지 못했습니다. peptides.fasta 또는 batch_complexes.csv를 확인하세요.")


peptides 개수: 20
peptides 예시: ['AASL', 'LLIT', 'RVLA', 'GSVR', 'TSGE']


In [15]:
# Cell 6) rank_001 PDB 찾기 (기존 ColabFold 결과에서)
def find_rank1_pdbs(colabfold_out: Path):
    # 1) unrelaxed rank_001 우선
    p = sorted(colabfold_out.glob("*_unrelaxed_*rank_001*.pdb"))
    if p:
        return p
    # 2) fallback
    p = sorted(colabfold_out.glob("*rank_001*.pdb"))
    return p

rank1_pdbs = find_rank1_pdbs(folders["colabfold_out"])

print("rank_001 PDB 개수:", len(rank1_pdbs))
print("예시:", [x.name for x in rank1_pdbs[:3]])

if not rank1_pdbs:
    raise RuntimeError("ColabFold rank_001 PDB를 찾지 못했습니다. colabfold_output 폴더를 확인하세요.")

rank_001 PDB 개수: 20
예시: ['complex_0_unrelaxed_rank_001_alphafold2_multimer_v3_model_1_seed_000.pdb', 'complex_10_unrelaxed_rank_001_alphafold2_multimer_v3_model_1_seed_000.pdb', 'complex_11_unrelaxed_rank_001_alphafold2_multimer_v3_model_1_seed_000.pdb']


In [16]:
# Cell 7) OpenMM 튜닝(구조 후처리) 실행 → refined PDB 생성
# 튜닝 파라미터 (원하는 값으로 조정 가능)
MD_TIME_PS  = 100.0
TIMESTEP_FS = 2.0
RESTRAINT_K = 1.0

refined_rank1_pdbs = pep.refine_structures_with_openmm_and_relax(
    rank1_pdbs,
    folders["pdb"],
    md_time_ps=MD_TIME_PS,
    timestep_fs=TIMESTEP_FS,
    restraint_k=RESTRAINT_K,
)

print("refined PDB 개수:", len(refined_rank1_pdbs))
print("refined 예시:", [x.name for x in refined_rank1_pdbs[:3]])
print("refined 폴더:", (folders["pdb"] / "refined"))


STEP 3b: 구조 후처리 (OpenMM minimization / short MD / Rosetta Relax)

[REFINE] (1/20) complex_0_unrelaxed_rank_001_alphafold2_multimer_v3_model_1_seed_000.pdb
[OpenMM] 입력 구조: complex_0_unrelaxed_rank_001_alphafold2_multimer_v3_model_1_seed_000.pdb
[OpenMM] OXT 보정: 2개 residue에 OXT 추가 → complex_0_unrelaxed_rank_001_alphafold2_multimer_v3_model_1_seed_000__openmm_prep.pdb
[OpenMM] ForceField: amber14-all.xml + implicit/obc2.xml
[OpenMM] Platform: CUDA
[OpenMM] 에너지 minimization 수행 (maxIterations=2000)
[OpenMM] short MD 수행: 100.0 ps, timestep=2.0 fs, steps=50000
[OpenMM] refinement 완료 → /home/aisys/work/pipeline/PDP_20251222_162855_op2_openmm_tuned/pdb/refined/complex_0_unrelaxed_rank_001_alphafold2_multimer_v3_model_1_seed_000_openmm_refined.pdb
[INFO] RELAX_CMD 미설정 → Rosetta Relax 단계 스킵
[REFINE] 최종 사용 구조: complex_0_unrelaxed_rank_001_alphafold2_multimer_v3_model_1_seed_000_openmm_refined.pdb

[REFINE] (2/20) complex_10_unrelaxed_rank_001_alphafold2_multimer_v3_model_1_seed_000.pdb
[OpenMM] 입

In [17]:
# Cell 8) refined PDB 기준으로 Vina / PLIP / PRODIGY 재계산
# Vina
pep.run_vina_on_rank1(refined_rank1_pdbs, folders["vina"])
print("Vina 완료:", folders["vina"])

# PLIP
pep.run_plip_on_rank1(refined_rank1_pdbs, folders["plip"])
print("PLIP 완료:", folders["plip"])

# PRODIGY
pep.run_prodigy_on_rank1(refined_rank1_pdbs, folders["prodigy"])
print("PRODIGY 완료:", folders["prodigy"])


STEP 4: AutoDock Vina 도킹

[INFO] Vina 준비: complex_0_unrelaxed_rank_001_alphafold2_multimer_v3_model_1_seed_000_openmm_refined.pdb
[INFO] complex_0_unrelaxed_rank_001_alphafold2_multimer_v3_model_1_seed_000_openmm_refined.pdb 체인 구성: {'A': 222, 'B': 4}
[INFO] 자동 할당 체인: receptor=A, ligand=B
[RUN] /home/aisys/miniconda3/envs/pepbind_openmm/bin/obabel -ipdb /home/aisys/work/pipeline/PDP_20251222_162855_op2_openmm_tuned/results/vina/complex_0_unrelaxed_rank_001_alphafold2_multimer_v3_model_1_seed_000_openmm_refined/complex_0_unrelaxed_rank_001_alphafold2_multimer_v3_model_1_seed_000_openmm_refined_receptor_A.pdb -xr -opdbqt -O /home/aisys/work/pipeline/PDP_20251222_162855_op2_openmm_tuned/results/vina/complex_0_unrelaxed_rank_001_alphafold2_multimer_v3_model_1_seed_000_openmm_refined/complex_0_unrelaxed_rank_001_alphafold2_multimer_v3_model_1_seed_000_openmm_refined_receptor_A.pdbqt
[RUN] /home/aisys/miniconda3/envs/pepbind_openmm/bin/obabel -ipdb /home/aisys/work/pipeline/PDP_20251222_1628

In [18]:
# Cell 9) 최종 산출물 생성 (PDB zip + 최종 엑셀)
from datetime import datetime

# (선택) refined pdb zip
pdb_zip = pep.zip_rank1_pdbs(refined_rank1_pdbs, folders["results"])
print("PDB zip:", pdb_zip)

# 최종 엑셀
final_xlsx = pep.build_and_save_final_table(
    folders=folders,
    peptides=peptides,
    rank1_pdbs=refined_rank1_pdbs,
    start_time=None,
    end_time=None,
    step_timings=None,
)
print("최종 엑셀:", final_xlsx)

print("\n완료. 결과 워크스페이스:", DST_ROOT)

✅ rank_001 PDB 압축 저장: /home/aisys/work/pipeline/PDP_20251222_162855_op2_openmm_tuned/results/peptide_structures_20251222_175254.zip
PDB zip: /home/aisys/work/pipeline/PDP_20251222_162855_op2_openmm_tuned/results/peptide_structures_20251222_175254.zip
[INFO] Vina 요약 엑셀에서 점수 로드: /home/aisys/work/pipeline/PDP_20251222_162855_op2_openmm_tuned/results/vina/vina_summary.xlsx
[INFO] Vina 점수를 읽어온 구조 수: 20
[INFO] PRODIGY 요약 엑셀에서 점수 로드: /home/aisys/work/pipeline/PDP_20251222_162855_op2_openmm_tuned/results/prodigy/prodigy_summary.xlsx
[INFO] PRODIGY ΔG를 요약 파일에서 불러옴: 20개 구조
[INFO] ipTM 값을 읽어온 구조 수: 39 / 20
[INFO] PLIP 파싱 디버그 로그: /home/aisys/work/pipeline/PDP_20251222_162855_op2_openmm_tuned/results/plip/plip_parse_debug.txt
[INFO] PLIP 요약 엑셀 저장: /home/aisys/work/pipeline/PDP_20251222_162855_op2_openmm_tuned/results/plip/plip_summary.xlsx
[INFO] PLIP 상호작용을 읽어온 구조 수: 20
✅ 최종 결과 엑셀 저장: /home/aisys/work/pipeline/PDP_20251222_162855_op2_openmm_tuned/results/final_peptide_rank_20251222_175255.xlsx
최종 엑