#### 환경

conda create -n qwen3 python=3.10   
conda activate qwen3   
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu124    ## pytorch    
pip install transformers accelerate bitsandbytes sentencepiece protobuf   ## LLM 모델    
pip install huggingface_hub     # 허깅페이스 로그인    

#### 모델 선택

- 모델 파일크기로 필요 GPU메모리량 확인
- 예) RTX 3090*2개 Qwen3-14B 모델 활용
- huggingface 에 token 있으면 빠른 다운로드

In [1]:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
import gc
import os

model_name = "Qwen/Qwen3-14B"
hf_token="hf_**********************"

# GPU 확인
num_gpus = torch.cuda.device_count()
print(f"GPUs available: {num_gpus}")
for i in range(num_gpus):
    props = torch.cuda.get_device_properties(i)
    print(f"  GPU {i}: {props.name} - {props.total_memory / 1024**3:.1f}GB")

# 메모리 정리
print("Cleaning GPU memory...")
torch.cuda.empty_cache()
gc.collect()

# Tokenizer
print("Step 1/2: Loading tokenizer...")
tokenizer = AutoTokenizer.from_pretrained(
    model_name,
    token=hf_token,
    trust_remote_code=True
)
print("  Tokenizer loaded")

# Model
print("Step 2/2: Loading model (this takes 3-5 minutes)...")

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    token=hf_token,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True,
    low_cpu_mem_usage=True
)

  from .autonotebook import tqdm as notebook_tqdm


GPUs available: 2
  GPU 0: NVIDIA GeForce RTX 3090 - 23.6GB
  GPU 1: NVIDIA GeForce RTX 3090 - 23.6GB
Cleaning GPU memory...
Step 1/2: Loading tokenizer...
  Tokenizer loaded
Step 2/2: Loading model (this takes 3-5 minutes)...


`torch_dtype` is deprecated! Use `dtype` instead!
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:09<00:00,  1.21s/it]


In [4]:
def chat(model, message, max_tokens=2000, temperature=0.7):
    messages = [
        {"role": "system", "content": "당신은 ACMG/AMP 변이 해석분야의 임상 유전학 전문가입니다."},
        {"role": "user", "content": message}
    ]
    text = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )

    inputs = tokenizer([text], return_tensors="pt").to(model.device)

    print("Generating response...")

    with torch.no_grad():
        # 메모리 정리
        torch.cuda.empty_cache()

        outputs = model.generate(
            **inputs,
            max_new_tokens=max_tokens,
            temperature=temperature,
            top_p=0.9,
            do_sample=True,
            pad_token_id=tokenizer.eos_token_id
        )

    num_tokens = len(outputs[0]) - len(inputs[0])

    response = tokenizer.decode(
        outputs[0][len(inputs[0]):],
        skip_special_tokens=True
    )

    return response


In [5]:
output = chat(model, message = """ACMG/AMP 2015 가이드라인이란""",max_tokens=4000)
print(output)

Generating response...


In [9]:
def analyze_variant(model, variant_info):
        """ACMG/AMP 2015 변이 분석 """
        prompt = f"""당신은 ACMG/AMP 2015 가이드라인과 ClinGen Sequence Variant Interpretation (SVI) 권고안에 따라 변이 해석을 수행하는 임상 유전학 전문가입니다.

변이 정보:
유전자: {variant_info.get('gene')}
변이: {variant_info.get('variant')}
유형: {variant_info.get('type')}
cDNA: {variant_info.get('cdna', 'N/A')}
단백질: {variant_info.get('protein', 'N/A')}

증거 데이터:
집단 데이터:
- gnomAD AF (전체): {variant_info.get('gnomad_af', 'Unknown')}
- gnomAD AF (popmax): {variant_info.get('gnomad_popmax', 'Unknown')}
- gnomAD 동형접합: {variant_info.get('gnomad_hom', 'Unknown')}

데이터베이스 증거:
- ClinVar: {variant_info.get('clinvar', 'Not found')}
- HGMD: {variant_info.get('hgmd', 'Not available')}

전산 예측:
- CADD: {variant_info.get('cadd', 'N/A')}
- REVEL: {variant_info.get('revel', 'N/A')}
- SpliceAI: {variant_info.get('spliceai', 'N/A')}
- 보존성 (phyloP): {variant_info.get('phylop', 'N/A')}

기능 연구:
- In vitro 데이터: {variant_info.get('functional', 'N/A')}
- In vivo 데이터: {variant_info.get('in_vivo', 'N/A')}

가계 분리 및 증례 데이터:
- 가계 분리: {variant_info.get('segregation', 'N/A')}
- De novo: {variant_info.get('de_novo', 'N/A')}
- 환자-대조군 연구: {variant_info.get('case_control', 'N/A')}

과제: ACMG/AMP 2015 기준을 체계적으로 평가하세요. 각 기준(PVS1, PS1-4, PM1-6, PP1-5, BA1, BS1-4, BP1-7)에 대해 충족 여부를 증거와 근거와 함께 명시하세요. PS3/BS3와 PVS1 강도 조절에 대한 ClinGen SVI 권고안을 적용하세요. 최종 분류(병원성/병원성 가능/VUS/양성 가능/양성)를 신뢰도 및 임상적 해석과 함께 제공하세요."""
        
        return chat(model, prompt, max_tokens=30000, temperature=0.2)



In [10]:
variant = {
        'gene': 'TP53',
        'variant': 'c.818G>A',
        'type': 'missense',
        'cdna': 'c.818G>A',
        'protein': 'p.Arg273His',
        'gnomad_af': '0.000008',
        'clinvar': 'Pathogenic',
        'cadd': '28.5',
        'revel': '0.95',
        'sift': 'deleterious',
        'polyphen2_hvar': 'probably_damaging',
        'protein_domain': 'DNA binding domain',
        'is_hotspot': 'Yes',
        'functional': 'Loss of transactivation activity',
    }

output= analyze_variant(model, variant)
print(output)

Generating response...
<think>
Okay, let's tackle this variant interpretation for TP53 c.818G>A (p.Arg273His). First, I need to recall the ACMG/AMP 2015 guidelines and the ClinGen SVI recommendations. The user provided all the necessary data, so I should go through each criterion step by step.

Starting with PVS1: This is for variants that cause a premature termination codon. But this is a missense variant, so PVS1 doesn't apply here. Next, PS1-4. PS1 is for a variant in a gene where all disease-causing variants are loss-of-function. TP53 is known for LOF variants causing Li-Fraumeni syndrome, but this is a missense. However, some missense variants in TP53 can be pathogenic. PS2 is for a variant in a critical and well-established functional domain. The Arg273 residue is in the DNA-binding domain of p53, which is crucial. So PS2 might apply. PS3 is for segregation data, but the user said it's not available. PS4 is for population data, but gnomAD AF is very low (0.000008), which is a str