<a href="https://colab.research.google.com/github/jihyeyu33/LLM-Interactive-Clarification/blob/main/04_demo/ambiguity_classification_demo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!nvidia-smi
!pip install -q transformers accelerate bitsandbytes gradio peft

Mon Dec  8 09:15:12 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  NVIDIA A100-SXM4-80GB          Off |   00000000:00:05.0 Off |                    0 |
| N/A   31C    P0             55W /  400W |       0MiB /  81920MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
                                                

In [2]:
!huggingface-cli login


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    To log in, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Enter your token (input will not be visible): 
Add token as git credential? (Y/n) y
Token is valid (permission: write).
The token `ambiguous-question-generator` has been saved to /root/.cache/huggingface/stored_tokens
[1m[31mCannot authenticate through git-credential as no helper is defined on your machine.
You might 

In [3]:
import torch

print(f"GPU 사용 가능: {torch.cuda.is_available()}")
print(f"GPU 이름: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'N/A'}")

GPU 사용 가능: True
GPU 이름: NVIDIA A100-SXM4-80GB


In [4]:
# 모델 설정
CLASSIFY_BASE_MODEL = "microsoft/Phi-4-mini-reasoning"
CLASSIFY_LORA_ADAPTER = "jyering/classify-ambig"

QUESTION_GEN_BASE_MODEL = "microsoft/Phi-4-mini-reasoning"
QUESTION_GEN_LORA_ADAPTER = "Dayeoni/question-generator-dpo"

ANSWER_GEN_MODEL = "meta-llama/Llama-2-7b-chat-hf"

# 분류 모델 시스템 프롬프트
CLASSIFY_SYSTEM_PROMPT = """You are an AI system that determines if the question requires clarification and classifies the ambiguity.

Task:
1. Determine if the question requires clarification: clear(no clarification needed) or ambiguous(clarification needed)
2. Classify the ambiguity:
 - If question is clear, set category=NONE and subclass=NONE
 - If question is ambiguous, classify category and subclass

Output format: category|subclass

Categories:
- EM (Epistemic Misalignment): Questions with unfamiliar entities or self-contradictions
- LA (Linguistic Ambiguity): Questions with lexical or semantic ambiguity
- AO (Aleatoric Output): Questions with missing contextual information causing confusion
- NONE: Clear questions that don't require clarification

Subclasses:
For EM:
- UNF (UNFAMILIAR): Query contains unfamiliar entities or facts
- CONT (CONTRADICTION): Query contains self-contradictions

For LA:
- LEX (LEXICAL): Query contains terms with multiple meanings
- SEM (SEMANTIC): Query lacks context leading to multiple interpretations

For AO:
- WHOM: Query output contains confusion due to missing personal elements
- WHEN: Query output contains confusion due to missing temporal elements
- WHERE: Query output contains confusion due to missing spatial elements
- WHAT: Query output contains confusion due to missing task-specific elements
"""

# 질문 생성 모델 시스템 프롬프트
QUESTION_GEN_SYSTEM_PROMPT = """You are an AI that generates a single, concise clarifying question when a user's query is ambiguous.

Task:
Generate exactly one clarifying question based on the ambiguity type.

Output format: One clarifying question

Categories:
- EM (Epistemic Misalignment): Questions with unfamiliar entities or self-contradictions
- LA (Linguistic Ambiguity): Questions with lexical or semantic ambiguity
- AO (Aleatoric Output): Questions with missing contextual information causing confusion

Subclasses:
For EM:
- UNF (UNFAMILIAR): Query contains unfamiliar entities or facts
- CONT (CONTRADICTION): Query contains self-contradictions

For LA:
- LEX (LEXICAL): Query contains terms with multiple meanings
- SEM (SEMANTIC): Query lacks context leading to multiple interpretations

For AO:
- WHOM: Query output contains confusion due to missing personal elements
- WHEN: Query output contains confusion due to missing temporal elements
- WHERE: Query output contains confusion due to missing spatial elements
- WHAT: Query output contains confusion due to missing task-specific elements"""

print("설정 완료")

설정 완료


In [6]:
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch


# 분류 모델
print("분류 모델 로딩 중...")
classify_tokenizer = AutoTokenizer.from_pretrained(
    CLASSIFY_BASE_MODEL,
    trust_remote_code=True
)

classify_base_model = AutoModelForCausalLM.from_pretrained(
    CLASSIFY_BASE_MODEL,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True
)

classify_model = PeftModel.from_pretrained(
    classify_base_model,
    CLASSIFY_LORA_ADAPTER
)
classify_model = classify_model.merge_and_unload()
print("분류 모델 로드 완료\n")

# 질문 생성 모델
print("질문 생성 모델 로딩 중...")
question_gen_tokenizer = AutoTokenizer.from_pretrained(
    QUESTION_GEN_BASE_MODEL,
    trust_remote_code=True
)

question_gen_base_model = AutoModelForCausalLM.from_pretrained(
    QUESTION_GEN_BASE_MODEL,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True
)

question_gen_model = PeftModel.from_pretrained(
    question_gen_base_model,
    QUESTION_GEN_LORA_ADAPTER
)
question_gen_model = question_gen_model.merge_and_unload()
print("질문 생성 모델 로드 완료\n")

# 답변 생성 모델
print("답변 생성 모델 로딩 중...")
answer_gen_tokenizer = AutoTokenizer.from_pretrained(ANSWER_GEN_MODEL)
answer_gen_model = AutoModelForCausalLM.from_pretrained(
    ANSWER_GEN_MODEL,
    torch_dtype=torch.float16,
    device_map="auto"
)
print("답변 생성 모델 로드 완료\n")

print(f"모든 모델 로드 완료 (GPU 메모리: {torch.cuda.memory_allocated() / 1024**3:.2f} GB)")

분류 모델 로딩 중...


vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json:   0%|          | 0.00/15.5M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/249 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/587 [00:00<?, ?B/s]

config.json: 0.00B [00:00, ?B/s]

`torch_dtype` is deprecated! Use `dtype` instead!


model.safetensors.index.json: 0.00B [00:00, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.90G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/2.77G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

adapter_config.json:   0%|          | 0.00/866 [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/14.7M [00:00<?, ?B/s]

분류 모델 로드 완료

질문 생성 모델 로딩 중...


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

adapter_config.json:   0%|          | 0.00/985 [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/14.7M [00:00<?, ?B/s]

질문 생성 모델 로드 완료

답변 생성 모델 로딩 중...


tokenizer_config.json:   0%|          | 0.00/1.62k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/614 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/188 [00:00<?, ?B/s]

답변 생성 모델 로드 완료

모든 모델 로드 완료 (GPU 메모리: 26.85 GB)


In [7]:
def classify_ambiguity(query):
    """분류 함수"""
    messages = [
        {"role": "system", "content": CLASSIFY_SYSTEM_PROMPT},
        {"role": "user", "content": query}
    ]

    input_ids = classify_tokenizer.apply_chat_template(
        messages,
        add_generation_prompt=True,
        return_tensors="pt"
    ).to(classify_model.device)

    outputs = classify_model.generate(
        input_ids,
        max_new_tokens=50,
        do_sample=False,
        pad_token_id=classify_tokenizer.eos_token_id
    )

    response = classify_tokenizer.decode(
        outputs[0][input_ids.shape[1]:],
        skip_special_tokens=True
    )

    return response.strip()


def generate_clarifying_question(classification, query):
    """명확화 질문 생성"""
    user_input = f"[{classification}] {query}"

    messages = [
        {"role": "system", "content": QUESTION_GEN_SYSTEM_PROMPT},
        {"role": "user", "content": user_input}
    ]

    input_ids = question_gen_tokenizer.apply_chat_template(
        messages,
        add_generation_prompt=True,
        return_tensors="pt"
    ).to(question_gen_model.device)

    outputs = question_gen_model.generate(
        input_ids,
        max_new_tokens=100,
        do_sample=False,
        pad_token_id=question_gen_tokenizer.eos_token_id
    )

    response = question_gen_tokenizer.decode(
        outputs[0][input_ids.shape[1]:],
        skip_special_tokens=True
    )

    return response.strip()


def generate_answer(query):
    """답변 생성"""
    messages = [{"role": "user", "content": query}]

    input_ids = answer_gen_tokenizer.apply_chat_template(
        messages,
        add_generation_prompt=True,
        return_tensors="pt"
    ).to(answer_gen_model.device)

    outputs = answer_gen_model.generate(
        input_ids,
        max_new_tokens=200,
        do_sample=True,
        temperature=0.7,
        pad_token_id=answer_gen_tokenizer.eos_token_id
    )

    response = answer_gen_tokenizer.decode(
        outputs[0][input_ids.shape[1]:],
        skip_special_tokens=True
    )

    return response.strip()


def is_ambiguous(classification):
    """모호성 판단"""
    return "NONE" not in classification.upper()


print("추론 함수 정의 완료")

추론 함수 정의 완료


In [8]:
def process_query(user_query):
    print(f"\n{'='*60}")
    print(f"입력: {user_query}")
    print(f"{'='*60}\n")

    classification = classify_ambiguity(user_query)
    print(f"분류 결과: {classification}")

    if not is_ambiguous(classification):
        print("라우팅: 답변 생성\n")
        output = generate_answer(user_query)
    else:
        print("라우팅: 명확화 질문 생성\n")
        output = generate_clarifying_question(classification, user_query)

    print(f"최종 답변: {output}\n")

    return {
        "분류 결과": classification,
        "출력": output
    }

print("파이프라인 함수 정의 완료")

파이프라인 함수 정의 완료


## Demo Interface

Gradio 기반 인터랙티브 데모입니다. 질문을 입력하면:
1. 모호성 분류 수행
2. 분류 결과에 따라 명확화 질문 또는 직접 답변 생성

In [9]:
import gradio as gr

def gradio_interface(query):
    result = process_query(query)
    return (
        result["분류 결과"],
        result["출력"]
    )

demo = gr.Interface(
    fn=gradio_interface,
    inputs=gr.Textbox(
        label="질문 입력",
        placeholder="예: Give me a list of good coffee shops?",
        lines=2
    ),
    outputs=[
        gr.Textbox(label="분류 결과"),
        gr.Textbox(label="최종 답변", lines=5)
    ],
    title="Ambiguous Query Handler",
    description="모호한 질문을 분류하고 명확화 질문 또는 답변을 생성합니다.",
    examples=[
        ["Give me a list of good coffee shops?"],
        ["What time does the store open?"],
        ["What is the capital of France?"],
    ]
)

demo.launch(share=True)

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://27cef64a1f7742a458.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


