# Level 2 - Week 4 - 03 Refusal and Clarification

**Estimated time:** 60-90 minutes

## Learning Objectives

- Decide mode based on retrieval signals
- Use a deterministic threshold
- Keep modes explicit


## Overview

Don’t rely on prompt wording to decide refusal.

Implement a deterministic rule first, then use prompting to format the response.

## Deterministic rules (examples)

- If no chunks retrieved:
  - ask a clarifying question OR refuse
- If top score < threshold:
  - ask a clarifying question
- If chunks conflict:
  - present both with citations OR ask clarification

## Underlying theory: refusal is a decision under uncertainty

Treat the top retrieval score $s$ as a signal of “is the KB likely relevant?”.

A simple decision rule:

$$
\text{mode} =
\begin{cases}
\text{answer} & s \ge \tau \\
\text{clarify/refuse} & s < \tau
\end{cases}
$$

Where $\tau$ is a threshold you choose.

### False positives vs false negatives

- If $\tau$ is too low:
  - you answer out-of-KB questions (hallucination risk)
- If $\tau$ is too high:
  - you refuse in-KB questions (bad UX)

So threshold choice is a product tradeoff, not a universal constant.

### Why score thresholds are model- and metric-specific

Scores are not comparable across:

- different embedding models
- different distance metrics (cosine vs dot vs L2)
- different vector DB implementations
- different chunking strategies

So any threshold must be calibrated on your own data.

## Recommended modes (make it explicit)

- `answer`: you have enough evidence in retrieved context
- `clarify`: you need the user to specify something to retrieve the right info
- `refuse`: question is out-of-scope/out-of-KB or unsafe to answer without evidence

## Practice Steps

- Implement `decide_mode` based on hits and top score.
- Calibrate a threshold with labeled in-KB vs out-of-KB questions.
- Log `top_score` and `threshold` alongside the chosen mode.

### Sample code

Mode decision based on top score.


In [None]:
def decide_mode(hits: list[dict], threshold: float) -> str:
    if not hits:
        return 'clarify'
    top_score = hits[0].get('score', 0.0)
    return 'answer' if top_score >= threshold else 'refuse'


### Student fill-in

1. Pick a threshold that matches your domain and embedding setup.
2. Test behavior on three kinds of questions:

- in-KB
- ambiguous
- out-of-KB

Expected behavior table:

- in-KB → `answer` with citations
- ambiguous → `clarify`
- out-of-KB → `refuse`

Then record a small labeled set (5–10 items) and ensure your system behaves consistently.

In [None]:
hits_good = [{'score': 0.82}]
hits_bad = [{'score': 0.12}]

# TODO: pick a threshold based on your data
print(decide_mode(hits_good, threshold=0.5))
print(decide_mode(hits_bad, threshold=0.5))


### Exercise: Threshold calibration from labeled scores

Collect two small lists:

- `in_kb_scores`: top scores for questions you *know* are answerable from the KB
- `out_kb_scores`: top scores for questions you *know* are not in the KB

Your goal is to choose a threshold that reduces out-of-KB answers while keeping in-KB refusals acceptable.

In [None]:
def evaluate_threshold(in_kb_scores: list[float], out_kb_scores: list[float], threshold: float) -> dict:
    in_kb_refuse = sum(s < threshold for s in in_kb_scores)
    out_kb_answer = sum(s >= threshold for s in out_kb_scores)
    return {
        "threshold": threshold,
        "in_kb_total": len(in_kb_scores),
        "out_kb_total": len(out_kb_scores),
        "in_kb_refuse": in_kb_refuse,
        "out_kb_answer": out_kb_answer,
    }


in_kb_scores = [0.83, 0.79, 0.74, 0.71, 0.68]
out_kb_scores = [0.42, 0.31, 0.28, 0.19, 0.05]

for t in [0.3, 0.5, 0.65, 0.7]:
    print(evaluate_threshold(in_kb_scores, out_kb_scores, threshold=t))

## Self-check

- Is mode decision deterministic?
- Do you log top_score and threshold?
