
# AI Interviewer — Conversation Flow Manager (LangGraph MVP)
This notebook demonstrates a prototype conversation flow manager for an adaptive AI interviewer built using LangGraph. The system dynamically adjusts its questioning strategy based on a participant’s responses to verify their claimed skills efficiently and accurately.

At its core, the agent maintains a per-skill belief state — a running mean and confidence band - to represent how confident it is that the respondent truly possesses each skill. Each turn of the conversation updates these beliefs and influences what the next question should be. The goal is to maximise verification accuracy while minimising interview time.

## Objective
Design a conversation manager that can:
1.	Measure skill proficiency adaptively – gauge how well the respondent actually understands a claimed skill.
2.	Balance exploration and exploitation – explore unverified skills while deepening assessment of skills with partial evidence.
3.	Optimise for total verification reward – confirm as many skills as possible with strong evidence in limited turns.
4.	Minimise interview cost – reduce redundant or low-information questions to keep the process short and scalable. 

## System Overview
The system operates as an information-seeking loop:

Ask -> Evaluate -> Update Belief -> Choose Next Question 

Under the hood, it's implemented as a LangGraph state graph with a few key nodes: 

generate_candidates → select_question → ask → grade → update → decide


1. **Generate Questions**: Propose a set of questions to ask the respondent. 
2. **Select Question**: Choose the most informative question from the set. 
3. **Ask**: Collect a response from the respondent. 
4. **Evaluate**: Grade the response based on a predefined rubric. 
5. **Update Belief**: Update the belief state based on the response. 
6. **Decide**: Determine what to do next based on the belief state. 

At the end, the system outputs Verification Cards summarising each skill’s confidence, lower-bound certainty, and verification status.

## Design Reasoning 
1. Problem challenge: 
Traditional interview scripts are static and waste turns asking low-value or redundant questions. If we also do exhaustive probing of each skill, we may need 20+ turns. 
We need an interviewer that learns during the interaction. 

2. Key insight: 
The interview can be framed as an active learning or adaptive testing process — each question acts as an experiment that reduces uncertainty about the candidate’s expertise.

3. Belif + policy:
Maintain a running estimate of each skill’s confidence and pick the next question that yields the highest expected information gain (uncertainty reduction) within the turn budget. Compare to a brute-force Large Language Model (LLM) approach that simply generates questions based on a skill, this approach can learns and adapts within the interview, whilst still being able to verify the skills after the interview. 

4. Stopping criteration: 
Stop when further questioning provides minimal confidence improvement — efficient, not exhaustive.

5.	Proof-of-concept scope:
Domain: 3 claimed skills (e.g., PyTorch, LLM Evaluation, CUDA).
Interview length: 8–10 turns.
Output: dynamic evolution of skill confidence + verification summary.
6.	Risks and mitigations:
LLM grading bias: Use fixed rubric + schema validation.
Overconfidence early: Enforce minimum turns before verification.
Verbose or evasive answers: Penalise low specificity in scoring.
JSON drift: Strict schema with fallback templates.




Before we start, let's assume that we have reconciled the participant's CV, LinkedIn profile and Google Scholar profile into a single structured profile.  

In [2]:
SKILLS_PROFILE = {
    "ID": "1234567890",
    "NAME": "Jason T",
    "EMAIL": "jason.t@example.com",
    "PHONE": "+1234567890",
    "SKILLS": [
        {
            "taxonomy_id": "ML/Frameworks/PyTorch",
            "raw_aliases": ["pytorch", "pytorch lightning"],
            "experience_years_claimed": 3,
            "last_used_year": 2025,
            "evidence_sources": [
                {
                    "source": "cv",
                    "span": "Led development of a custom PyTorch-based deep learning pipeline for real-time object detection, achieving 95% accuracy while reducing inference time by 40%. Implemented distributed training using PyTorch Lightning across multiple GPUs.",
                    "confidence": 0.86,
                },
                {
                    "source": "linkedin",
                    "span": "Developed and optimized deep learning models using PyTorch for computer vision applications. Created reusable model architectures and training pipelines with PyTorch Lightning that reduced development time by 60%.",
                    "confidence": 0.74,
                },
            ],
        },
        {
            "taxonomy_id": "Data Science/Frameworks/Pandas",
            "raw_aliases": ["pandas"],
            "experience_years_claimed": 2,
            "last_used_year": 2024,
            "evidence_sources": [
                {
                    "source": "cv",
                    "span": "Built data processing pipelines using Pandas to clean and analyze 100GB+ of customer transaction data. Optimized memory usage by 70% through efficient DataFrame operations and custom aggregation functions.",
                    "confidence": 0.92,
                },
                {
                    "source": "linkedin",
                    "span": "Extensive experience with Pandas for data manipulation and analysis. Created automated reporting systems processing millions of rows of time-series data with custom rolling window calculations.",
                    "confidence": 0.88,
                },
            ],
        },
    ],
}

In [None]:
from typing import TypedDict, List, Dict
from enum import Enum

skills_status = Enum("skills_status", ["unverified", "probed", "verified"])


# --- 1. Define the State for the Graph ---
class InterviewState(TypedDict):
    """Represents the state of the interview at any given time."""

    # Inputs - Interview Data
    skills_taxonomy: List[str]
    skills_evidence_spans: Dict[str, List[str]]
    skills_profile: Dict | None

    # Conversation State
    conversation_history: List[str]  # The history of the conversation
    candidate_questions: List["Question"]  # A pool of questions to choose from
    current_question: str  # The current question
    skills_status: Dict[str, skills_status]  # The status of the skills

    # Processing State
    turn_count: int

    # Verification State
    final_report: str  # The final report of the interview

In [8]:
# --- 2. Define PyDantic Models for structured LLM Outputs


class Question(BaseModel):
    """A question to ask the candidate about a specific skill."""

    skill: str = Field(description="The skill this question is designed to test.")
    text: str = Field(description="The text of the question.")
    difficulty: int = Field(
        description="The difficulty of the question, from 1 (easy) to 5 (expert)."
    )


class FollowupQuestion(BaseModel):
    """Adaptive follow-up question conditioned on the last grade."""

    skill: str
    text: str
    difficulty: int
    rationale: str


class Grade(BaseModel):
    """A grade for the candidate's response."""

    score: int = Field(
        description="The score from 1 (poor) to 5 (excellent) based on the rubric."
    )
    reasoning: str = Field(description="A brief justification for the given score.")
    subscores: Dict[str, float] = Field(
        default_factory=dict,
        description="Subscores in [0,1]: explicitness, specificity, recency, competence, consistency",
    )
    confidence: float | None = Field(
        default=None, ge=0.0, le=1.0, description="Weighted confidence in [0,1]"
    )


class LLMGrade(BaseModel):
    """Minimal schema for structured LLM output."""

    score: int
    reasoning: str

In [None]:
# --- 3. Define the Graph ---


def initialize_state(state: InterviewState) -> InterviewState:
    """Initialise the state of the interview."""
    return state


def generate_questions_node(state: InterviewState) -> InterviewState:
    """Generate a set of questions to ask the candidate."""
    return state


def select_question_node(state: InterviewState) -> InterviewState:
    """Select the most informative question from the set."""
    return state


def ask_question_node(state: InterviewState) -> InterviewState:
    """Ask the candidate a question."""
    return state


def grade_response_node(state: InterviewState) -> InterviewState:
    """Grade the candidate's response."""
    return state


def update_belief_node(state: InterviewState) -> InterviewState:
    """Update the belief state based on the response."""
    return state


def generate_report_node(state: InterviewState) -> InterviewState:
    """Generate a report of the interview."""
    return state

ValueError: OpenAI API key not found. Please set it in your .env file.

In [None]:
# Edges
def should_continue_node(state: InterviewState) -> InterviewState:
    if state["turn_count"] >= state["max_turns"]:
        return "end"
    else:
        return "continue"

In [None]:
# Assemble the graph
workflow = StateGraph(InterviewState)
workflow.add_node("initialize", initialize_state)
workflow.add_node("generate_questions", generate_questions_node)
workflow.add_node("select_question", select_question_node)
workflow.add_node("ask_question", ask_question_node)
workflow.add_node("grade_response", grade_response_node)

# Set entry point
workflow.set_entry_point("initialize")

# Add edges
workflow.add_edge("initialize", "generate_questions")
workflow.add_edge("generate_questions", "select_question")
workflow.add_edge("select_question", "ask_question")
workflow.add_edge("ask_question", "grade_response")
workflow.add_edge("grade_response", "update_belief")
workflow.add_edge("update_belief", "generate_questions")
workflow.add_edge("generate_questions", "should_continue")


# Add conditional edges
workflow.add_conditional_edges(
    "should_continue",
    should_continue_node,
    {"continue": "generate_questions", "end": "generate_report"},
)

workflow.add_edge("generate_report", END)