In [3]:
%load_ext autoreload
%autoreload 2

# Week 4 - Systematically Improving Your Rag Application

> **Prerequisites**: Please make sure that you've completed the notebooks - [1. Generate Dataset](1.%20Generate%20Dataset.ipynb) and [2. Topic Modelling](2.%20Topic%20Modelling.ipynb) before proceeding so that you understand the dataset, topics we're working with and why we're doing this.

In this notebook, we'll build a classifier that takes the topics that we've identified in the previous notebook and use it to classify new queries into the topics that we've identified. This helps us track performance over time and alert us when specific query types need attention.

## Why this matters

By having a system that can automatically classify new queries as they come in, we can monitor the distribution of queries over time and determine if a topic is relevant to our overall RAG application. 

## What you'll learn

Through hands-on coding, you'll discover how to:

1. Define Query Categories
- Convert topic insights into clear categories
- Structure classifications in YAML format
- Make categories easy for teams to update

2. Build a Classifier
- Use LLMs for accurate query classification
- Validate results against defined categories
- Handle edge cases and ambiguous queries

3. Monitor Performance
- Track classification accuracy over time
- Identify problematic query types
- Use metrics to guide improvements

By the end of this notebook, you'll have a working classifier that can automatically tag incoming queries. This builds on our topic modeling work while preparing us for more advanced query handling in Week 5.

## Defining Query Categories

In the previous notebook, we used topic modelling to generate topics from our dataset. Through that we realised that there were a significant number of queries related to refunds that we were performing poorly on. 

We also learnt that we could break down user intent into these 3 distinct categories:

1. Payment Method Changes: Users frequently need to switch payment methods after making a purchase. This highlights a need for clearer instructions on how to update payment information before due dates, as many express confusion about locating this option in the app.
2. Early Payoff Requests: Many users want to pay their balance in full rather than continuing with installments. This reflects a desire for flexibility in payment schedules and suggests that the option to pay early may not be prominently displayed or intuitive to find.
3. Payment Due Date Concerns: Several queries express anxiety about upcoming payment deadlines and potential consequences. This indicates an opportunity to improve notification systems and provide clearer information about payment timing.

We want to take these insights and build out a classifier that can assign these four types of queries to new queries so that we can monitor and improve our performance on them over time. We can do so using a simple `.yaml` file that contains the categories we've identified over time.

We've defined a few categories in our `categories.yml` file that we'll use to classify our queries. This makes it easy to add new categories in the future and allow other members of the team to add new categories as they see fit.

### Using pre-defined categories to classify queries

We'll start by loading in our categories from the `categories.yml` file. We'll use instructor and pydantic here to handle the classification.


In [1]:
import yaml
from pydantic import BaseModel, ValidationInfo, field_validator


class QuestionType(BaseModel):
    types: list[str]

    @field_validator("types")
    def validate_categories(cls, v: list[str], info: ValidationInfo):
        context = info.context
        if not context:
            raise ValueError("No context provided")
        question_types = context.get("question_types")
        if not question_types:
            raise ValueError("No question types provided")

        question_type_names = [
            question_type["title"] for question_type in question_types
        ]

        for question_type in v:
            if question_type not in question_type_names:
                raise ValueError(
                    f"Question type {question_type} not found in original list of question types"
                )
        return v


yaml_config = yaml.load(open("categories.yml", "r"), Loader=yaml.FullLoader)

question_types = []
for question_type_category in yaml_config["question_type"]:
    question_types.extend(yaml_config["question_type"][question_type_category])

question_types[:4]

[{'title': 'Payment Method Changes',
  'description': 'User needs to switch payment methods after making a purchase'},
 {'title': 'Payment Options',
  'description': 'User is asking about available payment methods or how to set them up'},
 {'title': 'Early Payoff',
  'description': 'User wants to pay their balance in full rather than continuing with installments'},
 {'title': 'Login Issues',
  'description': 'User is having trouble logging into their account'}]

We want to then load in these categories into our prompt. `instructor` allows us to use the same variable in our template formatting and our validation, making it easy to validate that our generated categories are valid.

In [8]:
import openai
import instructor
from rich import print


client = instructor.from_openai(openai.OpenAI())


def classify_query(query: str) -> QuestionType:
    return client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {
                "role": "system",
                "content": """
            Breakdown the folowing query into the relevant question types. Select all that apply to the query itself.
            
            Question Types:
            {% for question_type in question_types %}
            - {{ question_type.title }} : {{ question_type.description }}
            {% endfor %}
            
            Make sure to only return the categories that are in the list of categories.
            """,
            },
            {"role": "user", "content": query},
        ],
        response_model=QuestionType,
        context={"question_types": question_types},
    )


queries = [
    "Can I change my payment method from credit card to debit card for my recent purchase? Is there a difference between the two in terms of fees?",
    "I can't seem to login into the app for some reason but I want to pay off my entire balance now instead of making monthly payments. How do I do that? ",
    "Does Klarna cover the cost of returning my order? I'm out of state and I need to ship it to New York",
    "How many litres of milk does a cow produce in a day?",
]

for query in queries:
    print(classify_query(query))

## Conclusion

In Week 4, we built a complete workflow for understanding and improving how we handle user queries:

1. We created diverse synthetic queries by varying personas, intents, and question types
2. We used topic modeling to discover that refund-related queries were a major pain point
3. We built a classifier that can automatically tag incoming queries based on these insights

This systematic approach allows us to spot new query patterns as they emerge, track satisfaction scores and volume for each query type, and make sure that we're improving our system over time. This sets us up well for week 5 where we'll add structured metadata to our documents. For example, when we see lots of queries around payment methods, we might add specific methods for people to make that change automatically. 

The key insight from Week 4 is that we can't improve what we don't measure. By building systems to track and categorize queries, we create a data-driven foundation for systematically improving our RAG application over time.