# Intent classification of Langfuse trace data

The traces that your application picks up can provide a lot of insight into how your users are interacting with your service.  However, as interactions grow, manually reviewing this data can become cumbersome.  Intent classification can help you automate labeling and make sense of the trace data.  By accurately identifying intents, you can create labels that help you manage and analyze traces.  

You can approach intent classification in two ways - supervised and unsupervised.  In a supervised approach, you provide a model labeled training data.  When making predictions, the model will output one of the pre-defined labels you provided.  In an unsupervised approach, a model would attempt to find clusters within the data.  Afterwards, you could label each group appropriately.  This notebook will help you build a basic intent classification pipeline for each approach.  

By the end of this notebook, you'll have two basic pipelines that will:
1. extract trace data from one of your Langfuse projects
2. train an intent classification model
3. predict the intent of traces (using both supervised and unsupervised approaches)
4. upload predicted intent results back to Langfuse

## Setup

You will need an .env file with the following variables for your Langfuse project.
```
LANGFUSE_SECRET_KEY=
LANGFUSE_PUBLIC_KEY=
LANGFUSE_HOST=
```

In [1]:
import warnings
warnings.filterwarnings("ignore")

In [2]:
# Install Langfuse
%pip install --quiet langfuse

Note: you may need to restart the kernel to use updated packages.


In [3]:
# Install Notebook dependencies
%pip install --quiet ipywidgets

Note: you may need to restart the kernel to use updated packages.


In [4]:
# Install dependencies for making a model
%pip install --quiet pandas scikit-learn sentence-transformers torch transformers

Note: you may need to restart the kernel to use updated packages.


In [5]:
# Install dependencies for unsupervised intent recognition
%pip install --quiet chromadb hdbscan openai

Note: you may need to restart the kernel to use updated packages.


In [6]:
# Install python-dotenv to read in secrets
%pip install --quiet python-dotenv

Note: you may need to restart the kernel to use updated packages.


In [7]:
from sentence_transformers import SentenceTransformer
embedding_model = SentenceTransformer("all-mpnet-base-v2")

In [8]:
from dotenv import load_dotenv
load_dotenv()

True

## Supervised intent classification pipeline

### Retrieve Langfuse traces

In [9]:
import os
from langfuse import Langfuse

In [10]:
langfuse = Langfuse(
    secret_key=os.environ.get("LANGFUSE_SECRET_KEY"),
    public_key=os.environ.get("LANGFUSE_PUBLIC_KEY"),
    host=os.environ.get("LANGFUSE_HOST")
)

#### Create dummy trace data
If your project is empty, you can run the next two cells to create some simple dummy trace data to use for this notebook.  The remainder of this section expects a trace with a "message" key in the input.  You may need to adjust the notebook to your trace data's structure if you use data with another structure.

In [11]:
sample_utterances = [
    "Hello again",
    "Can you do anything else?",
    "Could you recommend a good book?",
    "I'd like to watch a drama",
]

In [12]:
# Create dummy traces
for utterance in sample_utterances:
    input = {
        "message": utterance
    }
    langfuse.trace(
        input=input
    )

#### Fetch data from your project

In [13]:
traces = langfuse.fetch_traces()
traces

FetchTracesResponse(data=[TraceWithDetails(id='9b73a066-5dfa-4005-bc7d-33c810a0b97a', timestamp=datetime.datetime(2024, 10, 6, 12, 20, 7, 344000, tzinfo=datetime.timezone.utc), name=None, input={'message': "I'd like to watch a drama"}, output=None, session_id=None, release=None, version=None, user_id=None, metadata=None, tags=[], public=False, html_path='/project/cm1s2r3d600hwlpo4x0iqsweb/traces/9b73a066-5dfa-4005-bc7d-33c810a0b97a', latency=0.0, total_cost=0.0, observations=[], scores=[], createdAt='2024-10-06T12:20:08.159Z', externalId=None, updatedAt='2024-10-06T12:20:08.159Z', projectId='cm1s2r3d600hwlpo4x0iqsweb', bookmarked=False), TraceWithDetails(id='4ef0e084-8145-465e-a966-7a5700b7827d', timestamp=datetime.datetime(2024, 10, 6, 12, 20, 7, 343000, tzinfo=datetime.timezone.utc), name=None, input={'message': 'Could you recommend a good book?'}, output=None, session_id=None, release=None, version=None, user_id=None, metadata=None, tags=[], public=False, html_path='/project/cm1s2r3

In [14]:
traces_list = []
for trace in traces.data:
    trace_info = [trace.id, trace.input["message"]]
    traces_list.append(trace_info)

In [15]:
import pandas as pd
traces_df = pd.DataFrame(traces_list, columns=["trace_id", "message"])
traces_df

Unnamed: 0,trace_id,message
0,9b73a066-5dfa-4005-bc7d-33c810a0b97a,I'd like to watch a drama
1,4ef0e084-8145-465e-a966-7a5700b7827d,Could you recommend a good book?
2,969a3121-e34e-4bb2-8225-0912dff9a34c,Can you do anything else?
3,3f1b06eb-aeee-4438-9204-9058ae49a5b0,Hello again


### Build and train an intent classification model

In [16]:
import numpy as np
from sklearn.base import TransformerMixin, BaseEstimator
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from tqdm.notebook import tqdm

In [17]:
# Note: This is a very small dataset.
# More data will help make the model  more accurate and avoid overfitting.
sample_data = {
    "text": [
        # Greeting utterances
        "hi",
        "hello",
        "howdy",
        "hey there",
        "greetings",
        "Nice to see you",
        "Let's start",
        "begin",
        "good morning",
        "Good afternoon",
        # Menu utterances
        "I want to talk about something else",
        "options",
        "menu, please",
        "Could we chat about another subject",
        "I want to see the menu",
        "switch topics",
        "What else can you do",
        "discuss about something else",
        "Show me the menu",
        "Can we do something else",
        # Restart utterances
        "restart",
        "I'd like to do this again",
        "let me try again",
        "one more time",
        "Can I review that?",
        "check again",
        "redo",
        "again please",
        "that was great, let's start from teh beginning",
        "go back to start",
    ],
    "intent": [
        "greeting",
        "greeting",
        "greeting",
        "greeting",
        "greeting",
        "greeting",
        "greeting",
        "greeting",
        "greeting",
        "greeting",
        "menu",
        "menu",
        "menu",
        "menu",
        "menu",
        "menu",
        "menu",
        "menu",
        "menu",
        "menu",
        "restart",
        "restart",
        "restart",
        "restart",
        "restart",
        "restart",
        "restart",
        "restart",
        "restart",
        "restart",
    ]
}

In [18]:
df = pd.DataFrame(sample_data)
df.head()

Unnamed: 0,text,intent
0,hi,greeting
1,hello,greeting
2,howdy,greeting
3,hey there,greeting
4,greetings,greeting


In [19]:
X_train, X_test, y_train, y_test = train_test_split(
    df["text"],
    df["intent"],
    test_size=0.5,
    random_state=14
)

In [20]:
class Encoder(BaseEstimator, TransformerMixin):
    def __init__(self):
        self.embedding_model = embedding_model

    def transform(self, X):
        return self.embedding_model.encode(list(X))

    def fit(self, X, y=None):
        return self

In [22]:
pipeline = Pipeline([
    ('encoder', Encoder()),
    ('clf', LogisticRegression()),
])

In [23]:
pipeline.fit(X_train, y_train)

In [24]:
y_pred = pipeline.predict(X_test)
y_pred

array(['greeting', 'menu', 'menu', 'greeting', 'restart', 'greeting',
       'restart', 'menu', 'greeting', 'greeting', 'restart', 'greeting',
       'menu', 'menu', 'menu'], dtype=object)

In [25]:
single_pred = pipeline.predict(["Please let's move on"])
single_pred

array(['menu'], dtype=object)

In [26]:
probas = pipeline.predict_proba(["Please let's move on"])
probas

array([[0.30275491, 0.39219676, 0.30504833]])

In [27]:
confidence_score = float(np.max(probas, axis=1)[0])
confidence_score

0.39219676403837844

In [28]:
print("\nClassification Report:\n", classification_report(y_test, y_pred))


Classification Report:
               precision    recall  f1-score   support

    greeting       0.83      1.00      0.91         5
        menu       0.67      1.00      0.80         4
     restart       1.00      0.50      0.67         6

    accuracy                           0.80        15
   macro avg       0.83      0.83      0.79        15
weighted avg       0.86      0.80      0.78        15



### Run predictions on traces

In [29]:
for index, row in traces_df.iterrows():
    result = pipeline.predict([row["message"]])
    probas = pipeline.predict_proba([row["message"]])
    confidence_score = float(np.max(probas, axis=1)[0])
    
    traces_df.at[index, "label"] = "".join(result)
    traces_df.at[index, "confidence_score"] = confidence_score

### Tag traces with labels

In [30]:
# Note: This will add to existing tags, not add duplicate tags.
for index, row in traces_df.iterrows():
    if row["confidence_score"] > 0.30:
        trace_id = row["trace_id"]
        label = row["label"]
        langfuse.trace(id=trace_id, tags = [label])

## Unsupervised intent classification pipeline

### Upload sample utterances to Langfuse and retrieve them
As before, you can run the next few cells to create some simple dummy trace data based on the training data from the previous section.  The remainder of this section expects a trace with a "message" key in the input.  You may need to adjust the notebook to your trace data's structure if you use data with another structure.

In [42]:
# This uses the training data for the intent classification model earlier
sample_utterances_2 = sample_data["text"]

In [32]:
# Create dummy traces
for utterance in sample_utterances_2:
    input = {
        "message": utterance
    }
    langfuse.trace(
        input=input
    )

In [33]:
traces = langfuse.fetch_traces()

In [34]:
traces_list = []
for trace in traces.data:
    trace_info = [trace.id, trace.input["message"]]
    traces_list.append(trace_info)

In [43]:
cluster_traces_df = pd.DataFrame(traces_list, columns=["trace_id", "message"])

In [44]:
# Exclude the four traces from before
cluster_traces_df = cluster_traces_df[~cluster_traces_df["trace_id"].isin(traces_df["trace_id"])]
cluster_traces_df[5:11]

Unnamed: 0,trace_id,message
5,da6beb43-e166-4eda-93e6-a24827cfff78,Can I review that?
6,2d7ec4e4-5719-4ba5-913b-a8cc9277b947,one more time
7,89857e58-cd81-40d9-9b7d-a1855142b709,I want to see the menu
8,eb72ac35-4ef2-4a72-af3b-9b606175d3a4,Could we chat about another subject
9,9cf13b34-e89e-43d5-ba5d-7ef8cc8cb4c5,"menu, please"
10,f94d4b68-1c1c-4ae3-9343-b5065bd601a4,let me try again


### Embed the utterances

In [45]:
embeddings = embedding_model.encode(sample_utterances_2)

### Save embeddings to a vector database
For this notebook, the embeddings in the `embeddings` variable are all you need.  However, saving embeddings and metadata to a vector database offers a more robust pipeline.  You can reuse the embeddings later in subsequent labeling sessions to recategorize intents or adjust labels.

In [46]:
import chromadb

In [47]:
# Note: use PersistentClient() to save to and load from your local machine
client = chromadb.Client()

In [48]:
collection = client.create_collection("traces")

In [49]:
for i, (utterance, embedding, trace) in enumerate(zip(sample_utterances_2, embeddings, traces_list)):
    collection.add(
        documents=[utterance],
        embeddings=[embedding],
        metadatas=[{"trace_id":trace[0]}],
        ids=[str(i)]
    )

In [50]:
sample_chroma_records = collection.get(limit=3, include=["embeddings", "documents", "metadatas"])
sample_chroma_records

{'ids': ['0', '1', '2'],
 'embeddings': array([[ 0.03394877, -0.00561435, -0.00121842, ...,  0.00074777,
         -0.03420043,  0.01577002],
        [ 0.03063983, -0.00623007, -0.00212156, ...,  0.03398374,
         -0.01675461,  0.00519884],
        [ 0.0427538 ,  0.03953217,  0.00019924, ...,  0.0452202 ,
         -0.03691882,  0.01709924]]),
 'metadatas': [{'trace_id': '049ca5be-0726-4667-bb0c-97c788bd3a8d'},
  {'trace_id': '16756ba4-7b4c-4f7b-a537-9b6ae9d6bf82'},
  {'trace_id': '4f99dd4f-2ac6-4871-9e3b-71874bed6f3c'}],
 'documents': ['hi', 'hello', 'howdy'],
 'uris': None,
 'data': None,
 'included': ['embeddings', 'documents', 'metadatas']}

In [51]:
chroma_embeddings = collection.get(include=["embeddings"])

In [52]:
embeddings = chroma_embeddings["embeddings"]

### Find clusters among utterances

In [53]:
import hdbscan

In [54]:
clusterer = hdbscan.HDBSCAN(min_cluster_size=2)

In [55]:
cluster_labels = clusterer.fit_predict(embeddings)
cluster_labels

array([ 0,  0,  0,  0,  0,  0,  2,  2,  0,  0,  1, -1,  3,  1,  3,  1, -1,
        1,  3,  1,  2,  2,  2,  2, -1, -1,  2,  2,  2,  2])

In [56]:
clustered_utterances = {}
for idx, label in enumerate(cluster_labels):
    if label == -1:
        continue
    if label not in clustered_utterances:
        clustered_utterances[label] = []
    clustered_utterances[label].append(utterances[idx])

In [57]:
for index, row in cluster_traces_df.iterrows():
    clusters = list(cluster_labels)
    cluster_id = clusters[index]
    cluster_traces_df.at[index,'cluster'] = cluster_id
cluster_traces_df["cluster"] = cluster_traces_df["cluster"].astype(int)

In [58]:
cluster_traces_df[5:11]

Unnamed: 0,trace_id,message,cluster
5,da6beb43-e166-4eda-93e6-a24827cfff78,Can I review that?,0
6,2d7ec4e4-5719-4ba5-913b-a8cc9277b947,one more time,2
7,89857e58-cd81-40d9-9b7d-a1855142b709,I want to see the menu,2
8,eb72ac35-4ef2-4a72-af3b-9b606175d3a4,Could we chat about another subject,0
9,9cf13b34-e89e-43d5-ba5d-7ef8cc8cb4c5,"menu, please",0
10,f94d4b68-1c1c-4ae3-9343-b5065bd601a4,let me try again,1


### Generate and assign label to cluster

In [59]:
import openai

In [60]:
openai.api_key = os.environ.get("OPENAI_API_KEY")

In [61]:
# Note: Depending on the volume of data you are running, 
# you may want to limit the number of utterances representing each group (ex. utterances_group[:5])

def generate_label(utterances_group):
    prompt = f"""
        # Task
        Your goal is to assign an intent label that most accurately fits the given group of utterances.
        You will only provide a single label, no explanation.  The label should be snake cased.

        ## Example utterances
        so long
        bye

        ## Example labels
        goodbye
        end_conversation        
        
        Utterances: {utterances_group}
        Label:
    """
    response = openai.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {
                "role": "user",
                "content": prompt
            }
        ],
        max_tokens=50
    )
    return response.choices[0].message.content.strip()

In [62]:
cluster_labels_map = {}
for cluster_id, utterances_group in clustered_utterances.items():
    label = generate_label(utterances_group)
    cluster_labels_map[cluster_id] = label
    print(f"Cluster {cluster_id}: {label}")

Cluster 0: greet
Cluster 2: start_conversation
Cluster 1: switch_topic
Cluster 3: view_menu


In [63]:
for index, row in cluster_traces_df.iterrows():
    cluster_id = row["cluster"]
    if cluster_id != -1:
        intent_label = cluster_labels_map[cluster_id]
        cluster_traces_df.at[index, "label"] = intent_label

In [64]:
cluster_traces_df

Unnamed: 0,trace_id,message,cluster,label
0,049ca5be-0726-4667-bb0c-97c788bd3a8d,go back to start,0,greet
1,16756ba4-7b4c-4f7b-a537-9b6ae9d6bf82,"that was great, let's start from teh beginning",0,greet
2,4f99dd4f-2ac6-4871-9e3b-71874bed6f3c,again please,0,greet
3,e1081798-6ffc-4a7c-ac9e-4b3716d2fb80,redo,0,greet
4,3249d6c1-9ed3-42ab-9080-109f89840ffd,check again,0,greet
5,da6beb43-e166-4eda-93e6-a24827cfff78,Can I review that?,0,greet
6,2d7ec4e4-5719-4ba5-913b-a8cc9277b947,one more time,2,start_conversation
7,89857e58-cd81-40d9-9b7d-a1855142b709,I want to see the menu,2,start_conversation
8,eb72ac35-4ef2-4a72-af3b-9b606175d3a4,Could we chat about another subject,0,greet
9,9cf13b34-e89e-43d5-ba5d-7ef8cc8cb4c5,"menu, please",0,greet


### Update vector database with label metadata

In [65]:
for i, label in enumerate(cluster_labels):
    if label != -1:
        cluster_label = cluster_labels_map[label]
        collection.update(ids=[str(i)], metadatas={"label": cluster_label})

In [66]:
all_records = collection.get(limit=3, include=["embeddings", "metadatas"])
all_records

{'ids': ['0', '1', '2'],
 'embeddings': array([[ 0.03394877, -0.00561435, -0.00121842, ...,  0.00074777,
         -0.03420043,  0.01577002],
        [ 0.03063983, -0.00623007, -0.00212156, ...,  0.03398374,
         -0.01675461,  0.00519884],
        [ 0.0427538 ,  0.03953217,  0.00019924, ...,  0.0452202 ,
         -0.03691882,  0.01709924]]),
 'metadatas': [{'label': 'greet',
   'trace_id': '049ca5be-0726-4667-bb0c-97c788bd3a8d'},
  {'label': 'greet', 'trace_id': '16756ba4-7b4c-4f7b-a537-9b6ae9d6bf82'},
  {'label': 'greet', 'trace_id': '4f99dd4f-2ac6-4871-9e3b-71874bed6f3c'}],
 'documents': None,
 'uris': None,
 'data': None,
 'included': ['embeddings', 'metadatas']}

### Tag traces with labels

In [67]:
# Note: This will add to existing tags, not add duplicate tags.
for index, row in cluster_traces_df.iterrows():
    if row["cluster"] != -1:
        trace_id = row["trace_id"]
        label = row["label"]
        langfuse.trace(id=trace_id, tags = [label])

## Conclusion

Each approach has its pros and cons.  

The supervised approach requires a lot of effort upfront to prepare a labelled dataset of an appropriate size.  During inference, it will only be able to assign labels that it was trained on, so it will not handle new cases well.  However, the inference will be consistent.

The unsupervised approach offers more flexibility in working with unlabeled data.  It can output a variety of new labels you didn't define beforehand.  However, the labels may not be consistent between runs (ex., 'hello', 'greeting', or 'start_conversation').  Additionally, the clusters may be more or less permissive than if you had labelled the data.

Combining both approaches may be ideal.  Unsupervised intent classification can help you quickly get an overview of a large volume of data, helping you with initial exploratory analysis.  As you understand your trace data better and get more samples, you may benefit from running the supervised model on your data using the intent labels you most care about.  Or, you may want to use the embedded data stored in the vector database to run similarity searches and reuse the labels from previous runs on new instances!