# Intent classification of Langfuse trace data

The traces that your application picks up can provide a lot of insight into how your users are interacting with your service.  However, as interactions grow, manually reviewing this data can become cumbersome.  Intent classification can help you automate labeling and make sense of the trace data.  By accurately identifying intents, you can create labels that help you manage and analyze traces.  This notebook will help you build a basic intent classification pipeline.

By the end of this notebook, you'll have a basic pipeline that will:
1. extract trace data from one of your Langfuse projects
2. train an intent classification model
3. predict the intent of traces
4. upload predicted intent results back to Langfuse

## Setup

You will need an .env file with the following variables for your Langfuse project.
```
LANGFUSE_SECRET_KEY=
LANGFUSE_PUBLIC_KEY=
LANGFUSE_HOST=
```

In [1]:
# Install LangFuse
%pip install --quiet langfuse

Note: you may need to restart the kernel to use updated packages.


In [2]:
# Install Notebook dependencies
%pip install --quiet ipywidgets

Note: you may need to restart the kernel to use updated packages.


In [3]:
# Install dependencies for making a model
%pip install --quiet pandas scikit-learn sentence-transformers torch transformers

Note: you may need to restart the kernel to use updated packages.


In [4]:
# Install python-dotenv to read in secrets
%pip install --quiet python-dotenv

Note: you may need to restart the kernel to use updated packages.


In [5]:
from dotenv import load_dotenv
load_dotenv()

True

## Retrieve LangFuse traces

In [6]:
import os
from langfuse import Langfuse

In [7]:
langfuse = Langfuse(
    secret_key=os.environ.get("LANGFUSE_SECRET_KEY"),
    public_key=os.environ.get("LANGFUSE_PUBLIC_KEY"),
    host=os.environ.get("LANGFUSE_HOST")
)

### Create dummy trace data
If your project is empty, you can run the next two cells to create some simple dummy trace data to use for this notebook.  The remainder of this notebook expects a trace with a "message" key in the input.  You may need to adjust the notebook to your trace data's structure if you use data with another structure.

In [8]:
sample_utterances = [
    "Hello again",
    "Can you do anything else?",
    "Could you recommend a good book?",
    "I'd like to watch a drama",
]

In [9]:
# Create dummy traces
for utterance in sample_utterances:
    input = {
        "message": utterance
    }
    langfuse.trace(
        input=input
    )

### Fetch data from your project

In [10]:
traces = langfuse.fetch_traces()
traces

FetchTracesResponse(data=[TraceWithDetails(id='71b8c019-1d66-4739-a158-4bc87cb3010e', timestamp=datetime.datetime(2024, 10, 4, 14, 36, 14, 697000, tzinfo=datetime.timezone.utc), name=None, input={'message': 'Can you do anything else?'}, output=None, session_id=None, release=None, version=None, user_id=None, metadata=None, tags=[], public=False, html_path='/project/cm1s2r3d600hwlpo4x0iqsweb/traces/71b8c019-1d66-4739-a158-4bc87cb3010e', latency=0.0, total_cost=0.0, observations=[], scores=[], externalId=None, updatedAt='2024-10-04T14:36:15.648Z', createdAt='2024-10-04T14:36:15.648Z', bookmarked=False, projectId='cm1s2r3d600hwlpo4x0iqsweb'), TraceWithDetails(id='a7fec83b-017d-4651-85d5-09d5d4d31310', timestamp=datetime.datetime(2024, 10, 4, 14, 36, 14, 697000, tzinfo=datetime.timezone.utc), name=None, input={'message': 'Hello again'}, output=None, session_id=None, release=None, version=None, user_id=None, metadata=None, tags=[], public=False, html_path='/project/cm1s2r3d600hwlpo4x0iqsweb/

In [11]:
traces_list = []
for trace in traces.data:
    trace_info = [trace.id, trace.input["message"]]
    traces_list.append(trace_info)

In [12]:
import pandas as pd
traces_df = pd.DataFrame(traces_list, columns=["trace_id", "message"])
traces_df

Unnamed: 0,trace_id,message
0,71b8c019-1d66-4739-a158-4bc87cb3010e,Can you do anything else?
1,a7fec83b-017d-4651-85d5-09d5d4d31310,Hello again
2,2a6d3c59-0d76-48f0-aa8e-8701ea799055,I'd like to watch a drama
3,8ed0efea-051f-46e3-8f28-ce1df572faa8,Could you recommend a good book?


## Build and test an intent classifier

In [13]:
import warnings
warnings.filterwarnings("ignore")

In [14]:
import numpy as np
import pandas as pd
from sentence_transformers import SentenceTransformer
from sklearn.base import TransformerMixin, BaseEstimator
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from tqdm.notebook import tqdm

In [15]:
# Note: This is a very small dataset.
# More data will help make the model  more accurate and avoid overfitting.
data = {
    "text": [
        # Greeting utterances
        "hi",
        "hello",
        "howdy",
        "hey there",
        "greetings",
        "Nice to see you",
        "Let's start",
        "begin",
        "good morning",
        "Good afternoon",
        # Menu utterances
        "I want to talk about something else",
        "options",
        "menu, please",
        "Could we chat about another subject",
        "I want to see the menu",
        "switch topics",
        "What else can you do",
        "discuss about something else",
        "Show me the menu",
        "Can we do something else",
        # Restart utterances
        "restart",
        "I'd like to do this again",
        "let me try again",
        "one more time",
        "Can I review that?",
        "check again",
        "redo",
        "again please",
        "that was great, let's start from teh beginning",
        "go back to start",
    ],
    "intent": [
        "greeting",
        "greeting",
        "greeting",
        "greeting",
        "greeting",
        "greeting",
        "greeting",
        "greeting",
        "greeting",
        "greeting",
        "menu",
        "menu",
        "menu",
        "menu",
        "menu",
        "menu",
        "menu",
        "menu",
        "menu",
        "menu",
        "restart",
        "restart",
        "restart",
        "restart",
        "restart",
        "restart",
        "restart",
        "restart",
        "restart",
        "restart",
    ]
}

In [16]:
df = pd.DataFrame(data)
df.head()

Unnamed: 0,text,intent
0,hi,greeting
1,hello,greeting
2,howdy,greeting
3,hey there,greeting
4,greetings,greeting


In [17]:
X_train, X_test, y_train, y_test = train_test_split(
    df["text"],
    df["intent"],
    test_size=0.5,
    random_state=14
)

In [18]:
class Encoder(BaseEstimator, TransformerMixin):
    def __init__(self, **kwargs):
        self.encoding_model = SentenceTransformer("all-mpnet-base-v2")

    def transform(self, X):
        return self.encoding_model.encode(list(X))

    def fit(self, X, y=None):
        return self

In [19]:
pipeline = Pipeline([
    ('encoder', Encoder()),
    ('clf', LogisticRegression(class_weight="balanced")),
])

In [20]:
pipeline.fit(X_train, y_train)

In [21]:
y_pred = pipeline.predict(X_test)
y_pred

array(['greeting', 'menu', 'menu', 'greeting', 'restart', 'greeting',
       'restart', 'restart', 'greeting', 'greeting', 'restart',
       'greeting', 'menu', 'menu', 'restart'], dtype=object)

In [22]:
single_pred = pipeline.predict(["Please let's move on"])
single_pred

array(['restart'], dtype=object)

In [23]:
probas = pipeline.predict_proba(["Please let's move on"])
probas

array([[0.29999479, 0.33983355, 0.36017166]])

In [24]:
confidence_score = float(np.max(probas, axis=1)[0])
confidence_score

0.3601716561862601

In [25]:
print("\nClassification Report:\n", classification_report(y_test, y_pred))


Classification Report:
               precision    recall  f1-score   support

    greeting       0.83      1.00      0.91         5
        menu       1.00      1.00      1.00         4
     restart       1.00      0.83      0.91         6

    accuracy                           0.93        15
   macro avg       0.94      0.94      0.94        15
weighted avg       0.94      0.93      0.93        15



## Run predictions on traces

In [26]:
for index, row in traces_df.iterrows():
    result = pipeline.predict([row["message"]])
    probas = pipeline.predict_proba([row["message"]])
    confidence_score = float(np.max(probas, axis=1)[0])
    
    traces_df.at[index, "label"] = "".join(result)
    traces_df.at[index, "confidence_score"] = confidence_score

## Tag traces with labels

In [27]:
# Note: This will add to existing tags, not add duplicate tags.
for index, row in traces_df.iterrows():
    if row["confidence_score"] > 0.30:
        trace_id = row["trace_id"]
        label = row["label"]
        langfuse.trace(id=trace_id, tags = [label])