# DSPy example about modules and working with log data

The goal of the notebook is to give a brief introduction to the [DSPy](https://dspy.ai) framework.

If you don't have `dspy` installed, either run `!pip install dspy` in the cell below in your chosen environment or create a new one using, e.g., conda:
```
conda create -n dspy-example python=3.9
conda activate dspy-example
conda install ipykernel
pip install dspy pandas
python -m ipykernel install --user --name=dspy-example --display-name "dspy-example"
```



In [88]:
!pip install -q dspy

In [48]:
import os
import pandas as pd
import dspy
from typing import Literal
from dotenv import load_dotenv
from pathlib import Path

## Setting up the pipeline

We will be using [OpenRouter](https://openrouter.ai/) to communicate with different LLMs. [This](https://models.litellm.ai/) link provides a list of all models that are usable via OpenRouter API (do `CTRL+F` "openrouter" for example). By clicking a certain model in this list you can see certain specs of the models, e.g., context length and price per token processed/generated.

Let's get started by setting out OpenRouter API key into `.env` file under `OPENROUTER_API_KEY`.

In [2]:
# Access the API key
load_dotenv()
api_key = os.getenv("OPENROUTER_API_KEY")
assert api_key != None

A good way to get started, debug and try out the DSPy is to use a free model, so let's choose meta's 8B parameter Llama model.

N.B.: Although these models are free to use, the number of API calls per minute are limited [as follows](https://openrouter.ai/docs/api-reference/limits).

In [3]:
# Set model name
model2 = "openrouter/meta-llama/llama-3-8b-instruct:free"
model = "openrouter/meta-llama/llama-3-8b-instruct:extended"

Next, we need to setup this language model to be part of our pipeline.

In [4]:
lm = dspy.LM(model, api_key=api_key)

We can see if the pipeline works by prompting our pipeline.

In [5]:
lm("hi", temperature=0.7)


["Hi! It's nice to meet you. Is there something I can help you with or would you like to chat?"]

To make sure that our LLM completion was indeed free, we may check the price of the previous calls:

In [6]:
cost = [x['cost'] for x in lm.history if x['cost']]
print(f"the cumulated total cost so far: {sum(cost)} dollars")

the cumulated total cost so far: 0 dollars


## Signatures

DSPy is an acronym of `Declarative Self-improving Python`, and the `Declarative` part comes from the fact that we only specify the expected input-output behaviour/type of a prompt and it's completion via a `Signature` (see [here](https://dspy.ai/learn/programming/signatures/)).

A signature can either be declared inline via `input: [A] -> output: [B]` or explicitly as an inherited class
```
class Emotion(dspy.Signature):
    """The docstring in DSPy signatures actually matter. """
    input: [A] = dspy.InputField()
    output: [B] = dspy.OutputField()
```
Note that the docstring is actually passed on to the actual LLM as part of the prompt and is therefore meaningful.

## Modules

The basic building block to DSPy are [modules](https://dspy.ai/learn/programming/modules/). Essentially, they abstract away common prompt engineering techniques to happen under the hood. 

We can use a list of ready-made modules like `dspy.Predict` and `dspy.ChainOfThought` or declare our own.

To make things more concrete, let's see an example. 

Assume we wanted to do a binary classification on toxicity.

Let us first do this using the inline syntax.

In [7]:
# 1) Configure lm to be part of our pipeline
dspy.configure(lm=lm)

# 2) Sentence to classify
statement = "i hate sundays"

# 3) In-line declarement of expected input/output-types
classification = dspy.Predict('sentence: str -> toxic: int')

# 4) Call LLM
response = classification(sentence=statement)

# 5) See response
print(response.toxic)

0


We could have also done this explicitly as a class, and given toxicity a labeled interpretation instead of 0/1-values:

In [8]:
class ToxicityClassifier(dspy.Signature):
    """Classify toxicity."""
    sentence: str = dspy.InputField()
    sentiment: Literal['toxic', 'not toxic'] = dspy.OutputField()

classification = dspy.Predict(ToxicityClassifier)
response = classification(sentence=statement)
print(response.sentiment)


toxic


What about a little more unambiguous statement?

In [9]:
statement = "What led you to that conclusion?"
response = classification(sentence=statement)
print(response.sentiment)

toxic


Here the classification seems to fail.

In the above, the `Predict` is a "basic predictor" as a DSPy `Module` instance. For more complex tasks, one wants to use the `ChainOfThought` module. To get a better understanding what's happening under the hood, see [here](https://github.com/stanfordnlp/dspy/blob/main/dspy/predict/chain_of_thought.py).

Let's try how a `ChainOfThought` module performs with the latter toxicity classfication task.

In [18]:
classification_cot = dspy.ChainOfThought(ToxicityClassifier)
response_cot = classification_cot(sentence=statement)
print(response_cot.sentiment)

not toxic


With the `ChainOfThought` module may also study all the responses and their reasonings, i.e., why the output was generated.

In [19]:
# 3) Access the outputs.
print(f"the sentence '{statement}' is {response_cot.sentiment} because: {response_cot.reasoning}")

the sentence 'What led you to that conclusion?' is not toxic because: The sentence is a question and does not contain any offensive language, so it is not toxic.


### Moving onto real data

Let's see if we can do classification on actual data. The following data examples are taken from [loghub's GitHub repo](https://github.com/logpai/loghub/tree/master).

In [20]:
data_dir = Path("./data")

file_names = [
    "Zookeeper_2k.log_structured.csv",
    "Thunderbird_2k.log_structured.csv",
    "Spark_2k.log_structured.csv",
    "Hadoop_2k.log_structured.csv",
    "Apache_2k.log_structured.csv"
]

for file_name in file_names:
    file_path = data_dir / file_name
    print(f"First 3 rows of {file_name}:")
    df = pd.read_csv(file_path)
    display(df.head(3))
    print("\n")

First 3 rows of Zookeeper_2k.log_structured.csv:


Unnamed: 0,LineId,Date,Time,Level,Node,Component,Id,Content,EventId,EventTemplate
0,1,2015-07-29,"17:41:44,747",INFO,QuorumPeer[myid=1]/0,0:0:0:0:0:0:0:2181:FastLeaderElection,774,Notification time out: 3200,E31,Notification time out: <*>
1,2,2015-07-29,"19:04:12,394",INFO,/10.10.34.11,3888:QuorumCnxManager$Listener,493,Received connection request /10.10.34.11:45307,E40,Received connection request /<*>:<*>
2,3,2015-07-29,"19:04:29,071",WARN,SendWorker,188978561024:QuorumCnxManager$SendWorker,688,Send worker leaving thread,E42,Send worker leaving thread




First 3 rows of Thunderbird_2k.log_structured.csv:


Unnamed: 0,LineId,Label,Timestamp,Date,User,Month,Day,Time,Location,Component,PID,Content,EventId,EventTemplate
0,1,-,1131566461,2005.11.09,dn228,Nov,9,12:01:01,dn228/dn228,crond(pam_unix),2915.0,session closed for user root,E117,session closed for user root
1,2,-,1131566461,2005.11.09,dn228,Nov,9,12:01:01,dn228/dn228,crond(pam_unix),2915.0,session opened for user root by (uid=0),E118,session opened for user root by (uid=0)
2,3,-,1131566461,2005.11.09,dn228,Nov,9,12:01:01,dn228/dn228,crond,2916.0,(root) CMD (run-parts /etc/cron.hourly),E3,(root) CMD (run-parts /etc/cron.hourly)




First 3 rows of Spark_2k.log_structured.csv:


Unnamed: 0,LineId,Date,Time,Level,Component,Content,EventId,EventTemplate
0,1,17/06/09,20:10:40,INFO,executor.CoarseGrainedExecutorBackend,"Registered signal handlers for [TERM, HUP, INT]",E22,"Registered signal handlers for [TERM, HUP, INT]"
1,2,17/06/09,20:10:40,INFO,spark.SecurityManager,"Changing view acls to: yarn,curi",E5,Changing view acls to: <*>
2,3,17/06/09,20:10:40,INFO,spark.SecurityManager,"Changing modify acls to: yarn,curi",E4,Changing modify acls to: <*>




First 3 rows of Hadoop_2k.log_structured.csv:


Unnamed: 0,LineId,Date,Time,Level,Process,Component,Content,EventId,EventTemplate
0,1,2015-10-18,"18:01:47,978",INFO,main,org.apache.hadoop.mapreduce.v2.app.MRAppMaster,Created MRAppMaster for application appattempt...,E29,Created MRAppMaster for application appattempt...
1,2,2015-10-18,"18:01:48,963",INFO,main,org.apache.hadoop.mapreduce.v2.app.MRAppMaster,Executing with tokens:,E42,Executing with tokens:
2,3,2015-10-18,"18:01:48,963",INFO,main,org.apache.hadoop.mapreduce.v2.app.MRAppMaster,"Kind: YARN_AM_RM_TOKEN, Service: , Ident: (app...",E61,"Kind: YARN_AM_RM_TOKEN, Service: , Ident: (app..."




First 3 rows of Apache_2k.log_structured.csv:


Unnamed: 0,LineId,Time,Level,Content,EventId,EventTemplate
0,1,Sun Dec 04 04:47:44 2005,notice,workerEnv.init() ok /etc/httpd/conf/workers2.p...,E2,workerEnv.init() ok <*>
1,2,Sun Dec 04 04:47:44 2005,error,mod_jk child workerEnv in error state 6,E3,mod_jk child workerEnv in error state <*>
2,3,Sun Dec 04 04:51:08 2005,notice,jk2_init() Found child 6725 in scoreboard slot 10,E1,jk2_init() Found child <*> in scoreboard slot <*>






For simplicity, assume that our task is to simply label where the data is coming from based on the `Content` field of each row.

Let's define a custom `Signature` towards this purpose:

In [30]:
class DataClassifier(dspy.Signature):
    """Label the data source based on the content. """
    content: str = dspy.InputField()
    label: Literal['apache', 'hadoop', 'spark', 'thunderbird', 'zookeeper'] = dspy.OutputField()

Next, let's read some random data rows from the log files as DSPy `Example` objects.

In [31]:
file_label_mapping = {
    "Zookeeper_2k.log_structured.csv": "zookeeper",
    "Thunderbird_2k.log_structured.csv": "thunderbird",
    "Spark_2k.log_structured.csv": "spark",
    "Hadoop_2k.log_structured.csv": "hadoop",
    "Apache_2k.log_structured.csv": "apache"
}

def create_random_examples(n_examples=3):
    examples = []

    for file_name, label in file_label_mapping.items():
        file_path = data_dir / file_name
        df = pd.read_csv(file_path)
        
        # Sample 3 random rows and extract the "Content" field
        sampled_rows = df.sample(n=n_examples, random_state=42)['Content'].tolist()
        
        # Create Example objects
        for content in sampled_rows:
            example = dspy.Example(content=content, label=label).with_inputs("content", "label")
            examples.append(example)
    return examples

examples = create_random_examples()
# Print the generated Example objects
for ex in examples:
    print(ex)

Example({'content': 'Interrupting SendWorker', 'label': 'zookeeper'}) (input_keys={'content', 'label'})
Example({'content': 'Received connection request /10.10.34.11:48609', 'label': 'zookeeper'}) (input_keys={'content', 'label'})
Example({'content': 'Connection request from old client /10.10.34.11:58424; will be dropped if server is in r-o mode', 'label': 'zookeeper'}) (input_keys={'content', 'label'})
Example({'content': 'synchronized to 10.100.18.250, stratum 3', 'label': 'thunderbird'}) (input_keys={'content', 'label'})
Example({'content': 'data_thread() got not answer from any [Thunderbird_A8] datasource', 'label': 'thunderbird'}) (input_keys={'content', 'label'})
Example({'content': 'probe new device 0x1028:0x0013:0x1028:0x016c: bus 2:slot 14:func 0', 'label': 'thunderbird'}) (input_keys={'content', 'label'})
Example({'content': 'Finished task 0.0 in stage 29.0 (TID 1320). 2128 bytes result sent to driver', 'label': 'spark'}) (input_keys={'content', 'label'})
Example({'content': 

Let's see how a `ChainOfThought` module does in this task:

In [32]:
dataclassifier_cot = dspy.ChainOfThought(DataClassifier)

# Initialize counters
total_examples = len(examples)
correct_classifications = 0

# Process each example and query the classifier
for example in examples:
    response_cot = dataclassifier_cot(content=example.content)
    predicted_label = response_cot.label

    # Check correctness
    correct = predicted_label == example.label
    correct_classifications += correct

    # Print reasoning
    print(f"Content: {example.content}")
    print(f"Predicted: {predicted_label}, Actual: {example.label}")
    print(f"Reasoning: {response_cot.reasoning}")
    print(f"Correct: {correct}\n")

# Compute and display success rate
success_rate = correct_classifications / total_examples * 100
print(f"Success Rate: {success_rate:.2f}%")

Content: Interrupting SendWorker
Predicted: apache, Actual: zookeeper
Reasoning: The content appears to be related to a network or distributed system, mentioning a "SendWorker" which is a common concept in distributed systems.
Correct: False

Content: Received connection request /10.10.34.11:48609
Predicted: zookeeper, Actual: zookeeper
Reasoning: The connection request is coming from a client with IP address 10.10.34.11, which is a common IP address range used by Apache ZooKeeper servers.
Correct: True

Content: Connection request from old client /10.10.34.11:58424; will be dropped if server is in r-o mode
Predicted: apache, Actual: zookeeper
Reasoning: This content appears to be related to a connection request and mentions a server, which is typical of distributed systems and network communication. The mention of "r-o mode" suggests that the server may be in read-only mode, which is a common configuration in distributed systems.
Correct: False

Content: synchronized to 10.100.18.250,

OK, with this we reaches a success rate of roughly 67%. To enhance the classification accuracy, let's `Optimize` our LLM using the concept of [DSPy optimizers](https://dspy.ai/learn/optimization/optimizers/).

To do this, we first need an [Evaluation metric](https://dspy.ai/learn/evaluation/metrics/). For a classifcation task, this is relatively simple: we do a 0-1 loss based on correct labeling, which may be defined as a lambda function:


In [33]:
metric = (lambda x, y, trace=None: x.label == y.label)

Next, we need to choose an optimizer to use from the [list of DSPy optimizers](https://dspy.ai/learn/optimization/optimizers/). As mentioned in DSPy's website, there isn't a single heuristic to choose the correct optimizer to use:
```
Ultimately, finding the ‘right’ optimizer to use & the best configuration for your task will require experimentation. 
Success in DSPy is still an iterative process - getting the best performance on your task will require you to explore and iterate.
```
Let's try `MIPROV2`:

In [34]:
trainset = create_random_examples(n_examples=5)
devset = create_random_examples(n_examples=3)

converted_trainset = [ex.with_inputs("content") for ex in trainset]
converted_devset   = [ex.with_inputs("content") for ex in devset]

class CoT(dspy.Module):
    def __init__(self):
        super().__init__()
        self.prog = dataclassifier_cot

    def forward(self, content):
        return self.prog(content=content)

evaluate = dspy.Evaluate(devset=converted_devset, metric=metric, num_threads=8, display_progress=True, display_table=False)
program = CoT()
evaluate(program, devset=converted_devset)

Average Metric: 10.00 / 15 (66.7%): 100%|██████████| 15/15 [00:00<00:00, 807.62it/s]2025/02/09 03:46:51 INFO dspy.evaluate.evaluate: Average Metric: 10 / 15 (66.7%)



66.67

In [39]:
teleprompter = dspy.teleprompt.MIPROv2(
    metric=metric,
    auto="light",
)

# Optimize program
print(f"Optimizing program with MIPRO...")

optimized_program = teleprompter.compile(
    program.deepcopy(),
    trainset=converted_trainset,
    max_bootstrapped_demos=3,
    max_labeled_demos=4,
    requires_permission_to_run=False,
)

2025/02/09 03:49:24 INFO dspy.teleprompt.mipro_optimizer_v2: 
RUNNING WITH THE FOLLOWING LIGHT AUTO RUN SETTINGS:
num_trials: 7
minibatch: False
num_candidates: 5
valset size: 20

2025/02/09 03:49:24 INFO dspy.teleprompt.mipro_optimizer_v2: 
==> STEP 1: BOOTSTRAP FEWSHOT EXAMPLES <==
2025/02/09 03:49:24 INFO dspy.teleprompt.mipro_optimizer_v2: These will be used as few-shot example candidates for our program and for creating instructions.

2025/02/09 03:49:24 INFO dspy.teleprompt.mipro_optimizer_v2: Bootstrapping N=5 sets of demonstrations...
Optimizing program with MIPRO...
Bootstrapping set 1/5
Bootstrapping set 2/5
Bootstrapping set 3/5

 60%|██████    | 3/5 [00:00<00:00, 226.54it/s]
Bootstrapped 3 full traces after 3 examples for up to 1 rounds, amounting to 3 attempts.
Bootstrapping set 4/5

 20%|██        | 1/5 [00:00<00:00, 259.66it/s]
Bootstrapped 1 full traces after 1 examples for up to 1 rounds, amounting to 1 attempts.
Bootstrapping set 5/5

 60%|██████    | 3/5 [00:00<00:00

In [41]:
print(f"Evaluate optimized program...")
evaluate(optimized_program, devset=converted_devset)

Evaluate optimized program...
Average Metric: 12.00 / 15 (80.0%): 100%|██████████| 15/15 [00:00<00:00, 880.70it/s]2025/02/09 03:51:35 INFO dspy.evaluate.evaluate: Average Metric: 12 / 15 (80.0%)



80.0

Seems like our success rate increased to 80% as a result of optimization. Let's save our model as a `json` file.

In [43]:
optimized_program.save(f"mipro_optimized.json")

We can take a look at the final prompt using `lm.inspect_history`:

In [47]:
print(lm.inspect_history(n=1))





[34m[2025-02-09T03:51:35.793939][0m

[31mSystem message:[0m

Your input fields are:
1. `content` (str)

Your output fields are:
1. `reasoning` (str)
2. `label` (typing.Literal['apache', 'hadoop', 'spark', 'thunderbird', 'zookeeper'])

All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## content ## ]]
{content}

[[ ## reasoning ## ]]
{reasoning}

[[ ## label ## ]]
{label}        # note: the value you produce must exactly match (no extra characters) one of: apache; hadoop; spark; thunderbird; zookeeper

[[ ## completed ## ]]

In adhering to this structure, your objective is: 
        Provide a detailed classification of the content, including a clear explanation of the reasoning behind the classification, and output the corresponding label.


[31mUser message:[0m

This is an example of the task, though some input or output fields are not supplied.

[[ ## content ## ]]
Connection broken for id 188978561024, my id = 2, error =

R