<a href="https://colab.research.google.com/github/lbhagavan/stanford_LLM_Leela/blob/homework/DSPy_Advanced_Prompt_Engineering_Tweet_Sentiment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# DSPy - Advanced Prompt Engineering

In the following notebook, we'll explore an introduction to DSPy and what it can do in just a few lines of code!

To begin, we'll grab the only (top level) dependency we'll need - DSPy!

In [None]:
!pip install -qU dspy-ai
!pip install --upgrade pyarrow

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/280.7 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m280.7/280.7 kB[0m [31m20.2 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/365.7 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m365.7/365.7 kB[0m [31m27.2 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/527.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m527.3/527.3 kB[0m [31m36.6 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/380.1 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m380.1/380.1 kB[0m [31m30.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

DSPy can leverage OpenAI's models under the hood, and still provide an advantage - in order to do so, however, we'll need to provide an OpenAI API Key!

In [None]:
import os
import getpass

from google.colab import userdata
api_key = userdata.get('open_ai_key')

if not api_key:
  api_key = getpass.getpass("Enter your OpenAI API Key: ")

os.environ['OPENAI_API_KEY'] = api_key

## Model

Now we can setup our OpenAI language model - which we'll use through the remaining cells in the notebook.

In [None]:
from dspy import OpenAI

llm = OpenAI(model='gpt-3.5-turbo', api_key=api_key)

Similar to other libraries, we can call the LLM directly with a string to get a response!

In [None]:
llm("What is the square root of pi?")

['The square root of pi is approximately 1.77245385091.']

We'll also set our `setting.configure` with our OpenAI model in the `lm` (Language Model) field for a default LM to use in case we don't specify which LM we'd like to use when calling our DSPy `Predictors`.

In [None]:
import dspy

dspy.settings.configure(lm=llm)

## Data

We're going to be using a dataset that provides a number of example sentences, along with a rating that indicates their "dopeness" level.

We have a total of 99 rows of data, and will be splitting that into a `trainset` and a `valset` - for training and evaluation.

In [None]:
import pandas as pd

dataset = pd.read_csv("https://raw.githubusercontent.com/cjflanagan/cs68/master/stock_data_nlp.csv")
# change sentiment from 1 to "Positive" and 0 to "Negative"
# dataset['Sentiment'] = dataset['Sentiment'].replace([0, 1], ['Negative', 'Positive'])
dataset = dataset.sample(frac=1)  # frac=1 shuffles all rows
dataset.head()

Unnamed: 0,Text,Sentiment
3473,user: VZ 6.2 million of 9 million phones sold ...,1
4135,MAKO breaks out beautifully,1
773,AJ from watch list triggered a few cents yeste...,1
2581,QCO 5 Stocks ising on nusual Volume,1
1764,"Green Weekly Triangle on CB,....Open Sell Shor...",0


In [None]:
dataset.info()

<class 'pandas.core.frame.DataFrame'>
Index: 5791 entries, 3473 to 4980
Data columns (total 2 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   Text       5791 non-null   object
 1   Sentiment  5791 non-null   int64 
dtypes: int64(1), object(1)
memory usage: 135.7+ KB


In [None]:
dataset.to_csv("tweet_sentiment.csv", index=False)

Due to the nature of the dataset, we'll need to shuffle our dataset to ensure our labels are not clumped up, and our `valset` is remotely representative to our `trainset`.

We'll move our `Dataset` into the expected format in DSPy which is the [`Example`](https://dspy-docs.vercel.app/docs/deep-dive/data-handling/examples)!


Our examples will have two keys:

- `sentence`, our input sentence to be rated
- `rating`, our rating label

We'll specify our input as `sentence` to properly leverage the DSPy framework.

In [None]:
from dspy import Example

trainset = []

# Iterate over rows in the DataFrame
for index, row in dataset.iterrows():
    trainset.append(Example(sentence=row["Text"], rating=row["Sentiment"]).with_inputs("sentence"))

len(trainset)

5791

In [None]:
trainset[0:10]

[Example({'sentence': 'user: VZ 6.2 million of 9 million phones sold were AAP strongest period of sales since 2011 FIE THE CEO!', 'rating': 1}) (input_keys={'sentence'}),
 Example({'sentence': 'MAKO breaks out beautifully', 'rating': 1}) (input_keys={'sentence'}),
 Example({'sentence': 'AJ from watch list triggered a few cents yesterday and continues today - volume 41%', 'rating': 1}) (input_keys={'sentence'}),
 Example({'sentence': 'QCO  5 Stocks ising on nusual Volume', 'rating': 1}) (input_keys={'sentence'}),
 Example({'sentence': 'Green Weekly Triangle on CB,....Open Sell Short at 3.38  ', 'rating': 0}) (input_keys={'sentence'}),
 Example({'sentence': 'HPQ user option guest on bloomberg tv just said buy march 18 calls too but also hedge by selling feb 17.50 calls other guests T', 'rating': 0}) (input_keys={'sentence'}),
 Example({'sentence': 'user I lost 15% of my holdings when it first opened, but have recouped it all through AAP and now FB. Buying more of both', 'rating': 1}) (in

We'll repeat the same process for our `valset` as well.

In [None]:
valset = trainset[0:100]
trainset = trainset[100:]

Let's take a peek at an example from our `trainset` and `valset`!

In [None]:
train_example = trainset[0]
print(f"Sentence: {train_example.sentence}")
print(f"Label: {train_example.rating}")

Sentence: MTG closed at the low. still short from 5.70's
Label: 0


In [None]:
valset_example = valset[0]
print(f"Sentence: {valset_example.sentence}")
print(f"Label: {valset_example.rating}")

Sentence: user: VZ 6.2 million of 9 million phones sold were AAP strongest period of sales since 2011 FIE THE CEO!
Label: 1


## Signature

The first foundational unit in DSPy is the `Signature`.

In a sense, a `Signature` can be thought of as both a prompt, as well as metadata about that prompt.

Going beyong just a simple `SystemMessage`, as seen in other frameworks, the `Signature` helps DSPy validate datatypes, create examples, and more.

> NOTE: DSPy's [documentation](https://dspy-docs.vercel.app/docs/deep-dive/signature/understanding-signatures#what-is-a-signature) goes into more detail about what exactly a `Signature` is.

In [None]:
from dspy import Signature, InputField, OutputField

class PositiveOrNegativeSignature(Signature):
  """Rate the input as being either 1 or 0. Only return 1 or 0"""
  sentence: str = InputField()
  rating: int = OutputField(desc='key-value pairs')

## Predictor

Now that we have our `Signature`, we can build a `Predictor` that leverages it.

A `Predictor`, in the simplest terms, is what calls the LLM using our signature. Importantly, the `Predictor` knows how to leverage our signature to call the LLM. From DSPy's documentation, one of the most interesting parts of a `Predictor` is that it can *learn* to become better at the desired task!

Let's take a look at our `TypedPredictor` below to see more.

In [None]:
from dspy.functional import TypedPredictor

generate_label = TypedPredictor(PositiveOrNegativeSignature)

In [None]:
generate_label

TypedPredictor(PositiveOrNegativeSignature(sentence -> rating
    instructions='Rate the input as being either 1 or 0. Only return 1 or 0'
    sentence = Field(annotation=str required=True json_schema_extra={'__dspy_field_type': 'input', 'prefix': 'Sentence:', 'desc': '${sentence}'})
    rating = Field(annotation=int required=True json_schema_extra={'desc': 'key-value pairs', '__dspy_field_type': 'output', 'prefix': 'Rating:'})
))

In [None]:
label_prediction = generate_label(sentence=valset_example.sentence)
print(f"Sentence: {valset_example.sentence}")
print(f"Prediction: {label_prediction}")

Sentence: user: VZ 6.2 million of 9 million phones sold were AAP strongest period of sales since 2011 FIE THE CEO!
Prediction: Prediction(
    rating=1
)


We can, at any time, check our LLMs outputs through the `inspect_history`.

In [None]:
llm.inspect_history(n=1)




Rate the input as being either 1 or 0. Only return 1 or 0

---

Follow the following format.

Sentence: ${sentence}
Rating: key-value pairs (Respond with a single int value)

---

Sentence: user: VZ 6.2 million of 9 million phones sold were AAP strongest period of sales since 2011 FIE THE CEO!
Rating:[32m 1[0m





'\n\n\nRate the input as being either 1 or 0. Only return 1 or 0\n\n---\n\nFollow the following format.\n\nSentence: ${sentence}\nRating: key-value pairs (Respond with a single int value)\n\n---\n\nSentence: user: VZ 6.2 million of 9 million phones sold were AAP strongest period of sales since 2011 FIE THE CEO!\nRating:\x1b[32m 1\x1b[0m\n\n\n'

Notice how, without our input - the `TypedPredictor` has included format instructions to the LLM to help ensure our returned data resembles what we desire.

Let's look at another example of a `Predictor` - this time with Chain of Thought.

In order to use this - we don't have to do anything with our `Signature`! We can leave it exactly as is - and allow the `Predictor` to adapt to it.

> NOTE: We won't be using this predictor going forward - this is just to showcase the ease of using another `Predictor` with a `Signature`.

In [None]:
from dspy.functional import TypedChainOfThought

generate_label_with_chain_of_thought = TypedChainOfThought(PositiveOrNegativeSignature)

label_prediction = generate_label_with_chain_of_thought(sentence=valset_example.sentence)

In [None]:
print(f"Sentence: {valset_example.sentence}")
print(f"Reasoning: {label_prediction.reasoning}")
print(f"Ground Truth Label: {valset_example.rating}")
print(f"Prediction: {label_prediction.rating}")

Sentence: user: VZ 6.2 million of 9 million phones sold were AAP strongest period of sales since 2011 FIE THE CEO!
Reasoning: produce the rating. We see that the user is mentioning the number of phones sold by VZ, which is 6.2 million out of 9 million. They also mention that this is the strongest period of sales since 2011. The user is expressing excitement about this news.
Ground Truth Label: 1
Prediction: 1


We can, again, check our LLM's history to see what the actual prompt/response is.


In [None]:
llm.inspect_history(n=1)




Rate the input as being either 1 or 0. Only return 1 or 0

---

Follow the following format.

Sentence: ${sentence}
Reasoning: Let's think step by step in order to ${produce the rating}. We ...
Rating: key-value pairs (Respond with a single int value)

---

Sentence: user: VZ 6.2 million of 9 million phones sold were AAP strongest period of sales since 2011 FIE THE CEO!
Reasoning: Let's think step by step in order to[32m produce the rating. We see that the user is mentioning the number of phones sold by VZ, which is 6.2 million out of 9 million. They also mention that this is the strongest period of sales since 2011. The user is expressing excitement about this news. 
Rating: 1[0m





"\n\n\nRate the input as being either 1 or 0. Only return 1 or 0\n\n---\n\nFollow the following format.\n\nSentence: ${sentence}\nReasoning: Let's think step by step in order to ${produce the rating}. We ...\nRating: key-value pairs (Respond with a single int value)\n\n---\n\nSentence: user: VZ 6.2 million of 9 million phones sold were AAP strongest period of sales since 2011 FIE THE CEO!\nReasoning: Let's think step by step in order to\x1b[32m produce the rating. We see that the user is mentioning the number of phones sold by VZ, which is 6.2 million out of 9 million. They also mention that this is the strongest period of sales since 2011. The user is expressing excitement about this news. \nRating: 1\x1b[0m\n\n\n"

## Modules

Now that we have our `TypedPredictor`, we can create a `Module`!

A `Module` is useful because it allows us to interact with the `Predictor` and `Signature` in a way that DSPy can leverage for optimization.

The helps the DSPy framework determine paths through your program - and helps during the `compilation` or optimisation steps (formerly `teleprompting`).

> NOTE: You might notice this looks strikingly familiar to PyTorch, and this is by design!

In [None]:
from dspy import Module, Prediction

class PositiveOrNegativeStudent(Module):
  def __init__(self):
    super().__init__()

    self.generate_rating = TypedPredictor(PositiveOrNegativeSignature)

  def forward(self, sentence):
    prediction = self.generate_rating(sentence=sentence)
    return Prediction(rating=prediction.rating)

## Evaluate

As with any good framework, DSPy has the ability to `Evaluate` - we can leverage this to determine how our current DSPy "program" (our `Module` in this case) operates.

> NOTE: DSPy's "program" could be loosely related to a "chain" from the popular LLM Framework LangChain.

In [None]:
from dspy.evaluate.evaluate import Evaluate

evaluate_fewshot = Evaluate(devset=valset, num_threads=1, display_progress=True, display_table=10)

def exact_match_metric(answer, pred, trace=None):
  return answer.rating == pred.rating

evaluate_fewshot(PositiveOrNegativeStudent(), metric=exact_match_metric)

Average Metric: 61 / 100  (61.0): 100%|██████████| 100/100 [00:54<00:00,  1.84it/s]


Unnamed: 0,sentence,example_rating,pred_rating,exact_match_metric
0,user: VZ 6.2 million of 9 million phones sold were AAP strongest period of sales since 2011 FIE THE CEO!,1,1,✔️ [True]
1,MAKO breaks out beautifully,1,1,✔️ [True]
2,AJ from watch list triggered a few cents yesterday and continues today - volume 41%,1,1,✔️ [True]
3,QCO 5 Stocks ising on nusual Volume,1,1,✔️ [True]
4,"Green Weekly Triangle on CB,....Open Sell Short at 3.38",0,1,False
5,HPQ user option guest on bloomberg tv just said buy march 18 calls too but also hedge by selling feb 17.50 calls other guests T,0,1,False
6,"user I lost 15% of my holdings when it first opened, but have recouped it all through AAP and now FB. Buying more of both",1,1,✔️ [True]
7,"Government May Act Out Of Fear, Hold Back In COVID Fight: Rajiv Bajaj https://t.co/hJ0dxsJ0cR",0,1,False
8,i think we'll see aapl at sub 430s by the end of the day. not saying it pins 430 but i think it revisits it...,0,1,False
9,"US Markets Crash Again As Dow Plunges 1,700 Points In Early Trade https://t.co/oLizZCixYO",0,1,False


61.0

In [None]:
llm.inspect_history(n=1)




Rate the input as being either 1 or 0. Only return 1 or 0

---

Follow the following format.

Sentence: ${sentence}
Rating: key-value pairs (Respond with a single int value)

---

Sentence: The U.S.��������s national medical stockpile has sent out nearly half of its ventilators��������an amount that pales in compariso������� https://t.co/fexQEHpg4n
Rating:[32m 1[0m





'\n\n\nRate the input as being either 1 or 0. Only return 1 or 0\n\n---\n\nFollow the following format.\n\nSentence: ${sentence}\nRating: key-value pairs (Respond with a single int value)\n\n---\n\nSentence: The U.S.��������s national medical stockpile has sent out nearly half of its ventilators��������an amount that pales in compariso������� https://t.co/fexQEHpg4n\nRating:\x1b[32m 1\x1b[0m\n\n\n'

## Program Optimization (the Artist Formerly Known as Teleprompting)

Optimization is the crux of the DSPy framework - it is what allows it to operate at a level beyond traditional prompt engineering.

At a high level, optimisation is a way for the DSPy framework to take the program, a training set, and a metric - and make changes/tweaks to our program to improve our metrics on our dataset.

Let's get started with the `LabeledFewShot` optimizer.

The `LabeledFewShot` optimizer very simply provides a sample of the `trainset` as few-shot examples!

In [None]:
from dspy.teleprompt import LabeledFewShot

labeled_fewshot_optimizer = LabeledFewShot(k=4)

Once we define our optimizer, we can compile our program!

In [None]:
compiled_dspy = labeled_fewshot_optimizer.compile(student=PositiveOrNegativeStudent(), trainset=trainset)

Let's evaluate!

In [None]:
evaluate_fewshot(compiled_dspy, metric=exact_match_metric)

Average Metric: 62 / 100  (62.0): 100%|██████████| 100/100 [00:53<00:00,  1.85it/s]


Unnamed: 0,sentence,example_rating,pred_rating,exact_match_metric
0,user: VZ 6.2 million of 9 million phones sold were AAP strongest period of sales since 2011 FIE THE CEO!,1,1,✔️ [True]
1,MAKO breaks out beautifully,1,1,✔️ [True]
2,AJ from watch list triggered a few cents yesterday and continues today - volume 41%,1,1,✔️ [True]
3,QCO 5 Stocks ising on nusual Volume,1,1,✔️ [True]
4,"Green Weekly Triangle on CB,....Open Sell Short at 3.38",0,1,False
5,HPQ user option guest on bloomberg tv just said buy march 18 calls too but also hedge by selling feb 17.50 calls other guests T,0,1,False
6,"user I lost 15% of my holdings when it first opened, but have recouped it all through AAP and now FB. Buying more of both",1,1,✔️ [True]
7,"Government May Act Out Of Fear, Hold Back In COVID Fight: Rajiv Bajaj https://t.co/hJ0dxsJ0cR",0,1,False
8,i think we'll see aapl at sub 430s by the end of the day. not saying it pins 430 but i think it revisits it...,0,1,False
9,"US Markets Crash Again As Dow Plunges 1,700 Points In Early Trade https://t.co/oLizZCixYO",0,1,False


62.0

In [None]:
llm.inspect_history(n=1)




Rate the input as being either 1 or 0. Only return 1 or 0

---

Follow the following format.

Sentence: ${sentence}
Rating: key-value pairs (Respond with a single int value)

---

Sentence: ong EN with stop arnd 39.40- entry 40.10
Rating: 1

---

Sentence: AAP probably small fadethen a pop finish (I think 435-437).. everything is green today,  tomorrow.. who knows
Rating: 1

---

Sentence: CM is bull flagging.
Rating: 1

---

Sentence: Our software stopped us out of NSPH today for a 5% loss on the trade.  Still hanging on tight with HA -
Rating: 0

---

Sentence: The U.S.��������s national medical stockpile has sent out nearly half of its ventilators��������an amount that pales in compariso������� https://t.co/fexQEHpg4n
Rating:[32m 0[0m





'\n\n\nRate the input as being either 1 or 0. Only return 1 or 0\n\n---\n\nFollow the following format.\n\nSentence: ${sentence}\nRating: key-value pairs (Respond with a single int value)\n\n---\n\nSentence: ong EN with stop arnd 39.40- entry 40.10\nRating: 1\n\n---\n\nSentence: AAP probably small fadethen a pop finish (I think 435-437).. everything is green today,  tomorrow.. who knows\nRating: 1\n\n---\n\nSentence: CM is bull flagging.\nRating: 1\n\n---\n\nSentence: Our software stopped us out of NSPH today for a 5% loss on the trade.  Still hanging on tight with HA -\nRating: 0\n\n---\n\nSentence: The U.S.��������s national medical stockpile has sent out nearly half of its ventilators��������an amount that pales in compariso������� https://t.co/fexQEHpg4n\nRating:\x1b[32m 0\x1b[0m\n\n\n'

As you can see - with no effort at all - we can improve our performance on our `valset`!

Let's try another optimizer - this time: [`BootstrapFewShot`](https://dspy-docs.vercel.app/docs/deep-dive/teleprompter/bootstrap-fewshot).

The key thing to note is that this optimizer works with even very few examples - by way of generating new examples by the LLMs!

In [None]:
from dspy.teleprompt import BootstrapFewShot

optimizer = BootstrapFewShot(metric=exact_match_metric, max_bootstrapped_demos=4, max_labeled_demos=12)

compiled_dspy_BOOTSTRAP = optimizer.compile(student=PositiveOrNegativeStudent(), trainset=trainset)

  0%|          | 6/5691 [00:03<49:48,  1.90it/s]

Bootstrapped 4 full traces after 7 examples in round 0.





Let's finally evaluate!

In [None]:
eval_output = evaluate_fewshot(compiled_dspy_BOOTSTRAP, metric=exact_match_metric)
eval_output

Average Metric: 77 / 100  (77.0): 100%|██████████| 100/100 [00:56<00:00,  1.77it/s]


Unnamed: 0,sentence,example_rating,pred_rating,exact_match_metric
0,user: VZ 6.2 million of 9 million phones sold were AAP strongest period of sales since 2011 FIE THE CEO!,1,1,✔️ [True]
1,MAKO breaks out beautifully,1,1,✔️ [True]
2,AJ from watch list triggered a few cents yesterday and continues today - volume 41%,1,1,✔️ [True]
3,QCO 5 Stocks ising on nusual Volume,1,0,False
4,"Green Weekly Triangle on CB,....Open Sell Short at 3.38",0,0,✔️ [True]
5,HPQ user option guest on bloomberg tv just said buy march 18 calls too but also hedge by selling feb 17.50 calls other guests T,0,1,False
6,"user I lost 15% of my holdings when it first opened, but have recouped it all through AAP and now FB. Buying more of both",1,1,✔️ [True]
7,"Government May Act Out Of Fear, Hold Back In COVID Fight: Rajiv Bajaj https://t.co/hJ0dxsJ0cR",0,0,✔️ [True]
8,i think we'll see aapl at sub 430s by the end of the day. not saying it pins 430 but i think it revisits it...,0,1,False
9,"US Markets Crash Again As Dow Plunges 1,700 Points In Early Trade https://t.co/oLizZCixYO",0,0,✔️ [True]


77.0

We can see that this optimization helps our program achieve 30 points higher on our evaluation!

In [None]:
llm.inspect_history(n=1)




Rate the input as being either 1 or 0. Only return 1 or 0

---

Follow the following format.

Sentence: ${sentence}
Rating: key-value pairs (Respond with a single int value)

---

Sentence: MTG closed at the low. still short from 5.70's
Rating: 0

---

Sentence: BAC Today on Weekly OPEX should  pin around 11.35-11.50 & next week going higher as  we approach March 7th StressTest IMO.
Rating: 1

---

Sentence:
Rupee Edges Lower To 76.43 Against Dollar Amid Coronavirus Crisis
https://t.co/7UCEt57hpb
Rating: 0

---

Sentence: added to my AAP long
Rating: 1

---

Sentence: SPY SPX ES  Green Hedges Stars model: next stop 1410.56 SPX 1387.92 if lower.
Rating: 0

---

Sentence: user: GMC looks like the free money trade for today. Headed to 49+ - so far so good
Rating: 1

---

Sentence: BAC nice little rocket, Friday was a headfake, get ready, she is going to 15 imo...
Rating: 1

---

Sentence: EBAY ebaynow was dead on arrival and the company itself is leveraged a lot.
Rating: 0

---

Senten

"\n\n\nRate the input as being either 1 or 0. Only return 1 or 0\n\n---\n\nFollow the following format.\n\nSentence: ${sentence}\nRating: key-value pairs (Respond with a single int value)\n\n---\n\nSentence: MTG closed at the low. still short from 5.70's\nRating: 0\n\n---\n\nSentence: BAC Today on Weekly OPEX should  pin around 11.35-11.50 & next week going higher as  we approach March 7th StressTest IMO.\nRating: 1\n\n---\n\nSentence:\nRupee Edges Lower To 76.43 Against Dollar Amid Coronavirus Crisis\nhttps://t.co/7UCEt57hpb\nRating: 0\n\n---\n\nSentence: added to my AAP long\nRating: 1\n\n---\n\nSentence: SPY SPX ES  Green Hedges Stars model: next stop 1410.56 SPX 1387.92 if lower.\nRating: 0\n\n---\n\nSentence: user: GMC looks like the free money trade for today. Headed to 49+ - so far so good\nRating: 1\n\n---\n\nSentence: BAC nice little rocket, Friday was a headfake, get ready, she is going to 15 imo...\nRating: 1\n\n---\n\nSentence: EBAY ebaynow was dead on arrival and the compa

In [None]:
for name, parameter in compiled_dspy_BOOTSTRAP.named_parameters():
  print(f"Parameter {name}: Num Examples: {len(parameter.demos)}, {parameter.demos[0]}")
  print()

Parameter generate_rating.predictor: Num Examples: 12, Example({'augmented': True, 'sentence': "MTG closed at the low. still short from 5.70's", 'rating': '0'}) (input_keys=None)



In [None]:
def return_rating(sentence):
  return compiled_dspy_BOOTSTRAP(sentence=sentence).rating

In [None]:
return_rating("This is bad")

0

### MIPRO

In [None]:
from dspy.teleprompt import MIPRO

optimizer = MIPRO(metric=exact_match_metric, max_bootstrapped_demos=4, max_labeled_demos=12)

compiled_dspy_MIPRO = optimizer.compile(student=PositiveOrNegativeStudent(), trainset=trainset)


TypeError: MIPRO.__init__() got an unexpected keyword argument 'max_bootstrapped_demos'

# Testing classifiers

## Niave Bayes Classifier

In [None]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import roc_auc_score, classification_report


# Split the data into training and testing sets
X = dataset['Text']
y = dataset['Sentiment']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Convert text data into numerical data using CountVectorizer
vectorizer = CountVectorizer()
X_train_vec = vectorizer.fit_transform(X_train)
X_test_vec = vectorizer.transform(X_test)

# Train a Naive Bayes classifier
nb_classifier = MultinomialNB()
nb_classifier.fit(X_train_vec, y_train)

# Predict probabilities
y_pred = nb_classifier.predict(X_test_vec)
# Convert string labels to numerical labels using NumPy's where function
# y_test_num = np.where(y_test == 'Positive', 1, 0)
# y_pred_num = np.where(y_pred == 'Positive', 1, 0)

# Calculate the ROC AUC metric
roc_auc = roc_auc_score(y_test, y_pred)

print("ROC AUC:", roc_auc)
print(classification_report(y_test, y_pred))



## LLM Predictor

In [None]:
y_pred_llm = X_test.apply(return_rating)

In [None]:
for i in X_test[0:10].index:
  print(f"Sentence: {X_test[i]}")
  print(f"Prediction: {y_pred_llm[i]}")
  print(f"Ground Truth: {y_test[i]}")
  print()

In [None]:
# Calculate the ROC AUC metric
roc_auc = roc_auc_score(y_test, y_pred_llm)

print("ROC AUC:", roc_auc)

print(classification_report(y_test, y_pred_llm))