# Setup

Before we delve into the example, let's ensure our environment is properly configured. We'll start by importing the necessary modules and configuring our language model:

In [46]:
from dotenv import load_dotenv
import os

# Load the .env file
load_dotenv()

# Get the OpenAI API key
openai_api_key = os.getenv('OPENAI_API_KEY')

In [47]:
import dspy
from dspy.datasets.gsm8k import GSM8K, gsm8k_metric

# Set up the LM
turbo = dspy.OpenAI(model='gpt-4', max_tokens=250)
dspy.settings.configure(lm=turbo)

# Load math questions from the GSM8K dataset
gsm8k = GSM8K()
gsm8k_trainset, gsm8k_devset = gsm8k.train[:1], gsm8k.dev[:1]

100%|██████████| 7473/7473 [00:00<00:00, 39542.95it/s]
100%|██████████| 1319/1319 [00:00<00:00, 40398.17it/s]


In [48]:
print(gsm8k_trainset)

[Example({'question': "The result from the 40-item Statistics exam Marion and Ella took already came out. Ella got 4 incorrect answers while Marion got 6 more than half the score of Ella. What is Marion's score?", 'gold_reasoning': "Ella's score is 40 items - 4 items = <<40-4=36>>36 items. Half of Ella's score is 36 items / 2 = <<36/2=18>>18 items. So, Marion's score is 18 items + 6 items = <<18+6=24>>24 items.", 'answer': '24'}) (input_keys={'question'})]


# Define the Module

With our environment set up, let's define a custom program that utilizes the ChainOfThought module to perform step-by-step reasoning to generate answers:

In [49]:
class CoT(dspy.Module):
    def __init__(self):
        super().__init__()
        self.prog = dspy.ChainOfThought("question -> answer")
    
    def forward(self, question):
        return self.prog(question=question)

# Compile and Evaluate the Model

With our simple program in place, let's move on to optimizing it using the BootstrapFewShot teleprompter:

In [50]:
from dspy.teleprompt import BootstrapFewShot

# Set up the optimizer: we want to "bootstrap" (i.e., self-generate) 4-shot examples of our CoT program.
config = dict(max_bootstrapped_demos=1, max_labeled_demos=1)

# Optimize! Use the `gsm8k_metric` here. In general, the metric is going to tell the optimizer how well it's doing.
teleprompter = BootstrapFewShot(metric=gsm8k_metric, **config)
optimized_cot = teleprompter.compile(CoT(), trainset=gsm8k_trainset, valset=gsm8k_devset)

  0%|          | 0/1 [00:00<?, ?it/s]

Backing off 0.7 seconds after 1 tries calling function <function GPT3.request at 0x7fe82e46ee60> with kwargs {}
Backing off 1.3 seconds after 2 tries calling function <function GPT3.request at 0x7fe82e46ee60> with kwargs {}
Backing off 0.5 seconds after 3 tries calling function <function GPT3.request at 0x7fe82e46ee60> with kwargs {}
Backing off 0.4 seconds after 4 tries calling function <function GPT3.request at 0x7fe82e46ee60> with kwargs {}
Backing off 10.6 seconds after 5 tries calling function <function GPT3.request at 0x7fe82e46ee60> with kwargs {}
Backing off 1.6 seconds after 6 tries calling function <function GPT3.request at 0x7fe82e46ee60> with kwargs {}
Backing off 14.9 seconds after 7 tries calling function <function GPT3.request at 0x7fe82e46ee60> with kwargs {}
Backing off 102.6 seconds after 8 tries calling function <function GPT3.request at 0x7fe82e46ee60> with kwargs {}
Backing off 159.2 seconds after 9 tries calling function <function GPT3.request at 0x7fe82e46ee60> w

100%|██████████| 1/1 [16:45<00:00, 1005.34s/it]

Failed to run or to evaluate example Example({'question': "The result from the 40-item Statistics exam Marion and Ella took already came out. Ella got 4 incorrect answers while Marion got 6 more than half the score of Ella. What is Marion's score?", 'gold_reasoning': "Ella's score is 40 items - 4 items = <<40-4=36>>36 items. Half of Ella's score is 36 items / 2 = <<36/2=18>>18 items. So, Marion's score is 18 items + 6 items = <<18+6=24>>24 items.", 'answer': '24'}) (input_keys={'question'}) with <function gsm8k_metric at 0x7fe822156320> due to Error code: 429 - {'error': {'message': 'You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.', 'type': 'insufficient_quota', 'param': None, 'code': 'insufficient_quota'}}.
Bootstrapped 0 full traces after 1 examples in round 0.





In [51]:
turbo.inspect_history(n=1)

In [52]:
optimized_cot(question='Who has the highest score?')

Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x7fe82e46ee60> with kwargs {}
Backing off 0.0 seconds after 2 tries calling function <function GPT3.request at 0x7fe82e46ee60> with kwargs {}
Backing off 0.4 seconds after 3 tries calling function <function GPT3.request at 0x7fe82e46ee60> with kwargs {}
Backing off 2.4 seconds after 4 tries calling function <function GPT3.request at 0x7fe82e46ee60> with kwargs {}
Backing off 10.3 seconds after 5 tries calling function <function GPT3.request at 0x7fe82e46ee60> with kwargs {}
Backing off 20.9 seconds after 6 tries calling function <function GPT3.request at 0x7fe82e46ee60> with kwargs {}
Backing off 14.4 seconds after 7 tries calling function <function GPT3.request at 0x7fe82e46ee60> with kwargs {}
Backing off 92.0 seconds after 8 tries calling function <function GPT3.request at 0x7fe82e46ee60> with kwargs {}
Backing off 29.2 seconds after 9 tries calling function <function GPT3.request at 0x7fe82e46ee60> wi

KeyboardInterrupt: 

## Shawn's notes
So I have issues with this starter module. This is a good example of how not to create a quickstart. OpenAI, even with the pro subscription, throttles the hell out of API requests, even on the older models like GPT 3X. It's to the point to being useless.

So I'm just going to assume that this works as advertized and hope I can find a solution as I continue to learn DSPy. Connecting to my own local model would be a much better solution. 

I attempted a few fixes to get this to work, but so far, no luck. It could be the dataset, or it could be OpenAI's API limits, but it wasn't quite working for me, even with a few tweaks. 
