# DSPy Take 2
  
In this exercise, we will use DSPy
1. create a basic QA Prediction to summarize Java file (which is what this read-agent-java project about)
2. use GPT-4 to summarize the sample Java project with hand crafted prompts, to make a "gold" dataset.
3. Run the QA bot with GPT-3.5, through few-shot optimizer from DSPy, and get the metrics
4. if the GPT-3.5 results is close enough to the GPT-4, then we call it success, and save the "compiled" QA prediction.

In [1]:
!pip3 install --quiet openai python-dotenv dspy-ai


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.1.2[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip3 install --upgrade pip[0m


In [2]:
import pkg_resources
pkg_resources.get_distribution("dspy-ai").version

'2.3.6'

**Use Arize Phoenix for tracing**

In [3]:
!pip3 install --quiet arize-phoenix openinference-instrumentation-dspy opentelemetry-exporter-otlp


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.1.2[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip3 install --upgrade pip[0m


In [None]:
import phoenix as px
px.launch_app()

In [5]:
from openinference.instrumentation.dspy import DSPyInstrumentor
from opentelemetry import trace as trace_api
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk import trace as trace_sdk
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace.export import SimpleSpanProcessor

endpoint = "http://127.0.0.1:6006/v1/traces"
resource = Resource(attributes={})
tracer_provider = trace_sdk.TracerProvider(resource=resource)
span_otlp_exporter = OTLPSpanExporter(endpoint=endpoint)
tracer_provider.add_span_processor(SimpleSpanProcessor(span_exporter=span_otlp_exporter))
trace_api.set_tracer_provider(tracer_provider=tracer_provider)
DSPyInstrumentor().instrument()

**Set Up OpenAI GPT-3.5 for DSPy**

In [6]:
import dspy

In [7]:
import os
import getpass
os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")
lm = dspy.OpenAI(model="gpt-3.5-turbo", max_tokens=4000)
dspy.settings.configure(lm=lm)

OpenAI API Key: ········


## Basic QA bot with Code, Question input and Summary output

In [26]:
class BasicQA(dspy.Signature):
    code = dspy.InputField(desc="source code of the Java program")
    question = dspy.InputField()
    summary = dspy.OutputField()
    
class BasicQABot(dspy.Module):
    def __init__(self):
        super().__init__()

        self.generate = dspy.Predict(BasicQA)

    def forward(self, code, question):
        prediction = self.generate(code=code, question = question)
        return dspy.Prediction(summary = prediction.summary)

In [10]:
qa_bot = BasicQABot()

**A quick test of our QA bot**

In [12]:
import os 
cwd = os.getcwd()
# load one java file
filename="TravelBeApplication.java"
filepath = os.path.join(cwd, "../../data/travel-service-dev/src/main/java/com/iky/travel/controller/travel/TravelController.java")
print(filepath)
with open(filepath, 'r') as file:
        code = file.read()
pred = qa_bot.forward(code=code, question="what is the summary of the Java program?")
pred.summary

/Users/jyou/Documents/GitHub/read-agent-java/docs/dspy/../../data/travel-service-dev/src/main/java/com/iky/travel/controller/travel/TravelController.java


'This Java program defines a `TravelController` class that handles requests related to travel destinations. It includes methods to retrieve popular destinations, clear popular destinations from Redis cache, and retrieve all destinations. The controller uses a `TravelService` to interact with the data layer.'

## We need the Ground Truth for training ...

Next, we will create a training set using GPT-4. Basically, the idea is to use the more advanced LLM to create training set, then we can utilize it to train the less expensive model for example GPT-3.5

## Create Gold Set - One Time Only ##

We will scan all the Java files, and summarize them with OpenAI GPT4, argubally the best LLM available now. 

In [80]:
files = []
cwd = os.getcwd()
folder_path = os.path.join(cwd, "../../data/travel-service-dev/src/main/java")
for root, dirs, filenames in os.walk(folder_path):
    for filename in filenames:
        if filename.endswith(".java"):
            files.append(os.path.join(root, filename))

In [156]:

summary_prompt = """
You are a world class Java developer. You are given a Java program to maintain. You need to read the code and write notes.
The notes should be short, concise and to the point.
Make sure to include the following points:
- The purpose of the code
- The functionality of the code
- The important classes and methods used in the code

Just return the notes. DO NOT explain your reason.

{}
"""


In [157]:
import openai
gpt_client = openai.OpenAI()

def query_gpt(
    prompt: str,
    lm: str = 'gpt-4-1106-preview',
    temperature: float = 0.0,
    max_decode_steps: int = 512,
    seconds_to_reset_tokens: float = 30.0,
) -> str:
  while True:
    try:
      raw_response = gpt_client.chat.completions.with_raw_response.create(
        model=lm,
        max_tokens=max_decode_steps,
        temperature=temperature,
        messages=[
          {'role': 'user', 'content': prompt},
        ]
      )
      completion = raw_response.parse()
      return completion.choices[0].message.content
    except openai.RateLimitError as e:
      print(f'{datetime.datetime.now()}: query_gpt_model: RateLimitError {e.message}: {e}')
      time.sleep(seconds_to_reset_tokens)
    except openai.APIError as e:
      print(f'{datetime.datetime.now()}: query_gpt_model: APIError {e.message}: {e}')
      print(f'{datetime.datetime.now()}: query_gpt_model: Retrying after 5 seconds...')
      time.sleep(5)

In [158]:
import time
dataset_file = os.path.join(cwd, "./java_summary_gpt4.txt")
with open(dataset_file, "a+") as f:
    for filepath in files:
        with open(filepath, 'r') as file:        
            code = file.read()
        # get summary from OpenAI GPT-4
        if code is not None:
            prompt = summary_prompt.format(code)
            summary = query_gpt(prompt)
            if summary:
                f.write(f"Path: {filepath}\n\nSummary: {summary}\n\n")
            # sleep
            time.sleep(5)

## Load the GPT-4 Gold Data (summary) into train-set

In [13]:
dataset = []
dataset_file = os.path.join(cwd, "./java_summary_gpt4.txt")
# Open the file and read line by line
with open(dataset_file, "r") as f:
            lines = f.readlines()
            i = 0
            while i < len(lines):
                if lines[i].startswith("Path:"):
                    path = lines[i].split(":")[1].strip()
                    # summary could be one or multiple lines, read until end of the file or empty line
                    # summary line starts with "Summary:"
                    summary = ""
                    i += 2
                    while i < len(lines) and not lines[i].strip() == "":
                        # first line remove the "Summary:"
                        #if lines[i].startswith("Summary:"):
                            #summary += lines[i].split(":")[1].strip()
                        #else:
                        summary += lines[i]
                        i += 1
                    dataset.append((path, summary))
                else:
                    i += 1

In [14]:
dataset[-3]

('/Users/jyou/Documents/GitHub/read-agent-java/docs/dspy/../../data/travel-service-dev/src/main/java/com/iky/travel/exception/city/CityAlreadyExistsException.java',
 'Summary: - Purpose: Define a custom exception to handle scenarios where a city already exists in a given context.\n- Functionality: Extends `RuntimeException` to create a specific unchecked exception that can be thrown when attempting to add a city that is already present.\n- Important Classes/Methods:\n  - `CityAlreadyExistsException`: Custom exception class.\n  - Constructor `CityAlreadyExistsException(String message)`: Initializes the exception with a custom message.\n')

In [15]:
# Calculate the midpoint of the list
midpoint = len(dataset) // 2
# Split the list into two halves
first_half = dataset[:midpoint]
second_half = dataset[midpoint:]

In [27]:
# create train set
from dspy import Example
exampleset = []
# here we rewrite the question to our QABot to be similar to what we have used when creating the Gold data with GPT-4 earlier.
thoughtful_question = """
what is the summary of the Java program? Make sure to include the following points:
- The purpose of the code
- The functionality of the code
- The important classes and methods used in the code
"""
for data in first_half:
    filepath = data[0]
    # Normalize the path to resolve any symbols like '..'
    normalized_path = os.path.normpath(filepath)
    # Search for the '/src/main/java/' pattern to split the path
    base, package_path = normalized_path.split('/src/main/java/')
    # Replace os.sep with '/' if you want the package name in traditional Java package format
    java_package = package_path.rsplit('/', 1)[0].replace(os.sep, '.')
    summary = data[1]
    
    with open(filepath, 'r') as file:
        code = file.read()
    example = Example(code=code, question=thoughtful_question, summary=summary)
    exampleset.append(example)
trainset = [x.with_inputs("code", "question") for x in exampleset]

In [31]:
trainset[-3].question

'\nwhat is the summary of the Java program? Make sure to include the following points:\n- The purpose of the code\n- The functionality of the code\n- The important classes and methods used in the code\n'

In [32]:
trainset[-3].summary

'Summary: - Purpose: The code defines an interface for mapping between City domain model and CityDTO (Data Transfer Object).\n- Functionality: Provides methods to convert a CityDTO to a City entity and vice versa.\n- Important Classes/Methods:\n  - `@Mapper`: Annotation indicating the interface is a MapStruct mapper.\n  - `CityMapper INSTANCE`: Singleton instance of the mapper created by MapStruct.\n  - `dtoToCity(CityDTO cityDTO)`: Method to convert CityDTO to City entity.\n  - `cityToDto(City city)`: Method to convert City entity to CityDTO.\n'

## Run the QA Bot with GPT-3.5 on the Trainset, and compare the summary with gold data

In [29]:
class AssessSummary(dspy.Signature):
    """Assess the quality of a given text is similar to the gold one."""
    gold_text = dspy.InputField()
    assessed_text = dspy.InputField()
    assessment_question = dspy.InputField()
    assessment_answer = dspy.OutputField(desc="from 0 to 1")    

# custom metric function returns either a number or a boolean value, the first parameter is the gold data, the next is the prediction
def metric(gold, pred, trace=None):
    answer, summary = gold.summary, pred.summary
    # we use LLM to find how close the generated summary is to the gold data
    closeness = "How close is the assessed text close to the gold text, between 0 and 1?"   
    closeEnough = dspy.Predict(AssessSummary)(gold_text=answer, assessed_text=summary, assessment_question=closeness)
    print(closeEnough.assessment_answer)
    if isinstance(closeEnough.assessment_answer, str):
        return float(closeEnough.assessment_answer)
    else:
        return closeEnough.assessment_answer

In [36]:
from dspy.teleprompt import BootstrapFewShot
teleprompter = BootstrapFewShot(metric=metric, max_bootstrapped_demos=7, max_rounds=2)
# now we run the BasicQABot with the trainset,
compiled_summarizer = teleprompter.compile(student = BasicQABot(), trainset=trainset)

 50%|██████████████████████                      | 7/14 [00:00<00:00, 90.90it/s]


0.7
0.7
0.8
0.8
0.9
0.8
0.9


  0%|                                                    | 0/14 [00:00<?, ?it/s]

Bootstrapped 7 full traces after 1 examples in round 1.





## With 7 samples in the few-shot, we achieve 90% closeness...it is good, right?

In [37]:
compiled_summarizer.save("trained_java_summarizer_few_shot.json")

In the case of few-shot promopts, the "compilzed summarizer" is basically the prompt with the samples.

```json
{
  "generate": {
    "lm": null,
    "traces": [],
    "train": [],
    "demos": [
      {
        "augmented": true,
        "code": "package com.iky.travel;\n\nimport org.springframewo...\n}\n",
        "question": "what is the summary of the Java program?",
        "summary": "Summary: - Purpose: The code represents the main entry point ... application context."
      },
      {
        "augmented": true,
        "code": "package com.iky.travel.config;\n\ni...}\n}\n",
        "question": "what is the summary of the Java program?",
        "summary": "Summary: - Purpose: The code con...to create and configure a MongoTemplate instance. - `MONGO_DB_NAME`: Constant holding the name of the MongoDB database."
      },
      {
        "augmented": true,
        "code": "package com.iky.travel.config;\n\n...alizer(new GenericJackson2JsonRedisSerializer());\n    return template;\n  }\n}\n",
        "question": "what is the summary of the Java program?",
        "summary": "Summary: \n- Purpose: The code c...s and values."
      },
      {
        "augmented": true,
        "code": "package com.iky.travel...  .authorizeHttpRequests(auth -> auth\n            .requestMatchers(\"\/api\/**\", \"\/actuator\/**\").permitAll()\n            .anyRequest().authenticated()\n        )\n        .httpBasic(withDefaults());\n\n    return http.build();\n  }\n}\n",
        "question": "what is the summary of the Java program?",
        "summary": "Summary: - Purpose: T... rules for different endpoints. - `httpBasic(withDefaults())`: Configures HTTP Basic authentication with default settings."
      },
      {
        "code": "package com.iky.travel.constant.common;...g API_V1_CITY = API_V1_PREFIX + CITY_API_PREFIX;\n\n  private ApiPathConstants() {\n  }\n}\n",
        "question": "what is the summary of the Java program?",
        "summary": "Summary: - Purpose: Define con...uctor: Prevents instantiation of the utility class.\n"
      },
     ...
      {
        "code": "package com.iky.tra..ring cityName);\n}\n",
        "question": "what is the summary of the Java program?",
        "summary": "Summary: - Purpose: T...etes a city.\n"
      },
      {
        "code": "package co...eated(location).build();\n    } else {\n      throw new CityUpdateException(\"Error when updating city: \" + cityDTO.getName());\n    }\n  }\n}\n",
        "question": "what is the summary of the Java program?",
        "summary": "Summary: - Purpose of...ss for building URIs for newly created resources.\n"
      }
    ],
    "signature_instructions": "Given the fields `code`, `question`, produce the fields `summary`.",
    "signature_prefix": "Summary:"
  }
}
```

## Thoughts

I think DSPy is targeting a very real problem of the dark art of LLM prompts. At the same time, I don't see huge value from it yet. For example, I assume majority of enterprise GenAI use cases are still mostly focusing on couple of well known areas particularily Q/A style, and utilize:
1. summarization
2. extraction
3. classification

In such cases, the developer has to provide domain-specific hints in the "question". This is not what DSPy is solving right now, unless it wants to maintain some sort of "best practices" template catalog which may help more than those optimizers. 

I suspect we can get great results by utilizing couple of simple strategies for example
1. specify requirements like below (by just asking ChatGTP of the top-5 things to summarize ...)
    * Purpose of the File: A brief description of the file's role in the project, such as the functionality it provides or its reason for existence.
    * Key Classes and Interfaces: The main classes and interfaces defined within the file, highlighting their significance.
    * Major Functions and Methods: Summarize the most critical functions and methods, including their purposes and how they contribute to the file's overall functionality.
    * Internal and External Dependencies: Note any significant dependencies, both within the project (internal) and on external libraries or frameworks (external), that are crucial for the file's operation.
    * Limitations or Known Issues: Briefly mention any limitations, bugs, or known issues within the file that could impact future maintenance or enhancements.
2. few shot of example ask-answer
3. add chain-of-thoughts

PS, couple of more research papers

[A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT](!https://arxiv.org/abs/2302.11382)

[Are Large Language Models Good Prompt Optimizers?](!https://arxiv.org/abs/2402.02101)