In [1]:
!python -m pip install -r requirements.txt



We will setup a locally running LLM and a locally running vector database and embedding function. These will be:
* Ollama running Mistral
* ChromaDB, running locally, but persistently. The vectors will be stored in the current directory. This can be changed via the `chroma_dir` variable
* `DefaultEmbeddingFunction` (part of the ChromaDB lib) will take care of creating embeddings

It is assumed that the ChromaDB database is already setup. In order to set it up, you can use the `create_knowledgebase` notebook

In [31]:
import dspy
from dspy.retrieve.chromadb_rm import ChromadbRM
from chromadb.utils.embedding_functions import DefaultEmbeddingFunction

chroma_dir = './chroma'
chroma_collection = 'man_data'

chroma_rm = ChromadbRM(
    collection_name=chroma_collection,
    persist_directory=chroma_dir,
    embedding_function=DefaultEmbeddingFunction(),
    k=3,
)

mistral_ollama = dspy.OllamaLocal(
    model='mistral',
    max_tokens=1000,
)
dspy.configure(
    lm=mistral_ollama,
    rm=chroma_rm
)

In [3]:
# test the vanilla LLM
mistral_ollama('Who is the president of Brazil?')

[' As of my current knowledge up to 2021, the President of Brazil is Jair Bolsonaro. He took office on January 1, 2019. However, please check the most recent and reliable sources for the latest information as political situations can change.']

`DocumentFAQ` is a pipeline for retrieving knowledgebase for a question and then using an LLM to answer it.
It features a Chain Of Throught step that should improve the performance of the predictiions

In [4]:
class DocumentFAQSignature(dspy.Signature):
    """Answer questions based on the provided context."""

    context = dspy.InputField(desc="facts here are assumed to be true")
    question = dspy.InputField()
    answer = dspy.OutputField()


class DocumentFAQ(dspy.Module):
    def __init__(self, num_passages=3):
        super().__init__()

        self.retrieve = dspy.Retrieve(k=3)
        self.generate_answer = dspy.ChainOfThought(DocumentFAQSignature)
    
    def forward(self, question) -> dspy.Prediction:
        context = self.retrieve(question).passages
        prediction = self.generate_answer(context=context, question=question)
        return dspy.Prediction(context=context, answer=prediction.answer)

The following block is test the pipeline without any training or optimizations. It will just extract three items from the knowledgebase and put them in the context. Then, using Ollama, it will predict the answer and display it.

In [5]:
# Test pipeline

question = 'How can I make a multipart upload with curl? Can you write an example for me?'
pipeline = DocumentFAQ()
prediction = pipeline.forward(question)
print(prediction.answer)

To make a multipart upload using `curl`, you can use the `--form` or `-F` option. Here's an example of how to send a file named `image.jpg` along with some metadata as key-value pairs:

```bash
# Replace 'https://example.com/upload.php' with your actual upload URL
curl --form "name=JohnDoe" \
     --form "email=john.doe@example.com" \
     --form "file;file=@image.jpg" \
     https://example.com/upload.php
```

In this example, we use the `--


# Adding some examples

We will add some static examples and feed them as data into the pipeline. There, they can be used as few-shot examples for the prompt. Additionally, they can be used for optimising the parameters of the pipeline.

## Examples

You will find these examples in the `curl_examples.csv` file. They were generated ahead of time using chatGPT 3.5. They seem to be on par with the format Mistral seems to return as answers, so they should be able to work well together

In [6]:
import csv
import dspy

from dspy.datasets import Dataset


examples_path = './curl_examples.csv'
with open(examples_path, 'r') as examples_file:
    csv_reader = csv.DictReader(examples_file)
    examples: list[dspy.Example] = []
    for example_dict in csv_reader:
        examples.append(
            dspy.Example(
                question=example_dict['question'],
                answer=example_dict['answer'],
            ).with_inputs('question')
        )

class CSVDataset(Dataset):
    def __init__(self, examples_path, *args, **kwargs) -> None:
        super().__init__(*args, **kwargs)

        with open(examples_path, 'r') as examples_file:
            csv_reader = csv.DictReader(examples_file)
            examples = [example for example in csv_reader]
            self._train = examples[:15]
            self._dev = examples[15:]

dataset = CSVDataset(examples_path)
dataset.train[:3]

[Example({'num': '2', 'question': 'How do I specify a different name for the downloaded file with `curl`?', 'answer': 'You can use `curl -o [output_file_name] [URL]` to specify a different name for the downloaded file.'}) (input_keys=None),
 Example({'num': '11', 'question': 'How can I download a file using FTP with `curl`?', 'answer': 'You can specify the FTP protocol by prefixing the URL with `ftp://` and use `curl` as usual.'}) (input_keys=None),
 Example({'num': '10', 'question': 'How do I send a POST request with `curl`?', 'answer': 'You can use the `-d` option with `curl` to send data as part of a POST request.'}) (input_keys=None)]

The next pipeline uses a Multi hop step in order to increase the accuracy of preductions for more complex queries.

In [7]:
from dsp.utils import deduplicate

class GenerateSearchQuery(dspy.Signature):
    """Write a simple search query that will help answer a complex question."""

    context = dspy.InputField(desc="may contain relevant facts")
    question = dspy.InputField()
    query = dspy.OutputField()

class MultihopFAQ(dspy.Module):
    def __init__(self, passages_per_hop=2, max_hops=2):
        super().__init__()

        self.generate_query = [dspy.ChainOfThought(GenerateSearchQuery) for _ in range(max_hops)]
        self.retrieve = dspy.Retrieve(k=passages_per_hop)
        self.generate_answer = dspy.ChainOfThought(DocumentFAQSignature)
        self.max_hops = max_hops
    
    def forward(self, question):
        context = []
        
        for hop in range(self.max_hops):
            query = self.generate_query[hop](context=context, question=question).query
            passages = self.retrieve(query).passages
            context = deduplicate(context + passages)

        return self.generate_answer(context=context, question=question)
        # return dspy.Prediction(context=context, answer=pred.answer)

In [8]:
# Test the multi hop pipeline

question = 'How do I set the cache headers in curl so that the server does not give me cached results?'
pipeline = MultihopFAQ()
prediction = pipeline.forward(question)
print(prediction.answer)

To prevent the server from serving you cached results and also not sending any cache-related headers, you can use the `--no-cache` option in curl along with other options if needed. Here's an example of how to use it:
```bash
curl --no-cache [OPTIONS] URL
```
Replace `[OPTIONS]` with any additional options you might need, such as `--header`, `--data`, or `--compressed`. For instance:
```bash
curl --no-cache --header "User-Agent: Mozilla/5.0" https://example.com/path-to-resource
```
This command will send


In [9]:
# inspect the LLM usage
from summarize_usage import summarize_usages


summarize_usages(mistral_ollama.history[-3:])

LLM usage for chatcmpl-da39a3ee5e6b4b0d3255bfef95601890afd80709: 156 prompt, 150 completion, 306 total
LLM usage for chatcmpl-da39a3ee5e6b4b0d3255bfef95601890afd80709: 992 prompt, 150 completion, 1142 total
LLM usage for chatcmpl-da39a3ee5e6b4b0d3255bfef95601890afd80709: 620 prompt, 150 completion, 770 total


# Adding some examples

We will add an optimizer to the pipeline. We have 30 examples that we generated manually via chatGPT. Based on the recommendations from GSPy, we should be either using the `BootstrapFewShot` optimizer (that's going to add a few examples to the prompt) and `BootstrapFewShotWithRandomSearch`. The latter is going to use search for the optimal set of examples to add as few-shot examples and add them to the prompts

In [10]:
from dspy.teleprompt import BootstrapFewShotWithRandomSearch
from metric import metric

config = dict(max_bootstrapped_demos=3, max_labeled_demos=3, num_candidate_programs=10, num_threads=4)

teleprompter = BootstrapFewShotWithRandomSearch(metric=metric, **config)
optimized_program = teleprompter.compile(MultihopFAQ(), trainset=examples)

Going to sample between 1 and 3 traces per predictor.
Will attempt to train 10 candidate sets.


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Average Metric: 0.0 / 30  (0.0): 100%|██████████| 30/30 [19:23<00:00, 38.77s/it] 
  df = df.applymap(truncate_cell)


Average Metric: 0.0 / 30  (0.0%)
Score: 0.0 for set: [0, 0, 0]
New best score: 0.0 for seed -3
Scores so far: [0.0]
Best score: 0.0


Average Metric: 0.0 / 30  (0.0): 100%|██████████| 30/30 [19:37<00:00, 39.26s/it] 


Average Metric: 0.0 / 30  (0.0%)
Score: 0.0 for set: [3, 3, 3]
Scores so far: [0.0, 0.0]
Best score: 0.0


100%|██████████| 30/30 [17:48<00:00, 35.63s/it]


Bootstrapped 0 full traces after 30 examples in round 0.


Average Metric: 0.0 / 30  (0.0): 100%|██████████| 30/30 [19:38<00:00, 39.29s/it] 


Average Metric: 0.0 / 30  (0.0%)
Score: 0.0 for set: [3, 3, 3]
Scores so far: [0.0, 0.0, 0.0]
Best score: 0.0
Average of max per entry across top 1 scores: 0.0
Average of max per entry across top 2 scores: 0.0
Average of max per entry across top 3 scores: 0.0
Average of max per entry across top 5 scores: 0.0
Average of max per entry across top 8 scores: 0.0
Average of max per entry across top 9999 scores: 0.0


100%|██████████| 30/30 [17:48<00:00, 35.61s/it]


Bootstrapped 0 full traces after 30 examples in round 0.


Average Metric: 0.0 / 30  (0.0): 100%|██████████| 30/30 [19:50<00:00, 39.68s/it] 


Average Metric: 0.0 / 30  (0.0%)
Score: 0.0 for set: [3, 3, 3]
Scores so far: [0.0, 0.0, 0.0, 0.0]
Best score: 0.0
Average of max per entry across top 1 scores: 0.0
Average of max per entry across top 2 scores: 0.0
Average of max per entry across top 3 scores: 0.0
Average of max per entry across top 5 scores: 0.0
Average of max per entry across top 8 scores: 0.0
Average of max per entry across top 9999 scores: 0.0


100%|██████████| 30/30 [17:59<00:00, 35.98s/it]


Bootstrapped 0 full traces after 30 examples in round 0.


Average Metric: 0.0 / 30  (0.0): 100%|██████████| 30/30 [19:35<00:00, 39.18s/it] 


Average Metric: 0.0 / 30  (0.0%)
Score: 0.0 for set: [3, 3, 3]
Scores so far: [0.0, 0.0, 0.0, 0.0, 0.0]
Best score: 0.0
Average of max per entry across top 1 scores: 0.0
Average of max per entry across top 2 scores: 0.0
Average of max per entry across top 3 scores: 0.0
Average of max per entry across top 5 scores: 0.0
Average of max per entry across top 8 scores: 0.0
Average of max per entry across top 9999 scores: 0.0


100%|██████████| 30/30 [17:49<00:00, 35.66s/it]


Bootstrapped 0 full traces after 30 examples in round 0.


Average Metric: 0.0 / 13  (0.0):  43%|████▎     | 13/30 [12:32<24:25, 86.21s/it] 

Error for example in dev set: 		 HTTPConnectionPool(host='localhost', port=11434): Read timed out. (read timeout=120)


Average Metric: 0.0 / 30  (0.0): 100%|██████████| 30/30 [24:54<00:00, 49.83s/it] 


Average Metric: 0.0 / 30  (0.0%)
Score: 0.0 for set: [3, 3, 3]
Scores so far: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
Best score: 0.0
Average of max per entry across top 1 scores: 0.0
Average of max per entry across top 2 scores: 0.0
Average of max per entry across top 3 scores: 0.0
Average of max per entry across top 5 scores: 0.0
Average of max per entry across top 8 scores: 0.0
Average of max per entry across top 9999 scores: 0.0


100%|██████████| 30/30 [18:09<00:00, 36.31s/it]


Bootstrapped 0 full traces after 30 examples in round 0.


Average Metric: 0.0 / 30  (0.0): 100%|██████████| 30/30 [19:36<00:00, 39.23s/it] 


Average Metric: 0.0 / 30  (0.0%)
Score: 0.0 for set: [3, 3, 3]
Scores so far: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
Best score: 0.0
Average of max per entry across top 1 scores: 0.0
Average of max per entry across top 2 scores: 0.0
Average of max per entry across top 3 scores: 0.0
Average of max per entry across top 5 scores: 0.0
Average of max per entry across top 8 scores: 0.0
Average of max per entry across top 9999 scores: 0.0


100%|██████████| 30/30 [17:47<00:00, 35.60s/it]


Bootstrapped 0 full traces after 30 examples in round 0.


Average Metric: 0.0 / 30  (0.0): 100%|██████████| 30/30 [19:23<00:00, 38.79s/it] 


Average Metric: 0.0 / 30  (0.0%)
Score: 0.0 for set: [3, 3, 3]
Scores so far: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
Best score: 0.0
Average of max per entry across top 1 scores: 0.0
Average of max per entry across top 2 scores: 0.0
Average of max per entry across top 3 scores: 0.0
Average of max per entry across top 5 scores: 0.0
Average of max per entry across top 8 scores: 0.0
Average of max per entry across top 9999 scores: 0.0


100%|██████████| 30/30 [18:11<00:00, 36.38s/it]


Bootstrapped 0 full traces after 30 examples in round 0.


Average Metric: 0.0 / 30  (0.0): 100%|██████████| 30/30 [20:11<00:00, 40.37s/it] 


Average Metric: 0.0 / 30  (0.0%)
Score: 0.0 for set: [3, 3, 3]
Scores so far: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
Best score: 0.0
Average of max per entry across top 1 scores: 0.0
Average of max per entry across top 2 scores: 0.0
Average of max per entry across top 3 scores: 0.0
Average of max per entry across top 5 scores: 0.0
Average of max per entry across top 8 scores: 0.0
Average of max per entry across top 9999 scores: 0.0


100%|██████████| 30/30 [17:53<00:00, 35.79s/it]


Bootstrapped 0 full traces after 30 examples in round 0.


Average Metric: 0.0 / 30  (0.0): 100%|██████████| 30/30 [19:36<00:00, 39.22s/it] 


Average Metric: 0.0 / 30  (0.0%)
Score: 0.0 for set: [3, 3, 3]
Scores so far: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
Best score: 0.0
Average of max per entry across top 1 scores: 0.0
Average of max per entry across top 2 scores: 0.0
Average of max per entry across top 3 scores: 0.0
Average of max per entry across top 5 scores: 0.0
Average of max per entry across top 8 scores: 0.0
Average of max per entry across top 9999 scores: 0.0


100%|██████████| 30/30 [17:54<00:00, 35.83s/it]


Bootstrapped 0 full traces after 30 examples in round 0.


Average Metric: 0.0 / 30  (0.0): 100%|██████████| 30/30 [19:27<00:00, 38.93s/it] 


Average Metric: 0.0 / 30  (0.0%)
Score: 0.0 for set: [3, 3, 3]
Scores so far: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
Best score: 0.0
Average of max per entry across top 1 scores: 0.0
Average of max per entry across top 2 scores: 0.0
Average of max per entry across top 3 scores: 0.0
Average of max per entry across top 5 scores: 0.0
Average of max per entry across top 8 scores: 0.0
Average of max per entry across top 9999 scores: 0.0


100%|██████████| 30/30 [17:48<00:00, 35.62s/it]


Bootstrapped 0 full traces after 30 examples in round 0.


Average Metric: 0.0 / 30  (0.0): 100%|██████████| 30/30 [19:28<00:00, 38.96s/it] 


Average Metric: 0.0 / 30  (0.0%)
Score: 0.0 for set: [3, 3, 3]
Scores so far: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
Best score: 0.0
Average of max per entry across top 1 scores: 0.0
Average of max per entry across top 2 scores: 0.0
Average of max per entry across top 3 scores: 0.0
Average of max per entry across top 5 scores: 0.0
Average of max per entry across top 8 scores: 0.0
Average of max per entry across top 9999 scores: 0.0


100%|██████████| 30/30 [17:38<00:00, 35.29s/it]


Bootstrapped 0 full traces after 30 examples in round 0.


Average Metric: 0.0 / 30  (0.0): 100%|██████████| 30/30 [19:16<00:00, 38.54s/it] 

Average Metric: 0.0 / 30  (0.0%)
Score: 0.0 for set: [3, 3, 3]
Scores so far: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
Best score: 0.0
Average of max per entry across top 1 scores: 0.0
Average of max per entry across top 2 scores: 0.0
Average of max per entry across top 3 scores: 0.0
Average of max per entry across top 5 scores: 0.0
Average of max per entry across top 8 scores: 0.0
Average of max per entry across top 9999 scores: 0.0
13 candidate programs found.





## Saving the optimised pipeline

Runing optimisers can take a long time AND it only needs to be done once (in order to find the optimal parameters). After that, the new, compiled pipeline can just be used (much like the difference between training and inference for traditional ML models).

In order to not lose these weights and parameters, we can save the compiled pipeline and later just read it from a file, instead of re-compiling.

The training/optimising is something that can be done during the development. And production runs can read a saved compiled pipeline and just run inference.

In [14]:
saved_pipeline_path = './compiled/v1.json'
optimized_program.save(saved_pipeline_path)

## Loading a saved pipeline

Here, the saved parameters are just read from the file and loaded into a DSPy module.

In [23]:
loaded_pipeline = MultihopFAQ()
loaded_pipeline.load(path=saved_pipeline_path)
print(loaded_pipeline)

generate_query[0] = ChainOfThought(StringSignature(context, question -> query
    instructions='Write a simple search query that will help answer a complex question.'
    context = Field(annotation=str required=True json_schema_extra={'desc': 'may contain relevant facts', '__dspy_field_type': 'input', 'prefix': 'Context:'})
    question = Field(annotation=str required=True json_schema_extra={'__dspy_field_type': 'input', 'prefix': 'Question:', 'desc': '${question}'})
    query = Field(annotation=str required=True json_schema_extra={'__dspy_field_type': 'output', 'prefix': 'Query:', 'desc': '${query}'})
))
generate_query[1] = ChainOfThought(StringSignature(context, question -> query
    instructions='Write a simple search query that will help answer a complex question.'
    context = Field(annotation=str required=True json_schema_extra={'desc': 'may contain relevant facts', '__dspy_field_type': 'input', 'prefix': 'Context:'})
    question = Field(annotation=str required=True json_schema

In [24]:
response = loaded_pipeline.forward('How do I tell curl to follow redirects automatically?')
response

Prediction(
    rationale="To make curl follow redirects automatically, you can use the `-L` or `--location` option when invoking curl. This tells curl to follow any HTTP location headers it receives and perform a new request using the given URL.\n\nHere is an example command:\n\n```bash\ncurl -L https://example.com/some_page\n```\n\nThis command will start by requesting `https://example.com/some_page`. If there's a location header (`Location:`) in the response, curl will follow it and perform another request using the new URL. This process can continue until the final URL does not have any more redirects.\n\nUsing this option,",
    answer='To make `curl` follow redirects automatically, use the `-L` or `--location` option when invoking `curl`. For example:\n\n```bash\ncurl -L <URL>\n```\n\nor\n\n```bash\ncurl --location <URL>\n```\n\nThis tells `curl` to follow any HTTP location headers it receives and perform a new request using the given URL. The process can continue until the final

## Comparison with the unoptimised models

Let's compare the four options we have already:
* The vanilla Mistral model that probably already has enough knowledge about curl
* The single hop DSPy module we created initially
* The multihop module we created in order to give the model the ability to do more reasoning
* The optimised multihop module where we let a DSPy optimiser train and find the best parameters

We will let all models answer the same questions and compare their results.
Wer will also try increasingly more complicated questions to see if the more simplistic models start to fail and the more finetuned ones start to shine

In [26]:
simple_question = 'How to I specify the HTTP method to use for a curl request?'
intermediate_question = 'I need a curl command that does a multipart upload of a file and specifies the media type as image/png. How can I do that?'
advanced_question = 'I need a script that uses curl in order to follow an OAuth flow in order to use turn an autorization code into an access token and refresh token. Can you write one for me?'
all_questions = [simple_question, intermediate_question, advanced_question]

In [25]:
def print_prediction(prediction: dspy.Prediction): 
    print(prediction.answer)

In [32]:
# Vanilla Ollama
for question in all_questions:
    response = mistral_ollama(question)
    print(response[0])
    print('--------------')

 To specify the HTTP method (such as GET, POST, PUT, DELETE, etc.) for a cURL request, you need to use the `-X` or `--request` option followed by the desired HTTP method. Here's an example using cURL in the command line:

```bash
curl -X GET <URL> \
  --header "Authorization: Token <your_token>" \
  --header 'Content-Type: application/json'
```

Replace `<URL>` with the target URL, and `<your_token>` with your valid authorization token or other required headers. In this example, we use the GET method.

For a POST request, you can set the data to be sent using the `--data` option:

```bash
curl -X POST <URL> \
  --header "Content-Type: application/json" \
  --data '{"key1":"value1","key2":"value2"}'
```

Replace `<URL>` with the target URL, and set the data as required. In this example, we use the POST method and send JSON data in the request body.

For other HTTP methods like PUT or DELETE, you can follow a similar pattern:

```bash
# For PUT
curl -X PUT <URL> \
  --header "Content-Typ

In [33]:
# DSPy - single hop
pipeline = DocumentFAQ()
for question in all_questions:
    prediction = pipeline.forward(question)
    print_prediction(prediction)
    print('--------------')

Context leak detected, msgtracer returned -1
Context leak detected, msgtracer returned -1
Context leak detected, msgtracer returned -1
Context leak detected, msgtracer returned -1
Context leak detected, msgtracer returned -1
Context leak detected, msgtracer returned -1
Context leak detected, msgtracer returned -1
Context leak detected, msgtracer returned -1
Context leak detected, msgtracer returned -1
Context leak detected, msgtracer returned -1
Context leak detected, msgtracer returned -1
Context leak detected, msgtracer returned -1
Context leak detected, msgtracer returned -1
Context leak detected, msgtracer returned -1
Context leak detected, msgtracer returned -1
Context leak detected, msgtracer returned -1
Context leak detected, msgtracer returned -1
Context leak detected, msgtracer returned -1
Context leak detected, msgtracer returned -1
Context leak detected, msgtracer returned -1
Context leak detected, msgtracer returned -1
Context leak detected, msgtracer returned -1
Context le

To specify the HTTP method for a curl request, you can use the `-X` or `--request` option followed by the desired HTTP method (such as GET, POST, PUT, DELETE, etc.). Here's an example using the `curl` command with a custom HTTP method called `HEAD_WITH_DATA`:
```bash
curl --url "https://example.com" \
  --X HEAD_WITH_DATA \
  --header 'My-Custom-Header: Value' \
  --data 'Data to send with the HEAD request'
```
In this example, we are using `curl` command to make a custom HTTP method called `HEAD_WITH_DATA`. This request combines the functionality of both `HEAD` and `POST` methods. The `--X` option is used to specify the custom HTTP method name, while the `--header` and `--data` options are used to send custom headers and data with the request respectively.

You can replace `HEAD_WITH_DATA` with any valid HTTP method like `GET`, `POST`, `PUT`, or `DELETE`. For example:
```bash
curl --url "https://example.com" \
  --X POST \
  --header 'My-Custom-Header: Value' \
  --data 'Data to send 

In [34]:
# Multihop
pipeline = MultihopFAQ()
for question in all_questions:
    prediction = pipeline.forward(question)
    print_prediction(prediction)
    print('--------------')

To specify the HTTP method for a curl request, you can use the `-X` or `--request` option followed by the desired HTTP method. Here are some examples using common HTTP methods:

1. GET request:
```bash
curl https://example.com/
```
This is the default HTTP method when no method is specified.

2. POST request with data:
```bash
curl -X POST \
  -H "Content-Type: application/json" \
  -d '{"key1":"value1","key2":"value2"}' \
  https://example.com/api
```
3. DELETE request:
```bash
curl -X DELETE \
  https://example.com/resource
```
4. PUT request with data:
```bash
curl -X PUT \
  -H "Content-Type: application/json" \
  -d '{"key1":"value1","key2":"value2"}' \
  https://example.com/resource
```
These examples demonstrate how to use different HTTP methods with curl.
--------------
To create a `curl` command that does a multipart upload of a file with the specified media type `image/png`, use the following command:

```bash
curl -X POST \
  -H "Content-Type: multipart/form-data" \
  -F "fi

In [36]:
# DSPy - optimised and loaded
pipeline = loaded_pipeline
for question in all_questions:
    prediction = pipeline.forward(question)
    print_prediction(prediction)
    print('--------------')

To specify the HTTP method for a curl request, you can use the `-X` or `--request` option followed by the desired HTTP method. Here are some examples using common HTTP methods:

1. GET request:
```bash
curl https://example.com/
```
This is the default HTTP method when no method is specified.

2. POST request with data:
```bash
curl -X POST \
  -H "Content-Type: application/json" \
  -d '{"key1":"value1","key2":"value2"}' \
  https://example.com/api
```
3. DELETE request:
```bash
curl -X DELETE \
  https://example.com/resource
```
4. PUT request with data:
```bash
curl -X PUT \
  -H "Content-Type: application/json" \
  -d '{"key1":"value1","key2":"value2"}' \
  https://example.com/resource
```
These examples demonstrate how to use different HTTP methods with curl.
--------------
To create a `curl` command that does a multipart upload of a file with the specified media type `image/png`, use the following command:

```bash
curl -X POST \
  -H "Content-Type: multipart/form-data" \
  -F "fi

## Test results

The results above all show quite good results on all three questions. There are minor differences in the style and contents of the answers, but they have similar contents.

One weakness if this test is that curl is a faily popular command and it's very talked about online. This results in faily good results even without providing additional context to the prompt. A more representative comparison between the models should have a knowledgebase domain that is not available to the general public and hence is not somethning the base model is trained on.

We also see minor differences in these answers like differences in handling the requirement that hte multipart upload should be specified as PNG in the intermediate question. For the advanced question, some modules provided a more general answer, whereas others gave a concrete example with Github's OAuth scheme. In the latter, the first, single-hop DSPy module (arguably) fell a little short as it gave an answer regarding that particular GitHub scheme, which was not mentioned in the question and might be considered a hallucination. The more advanced, multi-hop module at least mentioned that it was using GitHub purely as an example and that a generic Oauth scheme would be similar.

The metric function used didn't seem to provide adequite measurement of the quality of the results (at least not with the currect questions), which resulted in poor optimisation of the pipeline. Evidence to that is the lack of improvement between the unoptimised multi-hop pipeline and the optimised one.

### Conclusion

All DSPy modules showed very good results in answering curl-related questions. There were some nuances and differences in the results, but ultimately all answers were good enough.
However, it seems this test was overly simplistic, as it failed to show bad results from any module, even using vanilla Ollama. For more relevant and decisive results, a more complicated and less publically available knowledge domain is needs:
* More complicated questions, possibly with multiple instructions to follow
* Domain that's doesn't have a lot of public knowledge available on the internet (so base LLMs lack knowledge on it) 