A Typed DSPy Signature generator, that uses a DSPy program to help you create DSPy programs.

In [1]:
#import
import dspy
import os
import dotenv
import requests
import pydantic

you will need to put `OPENAI_API_KEY = sk-` in a .env in this folder. don't worry, .gitignore includes .env.

In [6]:
dotenv.load_dotenv()
assert 'OPENAI_API_KEY' in os.environ
llm = dspy.OpenAI(model='gpt-4o', max_tokens=4096, temperature=0.1)
dspy.settings.configure(lm=llm)

read the docs!

In [7]:
url = "https://raw.githubusercontent.com/stanfordnlp/dspy/main/docs/docs/building-blocks/8-typed_predictors.md"
response = requests.get(url)
context = response.text
    
from IPython.display import Markdown
display(Markdown(context))

# Typed Predictors

In DSPy Signatures, we have `InputField` and `OutputField` that define the nature of inputs and outputs of the field. However, the inputs and output to these fields are always `str`-typed, which requires input and output string processing.

Pydantic `BaseModel` is a great way to enforce type constraints on the fields, but it is not directly compatible with the `dspy.Signature`. Typed Predictors resolves this as a way to enforce the type constraints on the inputs and outputs of the fields in a `dspy.Signature`.

## Executing Typed Predictors

Using Typed Predictors is not too different than any other module with the minor additions of type hints to signature attributes and using a special Predictor module instead of `dspy.Predict`. Let's take a look at a simple example to understand this.

### Defining Input and Output Models

Let's take a simple task as an example i.e. given the `context` and `query`, the LLM should return an `answer` and `confidence_score`. Let's define our `Input` and `Output` models via pydantic.

```python
from pydantic import BaseModel, Field

class Input(BaseModel):
    context: str = Field(description="The context for the question")
    query: str = Field(description="The question to be answered")

class Output(BaseModel):
    answer: str = Field(description="The answer for the question")
    confidence: float = Field(ge=0, le=1, description="The confidence score for the answer")
```

As you can see, we can describe the attributes by defining a simple Signature that takes in the input and returns the output.

### Creating Typed Predictor

A Typed Predictor needs a Typed Signature, which extends a `dspy.Signature` with the addition of specifying "field type".

```python
class QASignature(dspy.Signature):
    """Answer the question based on the context and query provided, and on the scale of 10 tell how confident you are about the answer."""

    input: Input = dspy.InputField()
    output: Output = dspy.OutputField()
```

Now that we have the `QASignature`, let's define a Typed Predictor that executes this Signature while conforming to the type constraints.

```python
predictor = dspy.TypedPredictor(QASignature)
```

Similar to other modules, we pass the `QASignature` to `dspy.TypedPredictor` which enforces the typed constraints.

And similarly to `dspy.Predict`, we can also use a "string signature", which we type as:
```python
predictor = dspy.TypedPredictor("input:Input -> output:Output")
```

### I/O in Typed Predictors

Now let's test out the Typed Predictor by providing some sample input to the predictor and verifying the output type. We can create an `Input` instance and pass it to the predictor to get a dictionary of the output. 

```python
doc_query_pair = Input(
    context="The quick brown fox jumps over the lazy dog",
    query="What does the fox jumps over?",
)

prediction = predictor(input=doc_query_pair)
```

Let's see the output and its type.

```python
answer = prediction.output.answer
confidence_score = prediction.output.confidence

print(f"Prediction: {prediction}\n\n")
print(f"Answer: {answer}, Answer Type: {type(answer)}")
print(f"Confidence Score: {confidence_score}, Confidence Score Type: {type(confidence_score)}")
```

## Typed Chain of Thoughts with `dspy.TypedChainOfThought`

Extending the analogous comparison of `TypedPredictor` to `dspy.Predict`, we create `TypedChainOfThought`, the typed counterpart of `dspy.ChainOfThought`:

```python
cot_predictor = dspy.TypedChainOfThought(QASignature)

doc_query_pair = Input(
    context="The quick brown fox jumps over the lazy dog",
    query="What does the fox jumps over?",
)

prediction = cot_predictor(input=doc_query_pair)
```

## Typed Predictors as Decorators

While the `dspy.TypedPredictor` and `dspy.TypedChainOfThought` provide a convenient way to use typed predictors, you can also use them as decorators to enforce type constraints on the inputs and outputs of the function. This relies on the internal definitions of the Signature class and its function arguments, outputs, and docstrings.

```python
@dspy.predictor
def answer(doc_query_pair: Input) -> Output:
    """Answer the question based on the context and query provided, and on the scale of 0-1 tell how confident you are about the answer."""
    pass

@dspy.cot
def answer(doc_query_pair: Input) -> Output:
    """Answer the question based on the context and query provided, and on the scale of 0-1 tell how confident you are about the answer."""
    pass

prediction = answer(doc_query_pair=doc_query_pair)
```

## Composing Functional Typed Predictors in `dspy.Module`

If you're creating DSPy pipelines via `dspy.Module`, then you can simply use Functional Typed Predictors by creating these class methods and using them as decorators. Here is an example of using functional typed predictors to create a `SimplifiedBaleen` pipeline:

```python
class SimplifiedBaleen(FunctionalModule):
    def __init__(self, passages_per_hop=3, max_hops=1):
        super().__init__()
        self.retrieve = dspy.Retrieve(k=passages_per_hop)
        self.max_hops = max_hops

    @cot
    def generate_query(self, context: list[str], question) -> str:
        """Write a simple search query that will help answer a complex question."""
        pass

    @cot
    def generate_answer(self, context: list[str], question) -> str:
        """Answer questions with short factoid answers."""
        pass

    def forward(self, question):
        context = []

        for _ in range(self.max_hops):
            query = self.generate_query(context=context, question=question)
            passages = self.retrieve(query).passages
            context = deduplicate(context + passages)

        answer = self.generate_answer(context=context, question=question)
        return dspy.Prediction(context=context, answer=answer)
```

## Optimizing Typed Predictors

Typed predictors can be optimized on the Signature instructions through the `optimize_signature` optimizer. Here is an example of this optimization on the `QASignature`:

```python
import dspy
from dspy.evaluate import Evaluate
from dspy.evaluate.metrics import answer_exact_match
from dspy.teleprompt.signature_opt_typed import optimize_signature

turbo = dspy.OpenAI(model='gpt-3.5-turbo', max_tokens=4000)
gpt4 = dspy.OpenAI(model='gpt-4', max_tokens=4000)
dspy.settings.configure(lm=turbo)

evaluator = Evaluate(devset=devset, metric=answer_exact_match, num_threads=10, display_progress=True)

result = optimize_signature(
    student=dspy.TypedPredictor(QASignature),
    evaluator=evaluator,
    initial_prompts=6,
    n_iterations=100,
    max_examples=30,
    verbose=True,
    prompt_model=gpt4,
)
```


first, prove we can validate python code. from https://github.com/stanfordnlp/dspy/blob/main/examples/functional/functional.ipynb.

In [9]:
# We define a pydantic type that automatically checks if it's argument is valid python code.
class PythonCode(pydantic.BaseModel):
    code: str

    @pydantic.field_validator('code')
    def check_syntax(cls, v):
        try:
            # Attempt to compile the code snippet
            compile(v, "<string>", "exec")
        except SyntaxError as e:
            # If a SyntaxError is raised, the code is not syntactically valid
            raise ValueError(f"Code is not syntactically valid: {e}")
            
        return v

 the context is one of the DSPy Docs pages, here are the input and output models:

In [10]:
from dspy.functional import TypedPredictor, TypedChainOfThought
from pydantic import BaseModel, Field

class Input(BaseModel):
    context: str = Field(description="The context for the question.")
    query: str = Field(description="The user's query, to be transformed into the python code for the DSPy.Signature.")

class OutputCode(BaseModel):
    answer: PythonCode = Field(description="The answer for the question must be python code for the DSPy.Signature. Only return the signature.")





the code signature generator is defined here:

In [11]:
class CodeSignatureGenerator(dspy.Signature):
    """Answer the question based on the context and query provided. Use your knowledge of DSPy to generate a python code for the correctly typed DSPy.Signature. Only return the signature."""

    input: Input = dspy.InputField()
    output: OutputCode = dspy.OutputField()

now we use a TypedChainOfThought to generate the code signature. change `query` to be what a user who wants a dspy.signature would ask for.

In [12]:
cot_predictor = dspy.TypedChainOfThought(CodeSignatureGenerator, max_retries=3)

doc_query_pair = Input(
    context=context,
    query="input is a user query, output is a witty tweet based on the user's query.",
)

prediction = cot_predictor(input=doc_query_pair)
#display(prediction)
display(prediction.output.answer.code)

'from pydantic import BaseModel, Field\nimport dspy\n\nclass Input(BaseModel):\n    query: str = Field(description="The user\'s query")\n\nclass Output(BaseModel):\n    tweet: str = Field(description="A witty tweet based on the user\'s query")\n\nclass TweetSignature(dspy.Signature):\n    input: Input = dspy.InputField()\n    output: Output = dspy.OutputField()'

we can execute the code to prove to ourselves that the code runs.


In [13]:
try:
    exec(prediction.output.answer.code)
except Exception as e:
    print(f"Error during execution: {e}")



Cool! 

But now, what if we just want to return a valid DSPy.Signature? (This isn't working for me...what am I missing?)

In [14]:
# We define a pydantic type that automatically checks if its argument is a valid dspy.Signature.
class ValidDSPySignature(pydantic.BaseModel):
    signature: str

    @pydantic.field_validator('signature')
    def check_signature(cls, v):
        try:
            # Attempt to validate the signature
            if not isinstance(eval(v), dspy.Signature):
                raise ValueError("Signature is not a valid dspy.Signature")
        except Exception as e:
            # If an error is raised, the signature is not valid
            raise ValueError(f"Signature is not valid: {e}")
            
        return v

In [15]:
class OutputSignature(BaseModel):
    answer: ValidDSPySignature = Field(description="The answer for the question must be python code for the DSPy.Signature. Only return the signature.")

In [16]:
class SignatureSignatureGenerator(dspy.Signature):
    """Answer the question based on the context and query provided. Use your knowledge of DSPy to generate a python code for the correctly typed DSPy.Signature. Only return the signature."""

    input: Input = dspy.InputField()
    output: OutputSignature = dspy.OutputField()

try to get the program to give us a DSPy.Signature...but something about the validation is off.

In [18]:
cot_predictor = dspy.TypedChainOfThought(SignatureSignatureGenerator, max_retries=3)

doc_query_pair = Input(
    context=context,
    query="input is a user query, output is a witty tweet based on the user's query...",
)

prediction = cot_predictor(input=doc_query_pair)
#display(prediction)
display(prediction.output.answer.signature)

ValueError: ('Too many retries trying to get the correct output format. Try simplifying the requirements.', {'output': "Value error, Signature is not valid: name 'Signature' is not defined: answer, signature (error type: value_error)"})

In [19]:
llm.inspect_history(n=4)





Make a very succinct json object that validates with the following schema

---

Follow the following format.

Json Schema: ${json_schema}
Json Object: ${json_object}

---

Json Schema: {"$defs": {"ValidDSPySignature": {"properties": {"signature": {"title": "Signature", "type": "string"}}, "required": ["signature"], "title": "ValidDSPySignature", "type": "object"}}, "properties": {"answer": {"allOf": [{"$ref": "#/$defs/ValidDSPySignature"}], "description": "The answer for the question must be python code for the DSPy.Signature. Only return the signature."}}, "required": ["answer"], "title": "OutputSignature", "type": "object"}
Json Object:[32m ```json
{
  "answer": {
    "signature": "def function_name(param1: Type1, param2: Type2) -> ReturnType:"
  }
}
```[0m







Answer the question based on the context and query provided. Use your knowledge of DSPy to generate a python code for the correctly typed DSPy.Signature. Only return the signature.

---

Follow the following format.

