Skip to content

Commit

Permalink
update docs
Browse files Browse the repository at this point in the history
  • Loading branch information
jxnl committed Nov 20, 2023
1 parent e9ce94f commit df8cdea
Show file tree
Hide file tree
Showing 11 changed files with 362 additions and 13 deletions.
3 changes: 3 additions & 0 deletions docs/concepts/alias.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
!!! warning "This page is a work in progress"

This page is a work in progress. Check out [Pydantic's documentation](https://docs.pydantic.dev/latest/concepts/alias/)
25 changes: 25 additions & 0 deletions docs/concepts/enums.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
To prevent data misalignment, we can use Enums for standardized fields. Always include an "Other" option as a fallback so the model can signal uncertainty.

```python hl_lines="7 12"
from enum import Enum, auto

class Role(Enum):
PRINCIPAL = "PRINCIPAL"
TEACHER = "TEACHER"
STUDENT = "STUDENT"
OTHER = "OTHER""

class UserDetail(BaseModel):
age: int
name: str
role: Role = Field(description="Correctly assign one of the predefined roles to the user.")
```

If you're having a hard time with `Enum` and alternative is to use `Literal` instead.

```python hl_lines="4"
class UserDetail(BaseModel):
age: int
name: str
role: Literal["PRINCIPAL", "TEACHER", "STUDENT", "OTHER"]
```
160 changes: 160 additions & 0 deletions docs/concepts/fields.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
The `pydantic.Field` function is used to customize and add metadata to fields of models. To learn more check out the pydantic [documentation](https://docs.pydantic.dev/latest/concepts/fields/) as this is a near replica of that documentation that is relevant to prompting.

## Default values

The `default` parameter is used to define a default value for a field.

```py
from pydantic import BaseModel, Field


class User(BaseModel):
name: str = Field(default='John Doe')


user = User()
print(user)
#> name='John Doe'
```

You can also use `default_factory` to define a callable that will be called to generate a default value.

```py
from uuid import uuid4

from pydantic import BaseModel, Field


class User(BaseModel):
id: str = Field(default_factory=lambda: uuid4().hex)
```

!!! info

The `default` and `default_factory` parameters are mutually exclusive.

!!! note

If you use `typing.Optional`, it doesn't mean that the field has a default value of `None` you must use `default` or `default_factory` to define a default value. Then it will be considered `not required` when sent to the language model.

## Using `Annotated`

The `Field` function can also be used together with `Annotated`.

```py
from uuid import uuid4

from typing_extensions import Annotated

from pydantic import BaseModel, Field


class User(BaseModel):
id: Annotated[str, Field(default_factory=lambda: uuid4().hex)]
```

## Exclude

The `exclude` parameter can be used to control which fields should be excluded from the
model when exporting the model. This is helpful when you want to exclude fields that are not relevant to the model
generation like `scratch_pad` or `chain_of_thought`

See the following example:

```py
from pydantic import BaseModel, Field
from datetime import date


class DateRange(BaseModel):
chain_of_thought: str = Field(
description="Reasoning behind the date range."
exclude=True)
start_date: date
end_date: date


date_range = DateRange(
chain_of_thought="""
I want to find the date range for the last 30 days.
Today is 2021-01-30 therefore the start date
should be 2021-01-01 and the end date is 2021-01-30""",
start_date=date(2021, 1, 1),
end_date=date(2021, 1, 30),
)
print(date_range.model_dump_json())
#> start_date=datetime.date(2021, 1, 1) end_date=datetime.date(2021, 1, 30)
```

## Customizing JSON Schema

There are fields that exclusively to customise the generated JSON Schema:

- `title`: The title of the field.
- `description`: The description of the field.
- `examples`: The examples of the field.
- `json_schema_extra`: Extra JSON Schema properties to be added to the field.

These all work as great opportunities to add more information to the JSON Schema as part
of your prompt engineering.

Here's an example:

```py
from pydantic import BaseModel, EmailStr, Field, SecretStr


class User(BaseModel):
age: int = Field(description='Age of the user')
email: EmailStr = Field(examples=['marcelo@mail.com'])
name: str = Field(title='Username')
password: SecretStr = Field(
json_schema_extra={
'title': 'Password',
'description': 'Password of the user',
'examples': ['123456'],
}
)


print(User.model_json_schema())
"""
{
'properties': {
'age': {
'description': 'Age of the user',
'title': 'Age',
'type': 'integer',
},
'email': {
'examples': ['marcelo@mail.com'],
'format': 'email',
'title': 'Email',
'type': 'string',
},
'name': {'title': 'Username', 'type': 'string'},
'password': {
'description': 'Password of the user',
'examples': ['123456'],
'format': 'password',
'title': 'Password',
'type': 'string',
'writeOnly': True,
},
},
'required': ['age', 'email', 'name', 'password'],
'title': 'User',
'type': 'object',
}
"""
```

## General notes on JSON schema generation

- The JSON schema for Optional fields indicates that the value null is allowed.
- The Decimal type is exposed in JSON schema (and serialized) as a string.
- The JSON schema does not preserve namedtuples as namedtuples.
- When they differ, you can specify whether you want the JSON schema to represent the inputs to validation or the outputs from serialization.
- Sub-models used are added to the `$defs` JSON attribute and referenced, as per the spec.
- Sub-models with modifications (via the Field class) like a custom title, description, or default value, are recursively included instead of referenced.
- The description for models is taken from either the docstring of the class or the argument description to the Field class.
File renamed without changes.
14 changes: 4 additions & 10 deletions docs/concepts/maybe.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,6 @@
# Handling Missing Data with `Maybe`
# Handling Missing Data

In this post, we will demonstrate how to use the `Maybe` pattern to manage missing data and employ pattern matching to handle errors in a structured manner.

## What is `Maybe`?

The `Maybe` pattern is a concept in functional programming used for error handling. Instead of raising exceptions or returning `None`, you can use a `Maybe` type to encapsulate both the result and potential errors. This pattern is particularly useful when making OpenAI API calls, as providing language models with an escape mechanism effectively reduces hallucinations. Consequently, we can construct a prompt that closely resembles regular programming.

Towards the end, we will demonstrate how to use `Maybe` instances in pattern matching, which offers an excellent approach for handling errors in a structured manner.
The `Maybe` pattern is a concept in functional programming used for error handling. Instead of raising exceptions or returning `None`, you can use a `Maybe` type to encapsulate both the result and potential errors. This pattern is particularly useful when making llm calls, as providing language models with an escape hatch can effectively reduce hallucinations.

## Defining the Model

Expand Down Expand Up @@ -76,7 +70,7 @@ user2 = extract("Unknown user")

As you can see, when the data is extracted successfully, the `result` field contains the `UserDetail` instance. When an error occurs, the `error` field is set to `True`, and the `message` field contains the error message.

## Handle the result
## Handling the result

There are a few ways we can handle the result. Normally, we can just access the individual fields.

Expand All @@ -89,7 +83,7 @@ def process_user_detail(maybe_user: MaybeUser):
print(f"Not found: {user1.message}")
```

## Pattern Matching
### Pattern Matching

We can also use pattern matching to handle the result. This is a great way to handle errors in a structured way.

Expand Down
150 changes: 150 additions & 0 deletions docs/concepts/models.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
# Response Model

Defining llm output schemas in Pydantic is done via `pydantic.BaseModel`. To learn more about models in pydantic checkout their [documentation](https://docs.pydantic.dev/latest/concepts/models/).

After defining a pydantic model, we can use it as as the `response_model` in your client `create` calls to openai. The job of the `response_model` is to define the schema and prompts for the language model and validate the response from the API and return a pydantic model instance.

## Prompting

When defining a response model, we can use docstrings and field annotations to define the prompt that will be used to generate the response.

```python
from pydantic import BaseModel, Field

class User(BaseModel):
"""
This is the prompt that will be used to generate the response.
Any instructions here will be passed to the language model.
"""
name: str = Field(description="The name of the user.")
age: int = Field(description="The age of the user.")
```

Here all docstrings, types, and field annotations will be used to generate the prompt. The prompt will be generated by the `create` method of the client and will be used to generate the response.

## Optional Values

If we use `Optional` and `default` they will be considered not required when sent to the language model

```python
class User(BaseModel):
name: str = Field(description="The name of the user.")
age: int = Field(description="The age of the user.")
email: Optional[str] = Field(description="The email of the user.", default=None)
```

## Dynamic model creation

There are some occasions where it is desirable to create a model using runtime information to specify the fields. For this Pydantic provides the create_model function to allow models to be created on the fly:

```python
from pydantic import BaseModel, create_model


class FooModel(BaseModel):
foo: str
bar: int = 123


BarModel = create_model(
'BarModel',
apple=(str, 'russet'),
banana=(str, 'yellow'),
__base__=FooModel,
)
print(BarModel)
#> <class '__main__.BarModel'>
print(BarModel.model_fields.keys())
#> dict_keys(['foo', 'bar', 'apple', 'banana'])
```

??? notes "When would I use this?"

Consider a situation where the model is dynamically defined, based on some configuration or database. For example, we could have a database table that stores the properties of a model for
some model name or id. We could then query the database for the properties of the model and use that to create the model.

```sql
SELECT property_name, property_type, description
FROM prompt
WHERE model_name = {model_name}
```

We can then use this information to create the model.

```python
types = {
'string': str,
'integer': int,
'boolean': bool,
'number': float,
'List[str]': List[str],
}

BarModel = create_model(
'User',
**{
property_name: (types[property_type], description)
for property_name, property_type, description in cursor.fetchall()
},
__base__=BaseModel,
)
```

This would be useful when different users have different descriptions for the same model. We can use the same model but have different prompts for each user.

## Structural Pattern Matching

Pydantic supports structural pattern matching for models, as introduced by PEP 636 in Python 3.10.

```python
from pydantic import BaseModel


class Pet(BaseModel):
name: str
species: str


a = Pet(name='Bones', species='dog')

match a:
# match `species` to 'dog', declare and initialize `dog_name`
case Pet(species='dog', name=dog_name):
print(f'{dog_name} is a dog')
#> Bones is a dog
# default case
case _:
print('No dog matched')
```

## Adding Behavior

We can add methods to our pydantic models just as any plain python class. We might want to do this to add some custom logic to our models.

```python
from pydantic import BaseModel
from typing import Literal

from openai import OpenAI

import instructor

client = instructor.patch(OpenAI())

class SearchQuery(BaseModel):
query: str
query_type: Literal["web", "image", "video"]

def execute(self):
# do some logic here
return results


query = client.chat.completions.create(
..., response_model=SearchQuery
)

results = query.execute()
```

Now we can call `execute` on our model instance after extracting it from a language model. If you want to see more examples of this checkout our post on [RAG is more than embeddings](../blog/posts/rag-and-beyond.md)
2 changes: 2 additions & 0 deletions docs/concepts/prompting.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# General Tips for Prompt Engineering

The overarching theme of using Instructor and Pydantic for function calling is to make the models as self-descriptive, modular, and flexible as possible, while maintaining data integrity and ease of use.

- **Modularity**: Design self-contained components for reuse.
Expand Down
3 changes: 3 additions & 0 deletions docs/concepts/typeadapter.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
!!! warning "This page is a work in progress"

This page is a work in progress. Check out [Pydantic's documentation](https://docs.pydantic.dev/latest/concepts/type_adapter/)
3 changes: 3 additions & 0 deletions docs/concepts/types.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
!!! warning "This page is a work in progress"

This page is a work in progress. Check out [Pydantic's documentation](https://docs.pydantic.dev/latest/concepts/types/)
3 changes: 3 additions & 0 deletions docs/concepts/union.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
!!! warning "This page is a work in progress"

This page is a work in progress. Check out [Pydantic's documentation](https://docs.pydantic.dev/latest/concepts/union/)
Loading

0 comments on commit df8cdea

Please sign in to comment.