update docs

jxnl · Nov 20, 2023 · df8cdea · df8cdea
1 parent e9ce94f
commit df8cdea
Show file tree

Hide file tree

Showing 11 changed files with 362 additions and 13 deletions.
diff --git a/docs/concepts/alias.md b/docs/concepts/alias.md
@@ -0,0 +1,3 @@
+!!! warning "This page is a work in progress"
+
+    This page is a work in progress. Check out [Pydantic's documentation](https://docs.pydantic.dev/latest/concepts/alias/)
diff --git a/docs/concepts/enums.md b/docs/concepts/enums.md
@@ -0,0 +1,25 @@
+To prevent data misalignment, we can use Enums for standardized fields. Always include an "Other" option as a fallback so the model can signal uncertainty.
+
+```python hl_lines="7 12"
+from enum import Enum, auto
+
+class Role(Enum):
+    PRINCIPAL = "PRINCIPAL"
+    TEACHER = "TEACHER"
+    STUDENT = "STUDENT"
+    OTHER = "OTHER""
+
+class UserDetail(BaseModel):
+    age: int
+    name: str
+    role: Role = Field(description="Correctly assign one of the predefined roles to the user.")
+```
+
+If you're having a hard time with `Enum` and alternative is to use `Literal` instead.
+
+```python hl_lines="4"
+class UserDetail(BaseModel):
+    age: int
+    name: str
+    role: Literal["PRINCIPAL", "TEACHER", "STUDENT", "OTHER"]
+```
diff --git a/docs/concepts/fields.md b/docs/concepts/fields.md
@@ -0,0 +1,160 @@
+The `pydantic.Field` function is used to customize and add metadata to fields of models. To learn more check out the pydantic [documentation](https://docs.pydantic.dev/latest/concepts/fields/) as this is a near replica of that documentation that is relevant to prompting.
+
+## Default values
+
+The `default` parameter is used to define a default value for a field.
+
+```py
+from pydantic import BaseModel, Field
+
+
+class User(BaseModel):
+    name: str = Field(default='John Doe')
+
+
+user = User()
+print(user)
+#> name='John Doe'
+```
+
+You can also use `default_factory` to define a callable that will be called to generate a default value.
+
+```py
+from uuid import uuid4
+
+from pydantic import BaseModel, Field
+
+
+class User(BaseModel):
+    id: str = Field(default_factory=lambda: uuid4().hex)
+```
+
+!!! info
+
+    The `default` and `default_factory` parameters are mutually exclusive.
+
+!!! note
+
+    If you use `typing.Optional`, it doesn't mean that the field has a default value of `None` you must use `default` or `default_factory` to define a default value. Then it will be considered `not required` when sent to the language model.
+
+## Using `Annotated`
+
+The `Field` function can also be used together with `Annotated`.
+
+```py
+from uuid import uuid4
+
+from typing_extensions import Annotated
+
+from pydantic import BaseModel, Field
+
+
+class User(BaseModel):
+    id: Annotated[str, Field(default_factory=lambda: uuid4().hex)]
+```
+
+## Exclude
+
+The `exclude` parameter can be used to control which fields should be excluded from the
+model when exporting the model. This is helpful when you want to exclude fields that are not relevant to the model
+generation like `scratch_pad` or `chain_of_thought`
+
+See the following example:
+
+```py
+from pydantic import BaseModel, Field
+from datetime import date
+
+
+class DateRange(BaseModel):
+    chain_of_thought: str = Field(
+        description="Reasoning behind the date range."
+        exclude=True)
+    start_date: date
+    end_date: date
+
+
+date_range = DateRange(
+    chain_of_thought="""
+        I want to find the date range for the last 30 days.
+        Today is 2021-01-30 therefore the start date
+        should be 2021-01-01 and the end date is 2021-01-30""",
+    start_date=date(2021, 1, 1),
+    end_date=date(2021, 1, 30),
+)
+print(date_range.model_dump_json())
+#> start_date=datetime.date(2021, 1, 1) end_date=datetime.date(2021, 1, 30)
+```
+
+## Customizing JSON Schema
+
+There are fields that exclusively to customise the generated JSON Schema:
+
+- `title`: The title of the field.
+- `description`: The description of the field.
+- `examples`: The examples of the field.
+- `json_schema_extra`: Extra JSON Schema properties to be added to the field.
+
+These all work as great opportunities to add more information to the JSON Schema as part
+of your prompt engineering.
+
+Here's an example:
+
+```py
+from pydantic import BaseModel, EmailStr, Field, SecretStr
+
+
+class User(BaseModel):
+    age: int = Field(description='Age of the user')
+    email: EmailStr = Field(examples=['marcelo@mail.com'])
+    name: str = Field(title='Username')
+    password: SecretStr = Field(
+        json_schema_extra={
+            'title': 'Password',
+            'description': 'Password of the user',
+            'examples': ['123456'],
+        }
+    )
+
+
+print(User.model_json_schema())
+"""
+{
+    'properties': {
+        'age': {
+            'description': 'Age of the user',
+            'title': 'Age',
+            'type': 'integer',
+        },
+        'email': {
+            'examples': ['marcelo@mail.com'],
+            'format': 'email',
+            'title': 'Email',
+            'type': 'string',
+        },
+        'name': {'title': 'Username', 'type': 'string'},
+        'password': {
+            'description': 'Password of the user',
+            'examples': ['123456'],
+            'format': 'password',
+            'title': 'Password',
+            'type': 'string',
+            'writeOnly': True,
+        },
+    },
+    'required': ['age', 'email', 'name', 'password'],
+    'title': 'User',
+    'type': 'object',
+}
+"""
+```
+
+## General notes on JSON schema generation
+
+- The JSON schema for Optional fields indicates that the value null is allowed.
+- The Decimal type is exposed in JSON schema (and serialized) as a string.
+- The JSON schema does not preserve namedtuples as namedtuples.
+- When they differ, you can specify whether you want the JSON schema to represent the inputs to validation or the outputs from serialization.
+- Sub-models used are added to the `$defs` JSON attribute and referenced, as per the spec.
+- Sub-models with modifications (via the Field class) like a custom title, description, or default value, are recursively included instead of referenced.
+- The description for models is taken from either the docstring of the class or the argument description to the Field class.
diff --git a/docs/concepts/multitask.md → docs/concepts/lists.md b/docs/concepts/multitask.md → docs/concepts/lists.md
diff --git a/docs/concepts/maybe.md b/docs/concepts/maybe.md
@@ -1,12 +1,6 @@
-# Handling Missing Data with `Maybe`
+# Handling Missing Data
 
-In this post, we will demonstrate how to use the `Maybe` pattern to manage missing data and employ pattern matching to handle errors in a structured manner.
-
-## What is `Maybe`?
-
-The `Maybe` pattern is a concept in functional programming used for error handling. Instead of raising exceptions or returning `None`, you can use a `Maybe` type to encapsulate both the result and potential errors. This pattern is particularly useful when making OpenAI API calls, as providing language models with an escape mechanism effectively reduces hallucinations. Consequently, we can construct a prompt that closely resembles regular programming.
-
-Towards the end, we will demonstrate how to use `Maybe` instances in pattern matching, which offers an excellent approach for handling errors in a structured manner.
+The `Maybe` pattern is a concept in functional programming used for error handling. Instead of raising exceptions or returning `None`, you can use a `Maybe` type to encapsulate both the result and potential errors. This pattern is particularly useful when making llm calls, as providing language models with an escape hatch can effectively reduce hallucinations.
 
 ## Defining the Model
 
@@ -76,7 +70,7 @@ user2 = extract("Unknown user")
 
 As you can see, when the data is extracted successfully, the `result` field contains the `UserDetail` instance. When an error occurs, the `error` field is set to `True`, and the `message` field contains the error message.
 
-## Handle the result
+## Handling the result
 
 There are a few ways we can handle the result. Normally, we can just access the individual fields.
 
@@ -89,7 +83,7 @@ def process_user_detail(maybe_user: MaybeUser):
       print(f"Not found: {user1.message}")
 ```
 
-## Pattern Matching
+### Pattern Matching
 
 We can also use pattern matching to handle the result. This is a great way to handle errors in a structured way.
 

diff --git a/docs/concepts/models.md b/docs/concepts/models.md
@@ -0,0 +1,150 @@
+# Response Model
+
+Defining llm output schemas in Pydantic is done via `pydantic.BaseModel`. To learn more about models in pydantic checkout their [documentation](https://docs.pydantic.dev/latest/concepts/models/).
+
+After defining a pydantic model, we can use it as as the `response_model` in your client `create` calls to openai. The job of the `response_model` is to define the schema and prompts for the language model and validate the response from the API and return a pydantic model instance.
+
+## Prompting
+
+When defining a response model, we can use docstrings and field annotations to define the prompt that will be used to generate the response.
+
+```python
+from pydantic import BaseModel, Field
+
+class User(BaseModel):
+    """
+    This is the prompt that will be used to generate the response.
+    Any instructions here will be passed to the language model.
+    """
+    name: str = Field(description="The name of the user.")
+    age: int = Field(description="The age of the user.")
+```
+
+Here all docstrings, types, and field annotations will be used to generate the prompt. The prompt will be generated by the `create` method of the client and will be used to generate the response.
+
+## Optional Values
+
+If we use `Optional` and `default` they will be considered not required when sent to the language model
+
+```python
+class User(BaseModel):
+    name: str = Field(description="The name of the user.")
+    age: int = Field(description="The age of the user.")
+    email: Optional[str] = Field(description="The email of the user.", default=None)
+```
+
+## Dynamic model creation
+
+There are some occasions where it is desirable to create a model using runtime information to specify the fields. For this Pydantic provides the create_model function to allow models to be created on the fly:
+
+```python
+from pydantic import BaseModel, create_model
+
+
+class FooModel(BaseModel):
+    foo: str
+    bar: int = 123
+
+
+BarModel = create_model(
+    'BarModel',
+    apple=(str, 'russet'),
+    banana=(str, 'yellow'),
+    __base__=FooModel,
+)
+print(BarModel)
+#> <class '__main__.BarModel'>
+print(BarModel.model_fields.keys())
+#> dict_keys(['foo', 'bar', 'apple', 'banana'])
+```
+
+??? notes "When would I use this?"
+
+    Consider a situation where the model is dynamically defined, based on some configuration or database. For example, we could have a database table that stores the properties of a model for
+    some model name or id. We could then query the database for the properties of the model and use that to create the model.
+
+    ```sql
+    SELECT property_name, property_type, description
+    FROM prompt
+    WHERE model_name = {model_name}
+    ```
+
+    We can then use this information to create the model.
+
+    ```python
+    types = {
+        'string': str,
+        'integer': int,
+        'boolean': bool,
+        'number': float,
+        'List[str]': List[str],
+    }
+
+    BarModel = create_model(
+        'User',
+        **{
+            property_name: (types[property_type], description)
+            for property_name, property_type, description in cursor.fetchall()
+        },
+        __base__=BaseModel,
+    )
+    ```
+
+    This would be useful when different users have different descriptions for the same model. We can use the same model but have different prompts for each user.
+
+## Structural Pattern Matching
+
+Pydantic supports structural pattern matching for models, as introduced by PEP 636 in Python 3.10.
+
+```python
+from pydantic import BaseModel
+
+
+class Pet(BaseModel):
+    name: str
+    species: str
+
+
+a = Pet(name='Bones', species='dog')
+
+match a:
+    # match `species` to 'dog', declare and initialize `dog_name`
+    case Pet(species='dog', name=dog_name):
+        print(f'{dog_name} is a dog')
+        #> Bones is a dog
+    # default case
+    case _:
+        print('No dog matched')
+```
+
+## Adding Behavior
+
+We can add methods to our pydantic models just as any plain python class. We might want to do this to add some custom logic to our models.
+
+```python
+from pydantic import BaseModel
+from typing import Literal
+
+from openai import OpenAI
+
+import instructor
+
+client = instructor.patch(OpenAI())
+
+class SearchQuery(BaseModel):
+    query: str
+    query_type: Literal["web", "image", "video"]
+
+    def execute(self):
+        # do some logic here
+        return results
+
+
+query = client.chat.completions.create(
+        ..., response_model=SearchQuery
+    )
+
+results = query.execute()
+```
+
+Now we can call `execute` on our model instance after extracting it from a language model. If you want to see more examples of this checkout our post on [RAG is more than embeddings](../blog/posts/rag-and-beyond.md)
diff --git a/docs/concepts/prompting.md b/docs/concepts/prompting.md
@@ -1,3 +1,5 @@
+# General Tips for Prompt Engineering
+
 The overarching theme of using Instructor and Pydantic for function calling is to make the models as self-descriptive, modular, and flexible as possible, while maintaining data integrity and ease of use.
 
 - **Modularity**: Design self-contained components for reuse.

diff --git a/docs/concepts/typeadapter.md b/docs/concepts/typeadapter.md
@@ -0,0 +1,3 @@
+!!! warning "This page is a work in progress"
+
+    This page is a work in progress. Check out [Pydantic's documentation](https://docs.pydantic.dev/latest/concepts/type_adapter/)
diff --git a/docs/concepts/types.md b/docs/concepts/types.md
@@ -0,0 +1,3 @@
+!!! warning "This page is a work in progress"
+
+    This page is a work in progress. Check out [Pydantic's documentation](https://docs.pydantic.dev/latest/concepts/types/)
diff --git a/docs/concepts/union.md b/docs/concepts/union.md
@@ -0,0 +1,3 @@
+!!! warning "This page is a work in progress"
+
+    This page is a work in progress. Check out [Pydantic's documentation](https://docs.pydantic.dev/latest/concepts/union/)