Skip to content

Commit

Permalink
Merge branch 'main' into jxnl-tutorial-live
Browse files Browse the repository at this point in the history
  • Loading branch information
jxnl committed Dec 22, 2023
2 parents feb2a53 + 8eef6da commit 2ea3ce2
Show file tree
Hide file tree
Showing 11 changed files with 694 additions and 43 deletions.
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
.DS_Store
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
Expand All @@ -6,6 +7,7 @@ __pycache__/
# C extensions
*.so


# Distribution / packaging
.Python
build/
Expand Down Expand Up @@ -166,4 +168,4 @@ tutorials/results.csv
tutorials/results.jsonl
tutorials/results.jsonlines
tutorials/schema.json
wandb/settings
wandb/settings
37 changes: 18 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Welcome to Instructor - Your Gateway to Structured Outputs with OpenAI

_Structured extraction in Python, powered by OpenAI's function calling api, designed for simplicity, transparency, and control._
_Structured extraction in Python, powered by OpenAI's function calling API, designed for simplicity, transparency, and control._

---

Expand All @@ -18,7 +18,7 @@ Dive into the world of Python-based structured extraction, empowered by OpenAI's

## Get Started in Moments

Installing Instructor is a breeze. Just run `pip install instructor` in your terminal and you're on your way to a smoother data handling experience.
Installing Instructor is a breeze. Simply run `pip install instructor` in your terminal and you're on your way to a smoother data handling experience!

## How Instructor Enhances Your Workflow

Expand All @@ -28,13 +28,11 @@ Our `instructor.patch` for the `OpenAI` class introduces three key enhancements:
- **Max Retries:** Set your desired number of retry attempts for requests.
- **Validation Context:** Provide a context object for enhanced validator access. A Glimpse into Instructor's Capabilities.

!!! note "Using Validators"

Learn more about validators checkout our blog post [Good llm validation is just good validation](https://jxnl.github.io/instructor/blog/2023/10/23/good-llm-validation-is-just-good-validation/)

With Instructor, your code becomes more efficient and readable. Here’s a quick peek:
### Using Validators
To learn more about validators, checkout our blog post [Good LLM validation is just good validation](https://jxnl.github.io/instructor/blog/2023/10/23/good-llm-validation-is-just-good-validation/)

## Usage
With Instructor, your code becomes more efficient and readable. Here’s a quick peek:

```py hl_lines="5 13"
import instructor
Expand All @@ -61,7 +59,7 @@ assert user.name == "Jason"
assert user.age == 25
```

### "Using `openai<1.0.0`"
### Using `openai<1.0.0`

If you're using `openai<1.0.0` then make sure you `pip install instructor<0.3.0`
where you can patch a global client like so:
Expand All @@ -78,9 +76,9 @@ user = openai.ChatCompletion.create(
)
```

### "Using async clients"
### Using async clients

For async clients you must use apatch vs patch like so:
For async clients you must use `apatch` vs. `patch`, as shown:

```py
import instructor
Expand All @@ -106,7 +104,7 @@ assert isinstance(model, UserExtract)

### Step 1: Patch the client

First, import the required libraries and apply the patch function to the OpenAI module. This exposes new functionality with the response_model parameter.
First, import the required libraries and apply the `patch` function to the OpenAI module. This exposes new functionality with the `response_model` parameter.

```python
import instructor
Expand All @@ -132,8 +130,7 @@ class UserDetail(BaseModel):

### Step 3: Extract

Use the `client.chat.completions.create` method to send a prompt and extract the data into the Pydantic object. The response_model parameter specifies the Pydantic model to use for extraction. Its helpful to annotate the variable with the type of the response model.
which will help your IDE provide autocomplete and spell check.
Use the `client.chat.completions.create` method to send a prompt and extract the data into the Pydantic object. The `response_model` parameter specifies the Pydantic model to use for extraction. It is helpful to annotate the variable with the type of the response model which will help your IDE provide autocomplete and spell check.

```python
user: UserDetail = client.chat.completions.create(
Expand All @@ -150,7 +147,9 @@ assert user.age == 25

## Pydantic Validation

Validation can also be plugged into the same Pydantic model. Here, if the answer attribute contains content that violates the rule "don't say objectionable things," Pydantic will raise a validation error.
Validation can also be plugged into the same Pydantic model.

In this example, if the answer attribute contains content that violates the rule "Do not say objectionable things", Pydantic will raise a validation error.

```python hl_lines="9 15"
from pydantic import BaseModel, ValidationError, BeforeValidator
Expand All @@ -173,15 +172,15 @@ except ValidationError as e:
print(e)
```

Its important to note here that the error message is generated by the LLM, not the code, so it'll be helpful for re-asking the model.
It is important to note here that the **error message is generated by the LLM**, not the code. Thus, it is helpful for re-asking the model.

```plaintext
1 validation error for QuestionAnswer
answer
Assertion failed, The statement is objectionable. (type=assertion_error)
```

## Reask on validation error
## Re-ask on validation error

Here, the `UserDetails` model is passed as the `response_model`, and `max_retries` is set to 2.

Expand Down Expand Up @@ -219,15 +218,15 @@ assert model.name == "JASON"

## [Evals](https://github.com/jxnl/instructor/tree/main/tests/openai/evals)

We invite you to contribute evals in pytest as a way to monitor the quality of the openai models and the instructor library. To get started check out the [jxnl/instructor/tests/evals](https://github.com/jxnl/instructor/tree/main/tests/openai/evals) and contribute your own evals in the form of pytest tests. These evals will be run once a week and the results will be posted.
We invite you to contribute to evals in `pytest` as a way to monitor the quality of the OpenAI models and the `instructor` library. To get started check out the [jxnl/instructor/tests/evals](https://github.com/jxnl/instructor/tree/main/tests/openai/evals) and contribute your own evals in the form of pytest tests. These evals will be run once a week and the results will be posted.

## Contributing

If you want to help out checkout some of the issues marked as `good-first-issue` or `help-wanted`. Found [here](https://github.com/jxnl/instructor/labels/good%20first%20issue). They could be anything from code improvements, a guest blog post, or a new cook book.
If you want to help, checkout some of the issues marked as `good-first-issue` or `help-wanted` found [here](https://github.com/jxnl/instructor/labels/good%20first%20issue). They could be anything from code improvements, a guest blog post, or a new cookbook.

## CLI

We also provide some added CLI functionality for easy convinience
We also provide some added CLI functionality for easy convinience:

- `instructor jobs` : This helps with the creation of fine-tuning jobs with OpenAI. Simple use `instructor jobs create-from-file --help` to get started creating your first fine-tuned GPT3.5 model

Expand Down
7 changes: 3 additions & 4 deletions docs/concepts/fields.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
The `pydantic.Field` function is used to customize and add metadata to fields of models. To learn more check out the pydantic [documentation](https://docs.pydantic.dev/latest/concepts/fields/) as this is a near replica of that documentation that is relevant to prompting.
The `pydantic.Field` function is used to customize and add metadata to fields of models. To learn more, check out the Pydantic [documentation](https://docs.pydantic.dev/latest/concepts/fields/) as this is a near replica of that documentation that is relevant to prompting.

## Default values

Expand Down Expand Up @@ -88,15 +88,14 @@ print(date_range.model_dump_json())

## Customizing JSON Schema

There are fields that exclusively to customise the generated JSON Schema:
There are some fields that are exclusively used to customise the generated JSON Schema:

- `title`: The title of the field.
- `description`: The description of the field.
- `examples`: The examples of the field.
- `json_schema_extra`: Extra JSON Schema properties to be added to the field.

These all work as great opportunities to add more information to the JSON Schema as part
of your prompt engineering.
These all work as great opportunities to add more information to the JSON schema as part of your prompt engineering.

Here's an example:

Expand Down
2 changes: 1 addition & 1 deletion docs/concepts/lists.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ Defining a task and creating a list of classes is a common enough pattern that w

## Extracting Tasks using Iterable

By using `Iterable` you get a very convient class with prompts and names automatically defined:
By using `Iterable` you get a very convenient class with prompts and names automatically defined:

```python
import instructor
Expand Down
4 changes: 3 additions & 1 deletion docs/concepts/maybe.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
# Handling Missing Data

The `Maybe` pattern is a concept in functional programming used for error handling. Instead of raising exceptions or returning `None`, you can use a `Maybe` type to encapsulate both the result and potential errors. This pattern is particularly useful when making llm calls, as providing language models with an escape hatch can effectively reduce hallucinations.
The `Maybe` pattern is a concept in functional programming used for error handling. Instead of raising exceptions or returning `None`, you can use a `Maybe` type to encapsulate both the result and potential errors.

This pattern is particularly useful when making LLM calls, as providing language models with an escape hatch can effectively reduce hallucinations.

## Defining the Model

Expand Down
15 changes: 9 additions & 6 deletions docs/concepts/models.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,11 @@
# Response Model

Defining llm output schemas in Pydantic is done via `pydantic.BaseModel`. To learn more about models in pydantic checkout their [documentation](https://docs.pydantic.dev/latest/concepts/models/).
Defining LLM output schemas in Pydantic is done via `pydantic.BaseModel`. To learn more about models in Pydantic, check out their [documentation](https://docs.pydantic.dev/latest/concepts/models/).

After defining a pydantic model, we can use it as as the `response_model` in your client `create` calls to openai. The job of the `response_model` is to define the schema and prompts for the language model and validate the response from the API and return a pydantic model instance.
After defining a Pydantic model, we can use it as the `response_model` in your client `create` calls to OpenAI or any other supported model. The job of the `response_model` parameter is to:
- Define the schema and prompts for the language model
- Validate the response from the API
- Return a Pydantic model instance.

## Prompting

Expand All @@ -24,7 +27,7 @@ Here all docstrings, types, and field annotations will be used to generate the p

## Optional Values

If we use `Optional` and `default` they will be considered not required when sent to the language model
If we use `Optional` and `default`, they will be considered not required when sent to the language model

```python
class User(BaseModel):
Expand All @@ -35,7 +38,7 @@ class User(BaseModel):

## Dynamic model creation

There are some occasions where it is desirable to create a model using runtime information to specify the fields. For this Pydantic provides the create_model function to allow models to be created on the fly:
There are some occasions where it is desirable to create a model using runtime information to specify the fields. For this, Pydantic provides the create_model function to allow models to be created on the fly:

```python
from pydantic import BaseModel, create_model
Expand Down Expand Up @@ -94,7 +97,7 @@ print(BarModel.model_fields.keys())

## Structural Pattern Matching

Pydantic supports structural pattern matching for models, as introduced by PEP 636 in Python 3.10.
Pydantic supports structural pattern matching for models, as introduced by [PEP 636](https://peps.python.org/pep-0636/) in Python 3.10.

```python
from pydantic import BaseModel
Expand All @@ -119,7 +122,7 @@ match a:

## Adding Behavior

We can add methods to our pydantic models just as any plain python class. We might want to do this to add some custom logic to our models.
We can add methods to our Pydantic models, just as any plain Python class. We might want to do this to add some custom logic to our models.

```python
from pydantic import BaseModel
Expand Down
20 changes: 9 additions & 11 deletions docs/concepts/reask_validation.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@
# Validation and Reasking

Instead of framing "self-critique" or "self-reflection" in AI as new concepts, we can view them as validation errors with clear error messages that the systen can use to self correct.
Instead of framing "self-critique" or "self-reflection" in AI as new concepts, we can view them as validation errors with clear error messages that the system can use to self-correct.

## Pydantic

Pydantic offers an customizable and expressive validation framework for Python. Instructor leverages Pydantic's validation framework to provide a uniform developer experience for both code-based and LLM-based validation, as well as a reasking mechanism for correcting LLM outputs based on validation errors. To learn more check out the [Pydantic docs](https://docs.pydantic.dev/latest/concepts/validators/) on validators.

!!! note "Good llm validation is just good validation"

If you want to see some more examples on validators checkout our blog post [Good llm validation is just good validation](https://jxnl.github.io/instructor/blog/2023/10/23/good-llm-validation-is-just-good-validation/)
If you want to see some more examples on validators checkout our blog post [Good LLM validation is just good validation](https://jxnl.github.io/instructor/blog/2023/10/23/good-llm-validation-is-just-good-validation/)

### Code-Based Validation Example
### Code-based Validation Example

First define a Pydantic model with a validator using the `Annotation` class from `typing_extensions`.

Expand Down Expand Up @@ -80,7 +80,7 @@ except ValidationError as e:

#### Output for LLM-Based Validation

Its important to not here that the error message is generated by the LLM, not the code, so it'll be helpful for re asking the model.
It is important to not here that the error message is generated by the LLM, not the code, so it'll be helpful for re asking the model.

```plaintext
1 validation error for QuestionAnswer
Expand All @@ -92,14 +92,13 @@ answer

Validators are a great tool for ensuring some property of the outputs. When you use the `patch()` method with the `openai` client, you can use the `max_retries` parameter to set the number of times you can reask the model to correct the output.

Its a great layer of defense against bad outputs of two forms.

It is a great layer of defense against bad outputs of two forms:
1. Pydantic Validation Errors (code or llm based)
2. JSON Decoding Errors (when the model returns a bad response)

### Step 1: Define the Response Model with Validators

Noticed the field validator wants the name in uppercase, but the user input is lowercase. The validator will raise a `ValueError` if the name is not in uppercase.
Notice that the field validator wants the name in uppercase, but the user input is lowercase. The validator will raise a `ValueError` if the name is not in uppercase.

```python hl_lines="11-16"
import instructor
Expand Down Expand Up @@ -156,11 +155,10 @@ except (ValidationError, JSONDecodeError) as e:

## Advanced Validation Techniques

The docs are currently incomplete, but we have a few advanced validation techniques that we're working on documenting better, for a example of model level validation, and using a validation context check out our example on [verifying citations](../examples/exact_citations.md) which covers

The docs are currently incomplete, but we have a few advanced validation techniques that we're working on documenting better such as model level validation, and using a validation context. Check out our example on [verifying citations](../examples/exact_citations.md) which covers:
1. Validate the entire object with all attributes rather than one attribute at a time
2. Using some 'context' to validate the object, in this case we use the `context` to check if the citation existed in the original text.
2. Using some 'context' to validate the object: In this case, we use the `context` to check if the citation existed in the original text.

## Takeaways

By integrating these advanced validation techniques, we not only improve the quality and reliability of LLM-generated content but also pave the way for more autonomous and effective systems.
By integrating these advanced validation techniques, we not only improve the quality and reliability of LLM-generated content, but also pave the way for more autonomous and effective systems.
Loading

0 comments on commit 2ea3ce2

Please sign in to comment.