diff --git a/docs/blog/.authors.yml b/docs/blog/.authors.yml
index 2441ac862..34e8d7f07 100644
--- a/docs/blog/.authors.yml
+++ b/docs/blog/.authors.yml
@@ -2,7 +2,7 @@ authors:
jxnl:
name: Jason Liu
description: Creator
- avatar: https://pbs.twimg.com/profile_images/1682267111555571714/VDORoUy__400x400.jpg
+ avatar: https://pbs.twimg.com/profile_images/1724672723748638720/qOBwmkOI_400x400.jpg
ivanleomk:
name: Ivan Leo
description: Contributor
diff --git a/docs/blog/index.md b/docs/blog/index.md
index 5058819f0..3c26d98a7 100644
--- a/docs/blog/index.md
+++ b/docs/blog/index.md
@@ -4,9 +4,10 @@ The goal of the blog is to capture some content that does not neatly fit within
## Advanced Topics
-- [Query Understanding and Expansion for RAG](posts/rag-and-beyond.md)
-- [GPT-4 Level summarization with GPT3.5 Finetuning](posts/chain-of-density.md)
-- [Deepdive on LLM Guardrails / Validation](posts/validation-part1.md)
+- [Query Understanding for RAG: Beyond Embeddings](posts/rag-and-beyond.md)
+- [Finetuning: GPT-4 level summaries with GPT-3.5-turbo](posts/chain-of-density.md)
+- [Introduction to Guardrails and Validation](posts/validation-part1.md)
+- [Validating Citations](posts/citations.md)
- [A Guide to Fine-Tuning and Distillation](posts/distilation-part1.md)
## Learning Python
diff --git a/docs/blog/posts/citations.md b/docs/blog/posts/citations.md
new file mode 100644
index 000000000..240c37a44
--- /dev/null
+++ b/docs/blog/posts/citations.md
@@ -0,0 +1,268 @@
+---
+draft: False
+date: 2023-11-18
+slug: validate-citations
+tags:
+ - pydantic
+ - validation
+ - finetuneing
+ - citations
+ - hallucination
+authors:
+ - jxnl
+---
+
+# Verifying LLM Citations with Pydantic
+
+Ensuring the accuracy of information is crucial. This blog post explores how Pydantic's powerful and flexible validators can enhance data accuracy through citation verification.
+
+We'll start with using a simple substring check to verify citations. Then we'll use `instructor` itself to power an LLM to verify citations and align answers with the given citations. Finally, we'll explore how we can use these techniques to generate a dataset of accurate responses.
+
+## Example 1: Simple Substring Check
+
+In this example, we use the `Statements` class to verify if a given substring quote exists within a text chunk. If the substring is not found, an error is raised.
+
+### Code Example:
+
+```python
+from typing import List, Optional
+from openai import OpenAI
+from pydantic import BaseModel, Field, ValidationError, ValidationInfo, field_validator, model_validator
+import instructor
+
+client = instructor.patch(OpenAI())
+
+class Statements(BaseModel):
+ body: str
+ substring_quote: str
+
+ @field_validator("substring_quote")
+ @classmethod
+ def substring_quote_exists(cls, v: str, info: ValidationInfo):
+ context = info.context.get("text_chunks", None)
+
+ for text_chunk in context.values():
+ if v in text_chunk: # (1)
+ return v
+ raise ValueError("Could not find substring_quote `{v}` in contexts")
+
+
+class AnswerWithCitaton(BaseModel):
+ question: str
+ answer: List[Statements]
+```
+
+1. While we use a simple substring check in this example, we can use more complex techniques like regex or Levenshtein distance.
+
+Once the class is defined, we can use it to validate the context and raise an error if the substring is not found.
+
+```python
+try:
+ AnswerWithCitaton.model_validate(
+ {
+ "question": "What is the capital of France?",
+ "answer": [
+ {"body": "Paris", "substring_quote": "Paris is the capital of France"},
+ ],
+ },
+ context={
+ "text_chunks": {
+ 1: "Jason is a pirate",
+ 2: "Paris is not the capital of France",
+ 3: "Irrelevant data",
+ }
+ },
+ )
+except ValidationError as e:
+ print(e)
+```
+
+### Error Message Example:
+
+```
+answer.0.substring_quote
+ Value error, Could not find substring_quote `Paris is the capital of France` in contexts [type=value_error, input_value='Paris is the capital of France', input_type=str]
+ For further information visit [https://errors.pydantic.dev/2.4/v/value_error](https://errors.pydantic.dev/2.4/v/value_error)
+```
+
+Pydantic raises a validation error when the `substring_quote` attribute does not exist in the context. This approach can be used to validate more complex data using techniques like regex or Levenshtein distance.
+
+## Example 2: Using LLM for Verification
+
+This approach leverages OpenAI's LLM to validate citations. If the citation does not exist in the context, the LLM returns an error message.
+
+### Code Example:
+
+```python
+class Validation(BaseModel):
+ is_valid: bool
+ error_messages: Optional[str] = Field(None, description="Error messages if any")
+
+
+class Statements(BaseModel):
+ body: str
+ substring_quote: str
+
+ @model_validator(mode="after")
+ def substring_quote_exists(self, info: ValidationInfo):
+ context = info.context.get("text_chunks", None)
+
+ resp: Validation = client.chat.completions.create(
+ response_model=Validation,
+ messages=[
+ {
+ "role": "user",
+ "content": f"Does the following citation exist in the following context?\n\nCitation: {self.substring_quote}\n\nContext: {context}",
+ }
+ ],
+ model="gpt-3.5-turbo",
+ )
+
+ if resp.is_valid:
+ return self
+
+ raise ValueError(resp.error_messages)
+
+
+class AnswerWithCitaton(BaseModel):
+ question: str
+ answer: List[Statements]
+```
+
+Now when we use a correct citation, the LLM returns a valid response.
+
+```python
+resp = AnswerWithCitaton.model_validate(
+ {
+ "question": "What is the capital of France?",
+ "answer": [
+ {"body": "Paris", "substring_quote": "Paris is the capital of France"},
+ ],
+ },
+ context={
+ "text_chunks": {
+ 1: "Jason is a pirate",
+ 2: "Paris is the capital of France",
+ 3: "Irrelevant data",
+ }
+ },
+)
+print(resp.model_dump_json(indent=2))
+```
+
+### Result:
+
+```json
+{
+ "question": "What is the capital of France?",
+ "answer": [
+ {
+ "body": "Paris",
+ "substring_quote": "Paris is the capital of France"
+ }
+ ]
+}
+```
+
+When we have citations that don't exist in the context, the LLM returns an error message.
+
+```python
+try:
+ AnswerWithCitaton.model_validate(
+ {
+ "question": "What is the capital of France?",
+ "answer": [
+ {"body": "Paris", "substring_quote": "Paris is the capital of France"},
+ ],
+ },
+ context={
+ "text_chunks": {
+ 1: "Jason is a pirate",
+ 2: "Paris is not the capital of France",
+ 3: "Irrelevant data",
+ }
+ },
+ )
+except ValidationError as e:
+ print(e)
+```
+
+### Error Message Example:
+
+```
+1 validation error for AnswerWithCitaton
+answer.0
+ Value error, Citation not found in context [type=value_error, input_value={'body': 'Paris', 'substr... the capital of France'}, input_type=dict]
+ For further information visit [https://errors.pydantic.dev/2.4/v/value_error](https://errors.pydantic.dev/2.4/v/value_error)
+```
+
+## Example 3: Aligning Citations and Answers
+
+In this example, we ensure that the provided answers are aligned with the given citations and context. The LLM is used to verify the alignment.
+
+We use the same `Statements` model as above, but we add a new model for the answer that also verifies the alignment of citations.
+
+### Code Example:
+
+```python
+class AnswerWithCitaton(BaseModel):
+ question: str
+ answer: List[Statements]
+
+ @model_validator(mode="after")
+ def validate_answer(self, info: ValidationInfo):
+ context = info.context.get("text_chunks", None)
+
+ resp: Validation = client.chat.completions.create(
+ response_model=Validation,
+ messages=[
+ {
+ "role": "user",
+ "content": f"Does the following answers match the question and the context?\n\nQuestion: {self.question}\n\nAnswer: {self.answer}\n\nContext: {context}",
+ }
+ ],
+ model="gpt-3.5-turbo",
+ )
+
+ if resp.is_valid:
+ return self
+
+ raise ValueError(resp.error_messages)
+```
+
+When we have a mismatch between the answer and the citation, the LLM returns an error message.
+
+```python
+try:
+ AnswerWithCitaton.model_validate(
+ {
+ "question": "What is the capital of France?",
+ "answer": [
+ {"body": "Texas", "substring_quote": "Paris is the capital of France"},
+ ],
+ },
+ context={
+ "text_chunks": {
+ 1: "Jason is a pirate",
+ 2: "Paris is the capital of France",
+ 3: "Irrelevant data",
+ }
+ },
+ )
+except ValidationError as e:
+ print(e)
+```
+
+### Error Message Example:
+
+```
+1 validation error for AnswerWithCitaton
+ Value error, The answer does not match the question and context [type=value_error, input_value={'question': 'What is the...he capital of France'}]}, input_type=dict]
+ For further information visit [https://errors.pydantic.dev/2.4/v/value_error](https://errors.pydantic.dev/2.4/v/value_error)
+```
+
+## Conclusion
+
+These examples demonstrate the potential of using Pydantic and OpenAI to enhance data accuracy through citation verification. While the LLM-based approach may not be efficient for runtime operations, it has exciting implications for generating a dataset of accurate responses. By leveraging this method during data generation, we can fine-tune a model that excels in citation accuracy. Similar to our last post on [finetuning a better summarizer](https://jxnl.github.io/instructor/blog/2023/11/05/chain-of-density/).
+
+If you like the content check out our [GitHub](https://github.com/jxnl/instructor) as give us a start and checkout the library.
diff --git a/examples/citations/run.py b/examples/citations/run.py
new file mode 100644
index 000000000..c8af6994e
--- /dev/null
+++ b/examples/citations/run.py
@@ -0,0 +1,225 @@
+from typing import List, Optional
+from openai import OpenAI
+from pydantic import (
+ BaseModel,
+ Field,
+ ValidationError,
+ ValidationInfo,
+ field_validator,
+ model_validator,
+)
+
+import instructor
+
+client = instructor.patch(OpenAI())
+
+"""
+Example 1) Simple Substring check that compares a citation to a text chunk
+"""
+
+
+class Statements(BaseModel):
+ body: str
+ substring_quote: str
+
+ @field_validator("substring_quote")
+ @classmethod
+ def substring_quote_exists(cls, v: str, info: ValidationInfo):
+ context = info.context.get("text_chunks", None)
+
+ # Check if the substring_quote is in the text_chunk
+ # if not, raise an error
+ for text_chunk in context.values():
+ if v in text_chunk:
+ return v
+ raise ValueError(
+ f"Could not find substring_quote `{v}` in contexts",
+ )
+
+
+class AnswerWithCitaton(BaseModel):
+ question: str
+ answer: List[Statements]
+
+
+try:
+ AnswerWithCitaton.model_validate(
+ {
+ "question": "What is the capital of France?",
+ "answer": [
+ {"body": "Paris", "substring_quote": "Paris is the capital of France"},
+ ],
+ },
+ context={
+ "text_chunks": {
+ 1: "Jason is a pirate",
+ 2: "Paris is not the capital of France",
+ 3: "Irrelevant data",
+ }
+ },
+ )
+except ValidationError as e:
+ print(e)
+"""
+answer.0.substring_quote
+ Value error, Could not find substring_quote `Paris is the capital of France` in contexts [type=value_error, input_value='Paris is the capital of France', input_type=str]
+ For further information visit https://errors.pydantic.dev/2.4/v/value_error
+"""
+
+
+"""
+Example 2) Using an LLM to verify if a
+"""
+
+
+class Validation(BaseModel):
+ """
+ Verfication response from the LLM,
+ the error message should be detailed if the is_valid is False
+ but keep it to less than 100 characters, reference specific
+ attributes that you are comparing, use `...` is the string is too long
+ """
+
+ is_valid: bool
+ error_messages: Optional[str] = Field(None, description="Error messages if any")
+
+
+class Statements(BaseModel):
+ body: str
+ substring_quote: str
+
+ @model_validator(mode="after")
+ def substring_quote_exists(self, info: ValidationInfo):
+ context = info.context.get("text_chunks", None)
+
+ resp: Validation = client.chat.completions.create(
+ response_model=Validation,
+ messages=[
+ {
+ "role": "user",
+ "content": f"Does the following citation exist in the following context?\n\nCitation: {self.substring_quote}\n\nContext: {context}",
+ }
+ ],
+ model="gpt-3.5-turbo",
+ )
+
+ if resp.is_valid:
+ return self
+
+ raise ValueError(resp.error_messages)
+
+
+class AnswerWithCitaton(BaseModel):
+ question: str
+ answer: List[Statements]
+
+
+resp = AnswerWithCitaton.model_validate(
+ {
+ "question": "What is the capital of France?",
+ "answer": [
+ {"body": "Paris", "substring_quote": "Paris is the capital of France"},
+ ],
+ },
+ context={
+ "text_chunks": {
+ 1: "Jason is a pirate",
+ 2: "Paris is the capital of France",
+ 3: "Irrelevant data",
+ }
+ },
+)
+# output: notice that there are no errors
+print(resp.model_dump_json(indent=2))
+{
+ "question": "What is the capital of France?",
+ "answer": [{"body": "Paris", "substring_quote": "Paris is the capital of France"}],
+}
+
+# Now we change the text chunk to something else, and we get an error
+try:
+ AnswerWithCitaton.model_validate(
+ {
+ "question": "What is the capital of France?",
+ "answer": [
+ {"body": "Paris", "substring_quote": "Paris is the capital of France"},
+ ],
+ },
+ context={
+ "text_chunks": {
+ 1: "Jason is a pirate",
+ 2: "Paris is not the capital of France",
+ 3: "Irrelevant data",
+ }
+ },
+ )
+except ValidationError as e:
+ print(e)
+"""
+1 validation error for AnswerWithCitaton
+answer.0
+ Value error, Citation not found in context [type=value_error, input_value={'body': 'Paris', 'substr... the capital of France'}, input_type=dict]
+ For further information visit https://errors.pydantic.dev/2.4/v/value_error
+"""
+
+# Example 3) Using an LLM to verify if the citations and the answers are all aligned
+
+
+# we keep the same model as above for Statements, but we add a new model for the answer
+# that also verifies that the citations are aligned with the answers
+class AnswerWithCitaton(BaseModel):
+ question: str
+ answer: List[Statements]
+
+ @model_validator(mode="after")
+ def validate_answer(self, info: ValidationInfo):
+ context = info.context.get("text_chunks", None)
+
+ resp: Validation = client.chat.completions.create(
+ response_model=Validation,
+ messages=[
+ {
+ "role": "user",
+ "content": f"Does the following answers match the question and the context?\n\nQuestion: {self.question}\n\nAnswer: {self.answer}\n\nContext: {context}",
+ }
+ ],
+ model="gpt-3.5-turbo",
+ )
+
+ if resp.is_valid:
+ return self
+
+ raise ValueError(resp.error_messages)
+
+
+"""
+Using LLMs for citation verification is inefficient during runtime.
+However, we can utilize them to create a dataset consisting only of accurate responses
+where citations must be valid (as determined by LLM, fuzzy text search, etc.).
+
+This approach would require an initial investment during data generation to obtain
+a finely-tuned model for improved citation.
+"""
+try:
+ AnswerWithCitaton.model_validate(
+ {
+ "question": "What is the capital of France?",
+ "answer": [
+ {"body": "Texas", "substring_quote": "Paris is the capital of France"},
+ ],
+ },
+ context={
+ "text_chunks": {
+ 1: "Jason is a pirate",
+ 2: "Paris is the capital of France",
+ 3: "Irrelevant data",
+ }
+ },
+ )
+except ValidationError as e:
+ print(e)
+"""
+1 validation error for AnswerWithCitaton
+ Value error, The answer does not match the question and context [type=value_error, input_value={'question': 'What is the...he capital of France'}]}, input_type=dict]
+ For further information visit https://errors.pydantic.dev/2.4/v/value_error
+"""
diff --git a/instructor/__init__.py b/instructor/__init__.py
index b28ce7903..4bd40d7a1 100644
--- a/instructor/__init__.py
+++ b/instructor/__init__.py
@@ -1,7 +1,6 @@
from .distil import FinetuneFormat, Instructions
from .dsl import CitationMixin, Maybe, MultiTask, llm_validator
from .function_calls import OpenAISchema, openai_function, openai_schema
-from .dsl import MultiTask, Maybe, llm_validator, CitationMixin
from .patch import patch, apatch
__all__ = [
diff --git a/tutorials/1.introduction.ipynb b/tutorials/1.introduction.ipynb
index d178beadd..6053d3f32 100644
--- a/tutorials/1.introduction.ipynb
+++ b/tutorials/1.introduction.ipynb
@@ -24,7 +24,7 @@
},
{
"cell_type": "code",
- "execution_count": 7,
+ "execution_count": 22,
"metadata": {},
"outputs": [],
"source": [
@@ -43,7 +43,7 @@
},
{
"cell_type": "code",
- "execution_count": 8,
+ "execution_count": 23,
"metadata": {},
"outputs": [
{
@@ -62,7 +62,7 @@
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)",
- "\u001b[1;32m/home/m/cookbook/instructor/tutorials/1.introduction.ipynb Cell 5\u001b[0m line \u001b[0;36m5\n\u001b[1;32m 3\u001b[0m age \u001b[39m=\u001b[39m obj\u001b[39m.\u001b[39mget(\u001b[39m\"\u001b[39m\u001b[39mage\u001b[39m\u001b[39m\"\u001b[39m)\n\u001b[1;32m 4\u001b[0m \u001b[39mprint\u001b[39m(\u001b[39mf\u001b[39m\u001b[39m\"\u001b[39m\u001b[39m{\u001b[39;00mname\u001b[39m}\u001b[39;00m\u001b[39m is \u001b[39m\u001b[39m{\u001b[39;00mage\u001b[39m}\u001b[39;00m\u001b[39m\"\u001b[39m)\n\u001b[0;32m----> 5\u001b[0m \u001b[39mprint\u001b[39m(\u001b[39mf\u001b[39m\u001b[39m\"\u001b[39m\u001b[39mNext year he will be \u001b[39m\u001b[39m{\u001b[39;00mage\u001b[39m+\u001b[39;49m\u001b[39m1\u001b[39;49m\u001b[39m}\u001b[39;00m\u001b[39m years old\u001b[39m\u001b[39m\"\u001b[39m)\n",
+ "\u001b[1;32m/Users/jasonliu/dev/instructor/tutorials/1.introduction.ipynb Cell 5\u001b[0m line \u001b[0;36m5\n\u001b[1;32m 3\u001b[0m age \u001b[39m=\u001b[39m obj\u001b[39m.\u001b[39mget(\u001b[39m\"\u001b[39m\u001b[39mage\u001b[39m\u001b[39m\"\u001b[39m)\n\u001b[1;32m 4\u001b[0m \u001b[39mprint\u001b[39m(\u001b[39mf\u001b[39m\u001b[39m\"\u001b[39m\u001b[39m{\u001b[39;00mname\u001b[39m}\u001b[39;00m\u001b[39m is \u001b[39m\u001b[39m{\u001b[39;00mage\u001b[39m}\u001b[39;00m\u001b[39m\"\u001b[39m)\n\u001b[0;32m----> 5\u001b[0m \u001b[39mprint\u001b[39m(\u001b[39mf\u001b[39m\u001b[39m\"\u001b[39m\u001b[39mNext year he will be \u001b[39m\u001b[39m{\u001b[39;00mage\u001b[39m+\u001b[39;49m\u001b[39m1\u001b[39;49m\u001b[39m}\u001b[39;00m\u001b[39m years old\u001b[39m\u001b[39m\"\u001b[39m)\n",
"\u001b[0;31mTypeError\u001b[0m: can only concatenate str (not \"int\") to str"
]
}
@@ -93,7 +93,7 @@
},
{
"cell_type": "code",
- "execution_count": 2,
+ "execution_count": 24,
"metadata": {},
"outputs": [
{
@@ -102,7 +102,7 @@
"Person(name='Sam', age=30)"
]
},
- "execution_count": 2,
+ "execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
@@ -121,7 +121,7 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 25,
"metadata": {},
"outputs": [
{
@@ -130,7 +130,7 @@
"Person(name='Sam', age=30)"
]
},
- "execution_count": 4,
+ "execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
@@ -143,7 +143,7 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 26,
"metadata": {},
"outputs": [
{
@@ -153,7 +153,7 @@
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mAssertionError\u001b[0m Traceback (most recent call last)",
- "\u001b[1;32m/home/m/cookbook/instructor/tutorials/1.introduction.ipynb Cell 10\u001b[0m line \u001b[0;36m2\n\u001b[1;32m 1\u001b[0m \u001b[39massert\u001b[39;00m person\u001b[39m.\u001b[39mname \u001b[39m==\u001b[39m \u001b[39m\"\u001b[39m\u001b[39mSam\u001b[39m\u001b[39m\"\u001b[39m\n\u001b[0;32m----> 2\u001b[0m \u001b[39massert\u001b[39;00m person\u001b[39m.\u001b[39mage \u001b[39m==\u001b[39m \u001b[39m20\u001b[39m\n",
+ "\u001b[1;32m/Users/jasonliu/dev/instructor/tutorials/1.introduction.ipynb Cell 10\u001b[0m line \u001b[0;36m2\n\u001b[1;32m 1\u001b[0m \u001b[39massert\u001b[39;00m person\u001b[39m.\u001b[39mname \u001b[39m==\u001b[39m \u001b[39m\"\u001b[39m\u001b[39mSam\u001b[39m\u001b[39m\"\u001b[39m\n\u001b[0;32m----> 2\u001b[0m \u001b[39massert\u001b[39;00m person\u001b[39m.\u001b[39mage \u001b[39m==\u001b[39m \u001b[39m20\u001b[39m\n",
"\u001b[0;31mAssertionError\u001b[0m: "
]
}
@@ -165,7 +165,7 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 27,
"metadata": {},
"outputs": [
{
@@ -175,7 +175,7 @@
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mValidationError\u001b[0m Traceback (most recent call last)",
- "\u001b[1;32m/Users/jasonliu/dev/instructor/tutorials/1.introduction.ipynb Cell 11\u001b[0m line \u001b[0;36m2\n\u001b[1;32m 1\u001b[0m \u001b[39m# Data is validated to get better error messages\u001b[39;00m\n\u001b[0;32m----> 2\u001b[0m person \u001b[39m=\u001b[39m Person\u001b[39m.\u001b[39;49mmodel_validate({\u001b[39m\"\u001b[39;49m\u001b[39mname\u001b[39;49m\u001b[39m\"\u001b[39;49m: \u001b[39m\"\u001b[39;49m\u001b[39mSam\u001b[39;49m\u001b[39m\"\u001b[39;49m, \u001b[39m\"\u001b[39;49m\u001b[39mage\u001b[39;49m\u001b[39m\"\u001b[39;49m: \u001b[39m\"\u001b[39;49m\u001b[39m30.2\u001b[39;49m\u001b[39m\"\u001b[39;49m})\n\u001b[1;32m 3\u001b[0m person\n",
+ "\u001b[1;32m/Users/jasonliu/dev/instructor/tutorials/1.introduction.ipynb Cell 11\u001b[0m line \u001b[0;36m2\n\u001b[1;32m 1\u001b[0m \u001b[39m# Data is validated to get better error messages\u001b[39;00m\n\u001b[0;32m----> 2\u001b[0m person \u001b[39m=\u001b[39m Person\u001b[39m.\u001b[39;49mmodel_validate({\u001b[39m\"\u001b[39;49m\u001b[39mname\u001b[39;49m\u001b[39m\"\u001b[39;49m: \u001b[39m\"\u001b[39;49m\u001b[39mSam\u001b[39;49m\u001b[39m\"\u001b[39;49m, \u001b[39m\"\u001b[39;49m\u001b[39mage\u001b[39;49m\u001b[39m\"\u001b[39;49m: \u001b[39m\"\u001b[39;49m\u001b[39m30.2\u001b[39;49m\u001b[39m\"\u001b[39;49m})\n\u001b[1;32m 3\u001b[0m person\n",
"File \u001b[0;32m~/dev/instructor/.venv/lib/python3.11/site-packages/pydantic/main.py:503\u001b[0m, in \u001b[0;36mBaseModel.model_validate\u001b[0;34m(cls, obj, strict, from_attributes, context)\u001b[0m\n\u001b[1;32m 501\u001b[0m \u001b[39m# `__tracebackhide__` tells pytest and some other tools to omit this function from tracebacks\u001b[39;00m\n\u001b[1;32m 502\u001b[0m __tracebackhide__ \u001b[39m=\u001b[39m \u001b[39mTrue\u001b[39;00m\n\u001b[0;32m--> 503\u001b[0m \u001b[39mreturn\u001b[39;00m \u001b[39mcls\u001b[39;49m\u001b[39m.\u001b[39;49m__pydantic_validator__\u001b[39m.\u001b[39;49mvalidate_python(\n\u001b[1;32m 504\u001b[0m obj, strict\u001b[39m=\u001b[39;49mstrict, from_attributes\u001b[39m=\u001b[39;49mfrom_attributes, context\u001b[39m=\u001b[39;49mcontext\n\u001b[1;32m 505\u001b[0m )\n",
"\u001b[0;31mValidationError\u001b[0m: 1 validation error for Person\nage\n Input should be a valid integer, unable to parse string as an integer [type=int_parsing, input_value='30.2', input_type=str]\n For further information visit https://errors.pydantic.dev/2.4/v/int_parsing"
]
@@ -191,32 +191,144 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "By introducing pydantic into any python codebase you can get a lot of benefits. You can get type checking, you can get validation, and you can get autocomplete. This is a huge win, because it means you can catch errors before they happen. This is even more useful when we rely on language models to generate data for us."
+ "By introducing pydantic into any python codebase you can get a lot of benefits. You can get type checking, you can get validation, and you can get autocomplete. This is a huge win, because it means you can catch errors before they happen. This is even more useful when we rely on language models to generate data for us.\n",
+ "\n",
+ "You can also define validators that are run on the data. This is useful because it means you can catch errors before they happen. For example, you can define a validator that checks if the age is greater than 0. This is useful because it means you can catch errors before they happen."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 28,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "Person(name='Sam', age=-10)"
+ ]
+ },
+ "execution_count": 28,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "Person(name=\"Sam\", age=-10)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 29,
+ "metadata": {},
+ "outputs": [
+ {
+ "ename": "ValidationError",
+ "evalue": "1 validation error for Person\nage\n Input should be greater than 0 [type=greater_than, input_value=-10, input_type=int]\n For further information visit https://errors.pydantic.dev/2.4/v/greater_than",
+ "output_type": "error",
+ "traceback": [
+ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
+ "\u001b[0;31mValidationError\u001b[0m Traceback (most recent call last)",
+ "\u001b[1;32m/Users/jasonliu/dev/instructor/tutorials/1.introduction.ipynb Cell 14\u001b[0m line \u001b[0;36m5\n\u001b[1;32m 2\u001b[0m name: \u001b[39mstr\u001b[39m\n\u001b[1;32m 3\u001b[0m age: \u001b[39mint\u001b[39m \u001b[39m=\u001b[39m Field(\u001b[39m.\u001b[39m\u001b[39m.\u001b[39m\u001b[39m.\u001b[39m, gt\u001b[39m=\u001b[39m\u001b[39m0\u001b[39m)\n\u001b[0;32m----> 5\u001b[0m Person(name\u001b[39m=\u001b[39;49m\u001b[39m\"\u001b[39;49m\u001b[39mSam\u001b[39;49m\u001b[39m\"\u001b[39;49m, age\u001b[39m=\u001b[39;49m\u001b[39m-\u001b[39;49m\u001b[39m10\u001b[39;49m)\n",
+ "File \u001b[0;32m~/dev/instructor/.venv/lib/python3.11/site-packages/pydantic/main.py:164\u001b[0m, in \u001b[0;36mBaseModel.__init__\u001b[0;34m(__pydantic_self__, **data)\u001b[0m\n\u001b[1;32m 162\u001b[0m \u001b[39m# `__tracebackhide__` tells pytest and some other tools to omit this function from tracebacks\u001b[39;00m\n\u001b[1;32m 163\u001b[0m __tracebackhide__ \u001b[39m=\u001b[39m \u001b[39mTrue\u001b[39;00m\n\u001b[0;32m--> 164\u001b[0m __pydantic_self__\u001b[39m.\u001b[39;49m__pydantic_validator__\u001b[39m.\u001b[39;49mvalidate_python(data, self_instance\u001b[39m=\u001b[39;49m__pydantic_self__)\n",
+ "\u001b[0;31mValidationError\u001b[0m: 1 validation error for Person\nage\n Input should be greater than 0 [type=greater_than, input_value=-10, input_type=int]\n For further information visit https://errors.pydantic.dev/2.4/v/greater_than"
+ ]
+ }
+ ],
+ "source": [
+ "class Person(BaseModel):\n",
+ " name: str\n",
+ " age: int = Field(..., gt=0)\n",
+ "\n",
+ "Person(name=\"Sam\", age=-10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
- "## Asking for JSON from OpenAI"
+ "Lastly you can also define functions that run on the data."
]
},
{
"cell_type": "code",
- "execution_count": 3,
+ "execution_count": 30,
+ "metadata": {},
+ "outputs": [
+ {
+ "ename": "ValidationError",
+ "evalue": "1 validation error for Person\nname\n Value error, must contain a space [type=value_error, input_value='Sam', input_type=str]\n For further information visit https://errors.pydantic.dev/2.4/v/value_error",
+ "output_type": "error",
+ "traceback": [
+ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
+ "\u001b[0;31mValidationError\u001b[0m Traceback (most recent call last)",
+ "\u001b[1;32m/Users/jasonliu/dev/instructor/tutorials/1.introduction.ipynb Cell 16\u001b[0m line \u001b[0;36m1\n\u001b[1;32m 11\u001b[0m \u001b[39mraise\u001b[39;00m \u001b[39mValueError\u001b[39;00m(\u001b[39m\"\u001b[39m\u001b[39mmust contain a space\u001b[39m\u001b[39m\"\u001b[39m)\n\u001b[1;32m 12\u001b[0m \u001b[39mreturn\u001b[39;00m v\n\u001b[0;32m---> 14\u001b[0m Person(name\u001b[39m=\u001b[39;49m\u001b[39m\"\u001b[39;49m\u001b[39mSam\u001b[39;49m\u001b[39m\"\u001b[39;49m, age\u001b[39m=\u001b[39;49m\u001b[39m10\u001b[39;49m)\n",
+ "File \u001b[0;32m~/dev/instructor/.venv/lib/python3.11/site-packages/pydantic/main.py:164\u001b[0m, in \u001b[0;36mBaseModel.__init__\u001b[0;34m(__pydantic_self__, **data)\u001b[0m\n\u001b[1;32m 162\u001b[0m \u001b[39m# `__tracebackhide__` tells pytest and some other tools to omit this function from tracebacks\u001b[39;00m\n\u001b[1;32m 163\u001b[0m __tracebackhide__ \u001b[39m=\u001b[39m \u001b[39mTrue\u001b[39;00m\n\u001b[0;32m--> 164\u001b[0m __pydantic_self__\u001b[39m.\u001b[39;49m__pydantic_validator__\u001b[39m.\u001b[39;49mvalidate_python(data, self_instance\u001b[39m=\u001b[39;49m__pydantic_self__)\n",
+ "\u001b[0;31mValidationError\u001b[0m: 1 validation error for Person\nname\n Value error, must contain a space [type=value_error, input_value='Sam', input_type=str]\n For further information visit https://errors.pydantic.dev/2.4/v/value_error"
+ ]
+ }
+ ],
+ "source": [
+ "from pydantic import field_validator\n",
+ "\n",
+ "\n",
+ "class Person(BaseModel):\n",
+ " name: str\n",
+ " age: int = Field(..., gt=0)\n",
+ "\n",
+ " @field_validator(\"name\")\n",
+ " def name_must_contain_space(cls, v):\n",
+ " if \" \" not in v:\n",
+ " raise ValueError(\"must contain a space\")\n",
+ " return v\n",
+ " \n",
+ "Person(name=\"Sam\", age=10)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 31,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
- "Person(name='Jason', age=25)"
+ "Person(name='Sam Liu', age=10)"
]
},
- "execution_count": 3,
+ "execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
+ "source": [
+ "Person(name=\"Sam Liu\", age=10)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Asking for JSON from OpenAI"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 32,
+ "metadata": {},
+ "outputs": [
+ {
+ "ename": "ValidationError",
+ "evalue": "1 validation error for Person\nname\n Value error, must contain a space [type=value_error, input_value='Jason', input_type=str]\n For further information visit https://errors.pydantic.dev/2.4/v/value_error",
+ "output_type": "error",
+ "traceback": [
+ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
+ "\u001b[0;31mValidationError\u001b[0m Traceback (most recent call last)",
+ "\u001b[1;32m/Users/jasonliu/dev/instructor/tutorials/1.introduction.ipynb Cell 19\u001b[0m line \u001b[0;36m1\n\u001b[1;32m 3\u001b[0m client \u001b[39m=\u001b[39m OpenAI()\n\u001b[1;32m 5\u001b[0m resp \u001b[39m=\u001b[39m client\u001b[39m.\u001b[39mchat\u001b[39m.\u001b[39mcompletions\u001b[39m.\u001b[39mcreate(\n\u001b[1;32m 6\u001b[0m model\u001b[39m=\u001b[39m\u001b[39m\"\u001b[39m\u001b[39mgpt-3.5-turbo\u001b[39m\u001b[39m\"\u001b[39m,\n\u001b[1;32m 7\u001b[0m messages\u001b[39m=\u001b[39m[\n\u001b[1;32m 8\u001b[0m {\u001b[39m\"\u001b[39m\u001b[39mrole\u001b[39m\u001b[39m\"\u001b[39m: \u001b[39m\"\u001b[39m\u001b[39muser\u001b[39m\u001b[39m\"\u001b[39m, \u001b[39m\"\u001b[39m\u001b[39mcontent\u001b[39m\u001b[39m\"\u001b[39m: \u001b[39m\"\u001b[39m\u001b[39mExtract `Jason is 25 years old` into json\u001b[39m\u001b[39m\"\u001b[39m},\n\u001b[1;32m 9\u001b[0m ]\n\u001b[1;32m 10\u001b[0m )\n\u001b[0;32m---> 12\u001b[0m Person\u001b[39m.\u001b[39;49mmodel_validate_json(resp\u001b[39m.\u001b[39;49mchoices[\u001b[39m0\u001b[39;49m]\u001b[39m.\u001b[39;49mmessage\u001b[39m.\u001b[39;49mcontent)\n",
+ "File \u001b[0;32m~/dev/instructor/.venv/lib/python3.11/site-packages/pydantic/main.py:530\u001b[0m, in \u001b[0;36mBaseModel.model_validate_json\u001b[0;34m(cls, json_data, strict, context)\u001b[0m\n\u001b[1;32m 528\u001b[0m \u001b[39m# `__tracebackhide__` tells pytest and some other tools to omit this function from tracebacks\u001b[39;00m\n\u001b[1;32m 529\u001b[0m __tracebackhide__ \u001b[39m=\u001b[39m \u001b[39mTrue\u001b[39;00m\n\u001b[0;32m--> 530\u001b[0m \u001b[39mreturn\u001b[39;00m \u001b[39mcls\u001b[39;49m\u001b[39m.\u001b[39;49m__pydantic_validator__\u001b[39m.\u001b[39;49mvalidate_json(json_data, strict\u001b[39m=\u001b[39;49mstrict, context\u001b[39m=\u001b[39;49mcontext)\n",
+ "\u001b[0;31mValidationError\u001b[0m: 1 validation error for Person\nname\n Value error, must contain a space [type=value_error, input_value='Jason', input_type=str]\n For further information visit https://errors.pydantic.dev/2.4/v/value_error"
+ ]
+ }
+ ],
"source": [
"from openai import OpenAI\n",
"\n",
@@ -236,18 +348,7 @@
"cell_type": "code",
"execution_count": null,
"metadata": {},
- "outputs": [
- {
- "data": {
- "text/plain": [
- "Person(name='Jason Liu', age=30)"
- ]
- },
- "execution_count": 8,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
+ "outputs": [],
"source": [
"resp = client.chat.completions.create(\n",
" model=\"gpt-3.5-turbo\",\n",
@@ -268,32 +369,9 @@
},
{
"cell_type": "code",
- "execution_count": 11,
+ "execution_count": null,
"metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "{\n",
- " \"name\": \"Jason Liu\",\n",
- " \"age\": 30,\n",
- " \"birthday\": \"2023-11-17\"\n",
- "}\n",
- "name='Jason Liu' age=30\n"
- ]
- },
- {
- "data": {
- "text/plain": [
- "PersonBirthday(name='Jason Liu', age=30, birthday=datetime.date(2023, 11, 17))"
- ]
- },
- "execution_count": 11,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
+ "outputs": [],
"source": [
"import datetime\n",
"\n",
@@ -329,19 +407,18 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 33,
"metadata": {},
"outputs": [
{
- "ename": "ValidationError",
- "evalue": "1 validation error for PersonBirthday\nbirthday\n Input should be a valid date or datetime, input is too short [type=date_from_datetime_parsing, input_value='yesterday', input_type=str]\n For further information visit https://errors.pydantic.dev/2.4/v/date_from_datetime_parsing",
+ "ename": "NameError",
+ "evalue": "name 'datetime' is not defined",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
- "\u001b[0;31mValidationError\u001b[0m Traceback (most recent call last)",
- "\u001b[1;32m/Users/jasonliu/dev/instructor/tutorials/1.introduction.ipynb Cell 19\u001b[0m line \u001b[0;36m2\n\u001b[1;32m 1\u001b[0m schema \u001b[39m=\u001b[39m {\n\u001b[1;32m 2\u001b[0m \u001b[39m'\u001b[39m\u001b[39mproperties\u001b[39m\u001b[39m'\u001b[39m: \n\u001b[1;32m 3\u001b[0m {\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 9\u001b[0m \u001b[39m'\u001b[39m\u001b[39mtype\u001b[39m\u001b[39m'\u001b[39m: \u001b[39m'\u001b[39m\u001b[39mobject\u001b[39m\u001b[39m'\u001b[39m\n\u001b[1;32m 10\u001b[0m }\n\u001b[1;32m 12\u001b[0m resp \u001b[39m=\u001b[39m client\u001b[39m.\u001b[39mchat\u001b[39m.\u001b[39mcompletions\u001b[39m.\u001b[39mcreate(\n\u001b[1;32m 13\u001b[0m model\u001b[39m=\u001b[39m\u001b[39m\"\u001b[39m\u001b[39mgpt-3.5-turbo\u001b[39m\u001b[39m\"\u001b[39m,\n\u001b[1;32m 14\u001b[0m messages\u001b[39m=\u001b[39m[\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 18\u001b[0m function_call\u001b[39m=\u001b[39m\u001b[39m\"\u001b[39m\u001b[39mauto\u001b[39m\u001b[39m\"\u001b[39m\n\u001b[1;32m 19\u001b[0m )\n\u001b[0;32m---> 22\u001b[0m PersonBirthday\u001b[39m.\u001b[39;49mmodel_validate_json(resp\u001b[39m.\u001b[39;49mchoices[\u001b[39m0\u001b[39;49m]\u001b[39m.\u001b[39;49mmessage\u001b[39m.\u001b[39;49mfunction_call\u001b[39m.\u001b[39;49marguments)\n",
- "File \u001b[0;32m~/dev/instructor/.venv/lib/python3.11/site-packages/pydantic/main.py:530\u001b[0m, in \u001b[0;36mBaseModel.model_validate_json\u001b[0;34m(cls, json_data, strict, context)\u001b[0m\n\u001b[1;32m 528\u001b[0m \u001b[39m# `__tracebackhide__` tells pytest and some other tools to omit this function from tracebacks\u001b[39;00m\n\u001b[1;32m 529\u001b[0m __tracebackhide__ \u001b[39m=\u001b[39m \u001b[39mTrue\u001b[39;00m\n\u001b[0;32m--> 530\u001b[0m \u001b[39mreturn\u001b[39;00m \u001b[39mcls\u001b[39;49m\u001b[39m.\u001b[39;49m__pydantic_validator__\u001b[39m.\u001b[39;49mvalidate_json(json_data, strict\u001b[39m=\u001b[39;49mstrict, context\u001b[39m=\u001b[39;49mcontext)\n",
- "\u001b[0;31mValidationError\u001b[0m: 1 validation error for PersonBirthday\nbirthday\n Input should be a valid date or datetime, input is too short [type=date_from_datetime_parsing, input_value='yesterday', input_type=str]\n For further information visit https://errors.pydantic.dev/2.4/v/date_from_datetime_parsing"
+ "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)",
+ "\u001b[1;32m/Users/jasonliu/dev/instructor/tutorials/1.introduction.ipynb Cell 24\u001b[0m line \u001b[0;36m1\n\u001b[1;32m 1\u001b[0m schema \u001b[39m=\u001b[39m {\n\u001b[1;32m 2\u001b[0m \u001b[39m'\u001b[39m\u001b[39mproperties\u001b[39m\u001b[39m'\u001b[39m: \n\u001b[1;32m 3\u001b[0m {\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 9\u001b[0m \u001b[39m'\u001b[39m\u001b[39mtype\u001b[39m\u001b[39m'\u001b[39m: \u001b[39m'\u001b[39m\u001b[39mobject\u001b[39m\u001b[39m'\u001b[39m\n\u001b[1;32m 10\u001b[0m }\n\u001b[1;32m 12\u001b[0m resp \u001b[39m=\u001b[39m client\u001b[39m.\u001b[39mchat\u001b[39m.\u001b[39mcompletions\u001b[39m.\u001b[39mcreate(\n\u001b[1;32m 13\u001b[0m model\u001b[39m=\u001b[39m\u001b[39m\"\u001b[39m\u001b[39mgpt-3.5-turbo\u001b[39m\u001b[39m\"\u001b[39m,\n\u001b[1;32m 14\u001b[0m messages\u001b[39m=\u001b[39m[\n\u001b[0;32m---> 15\u001b[0m {\u001b[39m\"\u001b[39m\u001b[39mrole\u001b[39m\u001b[39m\"\u001b[39m: \u001b[39m\"\u001b[39m\u001b[39muser\u001b[39m\u001b[39m\"\u001b[39m, \u001b[39m\"\u001b[39m\u001b[39mcontent\u001b[39m\u001b[39m\"\u001b[39m: \u001b[39mf\u001b[39m\u001b[39m\"\u001b[39m\u001b[39mExtract `Jason Liu is thirty years old his birthday is yesturday` into json today is \u001b[39m\u001b[39m{\u001b[39;00mdatetime\u001b[39m.\u001b[39mdate\u001b[39m.\u001b[39mtoday()\u001b[39m}\u001b[39;00m\u001b[39m\"\u001b[39m},\n\u001b[1;32m 16\u001b[0m ],\n\u001b[1;32m 17\u001b[0m functions\u001b[39m=\u001b[39m[{\u001b[39m\"\u001b[39m\u001b[39mname\u001b[39m\u001b[39m\"\u001b[39m: \u001b[39m\"\u001b[39m\u001b[39mPerson\u001b[39m\u001b[39m\"\u001b[39m, \u001b[39m\"\u001b[39m\u001b[39mparameters\u001b[39m\u001b[39m\"\u001b[39m: schema}],\n\u001b[1;32m 18\u001b[0m function_call\u001b[39m=\u001b[39m\u001b[39m\"\u001b[39m\u001b[39mauto\u001b[39m\u001b[39m\"\u001b[39m\n\u001b[1;32m 19\u001b[0m )\n\u001b[1;32m 22\u001b[0m PersonBirthday\u001b[39m.\u001b[39mmodel_validate_json(resp\u001b[39m.\u001b[39mchoices[\u001b[39m0\u001b[39m]\u001b[39m.\u001b[39mmessage\u001b[39m.\u001b[39mfunction_call\u001b[39m.\u001b[39marguments)\n",
+ "\u001b[0;31mNameError\u001b[0m: name 'datetime' is not defined"
]
}
],
@@ -379,23 +456,19 @@
},
{
"cell_type": "code",
- "execution_count": 12,
+ "execution_count": 34,
"metadata": {},
"outputs": [
{
- "data": {
- "text/plain": [
- "{'properties': {'name': {'title': 'Name', 'type': 'string'},\n",
- " 'age': {'title': 'Age', 'type': 'integer'},\n",
- " 'birthday': {'format': 'date', 'title': 'Birthday', 'type': 'string'}},\n",
- " 'required': ['name', 'age', 'birthday'],\n",
- " 'title': 'PersonBirthday',\n",
- " 'type': 'object'}"
- ]
- },
- "execution_count": 12,
- "metadata": {},
- "output_type": "execute_result"
+ "ename": "NameError",
+ "evalue": "name 'PersonBirthday' is not defined",
+ "output_type": "error",
+ "traceback": [
+ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
+ "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)",
+ "\u001b[1;32m/Users/jasonliu/dev/instructor/tutorials/1.introduction.ipynb Cell 26\u001b[0m line \u001b[0;36m1\n\u001b[0;32m----> 1\u001b[0m PersonBirthday\u001b[39m.\u001b[39mmodel_json_schema()\n",
+ "\u001b[0;31mNameError\u001b[0m: name 'PersonBirthday' is not defined"
+ ]
}
],
"source": [
@@ -411,7 +484,7 @@
},
{
"cell_type": "code",
- "execution_count": 13,
+ "execution_count": 35,
"metadata": {},
"outputs": [
{
@@ -427,14 +500,14 @@
" 'type': 'object'}},\n",
" 'description': 'A Person with an address',\n",
" 'properties': {'name': {'title': 'Name', 'type': 'string'},\n",
- " 'age': {'title': 'Age', 'type': 'integer'},\n",
+ " 'age': {'exclusiveMinimum': 0, 'title': 'Age', 'type': 'integer'},\n",
" 'address': {'$ref': '#/$defs/Address'}},\n",
" 'required': ['name', 'age', 'address'],\n",
" 'title': 'PersonAddress',\n",
" 'type': 'object'}"
]
},
- "execution_count": 13,
+ "execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
@@ -464,7 +537,7 @@
},
{
"cell_type": "code",
- "execution_count": 14,
+ "execution_count": 37,
"metadata": {},
"outputs": [
{
@@ -473,13 +546,14 @@
"PersonAddress(name='Jason Liu', age=30, address=Address(address='123 Main St', city='San Francisco', state='CA'))"
]
},
- "execution_count": 14,
+ "execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import instructor\n",
+ "import datetime\n",
"\n",
"client = instructor.patch(client)\n",
"\n",
@@ -516,7 +590,14 @@
"\n",
"More importantly, we've also added straight forward validation and reasking to the mix.\n",
"\n",
- "The goal of instructor is to show you how to think about structured prompting and provide examples and documentation that you can take with you to any framework."
+ "The goal of instructor is to show you how to think about structured prompting and provide examples and documentation that you can take with you to any framework.\n",
+ "\n",
+ "\n",
+ "- [Marvin](https://www.askmarvin.ai/)\n",
+ "- [Langchain](https://python.langchain.com/docs/modules/model_io/output_parsers/pydantic)\n",
+ "- [LlamaIndex](https://gpt-index.readthedocs.io/en/latest/examples/output_parsing/openai_pydantic_program.html)\n",
+ "\n",
+ "The main difference between these libraries is that they all have different approaches to how they do it. With instructor the goal is to be as light weight as possible, get you as close as possible to the openai api, and then get out of your way."
]
}
],
@@ -536,7 +617,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
- "version": "3.10.12"
+ "version": "3.11.6"
}
},
"nbformat": 4,
diff --git a/tutorials/2.tips.ipynb b/tutorials/2.tips.ipynb
index 7abb79ead..7fadceb9e 100644
--- a/tutorials/2.tips.ipynb
+++ b/tutorials/2.tips.ipynb
@@ -32,7 +32,7 @@
},
{
"cell_type": "code",
- "execution_count": 9,
+ "execution_count": 5,
"id": "fdf5e1d9-31ad-4e8a-a55e-e2e70fff598d",
"metadata": {},
"outputs": [
@@ -42,15 +42,16 @@
"{'age': 17, 'name': 'Harry Potter', 'house': }"
]
},
- "execution_count": 9,
+ "execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from enum import Enum\n",
- "from pydantic import BaseModel,Field\n",
- "from typing import List,Literal\n",
+ "from pydantic import BaseModel, Field\n",
+ "from typing_extensions import Literal\n",
+ "\n",
"import instructor\n",
"from openai import OpenAI\n",
"\n",
@@ -83,7 +84,7 @@
},
{
"cell_type": "code",
- "execution_count": 10,
+ "execution_count": 6,
"id": "03db160c-81e9-4373-bfec-7a107224b6dd",
"metadata": {},
"outputs": [
@@ -93,7 +94,7 @@
"{'age': 17, 'name': 'Harry Potter', 'house': 'Gryffindor'}"
]
},
- "execution_count": 10,
+ "execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
@@ -129,7 +130,7 @@
},
{
"cell_type": "code",
- "execution_count": 4,
+ "execution_count": 7,
"id": "0e7938b8-4666-4df4-bd80-f53e8baf7550",
"metadata": {},
"outputs": [
@@ -139,13 +140,13 @@
"{'age': 38,\n",
" 'name': 'Severus Snape',\n",
" 'house': 'Slytherin',\n",
- " 'properties': [{'key': 'role', 'value': 'Professor of Potions'},\n",
- " {'key': 'loyalty', 'value': 'Dumbledore'},\n",
+ " 'properties': [{'key': 'position', 'value': 'Professor of Potions'},\n",
+ " {'key': 'loyalty', 'value': 'Dumbledore, Hogwarts'},\n",
" {'key': 'patronus', 'value': 'Doe'},\n",
- " {'key': 'played_by', 'value': 'Alan Rickman'}]}"
+ " {'key': 'skill', 'value': 'Occlumency'}]}"
]
},
- "execution_count": 4,
+ "execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
@@ -190,7 +191,7 @@
},
{
"cell_type": "code",
- "execution_count": 5,
+ "execution_count": 8,
"id": "69a58d01-ab6f-41b6-bc0c-b0e55fdb6fe4",
"metadata": {},
"outputs": [
@@ -202,18 +203,18 @@
" 'house': 'Slytherin',\n",
" 'properties': [{'index': '1',\n",
" 'key': 'Occupation',\n",
- " 'value': 'Potions Master, Defense Against the Dark Arts Professor, Headmaster'},\n",
+ " 'value': 'Professor of Potions and later Defence Against the Dark Arts'},\n",
" {'index': '2',\n",
- " 'key': 'Loyalty',\n",
- " 'value': 'Hogwarts, Order of the Phoenix, Albus Dumbledore, Lily Potter'},\n",
- " {'index': '3',\n",
+ " 'key': 'Allegiance',\n",
+ " 'value': 'Order of the Phoenix, Hogwarts'},\n",
+ " {'index': '3', 'key': 'Patronus', 'value': 'Doe'},\n",
+ " {'index': '4',\n",
" 'key': 'Skills',\n",
- " 'value': 'Potions, Occlumency, Legilimency, Spell creation'},\n",
- " {'index': '4', 'key': 'Patronus', 'value': 'Doe'},\n",
- " {'index': '5', 'key': 'Actor', 'value': 'Alan Rickman'}]}"
+ " 'value': 'Potions master, Occlumens, Legilimens'},\n",
+ " {'index': '5', 'key': 'Portrayed by', 'value': 'Alan Rickman'}]}"
]
},
- "execution_count": 5,
+ "execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
@@ -255,7 +256,7 @@
},
{
"cell_type": "code",
- "execution_count": 6,
+ "execution_count": 9,
"id": "1f2a2b14-a956-4f96-90c9-e11ca04ab7d1",
"metadata": {},
"outputs": [
@@ -266,33 +267,33 @@
" 'name': 'Severus Snape',\n",
" 'house': 'Slytherin',\n",
" 'properties': [{'index': '1',\n",
- " 'key': 'Occupation',\n",
- " 'value': 'Potions Master, Defense Against the Dark Arts teacher, Headmaster'},\n",
- " {'index': '2',\n",
- " 'key': 'Loyalty',\n",
- " 'value': 'Hogwarts School, Order of the Phoenix, Albus Dumbledore'},\n",
- " {'index': '3',\n",
- " 'key': 'Skills',\n",
- " 'value': 'Potions, Occlumency, Legilimency'},\n",
- " {'index': '4', 'key': 'Patronus', 'value': 'Doe'},\n",
+ " 'key': 'Role',\n",
+ " 'value': 'Professor of Potions, later Defence Against the Dark Arts, and head of Slytherin House'},\n",
+ " {'index': '2', 'key': 'Patronus', 'value': 'Doe'},\n",
+ " {'index': '3', 'key': 'Loyalty', 'value': 'Dumbledore, Harry, Hogwarts'},\n",
+ " {'index': '4',\n",
+ " 'key': 'Special Skill',\n",
+ " 'value': 'Occlumens, Potions Master'},\n",
" {'index': '5', 'key': 'Played by', 'value': 'Alan Rickman'}]},\n",
" {'age': 115,\n",
" 'name': 'Albus Dumbledore',\n",
" 'house': 'Gryffindor',\n",
" 'properties': [{'index': '1',\n",
- " 'key': 'Occupation',\n",
- " 'value': 'Headmaster, Founder of the Order of the Phoenix'},\n",
- " {'index': '2',\n",
+ " 'key': 'Role',\n",
+ " 'value': 'Headmaster of Hogwarts'},\n",
+ " {'index': '2', 'key': 'Patronus', 'value': 'Phoenix'},\n",
+ " {'index': '3',\n",
" 'key': 'Loyalty',\n",
- " 'value': 'Hogwarts School, Order of the Phoenix'},\n",
- " {'index': '3', 'key': 'Skills', 'value': 'Transfiguration, Alchemy'},\n",
- " {'index': '4', 'key': 'Patronus', 'value': 'Phoenix'},\n",
+ " 'value': 'Order of the Phoenix, Hogwarts'},\n",
+ " {'index': '4',\n",
+ " 'key': 'Special Skill',\n",
+ " 'value': 'Considered to be the most powerful wizard of his time'},\n",
" {'index': '5',\n",
" 'key': 'Played by',\n",
- " 'value': 'Richard Harris (films 1-2), Michael Gambon (films 3-8)'}]}]}"
+ " 'value': 'Richard Harris (films 1-2), Michael Gambon (films 3-6)'}]}]}"
]
},
- "execution_count": 6,
+ "execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
@@ -326,25 +327,21 @@
},
{
"cell_type": "code",
- "execution_count": 7,
+ "execution_count": 12,
"id": "6de8768e-b36a-4a51-9cf9-940d178552f6",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
- "{'users': [{'id': 1, 'name': 'Harry Potter', 'friends': [2, 3, 4, 5, 6, 7]},\n",
- " {'id': 2, 'name': 'Ron Weasley', 'friends': [1, 3, 5, 6, 7]},\n",
- " {'id': 3, 'name': 'Hermione Granger', 'friends': [1, 2, 5, 6, 7]},\n",
- " {'id': 4, 'name': 'Draco Malfoy', 'friends': [1, 7]},\n",
- " {'id': 5, 'name': 'Neville Longbottom', 'friends': [1, 2, 3]},\n",
- " {'id': 6, 'name': 'Luna Lovegood', 'friends': [1, 2, 3]},\n",
- " {'id': 7, 'name': 'Ginny Weasley', 'friends': [1, 2, 3]},\n",
- " {'id': 8, 'name': 'Fred Weasley', 'friends': [2, 7]},\n",
- " {'id': 9, 'name': 'George Weasley', 'friends': [2, 7]}]}"
+ "{'users': [{'id': 1, 'name': 'Harry Potter', 'friends': [2, 3, 4, 5]},\n",
+ " {'id': 2, 'name': 'Hermione Granger', 'friends': [1, 3, 4, 5]},\n",
+ " {'id': 3, 'name': 'Ron Weasley', 'friends': [1, 2, 4, 5]},\n",
+ " {'id': 4, 'name': 'Ginny Weasley', 'friends': [1, 2, 3, 5]},\n",
+ " {'id': 5, 'name': 'Neville Longbottom', 'friends': [1, 2, 3, 4]}]}"
]
},
- "execution_count": 7,
+ "execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
@@ -363,7 +360,7 @@
" messages=[\n",
" {\n",
" \"role\": \"user\", \n",
- " \"content\": \"The kids from from Harry Potter\"\n",
+ " \"content\": \"The 5 kids from from Harry Potter\"\n",
" }\n",
" ],\n",
" response_model=Characters\n",
@@ -373,7 +370,7 @@
},
{
"cell_type": "code",
- "execution_count": 8,
+ "execution_count": 13,
"id": "b31e10d7-ebd2-49b4-b2c4-20dd67ca135d",
"metadata": {},
"outputs": [
@@ -386,153 +383,105 @@
"\n",
"\n",
- "