<a href="https://colab.research.google.com/github/iIPM2023/superagent/blob/main/Instrctor_Tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Support this genious project: https://github.com/jxnl/instructor

In [None]:
!pip install langchain pydantic[email] git+https://github.com/VRSEN/agency-swarm.git instructor youtube_search

Collecting git+https://github.com/VRSEN/agency-swarm.git
  Cloning https://github.com/VRSEN/agency-swarm.git to /tmp/pip-req-build-4mydr6e3
  Running command git clone --filter=blob:none --quiet https://github.com/VRSEN/agency-swarm.git /tmp/pip-req-build-4mydr6e3
  Resolved https://github.com/VRSEN/agency-swarm.git to commit 9e19c7dc033d1430ffa6aef2f1216e5ffdf1d728
  Preparing metadata (setup.py) ... [?25l[?25hdone


In [None]:
import openai
from openai import OpenAI
import instructor
from getpass import getpass
openai.api_key = getpass("Paste your openai api key: ")

Paste your openai api key: ··········


# Pathcing the clent
This exposes new functionality with the response_model parameter.

In [None]:
client = instructor.patch(OpenAI(api_key = openai.api_key))

# Basic Data Extraction
Use `response_model` parameter to get structured outputs from your openai completions!

In [None]:
from pydantic import BaseModel, Field

class UserDetail(BaseModel):
    """Extract user data"""
    name: str
    age: int

In [None]:
user: UserDetail = client.chat.completions.create(
    model="gpt-3.5-turbo",
    response_model=UserDetail,
    messages=[
        {"role": "user", "content": "Hi, my name is Arsenii, and I am 24 yo."},
    ]
)

In [None]:
user.name

'Arsenii'

# Fields
The `pydantic.Field` function is used to customize and add metadata to fields of models. To learn more check out the pydantic [documentation](https://docs.pydantic.dev/latest/concepts/fields/) as this is a near replica of that documentation that is relevant to prompting.

Value is not required when `default` or `default_factory` is set.

In [None]:
from pydantic import BaseModel
from uuid import uuid4
from typing import Optional

class User(BaseModel):
    id: str = Field(default_factory=lambda: uuid4().hex)
    id2: Optional[str]

### Customizing JSON Schema

There are fields that exclusively to customise the generated JSON Schema:

- `title`: The title of the field.
- `description`: The description of the field.
- `examples`: The examples of the field.
- `json_schema_extra`: Extra JSON Schema properties to be added to the field.

These all work as great opportunities to add more information to the JSON Schema as part
of your prompt engineering.

In [None]:
from pydantic import BaseModel, EmailStr, Field, SecretStr

from instructor import OpenAISchema


class User(BaseModel):
    age: int = Field(description='Age of the user')
    email: EmailStr = Field(examples=['marcelo@mail.com'])
    name: str = Field(title='Username')
    password: SecretStr = Field(
        json_schema_extra={
            'title': 'Password',
            'description': 'Password of the user',
            'examples': ['123456'],
        }
    )

# Validation
Instead of framing "self-critique" or "self-reflection" in AI as new concepts, we can view them as validation errors with clear error messages that the systen can use to self correct.

In [None]:
from pydantic import field_validator

class RefundDetails(BaseModel):
    """Use this function to issue a refund to the customer"""
    customer_name: str
    amount: int

    @field_validator("amount")
    @classmethod
    def validate_amount(cls, v):
        if v > 100:
            raise ValueError("Amount must be less than 100")
        return v

In [None]:
refund: RefundDetails = client.chat.completions.create(
    model="gpt-3.5-turbo",
    response_model=RefundDetails,
    # max_retries=2,
    messages=[
        {"role": "user", "content": "Hi, my name is Arsenii, and I need a 1000$ refund"},
    ]
)

In [None]:
refund.amount

100

In [None]:
refund.customer_name

'Arsenii'

### LLM-Based Validation Example

[Docs](https://jxnl.github.io/instructor/blog/2023/10/23/good-llm-validation-is-just-good-validation/?h=llm+val#creating-your-own-field-level-llm_validator)

LLM-based validation can also be plugged into the same Pydantic model. Here, if the answer attribute contains content that violates the rule "don't say objectionable things," Pydantic will raise a validation error.

In [None]:
question = "What is the meaning of life?"
context = "The meaning of life, according to the context, is to live a life of sin and debauchery."

In [None]:
from typing_extensions import Annotated
from pydantic import BeforeValidator, AfterValidator
from instructor import llm_validator

class QuestionAnswerNoEvil(BaseModel):
    question: str
    answer: Annotated[
        str,
        BeforeValidator(
            llm_validator("the answer must not say objectionable things", openai_client=client)
        ),
    ]

try:
    qa: QuestionAnswerNoEvil = client.chat.completions.create(
        model="gpt-3.5-turbo",
        response_model=QuestionAnswerNoEvil,
        max_retries=0, # cahnge to 2
        messages=[
            {
                "role": "system",
                "content": "You are a system that answers questions based on the context. answer exactly what the question asks using the context.",
            },
            {
                "role": "user",
                "content": f"Using the context: {context}\n\nAnswer the following question: {question}",
            },
        ],
    )
except Exception as e:
    print(e)

In [None]:
qa.answer

'The meaning of life is to seek happiness, fulfillment, and personal growth.'

# Converting to OpenAI Schema
To convert your model into OpenAI function you simply need to extend OpenAISchema. All the same principles apply.

In [None]:
from instructor import OpenAISchema

class User(OpenAISchema):
    name: str
    age: int

    def run(self):
      print(f"Executing function. User name is {self.name}. Age is {self.age}")

In [None]:
completion = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[{
        "content": "Jason is 20 years old",
        "role": "user"
    }],
    tools=[{
        "type": "function",
        "function": User.openai_schema, # add your function
    }],
    tool_choice={"type": "function", "function": {"name": User.openai_schema['name']}}
)

In [None]:
def execute_tool(tool_call, funcs):
    func = next(iter([func for func in funcs if func.__name__ == tool_call.function.name]))

    if not func:
        return f"Error: Function {tool_call.function.name} not found. Available functions: {[func.__name__ for func in funcs]}"

    try:
        # init tool
        func = func(**eval(tool_call.function.arguments))
        # get outputs from the tool
        output = func.run()

        return output
    except Exception as e:
        return "Error: " + str(e)

In [None]:
for tool_call in completion.choices[0].message.tool_calls:
  execute_tool(tool_call, [User])

Executing function. User name is Jason. Age is 20


# Tips
The overarching theme of using Instructor and Pydantic for function calling is to make the models as self-descriptive, modular, and flexible as possible, while maintaining data integrity and ease of use.


### Always include "other" options

In [None]:
from typing import Literal
from enum import Enum
class Role(Enum):
    PRINCIPAL = auto()
    TEACHER = auto()
    STUDENT = auto()
    OTHER = auto()

class UserDetail(BaseModel):
    role: Literal["PRINCIPAL", "TEACHER", "STUDENT", "OTHER"]
    role2: Role = Field(..., description="Extract any other properties that might be relevant.")

### Use special fields

This approach to "chain of thought" improves data quality but can have modular components rather than global CoT. For complex attributes, it helps to reiterate the instructions in the field's description.

In [None]:
class Role(OpenAISchema):
    """
    Extract the role based on the following rules ...
    """
    instructions: str = Field(..., description="Restate the instructions and rules to correctly determine the title.")
    title: str

class Role(OpenAISchema):
    chain_of_thought: str = Field(...,
        description="Think step by step to determine the correct title",
                                  exclude=True)
    title: str

### Define relationships

In [None]:
class UserDetail(OpenAISchema):
    id: int = Field(..., description="Unique identifier for each user.")
    age: int
    name: str
    friends: List[int] = Field(..., description="Correct and complete list of friend IDs, representing relationships between users.")

class UserRelationships(OpenAISchema):
    users: List[UserDetail] = Field(..., description="Collection of users, correctly capturing the relationships among them.")

# Using with Agency Swarm
https://github.com/VRSEN/agency-swarm

The only difference is that you need to implement run method, and extend the BaseTool, which also extends OpenAISchema.

In [None]:
from agency_swarm import BaseTool

class User(BaseTool):
    name: str
    age: int

    def run(self):
      print(f"Executing function. User name is {self.name}. Age is {self.age}")

### Importing tools from langchain
You can now import tools from langchain in 1 line of code with a special `ToolFacotry` class!

In [None]:
from langchain.tools import YouTubeSearchTool
from agency_swarm.tools import ToolFactory
from agency_swarm import Agent, Agency, set_openai_key
set_openai_key(openai.api_key)

LangchainTool = ToolFactory.from_langchain_tool(YouTubeSearchTool)

In [None]:
from langchain.agents import load_tools

tools = load_tools(
    ["human"],
)

tools = ToolFactory.from_langchain_tools(tools)

In [None]:
agent = Agent(
    name="test_agent",
    tools=[LangchainTool, *tools]
)

agency = Agency(
    [agent]
)

In [None]:
message = agency.get_completion("Search YouTube for a video about lex fridman", False)
print(message)

Here are some YouTube videos about Lex Fridman:

1. [Video 1](https://www.youtube.com/watch?v=JN3KPFbWCy8)
2. [Video 2](https://www.youtube.com/watch?v=r4wLXNydzeY)
3. [Video 3](https://www.youtube.com/watch?v=co_MeKSnyAo)
4. [Video 4](https://www.youtube.com/watch?v=MVYrJJNdrEg)
5. [Video 5](https://www.youtube.com/watch?v=uZN5xjoS6TU)

Please explore these links for content related to Lex Fridman.


# Example
Improving your RAG applications with direct quotes!

In [None]:
import re, json
from pydantic import Field, BaseModel, model_validator, FieldValidationInfo
from typing import List

class Fact(BaseModel):
    fact: str = Field(...)
    substring_quote: List[str] = Field(...)

    @model_validator(mode="after")
    def validate_sources(self, info: FieldValidationInfo) -> "Fact":
        text_chunks = info.context.get("text_chunk", None)
        spans = list(self.get_spans(text_chunks))
        self.substring_quote = [text_chunks[span[0] : span[1]] for span in spans]
        return self

    def get_spans(self, context):
        for quote in self.substring_quote:
            yield from self._get_span(quote, context)

    def _get_span(self, quote, context):
        for match in re.finditer(re.escape(quote), context):
            yield match.span()

In [None]:
class QuestionAnswer(BaseModel):
    question: str = Field(...)
    answer: List[Fact] = Field(...)

    @model_validator(mode="after")
    def validate_sources(self) -> "QuestionAnswer":
        self.answer = [fact for fact in self.answer if len(fact.substring_quote) > 0]
        return self

In [None]:
question = "What did the author do during college?"
context = """
My name is Jason Liu, and I grew up in Toronto Canada but I was born in China.
I went to an arts high school but in university I studied Computational Mathematics and physics.
As part of coop I worked at many companies including Stitchfix, Facebook.
I also started the Data Science club at the University of Waterloo and I was the president of the club for 2 years.
"""

In [None]:
qa = client.chat.completions.create(
        model="gpt-3.5-turbo-0613",
        temperature=0,
        response_model=QuestionAnswer,
        messages=[
            {"role": "system", "content": "You are a world class algorithm to answer questions with correct and exact citations."},
            {"role": "user", "content": f"{context}"},
            {"role": "user", "content": f"Question: {question}"}
        ],
        validation_context={"text_chunk": context},
    )

In [None]:
print(json.dumps(qa.model_dump(), indent=4))

{
    "question": "What did the author do during college?",
    "answer": [
        {
            "fact": "The author studied Computational Mathematics and physics in university.",
            "substring_quote": [
                "in university I studied Computational Mathematics and physics."
            ]
        },
        {
            "fact": "The author started the Data Science club at the University of Waterloo and was the president of the club for 2 years.",
            "substring_quote": [
                "started the Data Science club at the University of Waterloo",
                "president of the club for 2 years."
            ]
        }
    ]
}


# Chekout Docs For More Details:
[Instructor cookbook](https://jxnl.github.io/instructor/examples/)

[Docs](https://jxnl.github.io/instructor/)