# PydanticOutputParser
- 모델의 출력을 구조화된 정보로 변환하는데 도움을 주는 클래스.
- 사용자가 원하는 출력 구조를 Pydantic Class로 정의하면 모델의 출력을 그에 맞게 변환하여 제공  
- 이러한 OutputParser 클래스들은 다음 두 가지 핵심 메서드가 구현되어야 한다.
    - `get_format_instructions()`: 모델이 출력해야할 정보 형식에 대한 지침 제공. 사용자가 정의한 Pydantic Class 구조에 따라 지침을 자동으로 생성해준다.
    - `parse()`: 모델의 출력을 사용자가 정의한 Pydantic Class로 변환한다. 그 과정에서 모델의 출력과 사용자의 Pydantic Class와 비교하여 출력 스키마를 검증한다.

In [1]:
import os
from dotenv import load_dotenv

load_dotenv()
print(os.environ["MODEL_ID"])

meta-llama/Meta-Llama-3-8B-Instruct


In [11]:
from langchain_community.chat_models.huggingface import ChatHuggingFace
from langchain_community.llms import HuggingFaceEndpoint

llm = HuggingFaceEndpoint(
    repo_id=os.environ["MODEL_ID"], 
    # max_new_tokens=1024,
    temperature=0.1,
    huggingfacehub_api_token=os.environ["HF_API_KEY"],
)
model = ChatHuggingFace(llm=llm)

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: read).
Your token has been saved to /home/dudaji/.cache/huggingface/token
Login successful


Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [12]:
email_conversation = """From: Chulsoo Kim (chulsoo.kim@bikecorporation.me)
To: Eunchae Lee (eunchae@teddyinternational.me)
Subject: "ZENESIS" bicycle distribution cooperation and meeting schedule proposal

Hello, Assistant Manager Eunchae Lee,

My name is Kim Cheol-soo, and I am the managing director of Bike Corporation. I learned about your new bicycle “ZENESIS” through a recent press release. Bike Corporation is a leader in innovation and quality in bicycle manufacturing and distribution, with long-term experience and expertise in this field.

I would like to request a detailed brochure for the ZENESIS model. In particular, you will need information about technical specifications, battery performance, and design aspects. Through this, we will be able to further refine the distribution strategy and marketing plan we will propose.

We also propose a meeting next Tuesday (January 15th) at 10am to further discuss possibilities for cooperation. Can we meet at your office and talk?

thank you

Cheolsu Kim
Executive Director
Bike Corporation
"""

In [13]:
from langchain.output_parsers import PydanticOutputParser
from langchain_core.pydantic_v1 import BaseModel, Field

class EmailSummary(BaseModel):
    person: str = Field(description="Person who sent email")
    email: str = Field(description="Email address whose sent email")
    subject: str = Field(description="Subject of email")
    summary: str = Field(description="Summary of email content")
    date: str = Field(description="Day of Week, Month, Day mentioned in email content")

parser = PydanticOutputParser(pydantic_object=EmailSummary)
parser.get_format_instructions()

'The output should be formatted as a JSON instance that conforms to the JSON schema below.\n\nAs an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}\nthe object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.\n\nHere is the output schema:\n```\n{"properties": {"person": {"title": "Person", "description": "Person who sent email", "type": "string"}, "email": {"title": "Email", "description": "Email address whose sent email", "type": "string"}, "subject": {"title": "Subject", "description": "Subject of email", "type": "string"}, "summary": {"title": "Summary", "description": "Summary of email content", "type": "string"}, "date": {"title": "Date", "description": "Day of Week, Month, Day mentioned in email content", "type": "string"}}, "required": ["person", "email", "subject", "s

In [24]:
from langchain.prompts import PromptTemplate
prompt = PromptTemplate.from_template(
    """
You are a helpful assistant. Please answer the following questions.

QUESTION:
{question}

EMAIL CONVERSATION:
{email_conversation}

FORMAT:
{format}
"""
)
prompt = prompt.partial(format=parser.get_format_instructions())

In [25]:
chain = prompt | model
response = chain.invoke({"email_conversation": email_conversation, "question": "Extract main content of email"})
print(response.content)

Here is the extracted main content of the email in the required JSON format:

```
{
  "person": "Chulsoo Kim",
  "email": "chulsoo.kim@bikecorporation.me",
  "subject": "\"ZENESIS\" bicycle distribution cooperation and meeting schedule proposal",
  "summary": "Request for detailed brochure of ZENESIS bicycle model and proposal for meeting to discuss cooperation",
  "date": "Tuesday (January 15th)"
}
```

Let me know if you need any further assistance!


In [26]:
parser.parse(response.content)

EmailSummary(person='Chulsoo Kim', email='chulsoo.kim@bikecorporation.me', subject='"ZENESIS" bicycle distribution cooperation and meeting schedule proposal', summary='Request for detailed brochure of ZENESIS bicycle model and proposal for meeting to discuss cooperation', date='Tuesday (January 15th)')

이전에 사용했던 `StrOutputParser` 처럼 `PydanticOutputParser`도 chain에 같이 묶어서 사용할 수 있다.

In [28]:
chain = prompt | model | parser
response = chain.invoke({
    "email_conversation": email_conversation,
    "question": "Extract main content of email",
})
response

EmailSummary(person='Chulsoo Kim', email='chulsoo.kim@bikecorporation.me', subject='"ZENESIS" bicycle distribution cooperation and meeting schedule proposal', summary='Request for detailed brochure of ZENESIS bicycle model and proposal for meeting to discuss cooperation', date='Tuesday (January 15th)')