## Output parse

docs: https://python.langchain.com/en/latest/modules/prompts/output_parsers/getting_started.html

对输出做约定，默认输出的是文本，如果希望是结构化的数据，可以用 `langchain.output_parses` 来实现，包括：

1. 自定义输出（比如一个类对象）
2. 输出JSON
3. 输出List
4. 输出枚举

本质：

```python
# 1.选择或者自定义output parse
parse = xxx # langchain.output_parses里有很多
# 2.获得格式说明
out_fmt = parse.get_format_instructions()
# 3.作为prompt 部分变量
prompt = PromptTemplate(
    template='Answer the user query.\n{format_instructions}\n{query}\n',
    input_variables=['query'],
    partial_variables={'format_instructions': out_fmt} 
)
# 4.parse 解析llm返回的结果字符串
output: str = llm(prompt.format_prompt(k,v ... xxx).to_string())
output_parse.parse(output)
```

In [1]:
# Get more structured information from the output of the model
from langchain.prompts import PromptTemplate, ChatPromptTemplate, HumanMessagePromptTemplate
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI

from langchain.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field, validator
from typing import List, Optional, Dict, Any

In [3]:
from app.init import init_llm
init_llm()

Init OpenAI env


<module 'openai' from '/Users/lingoace/Documents/GitHub/llm-101/venv/lib/python3.10/site-packages/openai/__init__.py'>

In [4]:
model_name = 'text-davinci-003'
temperature = 0.0
llm = OpenAI(model_name=model_name, temperature=temperature)

In [5]:
class Joke(BaseModel):
    setup: str = Field(description='question to set up a joke')
    punchline: str = Field(description='answer to resolve the joke')

    @validator('setup')
    def question_ends_with_question_mark(cls, field):
        if field[-1] != '?':
            raise ValueError('question must end with a question mark')
        return field

In [6]:
# set up a parse, inject instructions into the prompt template
parse = PydanticOutputParser(pydantic_object=Joke)
print(parse.get_format_instructions())

The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"properties": {"setup": {"title": "Setup", "description": "question to set up a joke", "type": "string"}, "punchline": {"title": "Punchline", "description": "answer to resolve the joke", "type": "string"}}, "required": ["setup", "punchline"]}
```


In [7]:
prompt = PromptTemplate(
    template='Answer the user query.\n{format_instructions}\n{query}\n',
    input_variables=['query'],
    partial_variables={'format_instructions': parse.get_format_instructions()} 
)

prompt

PromptTemplate(input_variables=['query'], output_parser=None, partial_variables={'format_instructions': 'The output should be formatted as a JSON instance that conforms to the JSON schema below.\n\nAs an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}}\nthe object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.\n\nHere is the output schema:\n```\n{"properties": {"setup": {"title": "Setup", "description": "question to set up a joke", "type": "string"}, "punchline": {"title": "Punchline", "description": "answer to resolve the joke", "type": "string"}}, "required": ["setup", "punchline"]}\n```'}, template='Answer the user query.\n{format_instructions}\n{query}\n', template_format='f-string', validate_template=True)

In [8]:
output = llm(prompt.format_prompt(query='What do you call a pig that does karate?').to_string())
print(output)

# structed output
parse.parse(output)


{"setup": "What do you call a pig that does karate?", "punchline": "A pork chop!"}


Joke(setup='What do you call a pig that does karate?', punchline='A pork chop!')

### CommaSeparatedListOutputParser
docs: https://python.langchain.com/en/latest/modules/prompts/output_parsers/examples/comma_separated.html

In [9]:
from langchain.output_parsers import CommaSeparatedListOutputParser

output_parse = CommaSeparatedListOutputParser() # List

format_instructions = output_parse.get_format_instructions()
prompt = PromptTemplate(
    template = 'List five {subject}.\n{format_instructions}',
    input_variables=['subject'],
    partial_variables={'format_instructions': format_instructions}
)

output = llm(prompt.format_prompt(subject='programming languages').to_string())
print(output)

output_parse.parse(output)



Java, C++, Python, JavaScript, Ruby


['Java', 'C++', 'Python', 'JavaScript', 'Ruby']

## Structured
docs: https://python.langchain.com/en/latest/modules/prompts/output_parsers/examples/structured.html

In [12]:
from langchain.output_parsers import StructuredOutputParser, ResponseSchema
from langchain.prompts import PromptTemplate, ChatPromptTemplate, HumanMessagePromptTemplate
from langchain.chat_models import ChatOpenAI

# here we define the response schema we want to receive.
response_schemas = [
    ResponseSchema(name='answer', description='answer to the question'),
    ResponseSchema(name='source', description="source used to answer the user's question, should be a website.")
]
output_parse = StructuredOutputParser.from_response_schemas(response_schemas)
output_parse

StructuredOutputParser(response_schemas=[ResponseSchema(name='answer', description='answer to the question', type='string'), ResponseSchema(name='source', description="source used to answer the user's question, should be a website.", type='string')])

In [13]:
format_instructions = output_parse.get_format_instructions()
format_instructions

'The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "```json" and "```":\n\n```json\n{\n\t"answer": string  // answer to the question\n\t"source": string  // source used to answer the user\'s question, should be a website.\n}\n```'

In [15]:
prompt = PromptTemplate(
    template="answer the users question as best as possible.\n{format_instructions}\n{question}",
    input_variables=["question"],
    partial_variables={"format_instructions": format_instructions}
)

output = llm(prompt.format_prompt(question="what's the capital of france?").to_string())
print(output)

# structed output
output_parse.parse(output)



```json
{
	"answer": "Paris",
	"source": "https://en.wikipedia.org/wiki/Paris"
}
```


{'answer': 'Paris', 'source': 'https://en.wikipedia.org/wiki/Paris'}

In [17]:
# also for chat_model

chat_model = ChatOpenAI(temperature=0)
chat_prompt = ChatPromptTemplate(
    messages=[
        HumanMessagePromptTemplate.from_template('answer the users question as best as possible.\n{format_instructions}\n{question}'),
    ],
    input_variables=["question"],
    partial_variables={"format_instructions": format_instructions}
)

_input = chat_prompt.format_prompt(question="what's the capital of france?")
output = chat_model(_input.to_messages())
output_parse.parse(output.content)

{'answer': 'Paris', 'source': 'https://en.wikipedia.org/wiki/Paris'}