## 十三. 结构化输出(Structured Output)

有时候我们希望输出的内容不是普通的字符串文本，而是像 json 那样结构化的数据，更能快捷高效地交付给下游业务使用。

In [6]:
from langchain.output_parsers import StructuredOutputParser, ResponseSchema
from langchain.prompts import PromptTemplate
from langchain import OpenAI
llm = OpenAI(model_name="text-davinci-003")

# 告诉他我们生成的内容需要哪些字段，每个字段类型式啥
# 定义响应的结构(JSON)，两个字段 answer和source。
response_schemas = [
    ResponseSchema(name="answer", description="answer to the user's question"),
    ResponseSchema(name="source", description="source referred to answer the user's question, should be a website.")
]

# 初始化解析器
output_parser = StructuredOutputParser.from_response_schemas(response_schemas)


# 获取响应格式化的指令
format_instructions = output_parser.get_format_instructions()
format_instructions += "\n Only markdown content, no other else!"
format_instructions

'The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "```json" and "```":\n\n```json\n{\n\t"answer": string  // answer to the user\'s question\n\t"source": string  // source referred to answer the user\'s question, should be a website.\n}\n```\n Only markdown content, no other else!'

In [8]:
# partial_variables允许在代码中预填充提示此模版的部分变量。这类似于接口，抽象类之间的关系prompt = PromptTemplate(
#template="answer the users question as best as possible.\n{format_instructions}\n{question}",
template = """
answer the users question as best as possible.

{format_instructions}

% QUESTION:
{question}

"""


# 将我们的格式描述嵌入到 prompt 中去，告诉 llm 我们需要他输出什么样格式的内容
prompt = PromptTemplate(
    input_variables=["question"],
    partial_variables={"format_instructions": format_instructions},
    template=template
)

In [9]:
response = prompt.format_prompt(question="what's the capital of France?")
output = llm(response.to_string())
output
output_parser.parse(output)

{'answer': 'Paris', 'source': 'https://en.wikipedia.org/wiki/Paris'}