<a href="https://colab.research.google.com/github/sugarforever/wtf-langchain/blob/main/05_Output_Parsers/05_Output_Parsers.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 05 输出解析器

LLM的输出为文本，但在程序中除了显示文本，可能希望获得更结构化的数据。这就是输出解析器（Output Parsers）的用武之地。

In [12]:
!pip install -q langchain==0.0.235 openai

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/73.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m73.6/73.6 kB[0m [31m2.3 MB/s[0m eta [36m0:00:00[0m
[?25h

## List Parser

List Parser将逗号分隔的文本解析为列表。

In [9]:
from langchain.output_parsers import CommaSeparatedListOutputParser

output_parser = CommaSeparatedListOutputParser()
output_parser.parse("black, yellow, red, green, white, blue")

['black', 'yellow', 'red', 'green', 'white', 'blue']

## Structured Output Parser

当我们想要类似JSON数据结构，包含多个字段时，可以使用这个输出解析器。该解析器可以生成指令帮助LLM返回结构化数据文本，同时完成文本到结构化数据的解析工作。

In [10]:
from langchain.output_parsers import StructuredOutputParser, ResponseSchema
from langchain.prompts import PromptTemplate, ChatPromptTemplate, HumanMessagePromptTemplate
from langchain.llms import OpenAI

# 定义响应的结构(JSON)，两个字段 answer和source。
response_schemas = [
    ResponseSchema(name="answer", description="answer to the user's question"),
    ResponseSchema(name="source", description="source referred to answer the user's question, should be a website.")
]
output_parser = StructuredOutputParser.from_response_schemas(response_schemas)

# 获取响应格式化的指令
format_instructions = output_parser.get_format_instructions()

format_instructions

'The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "```json" and "```":\n\n```json\n{\n\t"answer": string  // answer to the user\'s question\n\t"source": string  // source referred to answer the user\'s question, should be a website.\n}\n```'

In [14]:
# partial_variables允许在代码中预填充提示此模版的部分变量。这类似于接口，抽象类之间的关系
prompt = PromptTemplate(
    template="answer the users question as best as possible.\n{format_instructions}\n{question}",
    input_variables=["question"],
    partial_variables={"format_instructions": format_instructions}
)

model = OpenAI(temperature=0, openai_api_key="您的有效openai api key")
response = prompt.format_prompt(question="Who is the CEO of Tesla?")
output = model(response.to_string())
output_parser.parse(output)

{'answer': 'Elon Musk is the CEO of Tesla.',
 'source': 'https://www.tesla.com/about/leadership'}

## 自定义输出解析器

扩展CommaSeparatedListOutputParser，让其返回的列表是经过排序的。

In [15]:
from typing import List
class SortedCommaSeparatedListOutputParser(CommaSeparatedListOutputParser):
  def parse(self, text: str) -> List[str]:
    lst = super().parse(text)
    return sorted(lst)

output_parser = SortedCommaSeparatedListOutputParser()
output_parser.parse("black, yellow, red, green, white, blue")

['black', 'blue', 'green', 'red', 'white', 'yellow']