[![Open In Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/langchain-ai/langchain/blob/master/docs/docs/use_cases/extraction.ipynb)

# Parsing

LLMs that are able to follow prompt instructions well can be asked to extract information and output it in a given format.

We can then parse out the information from the text.

:::{.callout-tip}
All the same considerations for extraction quality apply for parsing approach. Review the [guidelines](/docs/use_cases/extraction/guidelines) for extraction quality. 
:::

In [5]:
from langchain_anthropic.chat_models import ChatAnthropic
from langchain_core.pydantic_v1 import BaseModel, Field, validator

In [3]:
model = ChatAnthropic(model_name='claude-2.1')

In [None]:
model.invoke('

In [None]:

from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages(
    [('system', 'You are an expert extraction algorithm. '
    'Only extract relevant information from the text. '
    'Do not extract information that does not match the description'),
     ('human', '{text}')]
     )

runnable = create_structured_output_runnable(Person, llm, prompt=prompt)

text = "Earth has only 1 moon."
runnable.invoke({'text': text})



[Output parsers](/docs/modules/model_io/output_parsers/) are classes that help structure language model responses. 

In [None]:
from typing import Optional, Sequence

from langchain.output_parsers import PydanticOutputParser
from langchain.prompts import PromptTemplate
from langchain_core.pydantic_v1 import BaseModel, Field, validator
from langchain_openai import ChatOpenAI


class Person(BaseModel):
    person_name: str
    person_height: int
    person_hair_color: str
    dog_breed: Optional[str]
    dog_name: Optional[str]


class People(BaseModel):
    """Identifying information about all people in a text."""

    people: Sequence[Person]


# Run
query = """Alex is 5 feet tall. Claudia is 1 feet taller Alex and jumps higher than him. Claudia is a brunette and Alex is blond."""

# Set up a parser + inject instructions into the prompt template.
parser = PydanticOutputParser(pydantic_object=People)

# Prompt
prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)

# Run
_input = prompt.format_prompt(query=query)
model = ChatOpenAI()

In [3]:
from langchain_anthropic.chat_models import ChatAnthropic

We can see from the [LangSmith trace](https://smith.langchain.com/public/aec42dd3-d471-4d34-801b-20dd88444931/r) that we get the same output as above.

![Image description](../../static/img/extraction_trace_parsing.png)

We can see that we provide a two-shot prompt in order to instruct the LLM to output in our desired format.

In [None]:
# Define your desired data structure.
class Joke(BaseModel):
    setup: str = Field(description="question to set up a joke")
    punchline: str = Field(description="answer to resolve the joke")

    # You can add custom validation logic easily with Pydantic.
    @validator("setup")
    def question_ends_with_question_mark(cls, field):
        if field[-1] != "?":
            raise ValueError("Badly formed question!")
        return field


# And a query intended to prompt a language model to populate the data structure.
joke_query = "Tell me a joke."

# Set up a parser + inject instructions into the prompt template.
parser = PydanticOutputParser(pydantic_object=Joke)

# Prompt
prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)

# Run
_input = prompt.format_prompt(query=joke_query)
model = ChatOpenAI(temperature=0)
output = model.invoke(_input.to_string())
parser.parse(output.content)

Joke(setup="Why couldn't the bicycle find its way home?", punchline='Because it lost its bearings!')

As we can see, we get an output of the `Joke` class, which respects our originally desired schema: 'setup' and 'punchline'.

We can look at the [LangSmith trace](https://smith.langchain.com/public/557ad630-af35-43e9-b043-93800539025f/r) to see exactly what is going on under the hood.

[Kor](https://eyurtsev.github.io/kor/) is another library for extraction where schema and examples can be provided to the LLM.