## Output Parsers

### Output parsers in LangChain are specialized classes that transform the output of large language models (LLMs) into a more suitable format. This is useful when using LLMs to generate structured data.

### Prompt Template + LLM + Output Parser ========> Formatted Output

In [7]:
from langchain.output_parsers import CommaSeparatedListOutputParser
from langchain.prompts import PromptTemplate
from langchain_openai import ChatOpenAI
from dotenv import load_dotenv
load_dotenv()

True

In [9]:
model = ChatOpenAI(temperature=0)

### CSV Parser

In [15]:
output_parser = CommaSeparatedListOutputParser()

format_instructions = output_parser.get_format_instructions()

prompt = PromptTemplate(
    template = "List five places {places}.\n{format_instructions}",
    input_variables = ["places"],
    partial_variables = {"format_instructions": format_instructions},
)

In [17]:
chain = prompt | model | output_parser

In [19]:
chain.invoke({"places": "for summer tourism in India"})

['Goa', 'Manali', 'Jaipur', 'Kerala', 'Ladakh']

### JSON Parser

### This output parser allows users to specify a JSON schema and query LLMs for outputs that confirm to that schema. Keep in mind that large language models are leaky abstractions. You will have to use an LLM with sufficient capacity to generate well-formed JSON. In the OpenAI family, DaVinci can do reliably but Curie's ability drops off dramatically

In [49]:
from typing import List
from langchain_core.output_parsers import JsonOutputParser
from pydantic import BaseModel, Field

### With Pydantic

In [51]:
class Travel(BaseModel):
    place: str = Field(description = "name of the places")
    description: str = Field(description = "description of the place")
    activities: str = Field(description = "what to do in that place")

In [55]:
travel_query = "Suggest a place in India for going on a trip this summer to avoid heat."


parser = JsonOutputParser(pydantic_object = Travel)

prompt = PromptTemplate(
    template = "Answer the user quert.\n{format_instructions}\n{query}\n",
    input_variables = ["query"],
    partial_variables = {"format_instructions": parser.get_format_instructions()},
)

chain = prompt | model | parser

chain.invoke({"query": travel_query})

{'place': 'Leh-Ladakh',
 'description': 'A high-altitude desert region in the northern part of India, known for its stunning landscapes, Buddhist monasteries, and adventurous activities like trekking and river rafting.',
 'activities': 'Explore the monasteries, go trekking in the Himalayas, visit Pangong Lake, experience the local culture and cuisine.'}

### Without Pydantic

In [59]:
travel_query = "Suggest a place in India for going on a trip this summer to avoid heat."


parser = JsonOutputParser()

prompt = PromptTemplate(
    template = "Answer the user quert.\n{format_instructions}\n{query}\n",
    input_variables = ["query"],
    partial_variables = {"format_instructions": parser.get_format_instructions()},
)


chain = prompt | model | parser

chain.invoke({"query": travel_query})

{'destination': 'Leh-Ladakh',
 'description': 'Located in the northernmost region of India, Leh-Ladakh offers a cool and pleasant climate during the summer months. With its stunning landscapes, monasteries, and adventure activities like trekking and river rafting, it is the perfect destination to escape the heat.',
 'activities': ['Trekking',
  'River rafting',
  'Visiting monasteries',
  'Exploring the stunning landscapes'],
 'average temperature': '15-25 degrees Celsius'}

### Pydantic helps us to keep a track of how the variables are coming in and to control for the data formats and manipulations

### Structured Output Parser

Getting structured output without using Pydantic

In [63]:
from langchain.output_parsers import ResponseSchema, StructuredOutputParser

In [65]:
response_schemas = [
    ResponseSchema(name="answer", description = "answer to the user's question"),
    ResponseSchema(name="description", description = "detailed description on the answer topic"),
    ResponseSchema(name="applications", description = "real world applications of the answer topic"),
]

output_parser = StructuredOutputParser.from_response_schemas(response_schemas)

In [73]:
format_instructions = output_parser.get_format_instructions()

prompt = PromptTemplate(
    template = "answer the users question as best as possible.\n{format_instructions}\n{question}",
    input_variables = ["question"],
    partial_variables = {"format_instructions": format_instructions},
)

In [75]:
chain = prompt | model | output_parser
chain.invoke({"question": "Name an invention in Healthcare that has caused revolution in the twenty first century."})

{'answer': 'Telemedicine',
 'description': 'Telemedicine is the remote diagnosis and treatment of patients using telecommunications technology. It has revolutionized healthcare by allowing patients to consult with healthcare providers remotely, reducing the need for in-person visits and improving access to medical care.',
 'applications': 'Telemedicine is used for virtual doctor visits, remote monitoring of patients with chronic conditions, telepsychiatry, telestroke services, and more. It has been particularly valuable during the COVID-19 pandemic for providing safe and efficient healthcare services.'}