# **5-output-parsers**

## **🤖 Introduction**
- **Output Parsers**: Tools or methods that help **format** the LLM’s answer into a **desired structure** (e.g., JSON, CSV, bullet points).

## **⚙️ Setup**
1. **Clone** or **Download** the GitHub repository.
2. In **terminal**:
   ```
   cd project_name
   pyenv local 3.11.4
   poetry install
   poetry shell
   ```
3. Launch **Jupyter Lab**:
   ```
   jupyter lab
   ```
   - Open the **`005-output-parsers.ipynb`** file in the notebooks folder.
4. **View Code** in your preferred editor (e.g., VS Code):
   - Open **`005-output-parsers.py`** to see or edit parser logic.

---

## **🔐 Create Your `.env` File**
- Rename **`.env.example`** to **`.env`**.
- Insert your **API keys**:
  ```
  OPENAI_API_KEY=your_openai_api_key
  LANGCHAIN_TRACING_V2=true
  LANGCHAIN_ENDPOINT=https://api.smith.langchain.com
  LANGCHAIN_API_KEY=your_langchain_api_key
  LANGCHAIN_PROJECT=your_project_name
  ```
- This LangSmith project is named **`005-output-parsers`**.

---

## **📊 Track Operations**
- You can **monitor usage and costs** in **LangSmith**:
  ```
  smith.langchain.com
  ```

> **💡 Pro Tip**: With **output parsers**, you can **standardize** how the LLM’s text is returned, making it easier to integrate with downstream processes (like data analysis or UI rendering).


## Connect with the .env file located in the same directory of this notebook

If you are using the pre-loaded poetry shell, you do not need to install the following package because it is already pre-loaded for you:

In [None]:
#pip install python-dotenv

In [None]:
import os
from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv())
openai_api_key = os.environ["OPENAI_API_KEY"]

#### Install LangChain

If you are using the pre-loaded poetry shell, you do not need to install the following package because it is already pre-loaded for you:

In [None]:
#!pip install langchain

## Connect with an LLM

If you are using the pre-loaded poetry shell, you do not need to install the following package because it is already pre-loaded for you:

In [None]:
#!pip install langchain-openai

* NOTE: Since right now is the best LLM in the market, we will use OpenAI by default. You will see how to connect with other Open Source LLMs like Llama3 or Mistral in a next lesson.

## LLM Model
* The trend before the launch of chatGPT-4.
* See LangChain documentation about LLM Models [here](https://python.langchain.com/v0.1/docs/modules/model_io/llms/).

In [None]:
from langchain_openai import OpenAI

llmModel = OpenAI()

## Chat Model
* The general trend after the launch of chatGPT-4.
    * Frequently known as "Chatbot".
    * Conversation between Human and AI.
    * Can have a system prompt defining the tone or the role of the AI.
* See LangChain documentation about Chat Models [here](https://python.langchain.com/v0.1/docs/modules/model_io/chat/).
* By default we will work with ChatOpenAI. See [here](https://python.langchain.com/v0.1/docs/integrations/chat/openai/) the LangChain documentation page about it.

In [None]:
from langchain_openai import ChatOpenAI

chatModel = ChatOpenAI(model="gpt-3.5-turbo-0125")

## Parsing Outputs
* See the corresponding LangChain Documentation page [here](https://python.langchain.com/v0.1/docs/modules/model_io/output_parsers/).
* Language models output text. But sometimes we would like to have those answers in a different format, like a JSON dictionary or a XML document. In order to achieve that, we use the Output Parsers.

In [None]:
# 🤖 Import PromptTemplate to define a structured prompt
from langchain_core.prompts import PromptTemplate

# 🗄 Import SimpleJsonOutputParser to parse LLM output into JSON
from langchain.output_parsers.json import SimpleJsonOutputParser

# 🏷 Create a prompt that instructs the model to return a JSON object with an `answer` key
json_prompt = PromptTemplate.from_template(
    "Return a JSON object with an `answer` key that answers the following question: {question}"
)

# 🛠 Instantiate the parser that will handle the JSON output
json_parser = SimpleJsonOutputParser()

# 🔗 Chain the prompt, LLM, and parser together
json_chain = json_prompt | llmModel | json_parser

1. **Prompt Definition**:  
   - `json_prompt` directs the LLM to produce an **answer** inside a **JSON object**.  
   - By specifying `"Return a JSON object with an 'answer' key..."`, you shape how the model formats its output.

2. **Parser Setup**:  
   - `SimpleJsonOutputParser` automatically converts the LLM’s textual response into **valid JSON**.  
   - If the LLM follows the prompt, the parser will extract the `"answer"` field without manual parsing.

3. **Chaining**:  
   - `json_chain = json_prompt | llmModel | json_parser` creates a **pipeline**:  
     1. **Prompt** is formatted with user input.  
     2. **llmModel** generates text following the prompt’s instructions.  
     3. **json_parser** ensures the final output is a **Python dictionary** (or raises an error if the format is incorrect).

### **💡 Why Use an Output Parser?**
- **Consistent Structure**: You can rely on a **known** schema (e.g., `{"answer": "..."} `).  
- **Reduced Post-Processing**: No need to manually parse or guess the LLM’s formatting.  
- **Automation-Friendly**: The parsed JSON is ready for further steps in your workflow, like storing results or passing them to another function.

> **🚀 Pro Tip**: Combine output parsers with **chains** to build complex pipelines—ensuring each step receives **structured** data rather than free-form text.

#### The previous prompt template includes the parser instructions

In [None]:
json_parser.get_format_instructions()

'Return a JSON object.'

In [None]:
json_chain.invoke({"question": "What is the biggest country?"})

{'answer': 'Russia'}

#### Optionally, you can use Pydantic to define a custom output format

In [None]:
from langchain_core.output_parsers import JsonOutputParser  # 🗃 Import the JSON Output Parser
from langchain_core.prompts import PromptTemplate           # 🏷 For creating structured prompts
from langchain_core.pydantic_v1 import BaseModel, Field     # 📦 Pydantic BaseModel & Field for defining a schema
from langchain_openai import ChatOpenAI                     # 🤖 ChatOpenAI to interface with OpenAI's chat models

In [None]:
# Define a Pydantic Object with the desired output format.
class Joke(BaseModel):
    setup: str = Field(description="question to set up a joke")     #  "setup" is the initial part of the joke
    punchline: str = Field(description="answer to resolve the joke")#  "punchline" is the comedic conclusion

In [None]:
# Define the parser referring the Pydantic Object
parser = JsonOutputParser(pydantic_object=Joke)  # 🛠 Ensures LLM output follows the Joke schema

# Add the parser format instructions in the prompt definition.
prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)

# Create a chain with the prompt and the parser
chain = prompt | chatModel | parser

chain.invoke({"query": "Tell me a joke."})

{'setup': "Why couldn't the bicycle find its way home?",
 'punchline': 'Because it lost its bearings!'}

### **🔎 How This Works**
1. **Pydantic Schema** (`Joke`):
   - **`setup`**: The “lead-in” question or statement for the joke.  
   - **`punchline`**: The comedic resolution or answer.

2. **`JsonOutputParser`**:
   - Instructs the model to return output that **matches** the `Joke` schema.
   - If the model deviates from this structure, the parser may throw an error or attempt to correct it.

3. **Prompt Definition**:
   - The template includes a `{format_instructions}` placeholder, which is populated by `parser.get_format_instructions()`.
   - This ensures the model knows exactly how to **format** its response as valid JSON.

4. **Chaining**:
   - `prompt | chatModel | parser` creates a **pipeline**:
     1. **Prompt** is prepared with the user’s query.
     2. **chatModel** generates text following the prompt’s rules.
     3. **parser** enforces the Pydantic schema, giving you a **Python object** (`Joke`) with `setup` and `punchline`.

### **💡 Why Use a Custom Schema?**
- **Structured Output**: Instead of free-form text, you get **key-value** pairs matching your domain (e.g., `setup`, `punchline`).
- **Validation**: Pydantic ensures fields are present and in the correct format.
- **Ease of Use**: Downstream code can rely on a well-defined object rather than unpredictable strings.

> **⚙️ Pro Tip**: You can extend this approach to **any** structured data (like events, product listings, or user profiles) by defining a suitable Pydantic model and letting `JsonOutputParser` handle the rest.

## How to execute the code from Visual Studio Code
* In Visual Studio Code, see the file 005-output-parsers.py
* In terminal, make sure you are in the directory of the file and run:
    * python 005-output-parsers.py