# Getting Specific Formats out of LLMs

Plain text outputs are useful, but there may be use cases where you need the LLM to generate a <em>structured</em> output—that is, output in a machine-readable format, such as JSON, XML, CSV, or even in a programming language such as Python or JavaScript. This is very useful when you intend to hand that output off to some other piece of code, making an LLM play a part in your larger application.

### JSON Output

The most common format to generate with LLMs is JSON. JSON outputs can (for example) be sent over the wire to your frontend code or be saved to a database.

When generating JSON, the first task is to define the schema you want the LLM to respect when producing the output. Then, you should include that schema in the prompt, along with the text you want to use as the source. Let’s see an example:

In [2]:
from langchain_openai import ChatOpenAI
from pydantic import BaseModel

class AnswerWithJustification(BaseModel):
    '''
    An answer to the user's question along 
    with justification for the answer.
    '''
    
    answer: str
    '''The answer to the user's question'''
    
    justification: str
    '''Justification for the answer'''
    

# Structured Outputs are available in the latest large language models, starting with GPT-4o
# See documentation for more details: 
# https://platform.openai.com/docs/guides/structured-outputs#supported-models

llm = ChatOpenAI(model="gpt-4o-mini-2024-07-18", temperature=0)
structured_llm = llm.with_structured_output(AnswerWithJustification)

structured_llm.invoke("""What weighs more, 
                        a pound of bricks or 
                        a pound of feathers""")

AnswerWithJustification(answer='A pound of bricks and a pound of feathers weigh the same.', justification='Both are measured as one pound, so regardless of the material, they have the same weight. The confusion often arises from the volume and density differences; bricks are denser and take up less space than feathers, which are lighter and take up more space. However, in terms of weight, one pound is equal to one pound.')

#
So, first define a schema. In Python, this is easiest to do with Pydantic (a library used for validating data against schemas). In JS, this is easiest to do with Zod (an equivalent library). The method ```with_structured_output``` will use that schema for two things:

- The schema will be converted to a ```JSONSchema``` object (a JSON format used to describe the shape [types, names, descriptions] of JSON data), which will be sent to the LLM. For each LLM, LangChain picks the best method to do this, usually function calling or prompting.
	
	
- The schema will also be used to validate the output returned by the LLM before returning it; this ensures the output produced respects the schema you passed in exactly.

#


# Use Case
1. Customer Feedback Analysis

This example uses an LLM to process customer feedback and output structured data about sentiment and key points. The model analyzes the text and returns:

- Overall sentiment (positive/negative/neutral)
- A numerical sentiment score
- Key points mentioned in the feedback
- Recommended action items based on the feedback

This structured approach allows companies to automatically categorize and prioritize customer feedback at scale, making it easy to identify trends and action items.

In [7]:
from langchain_openai import ChatOpenAI
from pydantic import BaseModel, Field
from typing import List

class SentimentAnalysis(BaseModel):
    """Analysis of customer feedback with sentiment and key points."""
    
    overall_sentiment: str = Field(description="The overall sentiment: positive, negative, or neutral")
    sentiment_score: float = Field(description="A score from -1.0 (negative) to 1.0 (positive)")
    key_points: List[str] = Field(description="List of main points mentioned in the feedback")
    action_items: List[str] = Field(description="Suggested actions based on the feedback")

llm = ChatOpenAI(model="gpt-4o-mini-2024-07-18", temperature=0)
sentiment_analyzer = llm.with_structured_output(SentimentAnalysis)

customer_feedback = """
I've been using your product for 3 months now. The user interface is intuitive 
and I love the new dashboard feature. However, I keep experiencing crashes when 
uploading large files, which is frustrating. Your customer support team was helpful 
but couldn't fully resolve my issue.
"""

analysis = sentiment_analyzer.invoke(f"Analyze this customer feedback: {customer_feedback}")

In [8]:
analysis

SentimentAnalysis(overall_sentiment='mixed', sentiment_score=0.2, key_points=['User interface is intuitive', 'Loves the new dashboard feature', 'Experiences crashes when uploading large files', "Customer support was helpful but couldn't fully resolve the issue"], action_items=['Investigate and fix the crashing issue when uploading large files', 'Enhance customer support training to better resolve technical issues', 'Consider gathering more user feedback on the dashboard feature for further improvements'])