# JsonOutputParser
- 사용자가 원하는 JSON 스키마대로 모델 출력에서 데이터를 추출하여 반환해준다.
- LLM이 원하는 JSON 스키마로 출력을 생성하기 위해선, 모델 용량이 충분해야 한다.
- JSON 스키마는 dictionary나 Pydantic을 사용해서 정의할 수 있다.

In [1]:
import os
from dotenv import load_dotenv

load_dotenv()
print(os.environ["MODEL_ID"])

meta-llama/Meta-Llama-3-8B-Instruct


In [2]:
from langchain_community.chat_models.huggingface import ChatHuggingFace
from langchain_community.llms import HuggingFaceEndpoint

llm = HuggingFaceEndpoint(
    repo_id=os.environ["MODEL_ID"], 
    # max_new_tokens=1024,
    temperature=0.1,
    huggingfacehub_api_token=os.environ["HF_API_KEY"],
)
model = ChatHuggingFace(llm=llm)

  warn_deprecated(
  from .autonotebook import tqdm as notebook_tqdm


The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: read).
Your token has been saved to /home/dudaji/.cache/huggingface/token
Login successful


  warn_deprecated(
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [12]:
from langchain_core.pydantic_v1 import BaseModel, Field

class Topic(BaseModel):
    description: str = Field(description="Concise description about topic")
    hashtags: str = Field(description="Some keywords in hashtag format")

In [13]:
from langchain.prompts import PromptTemplate
from langchain_core.output_parsers import JsonOutputParser

query = "Explain about Global Warming."
parser = JsonOutputParser(pydantic_object=Topic)
prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)

chain = prompt | model | parser

In [14]:
chain.invoke({"query": query})

{'description': 'Global warming is the long-term rise in the average surface temperature of the Earth due to the increasing levels of greenhouse gases in the atmosphere.',
 'hashtags': '#GlobalWarming #ClimateChange #Sustainability'}

- Pydantic 없이 스키마를 제공하지 않고 단순히 JSON 형식의 출력을 받을 수도 있다.

In [8]:
query = "Explain about global warming. Write description of global warming to `description`, and related keywords to `hashtags`"

parser = JsonOutputParser()  # JSON 출력 파서 초기화

prompt = PromptTemplate(
    # 사용자 쿼리에 답변하는 템플릿
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],  # 입력 변수로 'query' 사용
    # 부분 변수로 포맷 지시사항 설정
    partial_variables={
        "format_instructions": parser.get_format_instructions()},
)

chain = prompt | model | parser  # 프롬프트, 모델, 파서를 연결하는 체인 생성

chain.invoke({"query": query})  # 체인을 호출하여 농담 쿼리 처리

{'description': "Global warming, also known as climate change, is the gradual increase in the overall temperature of the Earth's atmosphere, primarily caused by human activities that release greenhouse gases, such as carbon dioxide and methane, into the atmosphere. These gases trap heat from the sun, leading to a rise in global temperatures. The effects of global warming include more frequent and severe heatwaves, droughts, and storms, as well as rising sea levels and melting of polar ice caps.",
 'hashtags': ['globalwarming',
  'climatechange',
  'greenhousegases',
  'carbonfootprint',
  'sustainability',
  'ecology',
  'environmentalissues']}

# Streaming
- 지금까지 json의 모든 key에 대한 값을 포함하는 partial JSON object를 반환한다.
- 만약 `diff=True` 이면, 현재 json과 바로 이전 json object의 차이를 JSONPatch operation 형식으로 반환한다.

In [15]:
for s in chain.stream({"query": query}):
    print(s)

{}
{'description': ''}
{'description': 'Global'}
{'description': 'Global warming'}
{'description': 'Global warming is'}
{'description': 'Global warming is the'}
{'description': 'Global warming is the long'}
{'description': 'Global warming is the long-term'}
{'description': 'Global warming is the long-term rise'}
{'description': 'Global warming is the long-term rise in'}
{'description': 'Global warming is the long-term rise in the'}
{'description': 'Global warming is the long-term rise in the average'}
{'description': 'Global warming is the long-term rise in the average surface'}
{'description': 'Global warming is the long-term rise in the average surface temperature'}
{'description': 'Global warming is the long-term rise in the average surface temperature of'}
{'description': 'Global warming is the long-term rise in the average surface temperature of the'}
{'description': 'Global warming is the long-term rise in the average surface temperature of the Earth'}
{'description': 'Global war

In [16]:
parser = JsonOutputParser(pydantic_object=Topic, diff=True)
prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)

chain = prompt | model | parser
for s in chain.stream({"query": query}):
    print(s)

[{'op': 'replace', 'path': '', 'value': {}}]
[{'op': 'add', 'path': '/description', 'value': ''}]
[{'op': 'replace', 'path': '/description', 'value': 'Global'}]
[{'op': 'replace', 'path': '/description', 'value': 'Global warming'}]
[{'op': 'replace', 'path': '/description', 'value': 'Global warming is'}]
[{'op': 'replace', 'path': '/description', 'value': 'Global warming is the'}]
[{'op': 'replace', 'path': '/description', 'value': 'Global warming is the long'}]
[{'op': 'replace', 'path': '/description', 'value': 'Global warming is the long-term'}]
[{'op': 'replace', 'path': '/description', 'value': 'Global warming is the long-term rise'}]
[{'op': 'replace', 'path': '/description', 'value': 'Global warming is the long-term rise in'}]
[{'op': 'replace', 'path': '/description', 'value': 'Global warming is the long-term rise in the'}]
[{'op': 'replace', 'path': '/description', 'value': 'Global warming is the long-term rise in the average'}]
[{'op': 'replace', 'path': '/description', 'valu