# OutputFixingParser
- 출력 파싱하는 과정에서 발생한 오류를 자동으로 수정하는 기능
- `PydanticOutputParser` 같은 다른 파서를 래핑하고, 해당 파서가 처리할 수 없을 경우 LLM을 호출하여 오류를 수정
- `OutputFixingParser`가 자동으로 파싱 오류를 인식하고 이를 수정하기 위한 새로운 명령어와 함께 모델에 다시 제출. 이 때 새로운 명령어는 오류를 정확히 지적하고, 올바른 형식으로 데이터를 재구성할 수 있도록 구체적이어야 한다.

In [2]:
import os
from dotenv import load_dotenv

load_dotenv()
print(os.environ["MODEL_ID"])

meta-llama/Meta-Llama-3-8B-Instruct


In [3]:
from langchain_community.chat_models.huggingface import ChatHuggingFace
from langchain_community.llms import HuggingFaceEndpoint

llm = HuggingFaceEndpoint(
    repo_id=os.environ["MODEL_ID"], 
    max_new_tokens=1024,
    temperature=0.1,
    huggingfacehub_api_token=os.environ["HF_API_KEY"],
)
model = ChatHuggingFace(llm=llm)

  warn_deprecated(
  from .autonotebook import tqdm as notebook_tqdm


The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: read).
Your token has been saved to /home/dudaji/.cache/huggingface/token
Login successful


  warn_deprecated(
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [4]:
from langchain.output_parsers import PydanticOutputParser
from langchain_core.pydantic_v1 import BaseModel, Field
from typing import List


class Actor(BaseModel):
    name: str = Field(description="name of an actor")
    film_names: List[str] = Field(description="list of names of films they starred in")


actor_query = "Generate the filmography for a random actor."

parser = PydanticOutputParser(pydantic_object=Actor)

In [9]:
# 잘못된 형식을 일부러 입력
misformatted = """{"name": "Tom Hanks"}"""
# misformatted = """{"name": "Tom Hanks", "film_names": 123}"""
# misformatted = "{'name': 'Tom Hanks', 'film_names': ['Forrest Gump']}"

# 잘못된 형식으로 입력된 데이터를 파싱하려고 시도
parser.parse(misformatted)

# 오류 출력

OutputParserException: Failed to parse Actor from completion {"name": "Tom Hanks"}. Got: 1 validation error for Actor
film_names
  field required (type=value_error.missing)

In [10]:
from langchain.output_parsers import OutputFixingParser
new_parser = OutputFixingParser.from_llm(parser=parser, llm=llm)
actor = new_parser.parse(misformatted)
actor

Actor(name='Tom Hanks', film_names=['Forrest Gump', 'Philadelphia'])