### **PandasDataFrameOutputParser**


출력 파서는 사용자가 임의의 Pandas DataFrame을 지정하고 해당 DataFrame에서 데이터를 추출하여 형식화된 사전 형태로 데이터를 조회할 수 있는 LLM을 요청할 수 있게 함


In [1]:
import pprint
from typing import Any, Dict

import pandas as pd
from langchain.output_parsers import PandasDataFrameOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_groq import ChatGroq

In [2]:
# 출력 함수 
# 출력 목적으로만 사용됩니다.
def format_parser_output(parser_output: Dict[str, Any]) -> None:
    # 파서 출력의 키들을 순회합니다.
    for key in parser_output.keys():
        # 각 키의 값을 딕셔너리로 변환합니다.
        parser_output[key] = parser_output[key].to_dict()
    # 예쁘게 출력합니다.
    return pprint.PrettyPrinter(width=4, compact=True).pprint(parser_output)


In [5]:
df = pd.read_csv('data/titanic.csv')
df.head(3)

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S


In [42]:
parser = PandasDataFrameOutputParser(dataframe=df)
print(parser.get_format_instructions())

The output should be formatted as a string as the operation, followed by a colon, followed by the column or row to be queried on, followed by optional array parameters.
1. The column names are limited to the possible columns below.
2. Arrays must either be a comma-separated list of numbers formatted as [1,3,5], or it must be in range of numbers formatted as [0..4].
3. Remember that arrays are optional and not necessarily required.
4. If the column is not in the possible columns or the operation is not a valid Pandas DataFrame operation, return why it is invalid as a sentence starting with either "Invalid column" or "Invalid operation".

As an example, for the formats:
1. String "column:num_legs" is a well-formatted instance which gets the column num_legs, where num_legs is a possible column.
2. String "row:1" is a well-formatted instance which gets row 1.
3. String "column:num_legs[1,2]" is a well-formatted instance which gets the column num_legs for rows 1 and 2, where num_legs is a p

### **PromptTemplate 작성방법 1**

In [None]:
prompt = PromptTemplate.from_template(
 '''
 Answer the User Questions 
 
 # format 
 {format_instruct}
 
 # Questions
 {question}
 '''   
)

prompt = prompt.partial(format_instruct = parser.get_format_instructions())

llm = ChatGroq(model = 'gemma2-9b-it')

chain = prompt | llm | parser

chain.invoke('Age 컬럼을 조회해줘')

### **PromptTemplate 작성방법 2**

- from_template 사용하지 않고 직접 변수전달, partial_variables는 딕셔너리로 특정 변수를 넣을때 사용

In [55]:
prompt = PromptTemplate(
    
    template = 'Answer the User Questions \n format \n {format_instruct} \n Questions \n {question}', 
    input_variables = ['question'],
    partial_variables={'format_instruct': parser.get_format_instructions()}

)

llm = ChatGroq(model = 'gemma2-9b-it')

chain = prompt | llm | parser

chain.invoke('Age 컬럼을 조회해줘')

{'Age': 0     22.0
 1     38.0
 2     26.0
 3     35.0
 4     35.0
 5      NaN
 6     54.0
 7      2.0
 8     27.0
 9     14.0
 10     4.0
 11    58.0
 12    20.0
 13    39.0
 14    14.0
 15    55.0
 16     2.0
 17     NaN
 18    31.0
 19     NaN
 Name: Age, dtype: float64}

### **첫번째 행 조회**

In [71]:
chain.invoke('0행을 조회해줘')

{'0': PassengerId                          1
 Survived                             0
 Pclass                               3
 Name           Braund, Mr. Owen Harris
 Sex                               male
 Age                               22.0
 SibSp                                1
 Parch                                0
 Ticket                       A/5 21171
 Fare                              7.25
 Cabin                              NaN
 Embarked                             S
 Name: 0, dtype: object}

### **특정 열에서 일부 행의 평균**

In [82]:
chain.invoke('find the average of rows 0 to 4 by Age')

{'mean': 31.2}

In [None]:
chain.invoke('Age 컬럼에서 0행부터 4행까지의 평균을 구해줘')

{'mean': 31.2}

In [83]:
chain.invoke("Calculate average `Fare` rate.")

{'mean': 22.19937}

In [87]:
chain.invoke("Fare의 평균을 구해줘")

{'mean': 22.19937}