# Adaptive RAG

PDF 테이블 정보에 대한 Recursive Retrieval 전략
- 다수의 CSV 테이블 대상으로 검색 chunk와 답변 생성 chunk 분리해보기
- 테이블 검색용 Chunk 대상으로 User Query에 대해 Adaptive하게 retrieval 가져가보기


In [39]:
import camelot

from llama_index.core import VectorStoreIndex
from llama_index.core.query_engine import PandasQueryEngine
from llama_index.core.schema import IndexNode
# from llama_index.llms.openai import OpenAI
from llama_index.llms.openai import OpenAI

from llama_index.readers.file import PyMuPDFReader
from typing import List

In [40]:
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
from llama_index.core import Settings

# 추후 사용할 llm, 임베딩 모델 클래스 정의
Settings.llm = OpenAI(model="gpt-4o-mini", temperature=0)
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")

In [41]:
# 파싱할 파일 경로 설정
file_path = "./billionaires_page.pdf"

In [42]:
# PDF파서 정의
reader = PyMuPDFReader()

In [43]:
# 업로드된 경로에서 로딩스테이지 진행한 후 다큐먼트 단위로 저장
docs =reader.load(file_path)

In [44]:
# 도큐먼트 정보 확인
docs

[Document(id_='359fcdd1-5cfc-4eca-b40d-40ec0b67ac93', embedding=None, metadata={'total_pages': 32, 'file_path': './billionaires_page.pdf', 'source': '1'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, metadata_template='{key}: {value}', metadata_separator='\n', text_resource=MediaResource(embeddings=None, data=None, text="29/09/2023, 10:17\nThe World's Billionaires - Wikipedia\nhttps://en.wikipedia.org/wiki/The_World%27s_Billionaires\n1/32\nThe World's Billionaires\nList of the world's billionaires, ranked in order of net worth\nThe net worth of the world's billionaires increased from\nless than US$1 trillion in 2000 to over $7 trillion in 2015.\nPublication details\nPublisher\nWhale Media Investments\nForbes family\nPublication\nForbes\nFirst published\nMarch 1987[1]\nLatest publication\nApril 4, 2023\nCurrent list details (2023)[2]\nWealthiest\nBernard Arnault\nNet worth (1st)\n\xa0US$211\xa0billion\nNumber of\nbillionaires\n\xa02,640 (from 2668)\n

In [45]:
from llama_index.core import Settings
#노드변환 및 파싱
doc_nodes =Settings.node_parser.get_nodes_from_documents(docs)

In [46]:
# 비교를 위한 Naive-RAG 구성
vector_index0 =VectorStoreIndex(doc_nodes)
vector_query_engine0 = vector_index0.as_query_engine()

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


In [47]:

response = vector_query_engine0.query(
    "How many billionaires were there in 2009"
)

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


In [48]:
print(response.source_nodes[0].node.get_content())

29/09/2023, 10:17
The World's Billionaires - Wikipedia
https://en.wikipedia.org/wiki/The_World%27s_Billionaires
12/32
No.
Name
Net worth
(USD)
Age
Nationality
Source(s) of wealth
1 
Carlos Slim
$74.0 billion 
71
 Mexico
América Móvil, Grupo Carso
2 
Bill Gates
$56.0 billion 
55
 United
States
Microsoft
3 
Warren Buffett
$50.0 billion 
80
 United
States
Berkshire Hathaway
4 
Bernard Arnault
$41.0 billion 
62
 France
LVMH Moët Hennessy • Louis
Vuitton
5 
Larry Ellison
$39.5 billion 
66
 United
States
Oracle Corporation
6 
Lakshmi Mittal
$31.1 billion 
60
 India
Arcelor Mittal
7 
Amancio Ortega
$31.0 billion 
74
 Spain
Inditex Group
8 
Eike Batista
$30.0 billion 
53
 Brazil
EBX Group
9 
Mukesh Ambani
$27.0 billion 
54
 India
Reliance Industries
10 
Christy Walton &
family
$26.5 billion 
62
 United
States
Walmart
Slim narrowly eclipsed Gates to top the billionaire list for the first time. Slim saw his estimated
worth surge $18.5 billion to $53.5 billion as shares of America Movil rose 35 p

In [49]:
print(str(response))

In 2009, there were a total of 1,011 billionaires.


In [50]:
response = vector_query_engine0.query(
    "What's the net worth of the second richest billionaire in 2023?"
)
print(str(response))

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
The net worth of the second richest billionaire in 2023 is $98 billion.


In [51]:
print()




- 기본적인 PDF파싱모듈로는 테이블 등 Text-Only 가 아닌 문서에 대한 정보 해석력이 떨어지는 것을 확인
- Table정보를 따로 추출하여 답하는 방식은 어떨지?

In [52]:
# pdf의 테이블파싱하기 - camelot 활용
def get_tables(path: str, pages: List[int]):
    table_dfs = []
    for page in pages:
        table_list = camelot.read_pdf(path, pages=str(page))
        for table in table_list:
            table_df = table.df
            table_df = (
                table_df.rename(columns=table_df.iloc[0])
                .drop(table_df.index[0])
                .reset_index(drop=True)
            )
            table_dfs.append(table_df)
    return table_dfs

In [53]:
table_dfs = get_tables(file_path, pages=[3,4,24])

CropBox missing from /Page, defaulting to MediaBox
CropBox missing from /Page, defaulting to MediaBox


2025-05-02T00:06:18 - INFO - Processing page-3


INFO:camelot:Processing page-3
Processing page-3
CropBox missing from /Page, defaulting to MediaBox
CropBox missing from /Page, defaulting to MediaBox


2025-05-02T00:06:19 - INFO - Processing page-4


INFO:camelot:Processing page-4
Processing page-4
CropBox missing from /Page, defaulting to MediaBox
CropBox missing from /Page, defaulting to MediaBox


2025-05-02T00:06:20 - INFO - Processing page-24


INFO:camelot:Processing page-24
Processing page-24


In [54]:
#파싱된 테이블 개수확인
len(table_dfs)

5

In [55]:
#파싱 결과 확인
table_dfs[0]

Unnamed: 0,No.,Name,Net worth\n(USD),Age,Nationality,Primary source(s) of\nwealth
0,1,Bernard Arnault &\nfamily,$211 billion,74,France,LVMH
1,2,Elon Musk,$180 billion,51,United\nStates,"Tesla, SpaceX"
2,3,Jeff Bezos,$114 billion,59,United\nStates,Amazon
3,4,Larry Ellison,$107 billion,78,United\nStates,Oracle Corporation
4,5,Warren Buffett,$106 billion,92,United\nStates,Berkshire Hathaway
5,6,Bill Gates,$104 billion,67,United\nStates,Microsoft
6,7,Michael Bloomberg,$94.5 billion,81,United\nStates,Bloomberg L.P.
7,8,Carlos Slim & family,$93 billion,83,Mexico,"Telmex, América Móvil, Grupo\nCarso"
8,9,Mukesh Ambani,$83.4 billion,65,India,Reliance Industries
9,10,Steve Ballmer,$80.7 billion,67,United\nStates,Microsoft


In [56]:
#파싱 결과 확인
table_dfs[1]

Unnamed: 0,No.,Name,Net worth\n(USD),Age,Nationality,Primary source(s) of\nwealth
0,1,Elon Musk,$219 billion,50,United\nStates,"Tesla, SpaceX"
1,2,Jeff Bezos,$177 billion,58,United\nStates,Amazon
2,3,Bernard Arnault &\nfamily,$158 billion,73,France,LVMH
3,4,Bill Gates,$129 billion,66,United\nStates,Microsoft
4,5,Warren Buffett,$118 billion,91,United\nStates,Berkshire Hathaway
5,6,Larry Page,$111 billion,49,United\nStates,Google
6,7,Sergey Brin,$107 billion,48,United\nStates,Google
7,8,Larry Ellison,$106 billion,77,United\nStates,Oracle Corporation
8,9,Steve Ballmer,$91.4 billion,66,United\nStates,Microsoft
9,10,Mukesh Ambani,$90.7 billion,64,India,Reliance Industries


In [57]:
#파싱 결과 확인
table_dfs[-1]

Unnamed: 0,No.[61],Name,Net worth (USD),Nationality
0,1,Yoshiaki Tsutsumi,$20 billion,Japan
1,2,Taikichiro Mori,$15 billion,Japan
2,3,Shigeru Kobayashi,$7.5 billion,Japan
3,4,Haruhiko Yoshimoto,$7.0 billion,Japan
4,5,Salim Ahmed Bin Mahfouz,$6.2 billion,Saudi Arabia
5,6,Hans and Gad Rausing,$6.0 billion,Sweden
6,7,Paul Reichmann,$6.0 billion,Canada
7,8,Yohachiro Iwasaki,$5.6 billion,Japan
8,9,Kenneth Thomson,$5.4 billion,Canada
9,10,Keizo Saji,$4.0 billion,Japan


이제 테이블을 다 파싱해왔는데,
이것들을 기반으로 질문에 바로 답할수 있도록 만들려면 만들수도 있겠지만,
테이블이 지금과 다르게 수천 수만개일때, 모든 유저 쿼리에 대해 수만개의 테이블을 매번 조회하는 것은 실용성 없는 Naive한 접근방식(자원은 무한하지 않다).

그렇기 때문에,
1. 사용자의 질문과 관련된 테이블을 먼저 찾고
2. 찾은 테이블을 기준으로 사용자의 질문에 답할 수 있는 정보를 발췌하여 답해보자.

일단은 각 테이블별로 답해주는 담당 라마인덱스 쿼리엔진을 만들어주자.

(지난 주 커버한 쿼리엔진 라우터(AdaptiveRAG))

In [58]:
from llama_index.experimental.query_engine import PandasQueryEngine

llm = OpenAI(model="gpt-4o-mini")

#table을 기준으로 하는 query engine(PandasQueryEngine) 각 추출 테이블별로 만들고 리스트화
df_query_engines = [
    PandasQueryEngine(table_df, llm=llm) for table_df in table_dfs
]

In [59]:
df_query_engines

[<llama_index.experimental.query_engine.pandas.pandas_query_engine.PandasQueryEngine at 0x3666b6110>,
 <llama_index.experimental.query_engine.pandas.pandas_query_engine.PandasQueryEngine at 0x15956add0>,
 <llama_index.experimental.query_engine.pandas.pandas_query_engine.PandasQueryEngine at 0x366217dd0>,
 <llama_index.experimental.query_engine.pandas.pandas_query_engine.PandasQueryEngine at 0x16917b390>,
 <llama_index.experimental.query_engine.pandas.pandas_query_engine.PandasQueryEngine at 0x3658faa90>]

In [60]:
# 상응하는 테이블 직접 지정해서 답변 요구
response = df_query_engines[0].query(
    "What's the net worth of the second richest billionaire in 2023?"
)
print(str(response))

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
180.0


In [61]:
# 상응하는 테이블 직접 지정해서 답변 요구
response = df_query_engines[0].query(
    "Who is the richest guy in the world in 2023?"
)
print(str(response))

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
Bernard Arnault &
family


In [62]:
table_dfs[0]

Unnamed: 0,No.,Name,Net worth\n(USD),Age,Nationality,Primary source(s) of\nwealth
0,1,Bernard Arnault &\nfamily,$211 billion,74,France,LVMH
1,2,Elon Musk,$180 billion,51,United\nStates,"Tesla, SpaceX"
2,3,Jeff Bezos,$114 billion,59,United\nStates,Amazon
3,4,Larry Ellison,$107 billion,78,United\nStates,Oracle Corporation
4,5,Warren Buffett,$106 billion,92,United\nStates,Berkshire Hathaway
5,6,Bill Gates,$104 billion,67,United\nStates,Microsoft
6,7,Michael Bloomberg,$94.5 billion,81,United\nStates,Bloomberg L.P.
7,8,Carlos Slim & family,$93 billion,83,Mexico,"Telmex, América Móvil, Grupo\nCarso"
8,9,Mukesh Ambani,$83.4 billion,65,India,Reliance Industries
9,10,Steve Ballmer,$80.7 billion,67,United\nStates,Microsoft


In [63]:
# 상응하는 테이블 지정해서 답변 요구
response = df_query_engines[2].query(
    "What's the net worth of the second richest billionaire in 2021?"
)
print(str(response))

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
151.0


In [64]:
# 상응하는 테이블 지정해서 답변 요구
response = df_query_engines[2].query(
    "where does this Jeff Bezos guy gets money from?"
)
print(str(response))

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
Amazon


In [65]:
table_dfs[2]

Unnamed: 0,No.,Name,Net worth (USD),Age,Nationality,Source(s) of wealth
0,1,Jeff Bezos,$177 billion,57,United States,Amazon
1,2,Elon Musk,$151 billion,49,United States,"Tesla, SpaceX"
2,3,Bernard Arnault & family,$150 billion,72,France,LVMH
3,4,Bill Gates,$124 billion,65,United States,Microsoft
4,5,Mark Zuckerberg,$97 billion,36,United States,Meta Platforms
5,6,Warren Buffett,$96 billion,90,United States,Berkshire Hathaway
6,7,Larry Ellison,$93 billion,76,United States,Oracle Corporation
7,8,Larry Page,$91.5 billion,48,United States,Google
8,9,Sergey Brin,$89 billion,47,United States,Google
9,10,Mukesh Ambani,$84.5 billion,63,India,Reliance Industries


In [66]:
response = df_query_engines[4].query(
    "How many billionaires were there in 2009?"
)
print(str(response))

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
There was an error running the output as Python code. Error message: 'Year'


Traceback (most recent call last):
  File "/Users/hyeonjinho/.pyenv/versions/3.11.6/lib/python3.11/site-packages/pandas/core/indexes/base.py", line 3805, in get_loc
    return self._engine.get_loc(casted_key)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "index.pyx", line 167, in pandas._libs.index.IndexEngine.get_loc
  File "index.pyx", line 196, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 7081, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 7089, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Year'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/hyeonjinho/.pyenv/versions/3.11.6/lib/python3.11/site-packages/llama_index/experimental/query_engine/pandas/output_parser.py", line 63, in default_output_processor
    output_str = str(safe_eval(module_end_str, global_vars, local_vars))


In [67]:
response = df_query_engines[4].query(
    "How much is the 2009 billionaires' combined net worth?"
)
print(str(response))

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
There was an error running the output as Python code. Error message: could not convert string to float: '$20'


Traceback (most recent call last):
  File "/Users/hyeonjinho/.pyenv/versions/3.11.6/lib/python3.11/site-packages/llama_index/experimental/query_engine/pandas/output_parser.py", line 63, in default_output_processor
    output_str = str(safe_eval(module_end_str, global_vars, local_vars))
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/hyeonjinho/.pyenv/versions/3.11.6/lib/python3.11/site-packages/llama_index/experimental/exec_utils.py", line 159, in safe_eval
    return eval(__source, _get_restricted_globals(__globals), __locals)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<string>", line 1, in <module>
  File "/Users/hyeonjinho/.pyenv/versions/3.11.6/lib/python3.11/site-packages/pandas/core/generic.py", line 6643, in astype
    new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/hyeonjinho/.pyenv/versions/3.11.6/l

In [68]:
table_dfs[4]

Unnamed: 0,No.[61],Name,Net worth (USD),Nationality
0,1,Yoshiaki Tsutsumi,$20 billion,Japan
1,2,Taikichiro Mori,$15 billion,Japan
2,3,Shigeru Kobayashi,$7.5 billion,Japan
3,4,Haruhiko Yoshimoto,$7.0 billion,Japan
4,5,Salim Ahmed Bin Mahfouz,$6.2 billion,Saudi Arabia
5,6,Hans and Gad Rausing,$6.0 billion,Sweden
6,7,Paul Reichmann,$6.0 billion,Canada
7,8,Yohachiro Iwasaki,$5.6 billion,Japan
8,9,Kenneth Thomson,$5.4 billion,Canada
9,10,Keizo Saji,$4.0 billion,Japan


질문별로 담당하는 쿼리엔진을 부여하는 것으로 heuristic하게 서칭 스페이스를 줄이고 시작할 수 있는 것 확인

In [69]:
# 쿼리엔진 요약문 생성
summaries = [
    (
        "This node provides information about the world's richest billionaires"
        " in 2023"
    ),
    (
        "This node provides information about the world's richest billionaires"
        " in 2022"
    ),
    (
        "This node provides information about the world's richest billionaires"
        " in 2021"
    ),
    (
        "This node provides information about the world's richest billionaires"
        " in 2020"
    ),
    (
        "This node provides information on the number of billionaires and"
        " their combined net worth from 2000 to 2023."
    ),
]

#생성된 요약문 별 노드단위 생성
df_nodes = [
    IndexNode(text=summary, index_id=f"pandas{idx}")
    for idx, summary in enumerate(summaries)
]

#요약노드 <-> 쿼리엔진 매핑
df_id_query_engine_mapping = {
    f"pandas{idx}": df_query_engine
    for idx, df_query_engine in enumerate(df_query_engines)
}

In [70]:
#생성된 노드 확인
df_nodes[0]

IndexNode(id_='e5a07043-d7c6-419b-9914-c4a4a9a96ff2', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, metadata_template='{key}: {value}', metadata_separator='\n', text="This node provides information about the world's richest billionaires in 2023", mimetype='text/plain', start_char_idx=None, end_char_idx=None, metadata_seperator='\n', text_template='{metadata_str}\n\n{content}', index_id='pandas0', obj=None)

In [71]:
#상위레벨 벡터스토어인덱스 정의
vector_index = VectorStoreIndex(df_nodes)
vector_retriever = vector_index.as_retriever(similarity_top_k=1)

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


RecursiveRetriever 활용한 Adaptive Engine Selection
- Pydantic Selector과 다르게 LLM 기반의 Function Calling이 아닌 2 step retrieval.
- chunk retrieve 하듯이 쿼리엔진별 description 대상으로 1차 거리계산 retrieve(top_k=1), 이후 retrieve 된 description을 가진 쿼리 엔진을 이용하여 쿼리 답안 생성

In [72]:
from llama_index.core.retrievers import RecursiveRetriever

from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core import get_response_synthesizer

recursive_retriever = RecursiveRetriever(
    "vector",
    retriever_dict={"vector": vector_retriever},
    query_engine_dict=df_id_query_engine_mapping,
    verbose=True,
)

response_synthesizer = get_response_synthesizer(response_mode="compact")

query_engine = RetrieverQueryEngine.from_args(
    recursive_retriever, response_synthesizer=response_synthesizer
)

In [73]:
response = query_engine.query(
    "What's the net worth of the second richest billionaire in 2023?"
)

[1;3;34mRetrieving with query id None: What's the net worth of the second richest billionaire in 2023?
[0mINFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
[1;3;38;5;200mRetrieved node with id, entering: pandas0
[0m[1;3;34mRetrieving with query id pandas0: What's the net worth of the second richest billionaire in 2023?
[0mINFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
[1;3;32mGot response: 180.0
[0mINFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


In [74]:
# 하위 리트리버의 response
response.source_nodes[0].node.get_content()

"Query: What's the net worth of the second richest billionaire in 2023?\nResponse: 180.0"

In [75]:
# 그걸 전달받은 상위 리트리버의 최종 답안
str(response)

'180.0'

In [76]:
response = query_engine.query("How many billionaires were there in 2009?")

[1;3;34mRetrieving with query id None: How many billionaires were there in 2009?
[0mINFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
[1;3;38;5;200mRetrieved node with id, entering: pandas4
[0m[1;3;34mRetrieving with query id pandas4: How many billionaires were there in 2009?
[0mINFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
[1;3;32mGot response: There was an error running the output as Python code. Error message: 'Year'
[0m

Traceback (most recent call last):
  File "/Users/hyeonjinho/.pyenv/versions/3.11.6/lib/python3.11/site-packages/pandas/core/indexes/base.py", line 3805, in get_loc
    return self._engine.get_loc(casted_key)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "index.pyx", line 167, in pandas._libs.index.IndexEngine.get_loc
  File "index.pyx", line 196, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 7081, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 7089, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Year'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/hyeonjinho/.pyenv/versions/3.11.6/lib/python3.11/site-packages/llama_index/experimental/query_engine/pandas/output_parser.py", line 63, in default_output_processor
    output_str = str(safe_eval(module_end_str, global_vars, local_vars))


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


In [77]:
str(response)

'There was an error in retrieving the information regarding the number of billionaires in 2009.'

DO IT YOURSELF
- 손흥민 위키피디아 pdf 코랩 환경에 업로드
- 위 exercise와 마찬가지로, camelot으로 pdf 내 테이블 파싱
- 파싱된 테이블별 pandas query engine 구축
- query engine별 연관 질문 해보기
- recursive retrieval 활용하여 상위레벨 쿼리엔진 구조 완성 하여 손흥민 AdaptiveRAG 구현해보기

In [90]:
# 파싱할 파일 경로 설정
file_path = "./son.pdf"

In [91]:
# PDF파서 정의
reader = PyMuPDFReader()
# 업로드된 경로에서 로딩스테이지 진행한 후 다큐먼트 단위로 저장
docs = reader.load(file_path)

In [92]:
doc_nodes = Settings.node_parser.get_nodes_from_documents(docs)

In [93]:
# 비교를 위한 Naive-RAG 구성
vector_index0 = VectorStoreIndex(doc_nodes)
vector_query_engine0 = vector_index0.as_query_engine()


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


In [94]:
#질문1: 손흥민이 소속된 구단이 어디야?
response = vector_query_engine0.query('손흥민이 소속된 구단이 어디야?')

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


In [95]:
print(str(response))

손흥민은 토트넘 홋스퍼 소속입니다.


In [96]:
print(response.source_nodes[0].node.get_content())

[184] 이에 해외에서는 "손흥민의 슈팅이 정확한 줄은 알았는데 저 정도였을 줄이야"라는 우스갯소리도 나오고 있다.
[185] 훈련 성적 상위 5명에게 주는 상이다.
[186] 물론 엄밀히 따지면 예비군훈련과 민방위가 있지만, 손흥민이 해외에 계속 체류한다는 전제하에 예비군훈련과 민방위는
자동으로 면제된다.
이 저작물은 CC BY-NC-SA 2.0 KR에 따라 이용할 수 있습니다. (단, 라이선스가 명시된 일부 문서 및 삽화 제외)
기여하신 문서의 저작권은 각 기여자에게 있으며, 각 기여자는 기여하신 부분의 저작권을 갖습니다.
25. 5. 2. 오전 12:17
손흥민 - 나무위키
https://namu.wiki/w/손흥민
32/33


In [97]:
# 질문2: 손흥민이 리그에서 23골을 넣었던 시즌이 언제야?
response = vector_query_engine0.query(
    "손흥민이 리그에서 23골을 넣었던 시즌이 언제야?"
)

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


In [98]:
print(str(response))

손흥민이 리그에서 23골을 넣었던 시즌은 2021-22 시즌이다.


In [99]:
print(response.source_nodes[0].node.get_content())

[118] 2019-20 시즌 FK 츠르베나 즈베즈다. 황희찬, 메흐디 타레미와 공동 최다 기록.
[119] 플레이오프 2골까지 포함하면 21골.
[120] 2019-20 시즌. 타레미와 공동 최다 기록.
[121] 2019-20 시즌. 타레미와 공동 최다 기록.
[122] 2024-25 시즌 FK 츠르베나 즈베즈다. 메흐디 타레미, 황희찬과 공동 최다 기록.
[123] 통산 9회. 2014-15시즌 조별리그 2 · 4라운드, 2016-17시즌 조별리그 2라운드, 2017-18시즌 조별리그 4라운드, 2
018-19시즌 8강 1 · 2차전, 2019-20시즌 조별리그 3 · 4라운드, 2022-23시즌 조별리그 4라운드
[124] 통산 8회. 2014-15시즌 조별리그 2 · 4라운드, 2017-18시즌 16강 2차전, 2018-19시즌 8강 1 · 2차전, 2019-20시
즌 조별리그 3 · 4라운드, 2022-23시즌 조별리그 4라운드
[125] 2022-23 시즌 조별리그 4차전, 메흐디 타레미와 더불어 둘뿐인 수상
[126] 2018-19시즌, 첫번째는 박지성
[127] 2021-22 시즌. 현재 손흥민의 한 시즌 개인 최다 골
[128] 2020-21 시즌
[129] 2020-21 시즌
[130] 2010-11 시즌 ~ 진행중
[131] 2016-17시즌 ~ 진행 중
[132] 2012-13시즌 부터 진행 중(2015-16시즌 제외)
[133] 2016-17시즌 ~ 진행 중
[134] 본인이 세웠던 기록인 22위를 경신했다.
[135] 유효표를 받은 것은 확실하게 기록되어 있으나 정확히 몇 표를 받은 것인지는 나오지 않았다.


In [100]:
print(response.source_nodes[1].node.get_content())

3시즌
합계
6
2
21
6
6
3
1
-
-
-
19
5
3
8
7
29
1
0
시즌
구단
리그
FA컵
EFL컵
대륙 대항전
총합
경
기
득점
도
움
경
기
득점
도
움
경
기
득
점
도
움
경기
득점
도움
경
기
득점
도
움
15-1
6[79]
토트넘
홋스퍼
2
8
4
1
4
1
1
1
0
0
7
[80]
3
3
4
0
8
5
16-1
7
3
4
14
6
5
6
[81]
1
-
-
-
8
[82]
1
[83]
0
4
7
21
7
17-1
8
3
7
12
6
7
2
3
2
0
2
7
[84]
4
0
5
3
18
11
18-1
9
3
1
12
6
1
1
2
4
3
0
12
[85]
4
1
4
8
20
9
19-2
0
3
0
11
1
0
4
2
0
1
0
0
6
[86]
5
1
4
1
18
11
20-2
1
3
7
17
1
0
2
0
4
3
1
0
9
[87]
4
[88]
3
[89]
5
1
22
1
7
21-2
2
3
5
23
[90]
7
2
0
0
4
0
0
4
[91]
1
[92]
1
[93]
4
5
24
8
22-2
3
3
6
10
6
3
2
0
-
-
-
8
[94]
2
0
4
7
14
6
23-2
4
3
5
17
1
0
-
-
-
1
0
0
-
-
-
3
6
17
1
0
24-2
5
2
8
7
9
2
0
1
4
1
0
9
3
2
4
3
11
1
2
10시
즌
합계
3
3
1
127
7
1
3
0
14
1
2
2
0
5
2
70
27
10
4
5
1
173
9
6
통산
4
6
6
168
8
0
4
1
17
1
3
2
0
5
2
88
32
14
6
1
6
222
[95]
1
0
9
[A] 진한 부분은 리그 최고 득점 기록이다
7.4.2. 국가대표
2025년 3월 25일 기준
25. 5. 2. 오전 12:17
손흥민 - 나무위키
https://namu.wiki/w/손흥민
11/33


In [109]:
# 성능고도화를 위해 get_tables 함수로 camelot라이브러리 활용한 테이블 파싱(페이지 14, 15, 19)
table_dfs = get_tables(file_path, pages=[10,11,14,15,19])

CropBox missing from /Page, defaulting to MediaBox
CropBox missing from /Page, defaulting to MediaBox


2025-05-02T00:19:09 - INFO - Processing page-10


INFO:camelot:Processing page-10
Processing page-10
CropBox missing from /Page, defaulting to MediaBox
CropBox missing from /Page, defaulting to MediaBox


2025-05-02T00:19:10 - INFO - Processing page-11


INFO:camelot:Processing page-11
Processing page-11
CropBox missing from /Page, defaulting to MediaBox
CropBox missing from /Page, defaulting to MediaBox


2025-05-02T00:19:11 - INFO - Processing page-14


INFO:camelot:Processing page-14
Processing page-14
CropBox missing from /Page, defaulting to MediaBox
CropBox missing from /Page, defaulting to MediaBox


2025-05-02T00:19:13 - INFO - Processing page-15


INFO:camelot:Processing page-15
Processing page-15
CropBox missing from /Page, defaulting to MediaBox
CropBox missing from /Page, defaulting to MediaBox


2025-05-02T00:19:14 - INFO - Processing page-19


INFO:camelot:Processing page-19
Processing page-19


In [110]:
len(table_dfs)

4

In [111]:
#파싱 결과 확인
table_dfs[0]

Unnamed: 0,7 3\n3 0\n2 7\n리그\nDFB 포칼\n-\n대륙 대항전\n총합\n시즌\n구단\n10-11\n11-12\n함부르크\n12-1\n3\n3시즌\n합계\n리그\nDFB 포칼\n-\n대륙 대항전\n총합\n시즌\n구단\n13-1\n4[74]\n14-1\n레버쿠젠\n5,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,Unnamed: 10,Unnamed: 11,Unnamed: 12,Unnamed: 13,Unnamed: 14,Unnamed: 15,Unnamed: 16,Unnamed: 17
0,,,경 기,득점\n3,도 움,경 기,득점\n0,도 움,경 기,득 점\n-,도 움,경기,득점\n-,도움,경 기,득점\n3,도 움\n0
1,,,1 3,,0,1,,0,-,,-,-,,-,1 4,,
2,,,,5,1\n3,,0,0\n-,,-,-\n-,,-,3 0\n-,,5,1
3,,,3 3,12\n20,2\n3\n5,1,0\n0,0\n0\n-,-,-\n-,-\n-\n-,-,-\n-,7 8\n-\n-,3 4,12\n20,2\n3
4,,,,,,,,,,,,,,,,,
5,,,,,,,,,,,,,,,,,
6,,,경 기,득점\n10,도 움,경 기,득점\n2,도 움,경 기,득 점\n-,도 움,경기,득점\n0,도움,경 기,득점\n12,도 움\n7
7,,,3 1,,4,4,,1,-,,-,8,,2,4 3,,
8,,,,11,2\n2,,1,0\n-,,-,10\n-\n[75],,5\n[76],4 2\n1\n[77],,17,3
9,15-1\n1\n6,,,0,0\n-,,-,-\n-,,-,1\n-\n[78],,0,0\n2,,0,0


In [112]:
import pandas as pd
df = table_dfs[0].replace('\n', '', regex=True)

df['클럽'] = df['클럽'].replace('', pd.NA).fillna(method='ffill')
df['시즌'] = df['시즌'].replace('', pd.NA).fillna(method='ffill')

df.columns = [
    '클럽', '시즌', '리그', '리그_경기', '리그_골', '리그_도움',
    '국내컵_경기', '국내컵_골', '국내컵_도움', '리그컵_경기',
    '리그컵_골', '리그컵_도움', 'UEFA_경기', 'UEFA_골', 'UEFA_도움',
    '합계_경기', '합계_골', '합계_도움'
]

df = df.replace('—', pd.NA)

numeric_columns = [
    '리그_경기', '리그_골', '리그_도움',
    '국내컵_경기', '국내컵_골', '국내컵_도움',
    '리그컵_경기', '리그컵_골', '리그컵_도움',
    'UEFA_경기', 'UEFA_골', 'UEFA_도움',
    '합계_경기', '합계_골', '합계_도움'
]

df[numeric_columns] = df[numeric_columns].apply(pd.to_numeric, errors='coerce')
df = df.drop(index=0)

df = df.reset_index(drop=True)
table_dfs[0] = df

KeyError: '클럽'

In [56]:
table_dfs[0]

Unnamed: 0,클럽,시즌,리그,리그_경기,리그_골,리그_도움,국내컵_경기,국내컵_골,국내컵_도움,리그컵_경기,리그컵_골,리그컵_도움,UEFA_경기,UEFA_골,UEFA_도움,합계_경기,합계_골,합계_도움
0,함부르크 SVII,2009-10,레기오날리가노드,6.0,1.0,0.0,,,,,,,,,,6.0,1.0,0.0
1,함부르크 SV,2010-11,분데스리가,13.0,3.0,0.0,1.0,0.0,0.0,,,,,,,14.0,3.0,0.0
2,함부르크 SV,2011-12,,27.0,5.0,1.0,3.0,0.0,0.0,,,,,,,30.0,5.0,1.0
3,함부르크 SV,2012-13,,33.0,12.0,2.0,1.0,0.0,0.0,,,,,,,34.0,12.0,2.0
4,함부르크 SV,합계,,73.0,20.0,3.0,5.0,0.0,0.0,,,,,,,78.0,20.0,3.0
5,바이어04 레버쿠젠,2013-14,분데스리가,31.0,10.0,4.0,4.0,2.0,1.0,,,,,0.0,2.0,43.0,12.0,7.0
6,바이어04 레버쿠젠,2014-15,,30.0,11.0,2.0,2.0,1.0,0.0,,,,,5.0,1.0,42.0,17.0,3.0
7,바이어04 레버쿠젠,2015-16,,1.0,0.0,0.0,0.0,0.0,0.0,,,,,0.0,0.0,2.0,0.0,0.0
8,바이어04 레버쿠젠,합계,,62.0,21.0,6.0,6.0,3.0,1.0,,,,19.0,5.0,3.0,87.0,29.0,10.0
9,토트넘홋스퍼,2015-16,프리미어리그,28.0,4.0,1.0,4.0,1.0,1.0,1.0,0.0,0.0,,3.0,4.0,40.0,8.0,6.0


In [57]:
table_dfs[1]

Unnamed: 0,상대팀,골 수,날짜
0,사우샘프턴 FC,10,"2016, 05, 08\n2016, 12, 29\n2017, 12, 26\n2018..."
1,레스터 시티 FC,9,"2017, 05, 19\n2018, 12, 09\n2019, 02, 10\n2022..."
2,크리스탈 팰리스 FC,9,"2015, 09, 20\n2017, 11, 05\n2019, 04, 04\n2019..."
3,AFC 본머스,7,"2017, 04, 15\n2018, 03, 12\n2018, 12, 27\n2023..."
4,왓퍼드 FC,6,"2015, 12, 29\n2017, 04, 08\n2017, 12, 03\n2019..."
5,리버풀 FC,6,"2017, 10, 23\n2020, 12, 17\n2021, 12, 20\n2022..."
6,웨스트햄 유나이티드 FC,6,"2018, 01, 05\n2019, 11, 23\n2020, 10, 19\n2022..."
7,번리 FC,6,"2017, 04, 01\n2019, 12, 07\n2020, 10, 27\n2023..."
8,아스널 FC,6,"2020, 07, 13\n2020, 12, 07\n2021, 09, 27"


In [58]:
table_dfs[2]

Unnamed: 0,#,일시,장소,상대 국가,득점,결 과,매치 형식
0,1,2011년 1월\n18일,카타르 도하 타니 빈 자심 스타디\n움,인도,4-1,4-1,2011년 AFC 아시안컵
1,2,2013년 3월\n26일,대한민국 서울 서울월드컵경기장,카타르,2-1,2-1,2014년 FIFA 월드컵 아시아\n지역 4차 예선
2,3\n4,2013년 9월 6\n일,대한민국 인천 인천축구전용경기\n장,아이티,1-0\n4-1,4-1,친선경기
3,5,2013년 10월\n15일,대한민국 천안 천안종합운동장,말리,2-1,3-1,친선경기
4,6,2014년 3월 5\n일,그리스 아테네 카라이스카키스\n스타디움,그리스,2-0,2-0,친선경기
5,7,2014년 6월\n22일,브라질 포르투알레그리 이스타지\n우 베이라히우,알제리,1-3,2-4,2014년 FIFA 월드컵
6,8\n9,2015년 1월\n22일,오스트레일리아 멜버른 멜버른\n렉탱귤러 스타디움,우즈베키\n스탄,1-0\n2-0,2-0,2015년 AFC 아시안컵
7,10,2015년 1월\n31일,오스트레일리아 시드니 스타디움\n오스트레일리아,오스트레\n일리아,1-1,1-2,2015년 AFC 아시안컵
8,11,2015년 6월\n16일,태국 방콕 라차망칼라 스타디움,미얀마,2-0,2-0,2018년 FIFA 월드컵 아시아\n지역 2차 예선
9,12\n13\n14,2015년 9월 3\n일,대한민국 화성 화성종합경기타운,라오스,2-0\n5-0\n7-0,8-0,2018년 FIFA 월드컵 아시아\n지역 2차 예선


In [59]:
# 파싱된 DF별 판다스쿼리엔진 assign
df_query_engines = [
    PandasQueryEngine(table_df, llm=llm) for table_df in table_dfs
]

In [60]:
# 질문: "손흥민이 리그에서 23골을 넣었던 시즌이 언제야?"
response = df_query_engines[0].query(
    "손흥민이 리그에서 23골을 넣었던 시즌이 언제야?"
)
print(str(response))

2021-22


In [61]:
# 질문: "손흥민이 상대팀 리버풀 상대로 몇골 넣었지?"
response = df_query_engines[1].query(
    "손흥민이 상대팀 리버풀 상대로 몇골 넣었지?"
)
print(str(response))

6


In [62]:
# 질문: "손흥민이 어떤 팀 상대로 가장 많은 골을 넣었지?"
response = df_query_engines[1].query(
    "손흥민이 어떤 팀 상대로 가장 많은 골을 넣었지?"
)
print(str(response))

사우샘프턴 FC


In [63]:
# 질문: "2018 FIFA 월드컵에서 어떤 팀들 상대로 골 넣었었지?"
response = df_query_engines[2].query(
    "2018 FIFA 월드컵에서 어떤 팀들 상대로 골 넣었었지?"
)
print(str(response))

15    멕시코
16     독일
Name: 상대 국가, dtype: object


In [64]:
table_dfs[2]

Unnamed: 0,#,일시,장소,상대 국가,득점,결 과,매치 형식
0,1,2011년 1월\n18일,카타르 도하 타니 빈 자심 스타디\n움,인도,4-1,4-1,2011년 AFC 아시안컵
1,2,2013년 3월\n26일,대한민국 서울 서울월드컵경기장,카타르,2-1,2-1,2014년 FIFA 월드컵 아시아\n지역 4차 예선
2,3\n4,2013년 9월 6\n일,대한민국 인천 인천축구전용경기\n장,아이티,1-0\n4-1,4-1,친선경기
3,5,2013년 10월\n15일,대한민국 천안 천안종합운동장,말리,2-1,3-1,친선경기
4,6,2014년 3월 5\n일,그리스 아테네 카라이스카키스\n스타디움,그리스,2-0,2-0,친선경기
5,7,2014년 6월\n22일,브라질 포르투알레그리 이스타지\n우 베이라히우,알제리,1-3,2-4,2014년 FIFA 월드컵
6,8\n9,2015년 1월\n22일,오스트레일리아 멜버른 멜버른\n렉탱귤러 스타디움,우즈베키\n스탄,1-0\n2-0,2-0,2015년 AFC 아시안컵
7,10,2015년 1월\n31일,오스트레일리아 시드니 스타디움\n오스트레일리아,오스트레\n일리아,1-1,1-2,2015년 AFC 아시안컵
8,11,2015년 6월\n16일,태국 방콕 라차망칼라 스타디움,미얀마,2-0,2-0,2018년 FIFA 월드컵 아시아\n지역 2차 예선
9,12\n13\n14,2015년 9월 3\n일,대한민국 화성 화성종합경기타운,라오스,2-0\n5-0\n7-0,8-0,2018년 FIFA 월드컵 아시아\n지역 2차 예선


In [113]:
# 쿼리엔진 요약문 생성
summaries = [
    (
        "This node provides information about 손흥민의 시즌별 통산 득점 기록"
    ),
    (
        "This node provides information about 손흥민의 상대팀별 기록"
    ),
    (
        "This node provides information about 손흥민의 국가대표팀 득점 기록"
    ),
]

#생성된 요약문 별 노드단위 생성
df_nodes = [
    IndexNode(text=summary, index_id=f"pandas{idx}")
    for idx, summary in enumerate(summaries)
]

#요약노드 <-> 쿼리엔진 매핑
df_id_query_engine_mapping = {
    f"pandas{idx}": df_query_engine
    for idx, df_query_engine in enumerate(df_query_engines)
}

In [114]:
#상위레벨 벡터스토어인덱스 정의
vector_index = VectorStoreIndex(df_nodes)
vector_retriever = vector_index.as_retriever(similarity_top_k=1)

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


In [115]:
recursive_retriever = RecursiveRetriever(
    "vector",
    retriever_dict={"vector": vector_retriever},
    query_engine_dict=df_id_query_engine_mapping,
    verbose=True,
)

response_synthesizer = get_response_synthesizer(response_mode="compact")

query_engine = RetrieverQueryEngine.from_args(
    recursive_retriever, response_synthesizer=response_synthesizer
)

In [116]:
#질문: "손흥민이 상대팀 리버풀 상대로 몇골 넣었지?"
response = query_engine.query("손흥민이 상대팀 리버풀 상대로 몇골 넣었지?")
print(str(response))

[1;3;34mRetrieving with query id None: 손흥민이 상대팀 리버풀 상대로 몇골 넣었지?
[0mINFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
[1;3;38;5;200mRetrieved node with id, entering: pandas1
[0m[1;3;34mRetrieving with query id pandas1: 손흥민이 상대팀 리버풀 상대로 몇골 넣었지?
[0mINFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
[1;3;32mGot response: 0
[0mINFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
0


In [117]:
#질문: "손흥민이 리그에서 23골을 넣었던 시즌이 언제야?"
response =query_engine.query("손흥민이 리그에서 23골을 넣었던 시즌이 언제야?")
print(str(response))

[1;3;34mRetrieving with query id None: 손흥민이 리그에서 23골을 넣었던 시즌이 언제야?
[0mINFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
[1;3;38;5;200mRetrieved node with id, entering: pandas0
[0m[1;3;34mRetrieving with query id pandas0: 손흥민이 리그에서 23골을 넣었던 시즌이 언제야?
[0mINFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
[1;3;32mGot response: There was an error running the output as Python code. Error message: attempt to get argmax of an empty sequence
[0m

Traceback (most recent call last):
  File "/Users/hyeonjinho/.pyenv/versions/3.11.6/lib/python3.11/site-packages/llama_index/experimental/query_engine/pandas/output_parser.py", line 63, in default_output_processor
    output_str = str(safe_eval(module_end_str, global_vars, local_vars))
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/hyeonjinho/.pyenv/versions/3.11.6/lib/python3.11/site-packages/llama_index/experimental/exec_utils.py", line 159, in safe_eval
    return eval(__source, _get_restricted_globals(__globals), __locals)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<string>", line 1, in <module>
  File "/Users/hyeonjinho/.pyenv/versions/3.11.6/lib/python3.11/site-packages/pandas/core/series.py", line 2761, in idxmax
    i = self.argmax(axis, skipna, *args, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/hyeonjinho/.pyenv/versions/3.11.6/lib/python3.11/site-packages/pandas/core/b

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
손흥민이 리그에서 23골을 넣었던 시즌은 2021-2022 시즌입니다.


In [118]:
#질문: "손흥민이 어떤 팀 상대로 가장 많은 골을 넣었지?"
response =query_engine.query("손흥민이 어떤 팀 상대로 가장 많은 골을 넣었지?")
print(str(response))


[1;3;34mRetrieving with query id None: 손흥민이 어떤 팀 상대로 가장 많은 골을 넣었지?
[0mINFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
[1;3;38;5;200mRetrieved node with id, entering: pandas1
[0m[1;3;34mRetrieving with query id pandas1: 손흥민이 어떤 팀 상대로 가장 많은 골을 넣었지?
[0mINFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
[1;3;32mGot response: There was an error running the output as Python code. Error message: attempt to get argmax of an empty sequence
[0m

Traceback (most recent call last):
  File "/Users/hyeonjinho/.pyenv/versions/3.11.6/lib/python3.11/site-packages/llama_index/experimental/query_engine/pandas/output_parser.py", line 63, in default_output_processor
    output_str = str(safe_eval(module_end_str, global_vars, local_vars))
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/hyeonjinho/.pyenv/versions/3.11.6/lib/python3.11/site-packages/llama_index/experimental/exec_utils.py", line 159, in safe_eval
    return eval(__source, _get_restricted_globals(__globals), __locals)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<string>", line 1, in <module>
  File "/Users/hyeonjinho/.pyenv/versions/3.11.6/lib/python3.11/site-packages/pandas/core/series.py", line 2761, in idxmax
    i = self.argmax(axis, skipna, *args, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/hyeonjinho/.pyenv/versions/3.11.6/lib/python3.11/site-packages/pandas/core/b

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
손흥민이 가장 많은 골을 넣은 팀은 번리입니다.


In [119]:
#질문: "2018 FIFA 월드컵에서 어떤 팀들 상대로 골 넣었었지?"
response =query_engine.query("2018 FIFA 월드컵에서 어떤 팀들 상대로 골 넣었었지?")
print(str(response))


[1;3;34mRetrieving with query id None: 2018 FIFA 월드컵에서 어떤 팀들 상대로 골 넣었었지?
[0mINFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
[1;3;38;5;200mRetrieved node with id, entering: pandas2
[0m[1;3;34mRetrieving with query id pandas2: 2018 FIFA 월드컵에서 어떤 팀들 상대로 골 넣었었지?
[0mINFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
[1;3;32mGot response: Series([], Name: Name, dtype: object)
[0mINFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
골을 넣은 팀에 대한 정보는 제공되지 않았습니다.
