Building 'Data Analyst Agent' via Agent Concept #13633

mraguth · 2024-05-22T00:51:02Z

mraguth
May 22, 2024

Is that possible to build a Data Analyst Agent get data summary or data insights from pandas dataframe? Do you suggest any best practices? I am trying to simulate the below concept.

https://www.llamaindex.ai/blog/data-agents-eed797d7972f?gad_source=1&gclid=Cj0KCQjwjLGyBhCYARIsAPqTz19bwTE9wcn3RCNBmhA5RAklvdXL4MKeAyd0NBVLZTEKTZN1P0_taX8aAi_JEALw_wcB

Answered by dosubot[bot]

May 23, 2024

To measure the token usage for the agent's actions in your current setup, you can integrate the TokenCountingHandler from the LlamaIndex library. Here is the modified version of your code with the TokenCountingHandler integrated:

import pandas as pd
from llama_index.agent.openai import OpenAIAgent
from llama_index.llms.openai import OpenAI
from llama_index.experimental.query_engine.pandas import PandasQueryEngine
from llama_index.core.schema import QueryBundle
from llama_index.core.callbacks import CallbackManager, TokenCountingHandler
import tiktoken

# Sample DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [24, 30, 22, 35],
    'Salary': [70000, 80000, 6…

View full answer

@dosu · 2024-05-22T00:51:27Z

dosubot[bot]
bot May 22, 2024

Yes, it is possible to build a Data Analyst Agent to get data summaries or insights from a pandas DataFrame. Here is a basic example using the OpenAI Function API and the PandasQueryEngine from LlamaIndex:

Import necessary libraries and define tools: You will need to import pandas for DataFrame operations and define any tools that the agent will use to interact with the data.
Initialize the LLM and the agent: Use the OpenAI model and initialize the agent with the defined tools.
Define the reasoning loop and tool abstractions: This involves setting up the logic for how the agent will process tasks and interact with the tools.

Here is a sample implementation:

import pandas as pd
from llama_index.agent.openai import OpenAIAgent
from llama_index.llms.openai import OpenAI
from llama_index.experimental.query_engine.pandas import PandasQueryEngine
from llama_index.core.schema import QueryBundle

# Sample DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [24, 30, 22, 35],
    'Salary': [70000, 80000, 60000, 90000]
}
df = pd.DataFrame(data)

# Initialize the LLM
llm = OpenAI(model="gpt-3.5-turbo-0613")

# Initialize the PandasQueryEngine
pandas_query_engine = PandasQueryEngine(df=df, llm=llm, verbose=True)

# Define a tool for querying the DataFrame
def query_dataframe(query_str):
    query_bundle = QueryBundle(query_str=query_str)
    response = pandas_query_engine._query(query_bundle)
    return response.response

# Define tools
tools = {
    'query_dataframe': query_dataframe
}

# Initialize the OpenAI agent
agent = OpenAIAgent.from_tools(tools, llm=llm, verbose=True)

# Example task: Query the DataFrame
task = {
    'tool': 'query_dataframe',
    'params': {'query_str': 'What is the average age and salary?'}
}

# Execute the task
result = agent.query(task)
print(result)

This setup allows the agent to provide data summaries or insights from the DataFrame effectively. Note that the PandasQueryEngine uses the eval function, which can execute arbitrary code, so it should be used with caution and appropriate security measures.

For more advanced use cases and to build custom agents, LlamaIndex provides a comprehensive framework that includes components for query planning, tool use, and memory modules. You can explore these components and additional resources in the LlamaIndex documentation to enhance your agent's capabilities.

For further details, you can refer to the following sources:

Details

Sources

About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

6 replies

mraguth May 22, 2024
Author

@dosu still not working. Now I am getting str error.

dosubot[bot] bot May 22, 2024

The string type error you're encountering is likely due to the way the query method is being called and the expected input types. Here is the corrected version of your code:

import pandas as pd
from llama_index.agent.openai import OpenAIAgent
from llama_index.llms.openai import OpenAI
from llama_index.experimental.query_engine.pandas import PandasQueryEngine
from llama_index.core.schema import QueryBundle

# Sample DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [24, 30, 22, 35],
    'Salary': [70000, 80000, 60000, 90000]
}
df = pd.DataFrame(data)

# Initialize the LLM
llm = OpenAI(model="gpt-3.5-turbo-0613")

# Initialize the PandasQueryEngine
pandas_query_engine = PandasQueryEngine(df=df, llm=llm, verbose=True)

# Define a tool for querying the DataFrame
def query_dataframe(query_str):
    query_bundle = QueryBundle(query_str=query_str)
    response = pandas_query_engine.query(query_bundle)
    return response.response

# Define tools
tools = {
    'query_dataframe': query_dataframe
}

# Initialize the OpenAI agent
agent = OpenAIAgent.from_tools(tools=tools, llm=llm, verbose=True)

# Example task: Query the DataFrame
task = {
    'tool': 'query_dataframe',
    'params': {'query_str': 'What is the average age and salary?'}
}

# Execute the task
result = agent.chat(task['params']['query_str'])
print(result)

Key Points:

Query Method Call: The query_dataframe function correctly initializes a QueryBundle with the query_str parameter and calls the query method of PandasQueryEngine.
Tool Definition: The tools dictionary is correctly defined with the query_dataframe function.
Agent Initialization: The OpenAIAgent is initialized with the defined tools and the LLM.

Additional Checks:

API Key: Ensure the OpenAI API key is set in your environment variables or directly in the code.
Model Name: Verify that "gpt-3.5-turbo-0613" is a valid model name.

By ensuring these corrections and checks, the string type error should be resolved, and the task should execute successfully. If the issue persists, please provide the exact error message for further diagnosis.

Details

Sources

About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

mraguth May 23, 2024
Author

@dosu. Thank you. How I measure the token usage for above agent action? can you give me a sample code please?

dosubot[bot] bot May 23, 2024

To measure the token usage for the agent's actions in your current setup, you can integrate the TokenCountingHandler from the LlamaIndex library. Here is the modified version of your code with the TokenCountingHandler integrated:

import pandas as pd
from llama_index.agent.openai import OpenAIAgent
from llama_index.llms.openai import OpenAI
from llama_index.experimental.query_engine.pandas import PandasQueryEngine
from llama_index.core.schema import QueryBundle
from llama_index.core.callbacks import CallbackManager, TokenCountingHandler
import tiktoken

# Sample DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [24, 30, 22, 35],
    'Salary': [70000, 80000, 60000, 90000]
}
df = pd.DataFrame(data)

# Initialize the LLM
llm = OpenAI(model="gpt-3.5-turbo-0613")

# Initialize the token counter with the appropriate tokenizer
token_counter = TokenCountingHandler(
    tokenizer=tiktoken.encoding_for_model("gpt-3.5-turbo").encode
)

# Set the global settings for the LLM and callback manager
Settings.llm = llm
Settings.callback_manager = CallbackManager([token_counter])

# Initialize the PandasQueryEngine
pandas_query_engine = PandasQueryEngine(df=df, llm=llm, verbose=True)

# Define a tool for querying the DataFrame
def query_dataframe(query_str):
    query_bundle = QueryBundle(query_str=query_str)
    response = pandas_query_engine.query(query_bundle)
    return response.response

# Define tools
tools = {
    'query_dataframe': query_dataframe
}

# Initialize the OpenAI agent
agent = OpenAIAgent.from_tools(tools=tools, llm=llm, verbose=True)

# Example task: Query the DataFrame
task = {
    'tool': 'query_dataframe',
    'params': {'query_str': 'What is the average age and salary?'}
}

# Execute the task
result = agent.chat(task['params']['query_str'])
print(result)

# Print the token usage
print(f"Total LLM Token Count: {token_counter.total_llm_token_count}")
print(f"Prompt LLM Token Count: {token_counter.prompt_llm_token_count}")
print(f"Completion LLM Token Count: {token_counter.completion_llm_token_count}")
print(f"Total Embedding Token Count: {token_counter.total_embedding_token_count}")

This setup will allow you to measure the token usage for the agent's actions while querying the DataFrame.

Details

Sources

About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

Answer selected by mraguth

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Building 'Data Analyst Agent' via Agent Concept #13633

{{title}}

Replies: 1 comment 6 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Details

{{title}}

{{title}}

Details

{{title}}

{{title}}

Details

Select a reply

Building 'Data Analyst Agent' via Agent Concept #13633

mraguth May 22, 2024

Replies: 1 comment · 6 replies

dosubot[bot] bot May 22, 2024

Details

mraguth May 22, 2024 Author

dosubot[bot] bot May 22, 2024

Key Points:

Additional Checks:

Details

mraguth May 23, 2024 Author

dosubot[bot] bot May 23, 2024

Details

mraguth
May 22, 2024

Replies: 1 comment 6 replies

dosubot[bot]
bot May 22, 2024

mraguth May 22, 2024
Author

mraguth May 23, 2024
Author