# PandasAI Implementation

Repo: https://github.com/sinaptik-ai/pandas-ai

PandaAI is a Python platform that makes it easy to ask questions to your data in natural language. It helps non-technical users to interact with their data in a more natural way, and it helps technical users to save time, and effort when working with data.

In [10]:
import pandasai as pdai
import os
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

# Fetch the API key from the environment
api_key = os.getenv("PANDASAI_API_KEY")

# Set the API key
pdai.api_key.set(api_key)

In [11]:
# Check api key loaded successfully
# print(api_key)

In [12]:
# Load your dataset
df = pdai.read_csv("data/Training Data Cleaned.csv")
df.head()

Unnamed: 0,Id,Income,Age,Experience,Marital_Status,House_Ownership,Car_Ownership,Profession,City,State,Current_Job_Years,Current_House_Years,Risk_Flag
0,1,1303834,23,3,single,rented,no,Mechanical_engineer,Rewa,Madhya_Pradesh,3,13,0
1,2,7574516,40,10,single,rented,no,Software_Developer,Parbhani,Maharashtra,9,13,0
2,3,3991815,66,4,married,rented,no,Technical_writer,Alappuzha,Kerala,4,10,0
3,4,6256451,41,2,single,rented,yes,Software_Developer,Bhubaneswar,Odisha,2,12,1
4,5,5768871,47,11,single,rented,no,Civil_servant,Tiruchirappalli[10],Tamil_Nadu,3,14,1


In [13]:
# Calculate percentage of defaulted customer
df["Risk_Flag"].value_counts(normalize=True).reset_index()

Unnamed: 0,Risk_Flag,proportion
0,0,0.804515
1,1,0.195485


In [14]:
# Test the chat function
response = df.chat("What is the percentage of customer that is defaulted?") # this should gave us the answer that 19% of the customer is defaulted
print(response)

19.54850659874971


> Initial trial (23/02/2015): The library imported sucessfully, api-key is already saved, and dataset is loaded using the pandasai library. But somehow the chat function doesn't work yet. Will try to fix this later on.

> Successful trial (24/02/2025): Identified the problem which is that the api key is not set properly, now it succeeded in getting the correct answer.

In [None]:
## Second example: using the pandasai platform

# Load your CSV file
file = pdai.read_csv("data/Training Data Cleaned.csv")

# Save your dataset configuration
df = pdai.create(
  path="pai-personal-8b5a5/loan-default-prediction",
  df=file,
  description="Dataset for loan default prediction",
)

# Push your dataset to PandaBI
df.push()

Dataset saved successfully to path: pai-personal-8b5a5\loan-default-prediction
Your dataset was successfully pushed to the remote server!
🔗 URL: https://app.pandabi.ai/datasets/pai-personal-8b5a5/loan-default-prediction


> Another succeed implementation (24/02/2015): Apparently another way to implement this is that you need to push the dataset into the platform first and then after it is loaded in the datasets, you can query with natural language. 

# Several things to note

## Implementation in the platform

In the platform, queries will be executed against data that has been pushed to pai-personal-8b5a5/loan-default-prediction. The example response can be found here:

<!-- <img src="assets/Model%20thinking.png" width="500" height="300"> -->

**A. Platform Display**
![Platform display](assets/Platform%20display.png)

**B. Model Thinking**
![Model thinking](assets/Model%20thinking.png)

**C. Example Final Answer**
![Example answer](assets/Example%20answer.png)

As we can see that the response in the platform is more complete with several more additional information rather than just straightforward answer like the chat implementation above.

## Free tier limitation

- One important thing to remember is that this pandasai implementation have big limitation in the free tier, the complete information can be found in this link: https://app.pandabi.ai/admin/settings/billing 

- Here is the upgrade plans option available and free tier limitation:

![Limitation and plan offers](assets/Free%20tier%20limitation%20and%20upgrade%20plans.png)

This is actually quite expensive actually and probably not reliable for large scale project since the file size that can be handled is quite small. But it's quite a fun way to learn about this tool and use it potentially only for small projects.