# Pandas AI: The Generative AI Python Library

In the age of AI, many of our tasks have been automated especially after the launch of ChatGPT. One such tool that uses the power of ChatGPT to ease data manipulation task in Python is PandasAI. It leverages the power of ChatGPT to generate Python code and executes it. The output of the generated code is returned. Pandas AI helps performing tasks involving pandas library without explicitly writing lines of code. In this article we will discuss about how one can use Pandas AI to simplify data manipulation.

# What is Pandas AI
Using generative AI models from OpenAI, Pandas AI is a pandas library addition. With simply a text prompt, you can produce insights from your dataframe. It utilises the OpenAI-developed text-to-query generative AI. The preparation of the data for analysis is a labor-intensive process for data scientists and analysts. Now they can carry on with their data analysis. Data experts may now leverage many of the methods and techniques they have studied to cut down on the time needed for data preparation thanks to Pandas AI. PandasAI should be used in conjunction with Pandas, not as a substitute for Pandas. Instead of having to manually traverse the dataset and react to inquiries about it, you can ask PandasAI these questions, and it will provide you answers in the form of Pandas DataFrames. Pandas AI wants to make it possible for you to visually communicate with a machine that will then deliver the desired results rather than having to program the work yourself. To do this, it uses the OpenAI GPT API to generate the code using Pandas library in Python and run this code in the background. The results are then returned which can be saved inside a variable.

How to use Pandas AI Library
1. Install and Import of Pandas AI library in python environment
Execute the following command in your jupyter notebook to install pandasai library in python

Anaconda Prompt:-


pip install --trusted-host pypi.org --trusted-host pypi.python.org --trusted-host files.pythonhosted.org pip pandasai

In [1]:
import pandas as pd 
import numpy as np 
from pandasai import PandasAI 
from pandasai.llm.openai import OpenAI

2. Add data to an empty DataFrame
Make a dataframe using a dictionary with dummy data

In [2]:
data_dict = { 
	"country": [ 
		"Delhi", 
		"Mumbai", 
		"Kolkata", 
		"Chennai", 
		"Jaipur", 
		"Lucknow", 
		"Pune", 
		"Bengaluru", 
		"Amritsar", 
		"Agra", 
		"Kola", 
	], 
	"annual tax collected": [ 
		19294482072, 
		28916155672, 
		24112550372, 
		34358173362, 
		17454337886, 
		11812051350, 
		16074023894, 
		14909678554, 
		43807565410, 
		146318441864, 
		np.nan, 
	], 
	"happiness_index": [9.94, 7.16, 6.35, 8.07, 6.98, 6.1, 4.23, 8.22, 6.87, 3.36, np.nan], 
} 

df = pd.DataFrame(data_dict) 
df.head(11)


Unnamed: 0,country,annual tax collected,happiness_index
0,Delhi,19294480000.0,9.94
1,Mumbai,28916160000.0,7.16
2,Kolkata,24112550000.0,6.35
3,Chennai,34358170000.0,8.07
4,Jaipur,17454340000.0,6.98
5,Lucknow,11812050000.0,6.1
6,Pune,16074020000.0,4.23
7,Bengaluru,14909680000.0,8.22
8,Amritsar,43807570000.0,6.87
9,Agra,146318400000.0,3.36


In [3]:
df.tail()


Unnamed: 0,country,annual tax collected,happiness_index
6,Pune,16074020000.0,4.23
7,Bengaluru,14909680000.0,8.22
8,Amritsar,43807570000.0,6.87
9,Agra,146318400000.0,3.36
10,Kola,,


3. Initialize an instance of pandasai

In [4]:
llm = OpenAI(api_token="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") 
pandas_ai = PandasAI(llm, conversational=False)



4. Trying pandas features using pandasai
# Prompt 1: Finding index of a value

In [5]:
# finding index of a row using value of a column 
response = pandas_ai(df, "What is the index of Pune?") 
print(response)


Unfortunately, I was not able to answer your question, because of the following error:

You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.




# Prompt 2: Using Head() function of DataFrame

In [6]:
response = pandas_ai(df, "Show the first 5 rows of data in tabular form") 
print(response)


Unfortunately, I was not able to answer your question, because of the following error:

You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.



# Prompt 3: Using Tail() function of DataFrame

In [7]:
response = pandas_ai(df, "Show the last 5 rows of data in tabular form") 
print(response)


Unfortunately, I was not able to answer your question, because of the following error:

You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.



# Prompt 4: Using describe() function of DataFrame

In [8]:
response = pandas_ai(df, "Show the description of data in tabular form") 
print(response)


Unfortunately, I was not able to answer your question, because of the following error:

You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.



# Prompt 5: Using the info() function of DataFrame

In [9]:
response = pandas_ai(df, "Show the info of data in tabular form") 
print(response)


Unfortunately, I was not able to answer your question, because of the following error:

You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.



# Prompt 6: Using shape attribute of dataframe

In [10]:
response = pandas_ai(df, "What is the shape of data?") 
print(response)


Unfortunately, I was not able to answer your question, because of the following error:

You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.



# Prompt 7: Finding any duplicate rows

In [11]:
response = pandas_ai(df, "Are there any duplicate rows?") 
print(response)

Unfortunately, I was not able to answer your question, because of the following error:

You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.



# Prompt 8: Finding missing values

In [12]:
response = pandas_ai(df, "Are there any missing values?") 
print(response)

Unfortunately, I was not able to answer your question, because of the following error:

You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.



# Prompt 9: Drop rows with missing values

In [13]:
response = pandas_ai(df, "Drop the row with missing values with inplace=True and return True when done else False ") 
print(response)

Unfortunately, I was not able to answer your question, because of the following error:

You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.



Checking if the last has been removed row

In [14]:
df.tail()


Unnamed: 0,country,annual tax collected,happiness_index
6,Pune,16074020000.0,4.23
7,Bengaluru,14909680000.0,8.22
8,Amritsar,43807570000.0,6.87
9,Agra,146318400000.0,3.36
10,Kola,,


# Prompt 10: Print all column names

In [15]:
response = pandas_ai(df, "List all the column names") 
print(response)


Unfortunately, I was not able to answer your question, because of the following error:

You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.



# Prompt 11: Rename a column

In [16]:
response = pandas_ai(df, "Rename column 'country' as 'Country' keep inplace=True and list all column names") 
print(response)

Unfortunately, I was not able to answer your question, because of the following error:

You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.



# Prompt 12: Add a row at the end of the dataframe

In [17]:
response = pandas_ai(df, "Add the list: ['A',None,None] at the end of the dataframe as last row keep inplace=True") 
print(response)

Unfortunately, I was not able to answer your question, because of the following error:

You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.



# Prompt 13: Replace the missing values

In [18]:
response = pandas_ai(df, """Fill the NULL values in dataframe with 0 keep inplace=True 
and the print the last row of dataframe""") 
print(response)


Unfortunately, I was not able to answer your question, because of the following error:

You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.




# Prompt 14: Calculating mean of a column

In [19]:
response = pandas_ai(df, "What is the mean of annual tax collected") 
print(response)


Unfortunately, I was not able to answer your question, because of the following error:

You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.



# Prompt 15: Finding frequency of unique values of a column

In [20]:
response = pandas_ai(df, "What are the value counts for the column 'Country'") 
print(response)


Unfortunately, I was not able to answer your question, because of the following error:

You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.



# Prompt 16: Dataframe Slicing

In [21]:
response = pandas_ai(df, "Show first 3 rows of columns 'Country' and 'happiness index'") 
print(response)


Unfortunately, I was not able to answer your question, because of the following error:

You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.



# Prompt 17: Using pandas with function

In [22]:
response = pandas_ai(df, "Show the data in the row where 'Country'='Mumbai'") 
print(response)


Unfortunately, I was not able to answer your question, because of the following error:

You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.



# Prompt 18: Using pandas where function with a range of values

In [23]:
response = pandas_ai(df, "Show the rows where 'happiness index' is between 3 and 6") 
print(response)


Unfortunately, I was not able to answer your question, because of the following error:

You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.



# Prompt 19: Finding 25th percentile of a column of continuous values

In [24]:
response = pandas_ai(df, "What is the 25th percentile value of 'annual tax collected'") 
print(response)


Unfortunately, I was not able to answer your question, because of the following error:

You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.



# Prompt 20: Finding IQR of a column

In [25]:
response = pandas_ai(df, "What is the IQR value of 'happiness index'") 
print(response)


Unfortunately, I was not able to answer your question, because of the following error:

You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.



# Prompt 21: Plotting a box plot for a continuous column

In [26]:
response = pandas_ai(df, "Plot a box plot for the column 'happiness index'") 
print(response)


Unfortunately, I was not able to answer your question, because of the following error:

You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.



# Prompt 22: Find outliers in a column

In [27]:
response = pandas_ai(df, "Show the data of the outlier value in the columns 'happiness index'") 
print(response)


Unfortunately, I was not able to answer your question, because of the following error:

You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.



# Prompt 23: Plot a scatter plot between 2 columns

In [28]:
response = pandas_ai(df, "Plot a scatter plot for the columns'annual tax collected' and 'happiness index'") 
print(response)


Unfortunately, I was not able to answer your question, because of the following error:

You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.



# Prompt 24: Describing a column/series

In [29]:
response = pandas_ai(df, "Describe the column 'annual tax collected'") 
print(response)


Unfortunately, I was not able to answer your question, because of the following error:

You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.



# Prompt 25: Plot a bar plot between 2 columns

In [30]:
response = pandas_ai(df, "Plot a bar plot for the columns'annual tax collected' and 'Country'") 
print(response)


Unfortunately, I was not able to answer your question, because of the following error:

You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.



# Prompt 26: Saving DataFrame as a CSV file and JSON file

In [31]:
# to save the dataframe as a CSV file 
response = pandas_ai(df, "Save the dataframe to 'temp.csv'") 
# to save the dataframe as a JSON file 
response = pandas_ai(df, "Save the dataframe to 'temp.json'")


# Pros and Cons of Pandas AI
# Pros of Pandas AI

Can easily perform simple tasks without having to remember any complex syntax
Capable of giving conversational replies
Easy report generation for quick analysis or data manipulation
# Cons of Pandas AI

Cannot perform complex tasks
Cannot create or interact with variables other than the passed dataframe