<a href="https://colab.research.google.com/github/naashonomics/openai/blob/main/PandasAI_demo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#PandasAI

Pandas AI is a Python library that adds generative artificial intelligence capabilities to Pandas, the popular data analysis and manipulation tool.

Here is a very simple demo about how it work!

First of all we install the dependencies:

In [30]:
!pip install --upgrade pandas pandasai

Collecting pandas
  Using cached pandas-2.0.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.3 MB)
Collecting tzdata>=2022.1 (from pandas)
  Using cached tzdata-2023.3-py2.py3-none-any.whl (341 kB)


Now we import the dependencies:

In [31]:
import pandas as pd
import numpy as np
from pandasai import PandasAI
from pandasai.llm.openai import OpenAI

In [24]:
class MyPandasAI(PandasAI):
    def chat(self, df, prompt):
        # Get the answer from the OpenAI API
        answer = self.llm.chat(prompt)

        # Return the answer
        return answer

We instanciate the llm (in this case OpenAI). Remember to change the API key with you OpenAI api key.

In [32]:
OPENAI_API_KEY = "YOUR OPEN AI KEY"
llm = OpenAI(api_token=OPENAI_API_KEY)
pandas_ai = MyPandasAI(llm)

We create a dataframe using pandas:

In [29]:
data_dict = {
    "country": [
        "Delhi",
        "Mumbai",
        "Kolkata",
        "Chennai",
        "Jaipur",
        "Lucknow",
        "Pune",
        "Bengaluru",
        "Amritsar",
        "Agra",
        "Kola",
    ],
    "annual tax collected": [
        19294482072,
        28916155672,
        24112550372,
        34358173362,
        17454337886,
        11812051350,
        16074023894,
        14909678554,
        43807565410,
        146318441864,
        np.nan,
    ],
    "happiness_index": [9.94, 7.16, 6.35, 8.07, 6.98, 6.1, 4.23, 8.22, 6.87, 3.36, np.nan],
}

df = pd.DataFrame(data_dict)
df.head(10)

Unnamed: 0,country,annual tax collected,happiness_index
0,Delhi,19294480000.0,9.94
1,Mumbai,28916160000.0,7.16
2,Kolkata,24112550000.0,6.35
3,Chennai,34358170000.0,8.07
4,Jaipur,17454340000.0,6.98
5,Lucknow,11812050000.0,6.1
6,Pune,16074020000.0,4.23
7,Bengaluru,14909680000.0,8.22
8,Amritsar,43807570000.0,6.87
9,Agra,146318400000.0,3.36


Initialize an instance of pandasai

Trying pandas features using pandasai
Prompt 1: Finding index of a value

In [None]:
# finding index of a row using value of a column
response = pandas_ai(df, "What is the index of Pune?")
print(response)

Prompt 2: Using Head() function of DataFrame

In [34]:
response = pandas_ai(df, "Show the first 5 rows of data in tabular form")
print(response)

   country  annual tax collected  happiness_index
0    Delhi          1.929448e+10             9.94
1   Mumbai          2.891616e+10             7.16
2  Kolkata          2.411255e+10             6.35
3  Chennai          3.435817e+10             8.07
4   Jaipur          1.745434e+10             6.98


Using Tail() function of DataFrame

In [35]:
response = pandas_ai(df, "Show the last 5 rows of data in tabular form")
print(response)

      country  annual tax collected  happiness_index
6        Pune          1.607402e+10             4.23
7   Bengaluru          1.490968e+10             8.22
8    Amritsar          4.380757e+10             6.87
9        Agra          1.463184e+11             3.36
10       Kola                   NaN              NaN


Prompt 4: Using describe() function of DataFrame

In [36]:
response = pandas_ai(df, "Show the description of data in tabular form")
print(response)

       annual tax collected  happiness_index
count          1.000000e+01        10.000000
mean           3.570575e+10         6.728000
std            4.010314e+10         1.907149
min            1.181205e+10         3.360000
25%            1.641910e+10         6.162500
50%            2.170352e+10         6.925000
75%            3.299767e+10         7.842500
max            1.463184e+11         9.940000


Prompt 5: Using the info() function of DataFrame

Prompt 6: Using shape attribute of dataframe

Prompt 7: Finding any duplicate rows

In [38]:
response = pandas_ai(df, "Are there any duplicate rows?")
print(response)

No duplicate rows found.


Prompt 8: Finding missing values

In [39]:

response = pandas_ai(df, "Are there any missing values?")
print(response)

Missing values: [2]


Prompt 9: Drop rows with missing values

Prompt 10: Print all column names

In [41]:

response = pandas_ai(df, "List all the column names")
print(response)

country, annual tax collected, happiness_index


Prompt 11: Rename a column

In [42]:
response = pandas_ai(df, "Rename column 'country' as 'Country' keep inplace=True and list all column names")
print(response)

['Country', 'annual tax collected', 'happiness_index']


Prompt 12: Add a row at the end of the dataframe

In [43]:
response = pandas_ai(df, "Add the list: ['Noida',None,None] at the end of the dataframe as last row keep inplace=True")
print(response)

None


Prompt 13: Replace the missing values

In [44]:
response = pandas_ai(df, """Fill the NULL values in dataframe with 0 keep inplace=True
and the print the last row of dataframe""")
print(response)

   Country  annual tax collected  happiness_index
11   Noida                   0.0              0.0
   Country  annual tax collected  happiness_index
11   Noida                   0.0              0.0
None


Prompt 14: Calculating mean of a column

In [45]:
response = pandas_ai(df, "What is the mean of annual tax collected")
print(response)

29754788369.666668


Prompt 15: Finding frequency of unique values of a column

Prompt 16: Dataframe Slicing

In [47]:

response = pandas_ai(df, "Show first 3 rows of columns 'Country' and 'happiness index'")
print(response)

   Country  happiness_index
0    Delhi             9.94
1   Mumbai             7.16
2  Kolkata             6.35


Prompt 17: Using pandas where function

In [48]:
response = pandas_ai(df, "Show the data in the row where 'Country'='Mumbai'")
print(response)

  Country  annual tax collected  happiness_index
1  Mumbai          2.891616e+10             7.16


Prompt 18: Using pandas where function with a range of values

In [49]:

response = pandas_ai(df, "Show the rows where 'happiness index' is between 3 and 6")
print(response)

  Country  annual tax collected  happiness_index
6    Pune          1.607402e+10             4.23
9    Agra          1.463184e+11             3.36


Prompt 19: Finding 25th percentile of a column of continuous values

In [50]:

response = pandas_ai(df, "What is the 25th percentile value of 'happiness index'")
print(response)

4.0125
