<a href="https://colab.research.google.com/github/naashonomics/openai/blob/main/PandasAI_demo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#PandasAI

Pandas AI is a Python library that adds generative artificial intelligence capabilities to Pandas, the popular data analysis and manipulation tool.

Here is a very simple demo about how it work!

First of all we install the dependencies:

In [None]:
!pip install --upgrade pandas pandasai

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting pandas
  Downloading pandas-2.0.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.3/12.3 MB[0m [31m63.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting pandasai
  Downloading pandasai-0.4.2-py3-none-any.whl (33 kB)
Collecting astor<0.9.0,>=0.8.1 (from pandasai)
  Downloading astor-0.8.1-py2.py3-none-any.whl (27 kB)
Collecting ipython<9.0.0,>=8.13.1 (from pandasai)
  Downloading ipython-8.14.0-py3-none-any.whl (798 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m798.7/798.7 kB[0m [31m43.8 MB/s[0m eta [36m0:00:00[0m
Collecting openai<0.28.0,>=0.27.5 (from pandasai)
  Downloading openai-0.27.8-py3-none-any.whl (73 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m73.6/73.6 kB[0m [31m7.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting python-dotenv<2.0.

Now we import the dependencies:

In [None]:
import pandas as pd
import numpy as np
from pandasai import PandasAI
from pandasai.llm.openai import OpenAI

We instanciate the llm (in this case OpenAI). Remember to change the API key with you OpenAI api key.

In [None]:
OPENAI_API_KEY = "YOUR API KEY"
llm = OpenAI(api_token=OPENAI_API_KEY)

We create a dataframe using pandas:

In [None]:
data_dict = {
    "country": [
        "Delhi",
        "Mumbai",
        "Kolkata",
        "Chennai",
        "Jaipur",
        "Lucknow",
        "Pune",
        "Bengaluru",
        "Amritsar",
        "Agra",
        "Kola",
    ],
    "annual tax collected": [
        19294482072,
        28916155672,
        24112550372,
        34358173362,
        17454337886,
        11812051350,
        16074023894,
        14909678554,
        43807565410,
        146318441864,
        np.nan,
    ],
    "happiness_index": [9.94, 7.16, 6.35, 8.07, 6.98, 6.1, 4.23, 8.22, 6.87, 3.36, np.nan],
}

df = pd.DataFrame(data_dict)
df.head()

Initialize an instance of pandasai

In [None]:
llm = OpenAI(api_token=OPENAI_API_KEY)
pandas_ai = PandasAI(llm, conversational=False)

Trying pandas features using pandasai
Prompt 1: Finding index of a value

In [None]:
# finding index of a row using value of a column
response = pandas_ai(df, "What is the index of Pune?")
print(response)

Prompt 2: Using Head() function of DataFrame

In [None]:
response = pandas_ai(df, "Show the first 5 rows of data in tabular form")
print(response)

Using Tail() function of DataFrame

In [None]:
response = pandas_ai(df, "Show the last 5 rows of data in tabular form")
print(response)

Prompt 4: Using describe() function of DataFrame

In [None]:
response = pandas_ai(df, "Show the description of data in tabular form")
print(response)

Prompt 5: Using the info() function of DataFrame

In [None]:
response = pandas_ai(df, "Show the info of data in tabular form")
print(response)

Prompt 6: Using shape attribute of dataframe

In [None]:
response = pandas_ai(df, "What is the shape of data?")
print(response)

Prompt 7: Finding any duplicate rows

In [None]:
response = pandas_ai(df, "Are there any duplicate rows?")
print(response)

Prompt 8: Finding missing values

In [None]:

response = pandas_ai(df, "Are there any missing values?")
print(response)

Prompt 9: Drop rows with missing values

In [None]:
response = pandas_ai(df, "Drop the row with missing values with inplace=True and return True when done else False ")
print(response)

Prompt 10: Print all column names

In [None]:

response = pandas_ai(df, "List all the column names")
print(response)

Prompt 11: Rename a column

In [None]:
response = pandas_ai(df, "Rename column 'country' as 'Country' keep inplace=True and list all column names")
print(response)

Prompt 12: Add a row at the end of the dataframe

In [None]:
response = pandas_ai(df, "Add the list: ['Noida',None,None] at the end of the dataframe as last row keep inplace=True")
print(response)

Prompt 13: Replace the missing values

In [None]:
response = pandas_ai(df, """Fill the NULL values in dataframe with 0 keep inplace=True
and the print the last row of dataframe""")
print(response)

Prompt 14: Calculating mean of a column

In [None]:
response = pandas_ai(df, "What is the mean of annual tax collected")
print(response)

Prompt 15: Finding frequency of unique values of a column

In [None]:
response = pandas_ai(df, "What are the value counts for the column 'Country'")
print(response)

Prompt 16: Dataframe Slicing

In [None]:

response = pandas_ai(df, "Show first 3 rows of columns 'Country' and 'happiness index'")
print(response)

Prompt 17: Using pandas where function

In [None]:
response = pandas_ai(df, "Show the data in the row where 'Country'='Mumbai'")
print(response)

Prompt 18: Using pandas where function with a range of values

In [None]:

response = pandas_ai(df, "Show the rows where 'happiness index' is between 3 and 6")
print(response)

Prompt 19: Finding 25th percentile of a column of continuous values

In [None]:

response = pandas_ai(df, "What is the 25th percentile value of 'happiness index'")
print(response)