In the age of AI, many of our tasks have been automated especially after the launch of ChatGPT. One such tool that uses the power of ChatGPT to ease data manipulation task in Python is PandasAI. It leverages the power of ChatGPT to generate Python code and executes it. The output of the generated code is returned. Pandas AI helps performing tasks involving pandas library without explicitly writing lines of code. In this jupyter notebook, we will discuss about how one can use Pandas AI to simplify data manipulation.

What is Pandas AI?

Using generative AI models from OpenAI, Pandas AI is a pandas library addition. With simply a text prompt, you can produce insights from your dataframe. It utilises the <b>OpenAI-developed text-to-query generative AI<b>. 
    
The preparation of the data for analysis is a labor-intensive process for data scientists and analysts. Now they can carry on with their data analysis. Data experts may now leverage many of the methods and techniques they have studied to cut down on the time needed for data preparation thanks to Pandas AI. PandasAI should be used in conjunction with Pandas, not as a substitute for Pandas. Instead of having to manually traverse the dataset and react to inquiries about it, you can ask PandasAI these questions, and it will provide you answers in the form of Pandas DataFrames. Pandas AI wants to make it possible for you to visually communicate with a machine that will then deliver the desired results rather than having to program the work yourself. To do this, it uses the OpenAI GPT API to generate the code using Pandas library in Python and run this code in the background. The results are then returned which can be saved inside a variable.

How Can I use Pandas AI in my projects
1. Install and Import of Pandas AI library in python environment
Execute the following command in your jupyter notebook to install pandasai library in python

In [1]:
import openai

In [2]:
openai.api_key = "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"


In [3]:
def chat(message):
	response = openai.ChatCompletion.create(
		model="gpt-3.5-turbo",
		messages=[
			{"role": "user", "content": f"{message}"},
		]
	)
	return response['choices'][0]['message']['content']


In [4]:
res = chat('letter')
print(res)

Subject: Request for Information on Upcoming Conference

Dear [Conference Organizer's Name],

I hope this letter finds you well. I am writing to inquire about an upcoming conference that has been generating a lot of buzz in our industry. As an enthusiastic professional eager to expand my knowledge and network, I am keen to learn more about this event and explore the possibilities it offers.

Firstly, I would appreciate if you could provide me with information regarding the conference's theme, objectives, and target audience. Understanding the core focus of the event will help me evaluate its relevance to my current professional goals. Additionally, I would like to know if there are any industry experts or renowned speakers who have been confirmed as presenters or panelists. This knowledge will assist me in gauging the expertise and quality of the discussions and sessions.

Furthermore, could you kindly supply the dates, location, and duration of the conference? Considering my work comm

Prompts to Gather/Generate Datasets for Machine Learning

# Prompt 1:
Create a list of datasets that can be used to train {topic} models. Ensure that the datasets are available in CSV format. The objective is to use this dataset to learn about {topic}. Also, provide links to the dataset if possible. Create the list in tabular form with the following columns: Dataset name, dataset, URL, dataset description

In [9]:
prompt ='''
Create a list of datasets that can be used to train logistic regression models.
Ensure that the datasets are available in CSV format.
The objective is to use this dataset to learn about logistic regression models
and related nuances such as training the models. Also provide links to the dataset if possible.
Create the list in tabular form with following columns:
Dataset name, dataset, URL, dataset description
'''
res = chat(prompt)
print(res)


Dataset Name | Dataset URL | Dataset Description
--- | --- | --- 
Titanic: Machine Learning from Disaster | https://www.kaggle.com/c/titanic/data | Contains information about passengers aboard the RMS Titanic such as age, sex, passenger class, and survival status. The objective is to predict whether a passenger survived or not.
Bank Marketing Dataset | https://archive.ics.uci.edu/ml/datasets/Bank+Marketing | This dataset is related to a direct marketing campaign of a Portuguese banking institution. It includes attributes like age, job, marital status, and previous marketing outcomes. The goal is to predict whether a client will subscribe to a term deposit or not.
Breast Cancer Wisconsin (Diagnostic) | https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic) | This dataset contains features computed from digitized images of fine needle aspirates (FNA) of breast masses. The task is to predict whether a mass is benign or malignant.
Australian Credit Approval | https://

# Prompt 2:
Generate a dummy dataset to train and test a {machine learning model name} for educational purposes.

In [10]:
res = chat('generate a dummy dataset to train and test a logistic regression model\
for educational purposes. Ensure that the dataset is available in csv format')
print(res)


Sure! Here's an example of a dummy dataset that you can use to train and test a logistic regression model. It consists of two numerical features and a binary target variable.

```plaintext
feature1,feature2,target
1.2,3.4,1
4.5,6.7,0
2.1,5.6,1
5.5,7.8,0
3.6,4.5,0
6.7,8.9,1
4.3,2.1,1
2.8,6.3,0
```

You can copy this data into a text editor and save it as a CSV file named "dummy_dataset.csv".


# Prompt 3:
List down datasets to practice {topic}, and the if possible also attach dataset links and descriptions. Create the list in tabular format

In [11]:
prompt ='''
List down datasets to practice object detection,
if possible also attach dataset links and description.
Create the list in tabular format
'''
res = chat(prompt)
print(res)


| Dataset Name                    | Dataset Link                                                                                                          | Description                                                                                                                         |
|-----------------------------|-----------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------|
| COCO (Common Objects in Context) | [Link](http://cocodataset.org/#home)                                                                              | COCO is a large-scale object detection, segmentation, and captioning dataset with over 200,000 labeled images and 80 object categories.                                                                                                         |
| Pascal VOC Dataset                

# Prompt 4:
Create a list of datasets for practicing on {topic}. Make sure they are available in CSV format. Also, provide links to the dataset.



In [12]:
prompt ="""
Create a list of datasets for practicing on machine translation from english to hindi.
Make sure they are available in text format.
Also, provide links to the dataset.
"""
res = chat(prompt)
print(res)


1. Hindi-English Parallel Corpora: This dataset contains parallel sentences in English and Hindi languages. It consists of multiple domains such as News, Medical, Tourism, and more. The dataset is available in text format and can be found at: https://huggingface.co/datasets/cc_mt

2. IIT Bombay English-Hindi Parallel Corpus: This dataset is provided by the Center for Indian Language Technology at IIT Bombay. It contains approximately 1.5 million sentence pairs in English and Hindi. The dataset is available in text format and can be found at: http://www.cfilt.iitb.ac.in/iitb_parallel/ 

3. Indic NLP Library Parallel Corpora: This repository contains parallel corpora for multiple Indian languages, including English to Hindi translation datasets. It includes different domains such as movies, law, science, and more. The dataset is available in text format and can be accessed at: https://indicnlp.ai4bharat.org/

4. Common Crawl Corpus: Common Crawl dataset provides a large-scale web crawl c

In [13]:
prompt ="""
Create a list of food image datasets for binary image classification.
Also, provide links to the dataset.
"""
res = chat(prompt)
print(res)

1. Food-101 Dataset: A large-scale dataset with 101 food categories for image recognition, containing 101,000 images. Each class consists of 1,000 images. [Link](https://www.vision.ee.ethz.ch/datasets_extra/food-101/)

2. UEC Food 256 Dataset: A dataset with 256 food categories for recognition, comprising of 28,673 images. [Link](http://foodcam.mobi/dataset256.html)

3. Food-50 Dataset: A dataset with 50 food categories, containing 25,000 images. Each class consists of 500 images. [Link](https://data.vision.ee.ethz.ch/cvl/datasets_extra/food-50/)

4. Stanford SNAP Food Dataset: A dataset with 11 food categories and a total of 310,000 images. [Link](https://snap.stanford.edu/data/vw-food-image-recognition.html)

5. ETHZ Food-101N Dataset: A variant of the Food-101 dataset, but with noisy annotations provided by the crowd. Contains 108,175 images in 101 categories. [Link](https://data.vision.ee.ethz.ch/cvl/datasets_extra/food-101-n/)

6. Chinese Food Dataset: A comprehensive dataset with

https://pypi.org/project/pandasai/