## **Play Store Apps Analysis**
Imagine you're diving into a big dataset from the Google Play Store. You'll be using Python to explore it and find interesting insights.

First, the data will be cleaned up using Python to fix any mistakes and make it look neat.

Once the data is clean, SQL will be used to find hidden patterns and correlations in the data. However, in this project, we are only going to work on cleaning tasks.

Visualizations will also be created to make it easier to understand the data. Charts and graphs will show patterns and trends.

By the end, you'll have learned a lot about how to make smart decisions using data. This project helps everyone understand how to use data to make better choices in our digital world.

Join this adventure as you explore the Google Play Store dataset. Get ready to find out all sorts of interesting things about your favorite apps!

### **Install the required libraries**
**langchain**: An open source framework for building applications based on large language models (LLMs).

**langchain_experimental**: An experimental extension of the langchain library, offering cutting-edge features and functionalities for advanced natural language processing tasks.

**openai**: A powerful Python library developed by OpenAI, offering state-of-the-art natural language processing capabilities. It provides tools and utilities for a wide range of NLP tasks, including text generation, text classification, language translation, and more.

In [None]:
!pip install langchain
!pip install langchain_experimental
!pip install openai

### Task 1: Data Preparation: Importing Libraries, Setting up API Key, and Reading Data
Before we start our analysis, let's ensure we have everything we need. First, we'll import the required libraries. Then, we'll set up our GPT-3.5 API key for access. Finally, we'll read the data from our CSV file and get everything ready.




**To obtain an API key for OpenAI, follow these steps:**

* Log in to your OpenAI account at OpenAI Login Page(https://platform.openai.com/login?launch).

* Generate and retrieve your API key by visiting API Key Generation Page.(https://platform.openai.com/api-keys)



Download the Playstore Datasets.(https://drive.google.com/drive/folders/1l_TabQZpTRypVKwVcHoD98YSAyXZOu91?usp=sharing)



In [None]:
import os
import pandas as pd
from langchain_experimental.agents.agent_toolkits import create_pandas_dataframe_agent
from langchain.llms import OpenAI

#Setup the api key
os.environ['OPENAI_API_KEY']="your-api-key"

#Load the dataset (playstore_apps.csv) and assign it to a variable 'df'.
df = pd.read_csv('playstore_apps.csv')

### Task 2: Display Top 5 Records.

Great! Initializing our agent(OpenAI) is our next step. Once that's done, we'll create a prompt to display the first 5 records from our data table

In [None]:
# Initializing the agent
agent = create_pandas_dataframe_agent(OpenAI(temperature=0),
              df, verbose=True)

agent("""Display top 5 records of df""")

### Task 3: Determine the Shape(Count Rows and Columns) of the Dataset.
Great! For our next step, let's create a prompt to gather information about the shape of our playstore app dataset. This will help us understand how many rows and columns it contains.

In [None]:
agent("What is the shape of the dataset df?")

### Task 4: Finding the Missing data.
Wow! We've successfully found out the shape of the dataset. Now, let's create a prompt to ask our agent to help us find details about any missing information in the data table

In [None]:
agent("How many missing values are there in each column in df?")

### Task 5: Identifying Duplicate Rows.
Fantastic! We've successfully found the missing values. Now, let's create a prompt to ask our agent to count the number of duplicate rows in the playstore app dataset.

In [None]:
agent("How many duplicate rows are there in each column in df?")

### Task 6: Dropping Unnecessary Columns.
Fantastic! We successfully found the duplicates. Now, let's create a prompt to ask our agent to remove the Last Updated, Current Ver, Android Ver and Size columns from the playstore dataset.

In [None]:
agent("Remove the Last Updated, Current Ver, Android Ver and Size columns from the df and update in place?")

### Task 7: Removing Duplicate Rows.
Great progress! We were successful in dropped the columns. Now, let's create a prompt to ask our agent to remove duplicate rows from the playstore dataset.

In [None]:
agent("Please write code to remove duplicate rows from the DataFrame 'df' and update in place")

### Task 8: Filling missing values in the DataFrame.
Good job! The duplicated rows have been successfully dropped. Let's now build a prompt to request that our agent to fill missing values in the Rating, Reviews, Price and Installs columns with the mean. Fill missing values in the Type column with the string 'Free'. Finally, fill missing values in the Content Rating column with the string 'Everyone'.

In [None]:
# agent("""Fill missing values in the Rating column with their mean value.
# Fill missing values in the 'Reviews' column with their mean value.
# Fill missing values in the 'Price' column with their column mean() value and update in place.
# Fill missing values in the 'Installs' column with their column mean() value and update in place. After that,
# Fill missing values in the 'Type' column with the string 'Free' value and update in place.
# And, fill missing values in the 'Content Rating' column with the string 'Everyone' and update in place""")

In [None]:
agent("Fill missing values in the Rating column with their mean value")
agent("Fill missing values in the Reviews column with their mean value")
agent("Fill missing values in the Price column with their mean value")
agent("Fill missing values in the Installs column with their mean value")
agent("Fill missing values in the 'Type' column with the string 'Free' value")
agent("fill missing values in the 'Content Rating' column with the string 'Everyone'")
df.shape

### Task 9: Eliminating Data Anomalies.

Wow! We've successfully filled the missing values. Now, let's develop a prompt to tell our agent to remove the rows with Category equal to '1.9' from the DataFrame.

In [None]:
agent("Remove the rows with Category column equal to '1.9' from the DataFrame. update in place")

### Task 10: Check App Distribution.

Fantastic! We've successfully removed the anamolies. Now, let's construct a prompt that tells our agent to count occurrences of each unique value in the Type column.

In [None]:
agent("count occurrences of each unique value in the Type column")

### Task 11:  Cleaning App Name.
Amazing Let's now develop a prompt to advise our helper to update the 'df' DataFrame by identifying rows where App column starts with '?'. And remove those rows in the dataframe.

In [None]:
agent("remove those rows where App column starts with '?' in df and update in place")

In [None]:
# export the cleaned dataframe 'df' with the name 'cleaned_apps_db.csv' using the parameter index=True.
df

### Task 12: Display Top 5 Records for playstore_reviews.

Great! Initializing our another agent(OpenAI) is our next step. Once that's done, we'll create a prompt to display the first 5 records from our playstore_reviews table.

In [None]:
#Load the dataset (playstore_reviews.csv) and assign it to a variable 'df1'.
df1 = pd.read_csv('playstore_reviews.csv')
# Initializing the agent1.
agent1 = create_pandas_dataframe_agent(OpenAI(temperature=0),
              df1, verbose=True)

# write your prompt using the agent1

agent1('Display top 5 records of df1 dataframe')

### Task 13: Determine the Shape(Count Rows and Columns) of the Dataset.
Great! For our next step, let's create a prompt to gather information about the shape of our playstore_reviews dataset. This will help us understand how many rows and columns it contains.

In [None]:
agent1("What is the shape of the df1 dataset?")

### Task 14: Finding the Missing data.
Wow! We've successfully found out the shape of the dataset. Now, let's create a prompt to ask our agent to help us find details about any missing information in the playstore_reviews table.

In [None]:
agent1("How many missing values are there in each column in df1?")

### Task 15: Identifying Duplicate Rows.
Fantastic! We've successfully found the missing values. Now, let's create a prompt to ask our agent to count the number of duplicate rows in the playstore_reviews dataset.

In [None]:
agent1("count the duplicate rows in dataframe df1?")

### Task 16: Removing Missing values.
Incredible! We were successful in identify the duplicates. Now, let's create a prompt to ask our agent to drop rows with missing values from the playstore_reviews dataset.

In [None]:
agent1("Drop rows with missing values the DataFrame df1 and update in place")

### Task 17: Removing Duplicate Rows.
Great progress! We were successful in dropped the missing values. Now, let's create a prompt to ask our agent to remove duplicate rows from the playstore_reviews dataset.

In [None]:
agent1("Please write code to remove duplicate rows from the DataFrame df1 and update in place")

In [None]:
# export the cleaned dataframe 'df1' with the name 'cleaned_reviews_db.csv' using the parameter index=True.