## **Play Store Apps Analysis**
Imagine you're diving into a big dataset from the Google Play Store. You'll be using Python to explore it and find interesting insights.

First, the data will be cleaned up using Python to fix any mistakes and make it look neat.

Once the data is clean, SQL will be used to find hidden patterns and correlations in the data. However, in this project, we are only going to work on cleaning tasks.

Visualizations will also be created to make it easier to understand the data. Charts and graphs will show patterns and trends.

By the end, you'll have learned a lot about how to make smart decisions using data. This project helps everyone understand how to use data to make better choices in our digital world.

Join this adventure as you explore the Google Play Store dataset. Get ready to find out all sorts of interesting things about your favorite apps!

### **Install the required libraries**
**langchain**: An open source framework for building applications based on large language models (LLMs).

**langchain_experimental**: An experimental extension of the langchain library, offering cutting-edge features and functionalities for advanced natural language processing tasks.

**openai**: A powerful Python library developed by OpenAI, offering state-of-the-art natural language processing capabilities. It provides tools and utilities for a wide range of NLP tasks, including text generation, text classification, language translation, and more.

In [1]:
!pip install langchain
!pip install langchain_experimental
!pip install openai

Collecting langchain
  Downloading langchain-0.1.11-py3-none-any.whl (807 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m807.5/807.5 kB[0m [31m8.1 MB/s[0m eta [36m0:00:00[0m
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain)
  Downloading dataclasses_json-0.6.4-py3-none-any.whl (28 kB)
Collecting jsonpatch<2.0,>=1.33 (from langchain)
  Downloading jsonpatch-1.33-py2.py3-none-any.whl (12 kB)
Collecting langchain-community<0.1,>=0.0.25 (from langchain)
  Downloading langchain_community-0.0.27-py3-none-any.whl (1.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m12.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting langchain-core<0.2,>=0.1.29 (from langchain)
  Downloading langchain_core-0.1.30-py3-none-any.whl (256 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m256.9/256.9 kB[0m [31m16.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting langchain-text-splitters<0.1,>=0.0.1 (from langchain)
  Downloa

### Task 1: Data Preparation: Importing Libraries, Setting up API Key, and Reading Data
Before we start our analysis, let's ensure we have everything we need. First, we'll import the required libraries. Then, we'll set up our GPT-3.5 API key for access. Finally, we'll read the data from our CSV file and get everything ready.




**To obtain an API key for OpenAI, follow these steps:**

* Log in to your OpenAI account at OpenAI Login Page(https://platform.openai.com/login?launch).

* Generate and retrieve your API key by visiting API Key Generation Page.(https://platform.openai.com/api-keys)



Download the Playstore Datasets.(https://drive.google.com/drive/folders/1l_TabQZpTRypVKwVcHoD98YSAyXZOu91?usp=sharing)



In [2]:
import os
import pandas as pd
from langchain_experimental.agents.agent_toolkits import create_pandas_dataframe_agent
from langchain.llms import OpenAI

#Setup the api key
os.environ['OPENAI_API_KEY']= "my key"

#Load the dataset (playstore_apps.csv) and assign it to a variable 'df'.
df = pd.read_csv('playstore_apps.csv')

### Task 2: Display Top 5 Records.

Great! Initializing our agent(OpenAI) is our next step. Once that's done, we'll create a prompt to display the first 5 records from our data table

In [3]:
# Initializing the agent
agent = create_pandas_dataframe_agent(OpenAI(temperature=0),
              df, verbose=True)

# write your prompt using the agent
agent("""Display Top 5 records from the df""")

  warn_deprecated(
  warn_deprecated(




[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to use the head() function to display the first 5 rows of the dataframe
Action: python_repl_ast
Action Input: print(df.head())[0m[36;1m[1;3m                                                 App        Category  Rating  \
0     Photo Editor & Candy Camera & Grid & ScrapBook  ART_AND_DESIGN     4.1   
1                                Coloring book moana  ART_AND_DESIGN     3.9   
2  U Launcher Lite – FREE Live Cool Themes, Hide ...  ART_AND_DESIGN     4.7   
3                              Sketch - Draw & Paint  ART_AND_DESIGN     4.5   
4              Pixel Draw - Number Art Coloring Book  ART_AND_DESIGN     4.3   

    Reviews  Size    Installs  Type  Price Content Rating  \
0     159.0   19M     10000.0  Free    0.0       Everyone   
1     967.0   14M    500000.0  Free    0.0       Everyone   
2   87510.0  8.7M   5000000.0  Free    0.0       Everyone   
3  215644.0   25M  50000000.0  Free    0.0           Te

{'input': 'Display Top 5 records from the df',
 'output': 'The top 5 records from the df have been displayed.'}

### Task 3: Determine the Shape(Count Rows and Columns) of the Dataset.
Great! For our next step, let's create a prompt to gather information about the shape of our playstore app dataset. This will help us understand how many rows and columns it contains.

In [4]:
# write your prompt using the agent
agent("""What is the shape of the df?""")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to use the shape attribute to get the number of rows and columns in the dataframe.
Action: python_repl_ast
Action Input: df.shape[0m[36;1m[1;3m(10841, 13)[0m[32;1m[1;3m10841 rows and 13 columns
Final Answer: The shape of the df is (10841, 13).[0m

[1m> Finished chain.[0m


{'input': 'What is the shape of the df?',
 'output': 'The shape of the df is (10841, 13).'}

### Task 4: Finding the Missing data.
Wow! We've successfully found out the shape of the dataset. Now, let's create a prompt to ask our agent to help us find details about any missing information in the data table

In [5]:
# write your prompt using the agent
agent("""How many missing values are there in each column of the df?""")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to find a way to count the number of missing values in each column.
Action: python_repl_ast
Action Input: df.isnull().sum()[0m[36;1m[1;3mApp                  0
Category             0
Rating            1474
Reviews              1
Size                 0
Installs             1
Type                 1
Price                1
Content Rating       1
Genres               0
Last Updated         1
Current Ver          8
Android Ver          3
dtype: int64[0m[32;1m[1;3m I now know the final answer
Final Answer: There are 1474 missing values in the "Rating" column, 1 missing value in the "Reviews", "Installs", "Type", "Price", "Content Rating", and "Last Updated" columns, 8 missing values in the "Current Ver" column, and 3 missing values in the "Android Ver" column.[0m

[1m> Finished chain.[0m


{'input': 'How many missing values are there in each column of the df?',
 'output': 'There are 1474 missing values in the "Rating" column, 1 missing value in the "Reviews", "Installs", "Type", "Price", "Content Rating", and "Last Updated" columns, 8 missing values in the "Current Ver" column, and 3 missing values in the "Android Ver" column.'}

### Task 5: Identifying Duplicate Rows.
Fantastic! We've successfully found the missing values. Now, let's create a prompt to ask our agent to count the number of duplicate rows in the playstore app dataset.

In [6]:
# write your prompt using the agent
agent("""Give me the number of duplicate rows in the df.""")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to find a way to identify duplicate rows in the dataframe.
Action: python_repl_ast
Action Input: df.duplicated()[0m[36;1m[1;3m0        False
1        False
2        False
3        False
4        False
         ...  
10836    False
10837    False
10838    False
10839    False
10840    False
Length: 10841, dtype: bool[0m[32;1m[1;3mThis output shows that there are no duplicate rows in the dataframe.
Action: python_repl_ast
Action Input: df.duplicated().sum()[0m[36;1m[1;3m483[0m[32;1m[1;3m483 duplicate rows were identified in the dataframe.
Final Answer: 483[0m

[1m> Finished chain.[0m


{'input': 'Give me the number of duplicate rows in the df.', 'output': '483'}

### Task 6: Dropping Unnecessary Columns.
Fantastic! We successfully found the duplicates. Now, let's create a prompt to ask our agent to remove the Last Updated, Current Ver, Android Ver and Size columns from the playstore dataset.

In [7]:
# write your prompt using the agent
agent("""Write a code to do the following:
 - Drop the columns 'Last Updated', 'Current Ver', 'Android Ver','Size' from the df inplace. Make sure that the df is updated inplace
 - Give me the shape of the df after the columns have been dropped
""")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to use the drop function from pandas and specify the columns I want to drop. I also need to set the inplace parameter to True to update the df inplace.

Action: python_repl_ast
Action Input: df.drop(columns=['Last Updated', 'Current Ver', 'Android Ver', 'Size'], inplace=True)[0m[36;1m[1;3m[0m[32;1m[1;3m The columns have been dropped and the df has been updated inplace.
Action: python_repl_ast
Action Input: df.shape[0m[36;1m[1;3m(10841, 9)[0m[32;1m[1;3m I now know the final answer.
Final Answer: The shape of the df after dropping the specified columns is (10841, 9).[0m

[1m> Finished chain.[0m


{'input': "Write a code to do the following:\n - Drop the columns 'Last Updated', 'Current Ver', 'Android Ver','Size' from the df inplace. Make sure that the df is updated inplace\n - Give me the shape of the df after the columns have been dropped\n",
 'output': 'The shape of the df after dropping the specified columns is (10841, 9).'}

### Task 7: Removing Duplicate Rows.
Great progress! We were successful in dropped the columns. Now, let's create a prompt to ask our agent to remove duplicate rows from the playstore dataset.

In [8]:
# write your prompt using the agent
agent(""""Write a code to do the following:
- Remove the duplicate rows in the df inplace while keeping the first occurrence of each duplicate rows. Make sure that the changes are made directly in the df using the appropriate command
- Give me the shape of the df after removing the duplicate rows
""")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to use the drop_duplicates() function to remove duplicate rows and set the inplace parameter to True to make changes directly in the df.
Action: python_repl_ast
Action Input: df.drop_duplicates(inplace=True)[0m[36;1m[1;3m[0m[32;1m[1;3mI need to use the shape attribute to get the shape of the df.
Action: python_repl_ast
Action Input: df.shape[0m[36;1m[1;3m(10356, 9)[0m[32;1m[1;3m10356 rows and 9 columns are the shape of the df after removing duplicate rows.
Final Answer: (10356, 9)[0m

[1m> Finished chain.[0m


{'input': '"Write a code to do the following:\n- Remove the duplicate rows in the df inplace while keeping the first occurrence of each duplicate rows. Make sure that the changes are made directly in the df using the appropriate command\n- Give me the shape of the df after removing the duplicate rows\n',
 'output': '(10356, 9)'}

### Task 8: Filling missing values in the DataFrame.
Good job! The duplicated rows have been successfully dropped. Let's now build a prompt to request that our agent to fill missing values in the Rating, Reviews, Price and Installs columns with the mean. Fill missing values in the Type column with the string 'Free'. Finally, fill missing values in the Content Rating column with the string 'Everyone'.

In [9]:
# write your prompt using the agent
agent("""Write a code to fill missing values in the DataFrame 'df' according to the following instructions:
1. Fill missing values in the 'Rating' column with the mean of the 'Rating' column. Make sure to update 'df' inplace.
2. Fill missing values in the 'Reviews' column with the mean of the 'Reviews' column. Make sure to update 'df' inplace.
3. Fill missing values in the 'Price' column with the mean of the 'Price' column. Make sure to update 'df' inplace.
4. Fill missing values in the 'Installs' column with the mean of the 'Installs' column. Make sure to update 'df' inplace.
5. Fill missing values in the 'Type' column with the string 'Free'. Make sure to update 'df' inplace.
6. Fill missing values in the 'Content Rating' column with the string 'Everyone'. Make sure to update 'df' inplace.
7. After filling missing values in the above columns, check the number of missing values in 'df' for each column.
""")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to use the fillna() function to fill in the missing values in the dataframe.
Action: python_repl_ast
Action Input: df.fillna()[0m[36;1m[1;3mValueError: Must specify a fill 'value' or 'method'.[0m[32;1m[1;3m I need to specify a fill value or method for the fillna() function to work.
Action: python_repl_ast
Action Input: df.fillna(df.mean(), inplace=True)[0m[36;1m[1;3m[0m



[32;1m[1;3m I have now filled in the missing values with the mean of each column.
Action: python_repl_ast
Action Input: df.isnull().sum()[0m[36;1m[1;3mApp               0
Category          0
Rating            0
Reviews           0
Installs          0
Type              1
Price             0
Content Rating    1
Genres            0
dtype: int64[0m[32;1m[1;3m There is still one missing value in the 'Type' column and one missing value in the 'Content Rating' column.
Action: python_repl_ast
Action Input: df['Type'].fillna('Free', inplace=True)[0m[36;1m[1;3m[0m[32;1m[1;3m I have now filled in the missing value in the 'Type' column with the string 'Free'.
Action: python_repl_ast
Action Input: df['Content Rating'].fillna('Everyone', inplace=True)[0m[36;1m[1;3m[0m[32;1m[1;3m I have now filled in the missing value in the 'Content Rating' column with the string 'Everyone'.
Action: python_repl_ast
Action Input: df.isnull().sum()[0m[36;1m[1;3mApp               0
Category      

{'input': "Write a code to fill missing values in the DataFrame 'df' according to the following instructions:\n1. Fill missing values in the 'Rating' column with the mean of the 'Rating' column. Make sure to update 'df' inplace.\n2. Fill missing values in the 'Reviews' column with the mean of the 'Reviews' column. Make sure to update 'df' inplace.\n3. Fill missing values in the 'Price' column with the mean of the 'Price' column. Make sure to update 'df' inplace.\n4. Fill missing values in the 'Installs' column with the mean of the 'Installs' column. Make sure to update 'df' inplace.\n5. Fill missing values in the 'Type' column with the string 'Free'. Make sure to update 'df' inplace.\n6. Fill missing values in the 'Content Rating' column with the string 'Everyone'. Make sure to update 'df' inplace. \n7. After filling missing values in the above columns, check the number of missing values in 'df' for each column.\n",
 'output': "The final answer is that there are no more missing values 

### Task 9: Eliminating Data Anomalies.

Wow! We've successfully filled the missing values. Now, let's develop a prompt to tell our agent to remove the rows with Category equal to '1.9' from the DataFrame.

In [10]:
# write your prompt using the agent
agent("""Write a code to remove all the rows that have 'Category' column value equal to '1.9'. Make sure to update the current df after removing the rows.
After removing all the rows, give the shape of the update df.
""")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to use the pandas dataframe method 'drop' to remove rows based on a condition.
Action: python_repl_ast
Action Input: df.drop(df[df['Category'] == '1.9'].index, inplace=True)[0m[36;1m[1;3m[0m[32;1m[1;3m The rows with 'Category' column value equal to '1.9' have been removed.
Action: python_repl_ast
Action Input: df.shape[0m[36;1m[1;3m(10355, 9)[0m[32;1m[1;3m I now know the final answer.
Final Answer: The shape of the updated dataframe is (10355, 9).[0m

[1m> Finished chain.[0m


{'input': "Write a code to remove all the rows that have 'Category' column value equal to '1.9'. Make sure to update the current df after removing the rows.\nAfter removing all the rows, give the shape of the update df.\n",
 'output': 'The shape of the updated dataframe is (10355, 9).'}

### Task 10: Check App Distribution.

Fantastic! We've successfully removed the anamolies. Now, let's construct a prompt that tells our agent to count occurrences of each unique value in the Type column.

In [11]:
# write your prompt using the agent
agent("""Give me the count of occurrences of each unique value in the 'Type' column in the 'df'.""")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to use the value_counts() function to get the count of each unique value in the 'Type' column.
Action: python_repl_ast
Action Input: df['Type'].value_counts()[0m[36;1m[1;3mFree    9590
Paid     765
Name: Type, dtype: int64[0m[32;1m[1;3mI now know the final answer
Final Answer: Free    9590
Paid     765
Name: Type, dtype: int64[0m

[1m> Finished chain.[0m


{'input': "Give me the count of occurrences of each unique value in the 'Type' column in the 'df'.",
 'output': 'Free    9590\nPaid     765\nName: Type, dtype: int64'}

### Task 11:  Cleaning App Name.
Amazing Let's now develop a prompt to advise our helper to update the 'df' DataFrame by identifying rows where App column starts with '?'. And remove those rows in the dataframe.

In [12]:
# write your prompt using the agent
agent("""Write a code to remove the rows from the 'df' where the value in the 'App' column starts with '?'. Make sure to update the current df after removing the rows.
And give me the shape of the updated df.
""")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to use the pandas dataframe method 'drop' to remove rows based on a condition.
Action: python_repl_ast
Action Input: df.drop(df[df['App'].str.startswith('?')].index, inplace=True)[0m[36;1m[1;3m[0m[32;1m[1;3m I need to use the pandas dataframe attribute 'shape' to get the shape of the updated dataframe.
Action: python_repl_ast
Action Input: df.shape[0m[36;1m[1;3m(10325, 9)[0m[32;1m[1;3m I now know the final answer
Final Answer: The shape of the updated dataframe is (10325, 9).[0m

[1m> Finished chain.[0m


{'input': "Write a code to remove the rows from the 'df' where the value in the 'App' column starts with '?'. Make sure to update the current df after removing the rows.\nAnd give me the shape of the updated df. \n",
 'output': 'The shape of the updated dataframe is (10325, 9).'}

In [13]:
# export the cleaned dataframe 'df' with the name 'cleaned_apps_db.csv' using the parameter index=True.
df.to_csv('cleaned_apps_db.csv', index = True)

### Task 12: Display Top 5 Records for playstore_reviews.

Great! Initializing our another agent(OpenAI) is our next step. Once that's done, we'll create a prompt to display the first 5 records from our playstore_reviews table.

In [14]:
#Load the dataset (playstore_reviews.csv) and assign it to a variable 'df1'.
df1 = pd.read_csv('playstore_reviews.csv')
# Initializing the agent1.
agent1 = create_pandas_dataframe_agent(OpenAI(temperature=0),
              df1, verbose=True)

# write your prompt using the agent1
agent1("""Display Top 5 rows from 'df1'.""")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to use the head() function to display the top 5 rows.
Action: python_repl_ast
Action Input: print(df.head())[0m[36;1m[1;3m                     App                                  Translated_Review  \
0  10 Best Foods for You  I like eat delicious food. That's I'm cooking ...   
1  10 Best Foods for You    This help eating healthy exercise regular basis   
2  10 Best Foods for You                                                NaN   
3  10 Best Foods for You         Works great especially going grocery store   
4  10 Best Foods for You                                       Best idea us   

  Sentiment  Sentiment_Polarity  Sentiment_Subjectivity  
0  Positive                1.00                0.533333  
1  Positive                0.25                0.288462  
2       NaN                 NaN                     NaN  
3  Positive                0.40                0.875000  
4  Positive                1.00  

{'input': "Display Top 5 rows from 'df1'.",
 'output': "The top 5 rows from 'df1' are displayed using the head() function."}

### Task 13: Determine the Shape(Count Rows and Columns) of the Dataset.
Great! For our next step, let's create a prompt to gather information about the shape of our playstore_reviews dataset. This will help us understand how many rows and columns it contains.

In [15]:
# write your prompt using the agent1
agent1("""What is the shape of df1?""")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to use the shape attribute to get the shape of the dataframe.
Action: python_repl_ast
Action Input: df.shape[0m[36;1m[1;3m(64295, 5)[0m[32;1m[1;3mI now know the final answer
Final Answer: The shape of df1 is (64295, 5).[0m

[1m> Finished chain.[0m


{'input': 'What is the shape of df1?',
 'output': 'The shape of df1 is (64295, 5).'}

### Task 14: Finding the Missing data.
Wow! We've successfully found out the shape of the dataset. Now, let's create a prompt to ask our agent to help us find details about any missing information in the playstore_reviews table.

In [16]:
# write your prompt using the agent1
agent1("""How many missing values are present in each column of df1?""")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to find a way to count the number of missing values in each column.
Action: python_repl_ast
Action Input: df.isnull().sum()[0m[36;1m[1;3mApp                           0
Translated_Review         26868
Sentiment                 26863
Sentiment_Polarity        26863
Sentiment_Subjectivity    26863
dtype: int64[0m[32;1m[1;3mI now know the final answer
Final Answer: There are 26868 missing values in the Translated_Review column, and 26863 missing values in the Sentiment, Sentiment_Polarity, and Sentiment_Subjectivity columns. The App column has no missing values.[0m

[1m> Finished chain.[0m


{'input': 'How many missing values are present in each column of df1?',
 'output': 'There are 26868 missing values in the Translated_Review column, and 26863 missing values in the Sentiment, Sentiment_Polarity, and Sentiment_Subjectivity columns. The App column has no missing values.'}

### Task 15: Identifying Duplicate Rows.
Fantastic! We've successfully found the missing values. Now, let's create a prompt to ask our agent to count the number of duplicate rows in the playstore_reviews dataset.

In [17]:
# write your prompt using the agent1
agent1("""What is the number of duplicate rows in the df1? """)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to find the number of duplicate rows in the dataframe.
Action: python_repl_ast
Action Input: df.duplicated().sum()[0m[36;1m[1;3m33616[0m[32;1m[1;3mI now know the final answer.
Final Answer: 33616[0m

[1m> Finished chain.[0m


{'input': 'What is the number of duplicate rows in the df1? ',
 'output': '33616'}

### Task 16: Removing Missing values.
Incredible! We were successful in identify the duplicates. Now, let's create a prompt to ask our agent to drop rows with missing values from the playstore_reviews dataset.

In [18]:
# write your prompt using the agent1
agent1("""Drop the rows with the missing values in the df1. Make sure to drop the values inplace and update the df1 accordingly.
And give me the shape of the updated df1.
""")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to use the dropna() function to drop the rows with missing values.
Action: python_repl_ast
Action Input: df.dropna(inplace=True)[0m[36;1m[1;3m[0m[32;1m[1;3m I need to use the shape attribute to get the shape of the updated dataframe.
Action: python_repl_ast
Action Input: df.shape[0m[36;1m[1;3m(37427, 5)[0m[32;1m[1;3m I now know the final answer
Final Answer: The shape of the updated dataframe is (37427, 5).[0m

[1m> Finished chain.[0m


{'input': 'Drop the rows with the missing values in the df1. Make sure to drop the values inplace and update the df1 accordingly.\nAnd give me the shape of the updated df1.\n',
 'output': 'The shape of the updated dataframe is (37427, 5).'}

### Task 17: Removing Duplicate Rows.
Great progress! We were successful in dropped the missing values. Now, let's create a prompt to ask our agent to remove duplicate rows from the playstore_reviews dataset.

In [19]:
# write your prompt using the agent1
agent1(""""Write a code to do the following to the 'df1':
- Remove the duplicate rows in the 'df1' inplace while keeping the first occurrence of each duplicate row. Make sure that the changes are made directly in the df1 using the appropriate command
- Give me the shape of the df after removing the duplicate rows.
""")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to use the drop_duplicates() function and set the inplace parameter to True to make changes directly in the dataframe.
Action: python_repl_ast
Action Input: df.drop_duplicates(inplace=True)[0m[36;1m[1;3m[0m[32;1m[1;3mI need to use the shape attribute to get the shape of the dataframe.
Action: python_repl_ast
Action Input: df.shape[0m[36;1m[1;3m(29692, 5)[0m[32;1m[1;3m29692 is the number of rows and 5 is the number of columns.
Final Answer: The shape of the dataframe after removing duplicate rows is (29692, 5).[0m

[1m> Finished chain.[0m


{'input': '"Write a code to do the following to the \'df1\':\n- Remove the duplicate rows in the \'df1\' inplace while keeping the first occurrence of each duplicate row. Make sure that the changes are made directly in the df1 using the appropriate command\n- Give me the shape of the df after removing the duplicate rows.\n',
 'output': 'The shape of the dataframe after removing duplicate rows is (29692, 5).'}

In [20]:
df1.duplicated().sum()

0

In [21]:
df1.shape

(29692, 5)

In [22]:
# export the cleaned dataframe 'df1' with the name 'cleaned_reviews_db.csv' using the parameter index=True.
df1.to_csv('cleaned_reviews_db.csv', index = True)