In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib as mpl # create histogram
import matplotlib.pyplot as plt
from matplotlib import pyplot

# Load Apps Data

In [None]:
apps = pd.read_csv("/kaggle/input/google-playstore-apps/Google-Playstore.csv")

# Examine data structures

Let's look at four rows to see how the data is organized.

In [None]:
apps.head()

# Remove any spaces from variable names

We must remove any spaces from the variable names in this dataset before we can examine it.

In [None]:
apps.columns = apps.columns.str.replace(' ', '')

In [None]:
apps.hist(bins=50, figsize=(20,15))

Because there are many 0s values, we should only enter non-zero values.

# Examine the variable type

In [None]:
apps.info()

# Attempt to eliminate 0s values, particularly in the Rating column.

In [None]:
apps_no_0 = apps[(apps.Rating != 0)]

In [None]:
apps_no_0.head()

In [None]:
apps_no_0.info()

We have 1253182 entries from 2312944 after removing 0 values. Almost half of the data is made up of 0s.

# Check the histogram

In [None]:
apps_no_0.hist(bins=50, figsize=(20,15))

We can see from these graphs that we only have one form of data, aside from Rating variables. As a result, we must consider that the variable Rating must be divided into n groups. We used the equal frequency binning method to divide this group.
This approach divides each group into data with an equal frequency.

In [None]:
apps_label = pd.qcut(apps_no_0['Rating'], q=5) #with n=5
bin_labels_5 = ['Poor','Average','Good','Very Good','Excellent']
apps_no_0['Description'] = pd.qcut(apps_no_0['Rating'], q=[0,.2,.4,.6,.8,1], labels=bin_labels_5)
apps_no_0.head()
apps_label

We classify labels into five categories: poor, average, good, very good, and excellent.

1. Poor : (0.999,3.6]
2. Average : (3.6,4.1]
3. Good : (4.1,4.4]
4. Very Good : (4.4,4.7]
5. Excellent : (4.7,5.0]

# Variables for Sorting Descriptions

Sorting description variables are used to ensure rating label consistency.

In [None]:
apps_sort=apps_no_0.sort_values(by=["Description"])

In [None]:
apps_sort.hist(by = 'Category', column='Description',figsize=(20,15))
pyplot.tight_layout()
plt.show()

We can see from the graph above that the social, shopping, food & drink, events, and arcade categories are mainly great.

**1. Social and events**

Mostly because many individuals prefer using social media apps like WhatsApp, LINE, and Youtube to connect with one another. The majority of people who use social media apps are either employees or students.

**2. Shopping, Food & Drink**

This type of app is currently popular. The reason for this is that the COVID-19 pandemic season is currently underway. It makes people want to buy everything quickly, especially groceries and beverages.

**3. Arcade**

Mostly because this is the type of traditional game that everyone enjoys. It has simple gameplay and is typically played in one's spare time.

Also, productivity, finance, and dating apps consistently receive low ratings.

**1. Productivity**

The main reason for this is because people believe that productivity will make them more productive. However, there will come a day when applications will be unable to influence a person's behavior. As a result, some people are prone to blaming the apps for their mistakes.
    
**2. Finance**

The majority of finance applications are concerned with earning money, investing in stocks, and so on. However, many people struggle with stocks and lose money. This kind of thing irritates people.
    
**3. Dating**

Single people have attempted to utilize these applications to find a spouse. But, in the end, these applications can't guarantee that someone will get a date. As a result, people are more likely to give a low rating.

In [None]:
apps_sort.hist(by = 'Free', column='Description',figsize=(12,8))
pyplot.tight_layout()
plt.show()

True stated that this is a free app and False stated that this is a paid app

In [None]:
apps_sort.hist(by = 'AdSupported', column='Description',figsize=(12,8))
pyplot.tight_layout()
plt.show()

True stated that this app is Ad Supported. False stated that this app is not Ad Supported

In [None]:
apps_sort.hist(by = 'EditorsChoice', column='Description',figsize=(12,8))
pyplot.tight_layout()
plt.show()


True stated that this app is Editor's choice app. False stated that this app is not Editor's choice app.

In [None]:
apps_sort.hist(by = 'InAppPurchases', column='Description',figsize=(12,8))
pyplot.tight_layout()
plt.show()
