In [5]:
import pandas as pd
import numpy as np
import seaborn as sns

# Load Cleaned Data

In [6]:
google_play_df = pd.read_csv('data/googleplaystore-cleaned.csv')
google_play_reviews_df = pd.read_csv('data/googleplaystore_user_reviews-cleaned.csv')

# Question 1: 
Which app category, in your opinion, has the best ratings? How are you measuring best ratings?


Use the `googleplaystore.csv`

## Answer 1: Aggregated Rating
One way to answer the question is to simply aggregate ratings by category. Below, we produce the aggregate median rating by category.

In [7]:
median_rating_by_category = google_play_df[['Category', 'Rating']].groupby('Category').median().sort_values(by='Rating', ascending=False)
median_rating_by_category

Unnamed: 0_level_0,Rating
Category,Unnamed: 1_level_1
ART_AND_DESIGN,4.4
HEALTH_AND_FITNESS,4.4
EDUCATION,4.4
COMICS,4.35
PARENTING,4.35
BOOKS_AND_REFERENCE,4.3
SHOPPING,4.3
PHOTOGRAPHY,4.3
PERSONALIZATION,4.3
GAME,4.3


### Conclusion
We conclude that, by aggregated median rating, there is a three way tie for "Highest Rated Category" between the categories listed below.

In [8]:
top_median = median_rating_by_category[median_rating_by_category['Rating'] == median_rating_by_category['Rating'].max()]
for category in top_median.index:
    print(category.replace('_', ' ').title()+'\n')

Art And Design

Health And Fitness

Education



## Answer 2: Aggregated Median with Aggregated Apps
Above we noted that the Play Store Listing data seems to have duplicate listings for some apps. Duplicate rows for a given app within a category would cause their rating to be overrepresented. To combat this issue we aggregate ratings by app within each category before computing the category-wide aggregate median rating.

In [9]:
rating_aggregated_by_app = google_play_df[['App', 'Category', 'Rating']].groupby(['Category', 'App']).median().reset_index()
rating_aggregated_by_app

Unnamed: 0,Category,App,Rating
0,ART_AND_DESIGN,350 Diy Room Decor Ideas,4.5
1,ART_AND_DESIGN,3D Color Pixel by Number - Sandbox Art Coloring,4.4
2,ART_AND_DESIGN,AJ Styles HD Wallpapers,4.8
3,ART_AND_DESIGN,AJ Styles Wallpaper 2018 - AJ Styles HD Wallpaper,4.0
4,ART_AND_DESIGN,Ai illustrator viewer,3.4
...,...,...,...
9739,WEATHER,Yahoo Weather,4.4
9740,WEATHER,Yahoo! Weather for SH Forecast for understandi...,4.2
9741,WEATHER,Yandex.Weather,4.5
9742,WEATHER,weather - weather forecast,4.7


In [10]:
app_aggregated_median_by_category = rating_aggregated_by_app[['Category', 'Rating']].groupby('Category').median().sort_values(by='Rating', ascending=False)
app_aggregated_median_by_category

Unnamed: 0_level_0,Rating
Category,Unnamed: 1_level_1
ART_AND_DESIGN,4.4
COMICS,4.4
EDUCATION,4.4
HEALTH_AND_FITNESS,4.4
PARENTING,4.35
BOOKS_AND_REFERENCE,4.3
PERSONALIZATION,4.3
GAME,4.3
EVENTS,4.25
AUTO_AND_VEHICLES,4.2


### Conclusion
We conclude that, by category app aggregated median rating, there is a four way tie for "Highest Rated Category" between the categories listed below.

In [11]:
top_app_aggregated_median = app_aggregated_median_by_category[app_aggregated_median_by_category['Rating'] == app_aggregated_median_by_category['Rating'].max()]
for category in top_app_aggregated_median.index:
    print(category.replace('_', ' ').title()+'\n')

Art And Design

Comics

Education

Health And Fitness

