In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import seaborn as sns

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

### Importing and taking a look at the data

In [None]:
df = pd.read_csv('/kaggle/input/top-play-store-games/android-games.csv')
df.head()

The last column is about whether the app is free or paid, so let's check out if there are any paid games.

In [None]:
df['paid'].value_counts()

As we can see there are only 7 paid games, which means we can ignore them.

The 'installs' column is a great indicator of the popularity of a game, but it's not in a format that we want it to be in. We have to change it from string (object) to float. For this reason, we are going to define a function that will take an amount from 'install' column, remove the 'M', 'K' letters and turn it into float.

In [None]:
def get_installs(value):
    if value[-1] == 'M':
        return float(value[:-2])
    elif value[-1] == 'k':
        return (float(value[:-2]) / 1000)
    
df['installs'] = df['installs'].apply(get_installs)
df['installs'].value_counts()

Great! We've changed the 'install' column (they are in millions) and now they are in a format that can be used for further analysis.

Let's now take a look at the total installs.

In [None]:
total_installs = df.groupby(['category']).sum()[['installs']]\
.sort_values(by='installs', ascending=False)

fig, ax = plt.subplots(figsize=(12,8))
sns.lineplot(x='category', y='installs', data=total_installs)
plt.xticks(rotation=-30);

total_installs

From the graph above, we can see that arcade, casual, action and racing game categories have the highest installs. If we want to enter the market with a game, it's a good idea that we make our games in one of these categories. But what about ratings? Ratings are important indicators about an app or a game. Bad ratings (1, 2) usually mean that an app or a game has serious problems, like crashing, lagging and similar problems. Medium ratings (3, 4) mean that a game is good but has some minor problems, like not being able to buy a thing in a game. Highest ratings are just highest ratings.

Let's take a look at rating percentages across categories.

In [None]:
rating_columns = ['5 star ratings', '4 star ratings', '3 star ratings',
                  '2 star ratings', '1 star ratings']

rating_perc = df.groupby('category')[rating_columns].sum()\
.apply(lambda x: x/x.sum(), axis=1)\
.sort_values(by='5 star ratings', ascending=False)

rating_and_install = rating_perc.merge(total_installs, left_index=True, right_index=True)
rating_and_install

Trivia, sports, role playing and music categories have less than 70% 5 star ratings. Except trivia, other 3 have higher 1, 2 star ratings. There may be several reasons for this. One of them is the amount of ads. Music games include so many ads and many music games are similar. It seems making a quality music game is difficult.

Now we've seen the individual ratings for each category, let's take a look at the average ratings and visualize them.

In [None]:
fig, ax = plt.subplots(figsize=(14,6))

avg_rating = df.groupby('category')['average rating'].mean().sort_values()
avg_rating.plot()
plt.xticks(ticks=np.arange(17), labels=list(avg_rating.index), rotation=-30)
plt.xlabel('Game Categories')
plt.ylabel('Average Ratings');

As expected, music and trivia games have the lowest average ratings. But average action games ratings are also low. Let's look at it's histogram.

In [None]:
action_ratings = df[df['category'] == 'GAME ACTION']

sns.histplot(x='average rating', data=action_ratings, bins=30);

Ratings are concentrated around 4.2. We can conclude that only few games in action category have 5 star ratings, others have 3 or 4. We can also see from the data that the 5 most installed games are action games and their average ratings are between 4-4.35. It's easy to get installs for action games, but difficult to satisfy customers. So if we're entering the market with an action game, we have to make sure that it's bug free and it's not getting boring after a while. Highest ratings belong to casino, card, puzzle and word games. Because they're easy to make and simple to play. They don't require a powerful mobile phone and they capture less storage.

Now let's look at the growth of each category.

In [None]:
growth_subset = df.groupby('category')[['growth (60 days)', 'growth (30 days)']].mean()\
.sort_values(by='growth (30 days)', ascending=False)

growth_subset

In [None]:
fig, (ax1, ax2) = plt.subplots(1,2, figsize=(15,8))

ax1.plot(growth_subset['growth (30 days)'])
plt.xticks(rotation=90)
ax2.plot(growth_subset.sort_values(by='growth (60 days)', ascending=False)['growth (60 days)'])
plt.sca(ax1)
plt.xticks(rotation=90);

Action and word games have incredible amount of growth in the past 30 days, almost 7 times the third most growth! 

### Last Words

Based on our analysis, if we want to make a game that will quickly gain popularity and get installed, we might think of making an action or an arcade game. We may not be successful in terms of ratings, so we have to be very cautious about it and read every negative feedback. If we want slower but bold development, we can think of making board or card games. They're easy to make and they usually get higher ratings, but they are not installed as much as action or arcade games. One other category that we can think of is music. They have too much 1 star ratings which means they're "bug prone" and there aren't many quality games. So a quality music game will immediately draw attention from many players. They also have moderate amount of growth.