# Overview

The mobile games industry is worth billions of dollars, with companies spending vast amounts of money on the development and marketing of these games to an equally large market. Using this data set, insights can be gained into a sub-market of this market, strategy games. This sub-market includes titles such as Clash of Clans, Plants vs Zombies and Pokemon GO.


# Background

This is the data of 17007 strategy games on the Apple App Store. It was collected on the 3rd of August 2019, using the iTunes API and the App Store sitemap.

# Some ideas

You could use the number of ratings as a proxy indicator for the overall success of a game, and then work out what factors make a successful game. Or you could measure the state of the market over time and try predict where it is headed. And I think an analysis of the icons of the apps would be pretty cool.
Edit

If you want download all of the icons for these apps (as 512 x 512 jpegs), I have uploaded them here https://mega.nz/#!pQNX1a7Q!DjG5wTXJ0EKp31n2wxwYuJ_WRJ5cXHChEcKLAfzUYTM

In [1]:
import pandas as pd
import numpy as np
import re
import collections
from sklearn.feature_extraction.text import TfidfVectorizer


In [2]:
df = pd.read_csv('/Users/krisanaya/Downloads/appstore_games.csv')

In [3]:
# describe data

In [3]:
def fillColumnList(x, index):
    """populates a record for a series 
       overrides index and populates None if out of range."""
    try:
        return x[index]
    except Exception as e:
        raise e

def maxList(df, index):
    """returns the max number of a index series."""
    return df[index].max()

def uniformList(x, index):  
    """creates a list of the same lenght"""
    return x[index] + ['nosubgenre'] * (6 - len(x['GenresList']))

def moveGames(x, index):
    """deletes games in list and cast literal in new series."""
    del x[index][x[index].index('games')]
    return 'games'

def subGenreDataFrame(df):
    """create SubGenre columns."""
    for index in range(5):
        df[f'SubGenre_{index}'] = df_games['GenresList'].apply(lambda x: fillColumnList(x, index))
    return df

def uniqueWords(df, index):
    """just gives me a set of uniqueWords."""
    uniqueWords = list()
    for array in df[index].to_list():
        for word in array:
            uniqueWords.append(word)
    return set(uniqueWords)


def VectorizeWordsDataFrame(df, index):
    """vectorize data frame."""
    newList = list()
    # create one list for the genres list
    for array in df[index].to_list():
        for subGenre in array:
            newList.append(subGenre)
    count = CountVectorizer()
    bag_of_words = count.fit_transform(newList)
    # Get feature names
    feature_names = count.get_feature_names()
    # Create data frame
    df_vector = pd.DataFrame(bag_of_words.toarray(), columns=feature_names)
    return df_vector


def hasSubGenres(df, index):
    return df[df[f'SubGenre_{index}'] != 'nosubgenre']
    

def hasConnection(df, index):
    npUnique = df[f'SubGenre_{index}'].unique()
    d = dict()
    for value in npUnique:
        d[value] = set()
        for element in df[df[f'SubGenre_{index}'] == value][f'SubGenre_{index + 1}']:
            d[value].add(element)
    return d
            

In [6]:
# sub set by Genre
vectoriser = TfidfVectorizer(sublinear_tf=True)
df_games = df[df['Genres'].apply(lambda x: 'Games' in x.split(',')[0])]
df_games['GenresList'] = df_games['Genres'].apply(lambda x: sorted(x.lower().replace('&', '').replace(' ', '').split(',')))
df_games['GenresList'] = df_games.apply(lambda x: uniformList(x, 'GenresList'), axis=1)
df_games['AppType'] = df_games.apply(lambda x: moveGames(x, 'GenresList'), axis=1)
df_games = subGenreDataFrame(df=df_games)
df_games['GenresSplit'] = df_games['GenresList'].apply(lambda x: ','.join(x))
df_games['FitGenres'] = list(vectoriser.fit_transform(df_games['GenresSplit']).toarray())
df_games['Current Version Release Date'] = pd.to_datetime(df_games['Current Version Release Date'])
df_games['Original Release Date'] = pd.to_datetime(df_games['Original Release Date'])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  after removing the cwd from sys.path.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: htt

In [23]:
# take the descriptions and get rid of the stop words 
# create your own categories from this descriptions
# use the price and in-app purchase data and averate user rating
# try to calculate the bar codes 
# claculate the frequency of the words in description

Unnamed: 0,URL,ID,Name,Subtitle,Icon URL,Average User Rating,User Rating Count,Price,In-app Purchases,Description,...,Current Version Release Date,GenresList,AppType,SubGenre_0,SubGenre_1,SubGenre_2,SubGenre_3,SubGenre_4,GenresSplit,FitGenres
0,https://apps.apple.com/us/app/sudoku/id284921427,284921427,Sudoku,,https://is2-ssl.mzstatic.com/image/thumb/Purpl...,4.0,3553.0,2.99,,"Join over 21,000,000 of our fans and download ...",...,2017-05-30,"[puzzle, strategy, nosubgenre, nosubgenre, nos...",games,puzzle,strategy,nosubgenre,nosubgenre,nosubgenre,"puzzle,strategy,nosubgenre,nosubgenre,nosubgenre","[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ..."
1,https://apps.apple.com/us/app/reversi/id284926400,284926400,Reversi,,https://is4-ssl.mzstatic.com/image/thumb/Purpl...,3.5,284.0,1.99,,"The classic game of Reversi, also known as Oth...",...,2018-05-17,"[board, strategy, nosubgenre, nosubgenre, nosu...",games,board,strategy,nosubgenre,nosubgenre,nosubgenre,"board,strategy,nosubgenre,nosubgenre,nosubgenre","[0.0, 0.0, 0.815930164600597, 0.0, 0.0, 0.0, 0..."
2,https://apps.apple.com/us/app/morocco/id284946595,284946595,Morocco,,https://is5-ssl.mzstatic.com/image/thumb/Purpl...,3.0,8376.0,0.00,,Play the classic strategy game Othello (also k...,...,2017-05-09,"[board, strategy, nosubgenre, nosubgenre, nosu...",games,board,strategy,nosubgenre,nosubgenre,nosubgenre,"board,strategy,nosubgenre,nosubgenre,nosubgenre","[0.0, 0.0, 0.815930164600597, 0.0, 0.0, 0.0, 0..."
3,https://apps.apple.com/us/app/sudoku-free/id28...,285755462,Sudoku (Free),,https://is3-ssl.mzstatic.com/image/thumb/Purpl...,3.5,190394.0,0.00,,"Top 100 free app for over a year.\nRated ""Best...",...,2017-05-30,"[puzzle, strategy, nosubgenre, nosubgenre, nos...",games,puzzle,strategy,nosubgenre,nosubgenre,nosubgenre,"puzzle,strategy,nosubgenre,nosubgenre,nosubgenre","[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ..."
4,https://apps.apple.com/us/app/senet-deluxe/id2...,285831220,Senet Deluxe,,https://is1-ssl.mzstatic.com/image/thumb/Purpl...,3.5,28.0,2.99,,"""Senet Deluxe - The Ancient Game of Life and A...",...,2018-07-22,"[board, education, strategy, nosubgenre, nosub...",games,board,education,strategy,nosubgenre,nosubgenre,"board,education,strategy,nosubgenre,nosubgenre","[0.0, 0.0, 0.5847528434181892, 0.0, 0.0, 0.0, ..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
17002,https://apps.apple.com/us/app/stack-puzzle-ris...,1474626442,Stack Puzzle : Rise Tower,"Blast the cubes, solve puzzle!",https://is5-ssl.mzstatic.com/image/thumb/Purpl...,,,0.00,,"The goal is very simple, move the square horiz...",...,2019-07-30,"[casual, entertainment, strategy, nosubgenre, ...",games,casual,entertainment,strategy,nosubgenre,nosubgenre,"casual,entertainment,strategy,nosubgenre,nosub...","[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.78043074..."
17003,https://apps.apple.com/us/app/eachother/id1474...,1474919257,EachOther,,https://is2-ssl.mzstatic.com/image/thumb/Purpl...,,,0.00,,Collect a score while you play!!\n\nBy linking...,...,2019-01-08,"[family, strategy, nosubgenre, nosubgenre, nos...",games,family,strategy,nosubgenre,nosubgenre,nosubgenre,"family,strategy,nosubgenre,nosubgenre,nosubgenre","[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ..."
17004,https://apps.apple.com/us/app/rabbit-vs-tortoi...,1474962324,Rabbit Vs Tortoise,,https://is2-ssl.mzstatic.com/image/thumb/Purpl...,,,0.00,,"""Rabbit Vs Tortoise is chess type cool simple ...",...,2019-03-08,"[strategy, nosubgenre, nosubgenre, nosubgenre,...",games,strategy,nosubgenre,nosubgenre,nosubgenre,nosubgenre,"strategy,nosubgenre,nosubgenre,nosubgenre,nosu...","[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ..."
17005,https://apps.apple.com/us/app/fatall/id1474963671,1474963671,FaTaLL,Most fun game!!!,https://is1-ssl.mzstatic.com/image/thumb/Purpl...,,,0.00,"9.99, 49.99, 3.99",Upgrade your character and use your skills to ...,...,2019-01-08,"[action, strategy, nosubgenre, nosubgenre, nos...",games,action,strategy,nosubgenre,nosubgenre,nosubgenre,"action,strategy,nosubgenre,nosubgenre,nosubgenre","[0.80171027585348, 0.0, 0.0, 0.0, 0.0, 0.0, 0...."


In [13]:
hasConnection(df_games, 0)

{'puzzle': {'reference', 'shopping', 'socialnetworking', 'sports', 'strategy'},
 'board': {'books',
  'business',
  'education',
  'emojiexpressions',
  'entertainment',
  'fooddrink',
  'lifestyle',
  'magazinesnewspapers',
  'music',
  'navigation',
  'news',
  'photovideo',
  'productivity',
  'reference',
  'socialnetworking',
  'sports',
  'strategy'},
 'entertainment': {'family',
  'music',
  'puzzle',
  'racing',
  'roleplaying',
  'simulation',
  'sports',
  'strategy'},
 'casual': {'education',
  'entertainment',
  'finance',
  'fooddrink',
  'healthfitness',
  'lifestyle',
  'music',
  'navigation',
  'news',
  'productivity',
  'reference',
  'shopping',
  'simulation',
  'socialnetworking',
  'sports',
  'strategy'},
 'education': {'family',
  'music',
  'puzzle',
  'racing',
  'roleplaying',
  'simulation',
  'sports',
  'strategy'},
 'action': {'books',
  'business',
  'education',
  'entertainment',
  'fooddrink',
  'healthfitness',
  'lifestyle',
  'music',
  'news',
  

In [119]:
import csv
# For python 2, skip the "newline" argument: open('dict.csv','w")
with open('dict.csv', 'w', newline="") as csv_file:  
    writer = csv.writer(csv_file)
    for key, value in connections.items():
        writer.writerow([key, value])

In [118]:
connections = hasConnection(df_games,0)

In [103]:
df_games['SubGenre_0'].unique()

array(['puzzle', 'board', 'entertainment', 'casual', 'education',
       'action', 'card', 'finance', 'adventure', 'strategy', 'simulation',
       'roleplaying', 'casino', 'family', 'navigation', 'lifestyle',
       'sports', 'racing', 'reference', 'medical', 'music',
       'socialnetworking', 'productivity', 'fooddrink', 'healthfitness',
       'news', 'photovideo', 'business', 'books'], dtype=object)

In [9]:
# import hvplot.pandas
# df_games.hvplot(width=2000, height=500, y='SubGenre_2', x='d',kind='line')

In [79]:
import hvplot.pandas
df_games.hvplot(width=2000, height=500, y='SubGenre_2', x='Current Version Release Date',kind='line')

In [234]:
# # Create the bag of words feature matrix

# newList = list()
# for array in df_games['GenresList'].to_list():
#     for subGenre in array:
#         newList.append(subGenre)
        
# count = CountVectorizer()
# bag_of_words = count.fit_transform(newList)

# # Show feature matrix
# bag_of_words.toarray()

# # Get feature names
# feature_names = count.get_feature_names()


# # View feature names
# feature_names

# df_genres = pd.DataFrame(bag_of_words.toarray(), columns=feature_names)

In [256]:
import pprint
import collections
pprint.pprint(uniqueWords(df=df_games, index='GenresList'))
collections.Counter([y for x in df_games[['SubGenre_0', 'SubGenre_1', 
                                          'SubGenre_2', 'SubGenre_3', 
                                          'SubGenre_4']].values.flatten() 
                     for y in x.split()])

{'action',
 'adventure',
 'board',
 'books',
 'business',
 'card',
 'casino',
 'casual',
 'education',
 'emojiexpressions',
 'entertainment',
 'family',
 'finance',
 'fooddrink',
 'gaming',
 'healthfitness',
 'lifestyle',
 'magazinesnewspapers',
 'medical',
 'music',
 'navigation',
 'news',
 'nosubgenre',
 'people',
 'photovideo',
 'productivity',
 'puzzle',
 'racing',
 'reference',
 'roleplaying',
 'shopping',
 'simulation',
 'socialnetworking',
 'sports',
 'stickers',
 'strategy',
 'travel',
 'trivia',
 'utilities',
 'weather',
 'word'}


Counter({'puzzle': 3849,
         'strategy': 16285,
         'nosubgenre': 40012,
         'board': 1663,
         'education': 729,
         'entertainment': 7793,
         'casual': 1673,
         'action': 1957,
         'card': 627,
         'simulation': 2068,
         'finance': 33,
         'word': 112,
         'roleplaying': 1096,
         'sports': 623,
         'adventure': 800,
         'family': 726,
         'travel': 92,
         'casino': 73,
         'business': 29,
         'navigation': 12,
         'lifestyle': 210,
         'socialnetworking': 117,
         'utilities': 100,
         'trivia': 257,
         'racing': 125,
         'reference': 52,
         'music': 105,
         'books': 29,
         'healthfitness': 29,
         'medical': 5,
         'productivity': 47,
         'fooddrink': 53,
         'news': 13,
         'photovideo': 25,
         'gaming': 2,
         'people': 1,
         'stickers': 2,
         'shopping': 3,
         'emojiexpressions': 

In [12]:
# import hvplot.pandas
# df_games.hvplot(x='Average User Rating', 
#                 y=['Subcode_0', 'Subcode_1', 'Subcode_2', 'Subcode_3', 'Subcode_4'],
#                 width=800, 
#                 height=500)


In [105]:
# newList = [y for x in df_games[['Description']].values.flatten() for y in x.lower().split()]
# count = CountVectorizer()
# bag_of_words = count.fit_transform(newList)

# # Show feature matrix
# bag_of_words.toarray()

# # Get feature names
# feature_names = count.get_feature_names()


# # View feature names
# feature_names

In [107]:
# transpose
import collections

# collections.Counter([y for x in df_games[['Description']].values.flatten() for y in x.lower().split()])

In [62]:
# # sub set of games
# # df[df['Primary Genre'] == 'Games']
# for d in df[df['Genres'].apply(lambda x: 'Games' in x.split(',')[0])]['Genres'].unique(): 
#     print(d)

In [63]:
# df_games = df[df['Genres'].apply(lambda x: 'Games' in x.split(',')[0])]
# df_ = df_games.groupby(['Genres'])

In [64]:
# df_.first().T

In [65]:
# for d in df_games['Name'].unique():
#     print(d)

# for d in df_games[df_games['Price'] >= 3.00]['Name'].unique():
#     print(d)


In [66]:
# df['Primary Genre'].value_counts().plot(kind='barh', figsize=(5,5))

In [67]:
# df['Current Version Release Date'].value_counts().sort_index().plot(kind='line', figsize=(30,20))