## Profitable App Profiles for the Apple App Store

Author: Julian Moors\
Contact: julian.moors@outlook.com

### Introduction
_The goal for this project is to analyse data to help developers understand what type of apps are likely to attract more users._

In [1]:
# code for Apple App Store
import pandas as pd
pd.set_option('display.max_rows', None)

apple_df = pd.read_csv('data/apple-appstore.csv')
apple_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7197 entries, 0 to 7196
Data columns (total 16 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   id                7197 non-null   int64  
 1   track_name        7197 non-null   object 
 2   size_bytes        7197 non-null   int64  
 3   currency          7197 non-null   object 
 4   price             7197 non-null   float64
 5   rating_count_tot  7197 non-null   int64  
 6   rating_count_ver  7197 non-null   int64  
 7   user_rating       7197 non-null   float64
 8   user_rating_ver   7197 non-null   float64
 9   ver               7197 non-null   object 
 10  cont_rating       7197 non-null   object 
 11  prime_genre       7197 non-null   object 
 12  sup_devices.num   7197 non-null   int64  
 13  ipadSc_urls.num   7197 non-null   int64  
 14  lang.num          7197 non-null   int64  
 15  vpp_lic           7197 non-null   int64  
dtypes: float64(3), int64(8), object(5)
memory 

### Data Cleaning

In [2]:
# drop duplicate app names
apple_df = apple_df.drop_duplicates(subset=['track_name'])

# drop all apps that are non-English
apple_df = apple_df[apple_df['track_name'].str.contains(r'^[a-zA-Z\s]+$', na=False)]

# show the top 20 free apps by rating count
apple_df[apple_df['price'] == 0.00].sort_values(by='rating_count_tot', ascending=False).head(20)

Unnamed: 0,id,track_name,size_bytes,currency,price,rating_count_tot,rating_count_ver,user_rating,user_rating_ver,ver,cont_rating,prime_genre,sup_devices.num,ipadSc_urls.num,lang.num,vpp_lic
0,284882215,Facebook,389879808,USD,0.0,2974676,212,3.5,3.5,95.0,4+,Social Networking,37,1,29,1
1,389801252,Instagram,113954816,USD,0.0,2161558,1289,4.5,4.0,10.23,12+,Photo & Video,37,0,29,1
2,529479190,Clash of Clans,116476928,USD,0.0,2130805,579,4.5,4.5,9.24.12,9+,Games,38,5,18,1
3,420009108,Temple Run,65921024,USD,0.0,1724546,3842,4.5,4.0,1.6.2,9+,Games,40,5,1,1
5,429047995,Pinterest,74778624,USD,0.0,1061624,1814,4.5,4.0,6.26,12+,Social Networking,37,5,27,1
6,282935706,Bible,92774400,USD,0.0,985920,5320,4.5,5.0,7.5.1,4+,Reference,37,5,45,1
7,553834731,Candy Crush Saga,222846976,USD,0.0,961794,2453,4.5,4.5,1.101.0,4+,Games,43,5,24,1
8,324684580,Spotify Music,132510720,USD,0.0,878563,8253,4.5,4.5,8.4.3,12+,Music,37,5,18,1
9,343200656,Angry Birds,175966208,USD,0.0,824451,107,4.5,3.0,7.4.0,4+,Games,38,0,10,1
10,512939461,Subway Surfers,156038144,USD,0.0,706110,97,4.5,4.0,1.72.1,9+,Games,38,5,1,1


### Data Analysis

In [3]:
# from the top 1000 free apps by rating count, show the top 20 genres
apple_df[apple_df['price'] == 0.00].sort_values(by='rating_count_tot', ascending=False).head(1000)['prime_genre'].value_counts()[:21]

prime_genre
Games                629
Entertainment         78
Social Networking     38
Education             35
Photo & Video         31
Music                 22
Utilities             19
Sports                19
Shopping              19
Travel                16
Productivity          16
News                  13
Health & Fitness      13
Lifestyle             12
Finance               10
Food & Drink           8
Business               7
Weather                5
Reference              4
Book                   3
Medical                2
Name: count, dtype: int64