# Android and IOS mobile apps analysis

In this project, we aim to get some insight into the mobile app market between two fierce competitors and see how apps in both stores compare

The goal of the project is to see which app store market is more profitable to have an app created for

In [1]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [2]:
opened_file = open('AppleStore.csv')
from csv import reader
read_file = reader(opened_file)
appleStore_apps_data = list(read_file)
opened_file.close()

In [3]:
opened_file = open('googleplaystore.csv')
from csv import reader
read_file = reader(opened_file)
googleStore_apps_data = list(read_file)
opened_file.close()

### Apple Store apps sample data

In [4]:
explore_data(appleStore_apps_data,1,5,True)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


Number of rows: 7198
Number of columns: 16


### Google Play Store apps sample data

In [5]:
explore_data(googleStore_apps_data,1,5,True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 10842
Number of columns: 13


### Colum names for Apple Store data

The below link can be used for a clear definition of column names :

[Source](https://www.kaggle.com/datasets/ramamet4/app-store-apple-data-set-10k-apps)

In [6]:
explore_data(appleStore_apps_data,0,1,True)


['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


Number of rows: 7198
Number of columns: 16


### Colum names for Google Play Store data

The below link can be used for a clear definition of column names :

[Source](https://www.kaggle.com/datasets/lava18/google-play-store-apps)

In [7]:
explore_data(googleStore_apps_data,0,1,True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


Number of rows: 10842
Number of columns: 13


### Finding error on data and deleting the row

In [8]:
googleStore_header = googleStore_apps_data[0]
for row in googleStore_apps_data[1:]:
    if len(row) !=len(googleStore_header):
        print(row)
        print(googleStore_apps_data.index(row))

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']
10473


In [9]:
del googleStore_apps_data[10473]

### Google Play store Data has duplicates

Below code perform a simple unique count of app names to flash out duplicates

In [10]:
duplicate_names =[]
unique_names = []

for app in googleStore_apps_data[1:]:
    name = app[0]
    if name in unique_names:
        duplicate_names.append(name)
    else:
        unique_names.append(name)
        
print('Number of duplicate apps : ',len(duplicate_names))
print('\n')
print('Sample of duplicate apps : ',duplicate_names[:10])

Number of duplicate apps :  1181


Sample of duplicate apps :  ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack']


The above duplicates can not be simply removed by deleting rows and leaving one left
, we will use the reviews column to remove the duplicates by taking the one with the highest
reviews and removing the rest, this will give us accurate information as the review count was the only difference among the duplicates, indicating the latest count of reviews on the same application.

### Removing duplicates
The below code prepares the steps needed to remove duplicates, 
we first get the highest reviews from the duplicate row

In [11]:
reviews_max = {}
for row in googleStore_apps_data[1:]:
    name = row[0]
    n_reviews = float(row[3])
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    elif name not in reviews_max:
        reviews_max[name] = n_reviews
        
print('Row length : ',len(reviews_max))
        


Row length :  9659


After getting the highest reviews rating, we use it to select a row in the
dataset, which will be put in the new clean data list.
We easily loop through the whole dataset, using the number of reviews as a unique indicator, we check if the current reviews number is the same as the one in the list above, if that's the case we add the row in the new clean list.


In [12]:
android_clean = []
already_added = []
for row in googleStore_apps_data[1:]:
        name = row[0]
        n_reviews = float(row[3])
        if (n_reviews == reviews_max[name]) and (name not in already_added):
            android_clean.append(row)
            already_added.append(name)
            

print('Row length of cleaned data : ',len(android_clean))

    


Row length of cleaned data :  9659


### Filter out non-English apps
The below function detects if a string has any non-english characters, we will use it further down to clean out non-english names

In [13]:
def isEnglishString(string):
    for char in string:
        if ord(char) > 127:
            return False
        return True
    
isEnglishString('Instagram')

True

In [14]:
isEnglishString('爱奇艺PPS -《欢乐颂2》电视剧热播')


False

In [15]:
isEnglishString('Docs To Go™ Free Office Suite')


True

In [16]:
isEnglishString('Instachat 😜')

True

Using the function created to detect english strings, we focus on the name column to clean out non-english apps

In [17]:
android_clean_v2 = []
for row in android_clean:
        name = row[0]
        if isEnglishString(name):
            android_clean_v2.append(row)
            
ios_clean = []
for row in appleStore_apps_data[1:]:
        name = row[1]
        if isEnglishString(name):
            ios_clean.append(row)

print('Row length of cleaned data : ',len(android_clean))
print('Row length of cleaned data : ',len(ios_clean))
          

Row length of cleaned data :  9659
Row length of cleaned data :  6273


In [18]:
explore_data(android_clean_v2,0,5,True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up']


Number of rows: 9623
Number of columns: 13


In [19]:
explore_data(ios_clean,0,5,True)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1']


Number of rows: 6273
Number of columns: 16


### Isolating free app
On the below steps we gooing to create a list ony for free apps

In [20]:
android_free = []
for row in android_clean_v2:
        price = row[7]
        if price == '0':
            android_free.append(row)
            
ios_free = []
for row in ios_clean:
        price = row[4]
        if price == '0.0':
            ios_free.append(row)
            
print('Row length of cleaned data : ',len(android_free))
print('Row length of cleaned data : ',len(ios_free))

Row length of cleaned data :  8873
Row length of cleaned data :  3300


In [21]:
def exploreDataState(data1,data2):
    explore_data(data1,0,5,True)
    print(' ')
    explore_data(data2,0,5,True)

In [22]:
exploreDataState(android_free,ios_free)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up']


Number of rows: 8873
Number of columns: 13
 
['284882215', 'Facebook', '389879808', 'USD', 

### App analysis

After all the cleaning, now we are ready to analyze our data to find a winning 
strategy for an app idea, that will do well in both the store, we will validate
the app idea in several parts:
* Build a minimum app and add it to Google Play
* Monitor response if does well, develop it further
* If the app gets profits, build an IOS version for the App Store
    
Our end goal is to add winning apps for both stores, so to help us we need to 
investigate winning apps on our dataset and look for common features they have.
We will try by observing columns that feature in both stores associated with reviews
and ratings of those apps, this will give us a feel of what categories we should focus on if we want to create winning apps

    

In [23]:
def freq_table(dataset,index):
    counting = {}
    total_number_of_data = len(dataset)
    
    for row in dataset:
        item = row[index]
        if item in counting:
            counting[item] += 1
        else:
            counting[item] = 1
            
    for iteration_variable in counting:
        counting[iteration_variable] /= total_number_of_data
        counting[iteration_variable] *= 100
    
    return counting

In [24]:
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

### IOS Prime Genre

In [36]:
display_table(ios_free,11)

Games : 57.484848484848484
Entertainment : 7.96969696969697
Photo & Video : 4.878787878787879
Education : 3.5757575757575757
Social Networking : 3.3939393939393945
Utilities : 2.696969696969697
Shopping : 2.5757575757575757
Sports : 2.1212121212121215
Health & Fitness : 2.0606060606060606
Music : 2.0303030303030303
Productivity : 1.7575757575757573
Lifestyle : 1.6969696969696972
News : 1.3636363636363635
Travel : 1.1818181818181819
Finance : 1.1818181818181819
Weather : 0.8787878787878787
Food & Drink : 0.8787878787878787
Book : 0.5757575757575757
Reference : 0.5151515151515151
Business : 0.5151515151515151
Navigation : 0.27272727272727276
Medical : 0.21212121212121215
Catalogs : 0.18181818181818182


##### Analysis

From the above IOS prime genre, we can see the following :
* The most common genre is Games, followed by Entertainment
* Other genres are very small proportional to overall genres.
* The data gives a general impression that entertainment apps are the most compared to apps built for practical purposes

Of the above Games genre is the most dominant and seems to be an easy pick  for our ideal app for the IOS App store

### Android Category

In [27]:
display_table(android_free,1)

FAMILY : 18.95638453736053
GAME : 9.703595176377776
TOOLS : 8.441338893271723
BUSINESS : 4.59821931702919
LIFESTYLE : 3.933280739321537
PRODUCTIVITY : 3.899470303166911
FINANCE : 3.685337540854277
MEDICAL : 3.5275555054660206
SPORTS : 3.3923137608475153
PERSONALIZATION : 3.3246928885382623
COMMUNICATION : 3.234531725459258
HEALTH_AND_FITNESS : 3.076749690071002
PHOTOGRAPHY : 2.9527780908373717
NEWS_AND_MAGAZINES : 2.817536346218866
SOCIAL : 2.6484841654457343
TRAVEL_AND_LOCAL : 2.332920094669221
SHOPPING : 2.231488786205342
BOOKS_AND_REFERENCE : 2.1751380592809646
DATING : 1.8595739885044518
VIDEO_PLAYERS : 1.791953116195199
MAPS_AND_NAVIGATION : 1.3974980277245577
FOOD_AND_DRINK : 1.2284458469514257
EDUCATION : 1.1720951200270482
ENTERTAINMENT : 0.946692212329539
LIBRARIES_AND_DEMO : 0.9354220669446636
AUTO_AND_VEHICLES : 0.9241519215597882
HOUSE_AND_HOME : 0.8114504677110335
WEATHER : 0.800180322326158
EVENTS : 0.7100191592471543
PARENTING : 0.6536684323227769
ART_AND_DESIGN : 0.6423

### Android Genres

In [34]:
display_table(android_free,9)

Tools : 8.430068747886848
Entertainment : 6.063338217063
Education : 5.375859348585597
Business : 4.59821931702919
Lifestyle : 3.922010593936662
Productivity : 3.899470303166911
Finance : 3.685337540854277
Medical : 3.5275555054660206
Sports : 3.459934633156768
Personalization : 3.3246928885382623
Communication : 3.234531725459258
Action : 3.0880198354558774
Health & Fitness : 3.076749690071002
Photography : 2.9527780908373717
News & Magazines : 2.817536346218866
Social : 2.6484841654457343
Travel & Local : 2.3216499492843456
Shopping : 2.231488786205342
Books & Reference : 2.1751380592809646
Simulation : 2.06243660543221
Dating : 1.8595739885044518
Arcade : 1.8370336977347006
Video Players & Editors : 1.7694128254254478
Casual : 1.7581426800405726
Maps & Navigation : 1.3974980277245577
Food & Drink : 1.2284458469514257
Puzzle : 1.1270145384875465
Racing : 0.9917727938690409
Role Playing : 0.9354220669446636
Libraries & Demo : 0.9354220669446636
Auto & Vehicles : 0.9241519215597882
Str

##### Analysis

From the above Android genre and category, we can see the following :
* The most common genres are tools followed by Entertainment in categories
Family and games are the most dominating categories
* Most other categories have no significant difference
* Compared with IOS prime genres, android does not have a big leap in the different genres
, and just like IOS games also have a major stake in overall genres

Mixing Android and IOS, preliminary we can advocate for Entertainment and Games
app profile as it the genres have the most users

#### Looking at App number installed 
One way to look for most people's apps is looking for the number of installed, which givesa feel of how many people are downloading the app.

##### IOS work around
IOS does not have number of installs column, thus as a workaround user ratings for  the apps will be used

In [38]:
ios_freq_genre = freq_table(ios_free,11)

In [67]:
for genere in ios_freq_genre:
    total = 0;
    len_genre = 0;
    for row in ios_free:
        genre_app = row[11]
        if genre_app == genere:
            user_ratings = float(row[8])
            total += user_ratings
            len_genre +=1
    average_user = total/len_genre
    
    print(genere, average_user)


Social Networking 2.8705357142857144
Photo & Video 3.3633540372670807
Games 3.8610964681075384
Music 3.9477611940298507
Reference 4.088235294117647
Health & Fitness 3.514705882352941
Weather 2.913793103448276
Utilities 3.106741573033708
Travel 2.8333333333333335
Shopping 3.3941176470588235
News 2.5444444444444443
Navigation 1.5
Lifestyle 2.7410714285714284
Entertainment 3.216730038022814
Food & Drink 2.913793103448276
Sports 2.642857142857143
Book 2.789473684210526
Finance 2.6794871794871793
Education 3.110169491525424
Productivity 3.8879310344827585
Business 3.0588235294117645
Catalogs 2.6666666666666665
Medical 3.5


From the above, prime genre Reference seems to have the most users/installs, thus we can recommend it

#### Android installs

In [66]:
android_freq_category = freq_table(android_free,1)

for category in android_freq_category:
    total = 0;
    len_category = 0;
    for row in android_free:
        category_app = row[1]
        if category_app == category:
            installs = row[5]
            total += float(installs.replace('+','').replace(',','')) # remove the +
            len_category +=1
            
    if len_genre == 0:
        average_installs = 0
    else:
        average_installs = total/len_category
    
    print(category, average_installs)


ART_AND_DESIGN 1986335.0877192982
AUTO_AND_VEHICLES 647317.8170731707
BEAUTY 513151.88679245283
BOOKS_AND_REFERENCE 8631794.093264248
BUSINESS 1708215.906862745
COMICS 828700.9433962264
COMMUNICATION 38456119.167247385
DATING 854028.8303030303
EDUCATION 1825480.7692307692
ENTERTAINMENT 11767380.952380951
EVENTS 253542.22222222222
FINANCE 1361355.1437308867
FOOD_AND_DRINK 1942465.605504587
HEALTH_AND_FITNESS 4188821.9853479853
HOUSE_AND_HOME 1348645.2916666667
LIBRARIES_AND_DEMO 638503.734939759
LIFESTYLE 1439955.3839541548
GAME 15547984.262485482
FAMILY 3682025.3810939356
MEDICAL 120550.61980830671
SOCIAL 23348348.519148935
SHOPPING 7072366.590909091
PHOTOGRAPHY 17772018.759541985
SPORTS 3638640.1428571427
TRAVEL_AND_LOCAL 13984077.710144928
TOOLS 10815793.690253671
PERSONALIZATION 5183850.806779661
PRODUCTIVITY 16738957.554913295
PARENTING 542603.6206896552
WEATHER 5074486.197183099
VIDEO_PLAYERS 24727872.452830188
NEWS_AND_MAGAZINES 9472829.04
MAPS_AND_NAVIGATION 4009361.209677419


From the above category, PHOTOGRAPHY seems to have the most installs, thus we can recommend it

# Conclusion
Based on users/installs IOS genres showsReference as a genre with the most users and on the other hand android shows Phototograpy as a category with the most installs, thus a photo manipulative app with entertainment immersed into gamification can be the ideal app to break to both markets.