# Anylisis about Android and iOS mobile apps

Our aim in this project is to find mobile app profiles that are profitable for the App Store and Google Play markets. We're working as data analysts for a company that builds Android and iOS mobile apps, and our job is to enable our team of developers to make data-driven decisions with respect to the kind of apps they build.

At our company, we only build apps that are free to download and install, and our main source of revenue consists of in-app ads. This means that our revenue for any given app is mostly influenced by the number of users that use our app. **Our goal for this project is to analyze data to help our developers understand what kinds of apps are likely to attract more users.**

In [1]:
# 导入两个文件 并将他们转换成 lists of list
from csv import reader

applestore_data=list(reader(open("AppleStore.csv",encoding='utf8')))
ios_header=applestore_data[0]
ios_data=applestore_data[1:]

googleplaystore_data=list(reader(open("googleplaystore.csv",encoding='utf8')))
android_header=googleplaystore_data[0]
android_data=googleplaystore_data[1:]

print(len(applestore_data))
print(len(googleplaystore_data))

7198
10842


In [2]:
#提升数据的可读性 通过在List与list之间添加空格
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

#ios
print(ios_header)
print('\n')
explore_data(ios_data,0,3)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']




In [3]:
#android
print(android_header)
print('\n')
explore_data(android_data,0,3)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']




**Our goal is to find what's the popular category in the free games,so l guess column "user_rating" "prime_genre" is important**

|column_name | meaning|
|  :-:  | :-:  |
|user rating | Average User Rating value (for all version) |
|prime_genre |Primary Genre|

## Deleting Wrong Data
**Discussion**

this entry has missing 'Rating' and a column shift happened for next columns..
10472 Life Made WI-Fi Touchscreen Photo Frame 1.9 19.0 3.0M 1,000+ Free 0 Everyone NaN February 11, 2018 1.0.19 4.0 and up NaN

In [4]:
# wrong index is 10472
print(android_header)
print('\n')
print(android_data[10472])

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


In [5]:
#romove wrong data
del android_data[10472]

In [6]:
print(android_data[10472])

['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', 'August 7, 2018', '6.06.14', '4.4 and up']


## Removing Duplicate Entries: Part One

If we explore the Google Play data set long enough, we'll find that some apps have more than one entry. For instance, the application Instagram has four entries:

In [7]:
for row in android_data:
    app_name=row[0]
    if app_name == 'Instagram':
        print(row)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


#### In the example above , we have 4 same app information , so we need to remove the duplicate entries

In [8]:
duplicate_apps=[]
unique_apps=[]

for row in android_data:
    app_name=row[0]
    if app_name in unique_apps:
        duplicate_apps.append(app_name)
    else:
        unique_apps.append(app_name)
print(len(duplicate_apps))
duplicate_apps[0:5]

1181


['Quick PDF Scanner + OCR FREE',
 'Box',
 'Google My Business',
 'ZOOM Cloud Meetings',
 'join.me - Simple Meetings']

In the example above we printed for the Instagram app, the main difference happens on the fourth position of each row, which corresponds to the number of reviews. The different numbers show the data was collected at different times.Rather than removing duplicates randomly, we'll only keep the row with the highest number of reviews and remove the other entries for any given app.Because the higher the number of reviews, the more recent the data should be.

## Removing Duplicate Entries: Part Two

In [9]:
 # create a dictionary where each key is a unique app name and the corresponding dictionary value is the highest number of reviews of that app.
reviews_max={}
for row in android_data:
    app_name=row[0]
    n_reviews=float(row[3])
    if app_name in reviews_max and (reviews_max[app_name]<n_reviews):
        reviews_max[app_name]=n_reviews
    elif app_name not in reviews_max:
        reviews_max[app_name]=n_reviews
len(reviews_max)

9659

we dont use else clause here because else clause here can incluse situations that " if app_name in reviews_max and (reviews_max[app_name]>n_reviews)"

it means the value of the key will update whatever the n_reviews is correct

In [10]:
# Use the dictionary you created above to remove the duplicate rows:
android_clean=[]
already_added=[]
for row in android_data:
    app_name=row[0]
    n_reviews=float(row[3])
    if n_reviews==reviews_max[app_name] and app_name not in already_added:
        android_clean.append(row)
        already_added.append(app_name)
len(android_clean)
        


9659

The name of the app is not already in the already_added list. We need to add this supplementary condition to account for those cases where the highest number of reviews of a duplicate app is the same for more than one entry (for example, the Box app has three entries, and the number of reviews is the same). If we just check for reviews_max[name] == n_reviews, we'll still end up with duplicate entries for some apps.
就是说如果有两行一样的，就会加两次

## Removing Non-English Apps: Part One

We'd like to analyze only the apps that are directed toward an English-speaking audience. However, if we explore the data long enough, we'll find that both data sets have apps with names that suggest they are not directed toward an English-speaking audience.

In [11]:
def is_english(name):
    for character in name:
        if ord(character)>127:
            return False
    return True

In [12]:
# Test
print(is_english('Instagram'))
print(is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))

True
False
False
False


## Removing Non-English Apps: Part Two

In [13]:
def englishdetect(name):
    non_english=0
    for character in name:
        if ord(character)>127:
            non_english+=1
    if non_english<=3:
        return True
    else:
        return False

In [14]:
print(englishdetect('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(englishdetect('Docs To Go™ Free Office Suite'))
print(englishdetect('Instachat 😜'))

False
True
True


In [15]:
android_english=[]
for row in android_clean:
    name=row[0]    
    if englishdetect(name):
        android_english.append(row)
len(android_english)

ios_english=[]
for row in ios_data:
    name=row[1]    
    if englishdetect(name):
        ios_english.append(row)
        
explore_data(android_english,0,3,True)
print('\n')
explore_data(ios_english,0,3,True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9614
Number of columns: 13


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 

## Isolating the Free Apps

we only build apps that are free to download and install, and our main source of revenue consists of in-app ads. Our data sets contain both free and non-free apps; we'll need to isolate only the free apps for our analysis.

In [16]:
android_free=[]
ios_free=[]
for row in android_english:
    price=row[7]
    if price=="0":
        android_free.append(row)
for row in ios_english:
    price=row[4]
    if price=='0.0':
        ios_free.append(row)
        
explore_data(android_free,0,3,True)
print('\n')
explore_data(ios_free,0,3,True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 8864
Number of columns: 13


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 

## Most Common Apps by Genre: Part One

As we mentioned in the introduction, our aim is to determine the kinds of apps that are likely to attract more users because our revenue is highly influenced by the number of people using our apps.

**Because our end goal is to add the app on both Google Play and the App Store, we need to find app profiles that are successful on both markets. For instance, a profile that works well for both markets might be a productivity app that makes use of gamification.**

Let's begin the analysis by getting a sense of the most common genres for each market. For this, we'll build a frequency table for the prime_genre column of the App Store data set, and the Genres and Category columns of the Google Play data set.

In [17]:
def freq_table(dataset,index):
    frequency_table={}
    percentage_table={}
    for row in dataset:
        genre=row[index]
        if genre in frequency_table:
            frequency_table[genre]+=1
        else:
            frequency_table[genre]=1
    for key in frequency_table:
        percent=frequency_table[key]/len(dataset)*100
        percentage_table[key]=percent
    return percentage_table

In [18]:
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key) #tuple
        table_display.append(key_val_as_tuple) # list of tuples
    table_sorted = sorted(table_display, reverse = True) # [（1874，”games“）,....]
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

In [19]:
#prime_genre in applestore
display_table(ios_free,-5)

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


### In Apple Store
- **Games** is the most common genre , and the **entertainment** is the runner-up
- Most of the apps designed for entertainment rather than practical purpose
- i may be recommend games 
- yes

In [20]:
# Category in google
display_table(android_free,1)

FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299638989
ART_AND_DESIGN : 

In [21]:
#Genres in google
display_table(android_free,-4)

Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2378158844765346
Action : 3.1024368231046933
Health & Fitness : 3.0798736462093865
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.8501805054151623
Video Players & Editors : 1.7712093862815883
Casual : 1.7599277978339352
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718411552346
Auto & Vehicles : 0.9250902527075

### In Google Play Store

- **Tools** is the common genre
- in google play , its like most apps for practical use , but the entertainment still take a position 
- entertainment

App Store is dominated by apps designed for fun, while Google Play shows a more balanced landscape of both practical and fun apps. 

## Most Popular Apps by Genre on the App Store



In [22]:
genre_ios=freq_table(ios_free,-5)

In [23]:
for genre in genre_ios:
    total=0
    len_genre=0
    for row in ios_free:
        genre_app=row[-5]
        if genre_app==genre:
            user_rating=row[5]
            total+=float(user_rating)
            len_genre+=1
    average_rating=total/len_genre
    print(genre, ':', average_rating)

Social Networking : 71548.34905660378
Photo & Video : 28441.54375
Games : 22788.6696905016
Music : 57326.530303030304
Reference : 74942.11111111111
Health & Fitness : 23298.015384615384
Weather : 52279.892857142855
Utilities : 18684.456790123455
Travel : 28243.8
Shopping : 26919.690476190477
News : 21248.023255813954
Navigation : 86090.33333333333
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Food & Drink : 33333.92307692308
Sports : 23008.898550724636
Book : 39758.5
Finance : 31467.944444444445
Education : 7003.983050847458
Productivity : 21028.410714285714
Business : 7491.117647058823
Catalogs : 4004.0
Medical : 612.0


Social Networking have the first 3 ratings even the genre percentage is only 3

## Most Popular Apps by Genre on Google Play

In [24]:
display_table(android_free,5) #install columns

1,000,000+ : 15.726534296028879
100,000+ : 11.552346570397113
10,000,000+ : 10.548285198555957
10,000+ : 10.198555956678701
1,000+ : 8.393501805054152
100+ : 6.915613718411552
5,000,000+ : 6.825361010830325
500,000+ : 5.561823104693141
50,000+ : 4.7721119133574
5,000+ : 4.512635379061372
10+ : 3.5424187725631766
500+ : 3.2490974729241873
50,000,000+ : 2.3014440433213
100,000,000+ : 2.1322202166064983
50+ : 1.917870036101083
5+ : 0.78971119133574
1+ : 0.5076714801444043
500,000,000+ : 0.2707581227436823
1,000,000,000+ : 0.22563176895306858
0+ : 0.04512635379061372
0 : 0.01128158844765343


In [25]:
category=freq_table(android_free,1)

In [42]:
genre_list=[]
for genre in category:
    total=0
    len_category=0
    for row in android_free:
        name=row[1]
        install=row[5]
        if name==genre:
            install=install.replace('+','')
            install=install.replace(',','')
            total+=float(install)
            len_category+=1
    average_install=total/len_category
    genre_list.append((average_install,genre))
    #print(genre,":",average_install)
genre_list=sorted(genre_list,reverse = True)
for row in genre_list:
    print(row[1],row[0])

COMMUNICATION 38456119.167247385
VIDEO_PLAYERS 24727872.452830188
SOCIAL 23253652.127118643
PHOTOGRAPHY 17840110.40229885
PRODUCTIVITY 16787331.344927534
GAME 15588015.603248259
TRAVEL_AND_LOCAL 13984077.710144928
ENTERTAINMENT 11640705.88235294
TOOLS 10801391.298666667
NEWS_AND_MAGAZINES 9549178.467741935
BOOKS_AND_REFERENCE 8767811.894736841
SHOPPING 7036877.311557789
PERSONALIZATION 5201482.6122448975
WEATHER 5074486.197183099
HEALTH_AND_FITNESS 4188821.9853479853
MAPS_AND_NAVIGATION 4056941.7741935486
FAMILY 3695641.8198090694
SPORTS 3638640.1428571427
ART_AND_DESIGN 1986335.0877192982
FOOD_AND_DRINK 1924897.7363636363
EDUCATION 1833495.145631068
BUSINESS 1712290.1474201474
LIFESTYLE 1437816.2687861272
FINANCE 1387692.475609756
HOUSE_AND_HOME 1331540.5616438356
DATING 854028.8303030303
COMICS 817657.2727272727
AUTO_AND_VEHICLES 647317.8170731707
LIBRARIES_AND_DEMO 638503.734939759
PARENTING 542603.6206896552
BEAUTY 513151.88679245283
EVENTS 253542.22222222222
MEDICAL 120550.6198083

In [27]:
for app in android_free:
    if app[1]=="COMMUNICATION" and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

WhatsApp Messenger : 1,000,000,000+
imo beta free calls and text : 100,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
Who : 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Firefox Browser fast & private : 100,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Messenger Lite: Free Calls & Messages : 100,000,000+
Kik : 100,000,000+
KakaoTalk: Free Calls & Text : 100,000,000+
Opera Mini - fast web browser : 100,000,000+
Opera Browser: Fast and Secure : 100,000,000+
Telegram : 100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+
UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+
Viber Mess

In [30]:
under_100_m=[]
for row in android_free:
    install=row[5]
    install=install.replace('+','')
    install=install.replace(',','')
    n_install=float(install)
    if (row[1]=="COMMUNICATION" ) and  ( n_install<100000000 ):
        under_100_m.append(n_install)

sum(under_100_m)/len(under_100_m)
    

3603485.3884615386

We see the same pattern for the video players category, which is the runner-up with 24,727,872 installs. The market is dominated by apps like Youtube, Google Play Movies & TV, or MX Player. The pattern is repeated for social apps (where we have giants like Facebook, Instagram, Google+, etc.), photography apps (Google Photos and other popular photo editors), or productivity apps (Microsoft Word, Dropbox, Google Calendar, Evernote, etc.).

## Conclusions

In this project, we analyzed data about the App Store and Google Play mobile apps with the goal of recommending an app profile that can be profitable for both markets.

We concluded that taking a popular book (perhaps a more recent book) and turning it into an app could be profitable for both the Google Play and the App Store markets. The markets are already full of libraries, so we need to add some special features besides the raw version of the book. This might include daily quotes from the book, an audio version of the book, quizzes on the book, a forum where people can discuss the book, etc.