# Determining What Type of App Could Potentially be Profitable For Both Markets.

The goal for this project is to analyze data to help devolopers understand what type of apps are more attractive to users. Our main source of income is based on ad-revenue and how many users engage with the ads on said apps. This was built with the mindset of a data analyst working for a company that builds apps that are availible on the Google Play and App store.

My goal for this project to provide insight on the different kinds of apps and ads that users find appealing and will interact with. I hope give devolopers an idea on how to structure their apps in the future, and what kind of ads to display in the apps to increase profits.

# Opening and Exploring the Data

As of September 2018, there were approximately 2 million iOS apps available on the App Store, and 2.1 million Android apps on Google Play.

Collecting data for over four million apps requires a significant amount of time and money, so we'll try to analyze a sample of data instead. To avoid spending resources with collecting new data ourselves, we should first try to see whether we can find any relevant existing data at no cost. Luckily, these are two data sets that seem suitable for our purpose:

A data set containing data about approximately ten thousand Android apps from Google Play. You can download the data set directly from this link.
A data set containing data about approximately seven thousand iOS apps from the App Store. You can download the data set directly from this link.
Lets start by opening the two data sets and then continue with exploring the data

In [2]:
def explore_data(dataset, start, end,
rows_and_columns=False):
    dataset_slice = dataset[start:end]
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row
        
    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [3]:

from csv import reader

### The Google Play data set ###
opened_file = open('googleplaystore.csv')
read_file = reader(opened_file)
android = list(read_file)
android_header = android[0]
android = android[1:]

### The App Store data set ###
opened_file = open('AppleStore.csv')
read_file = reader(opened_file)
ios = list(read_file)
ios_header = ios[0]
ios = ios[1:]

In [4]:
explore_data(ios, 1,5)
explore_data(android, 1,5)

['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644'

In [5]:
print(ios_header)
print('\n')
explore_data(ios, 0, 3, True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 7197
Number of columns: 16


# Deleting Wrong Data

In [6]:
print(android[10472]) #Incorrect row
print('\n')
print(android_header)
print('\n')
print(android[0])


['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


In [7]:
print(len(android))
del android[10472]
print(len(android))

10841
10840


# Removing Duplicate entries.
# Part 1
Before we dive any deeper into the analysis we need to identify duplicate entries to make sure we are not counting apps twice.

In [8]:
for app in android:
    name = app[0]
    if name =='Instagram':
        print(app)
        print('\n')

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']




In [9]:
#Checking for duplicate rows within google play store dataset.

duplicate_apps = []
unique_apps = []

for app in android:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)
        
print('Number of duplicate apps:', len(duplicate_apps))
print('\n')
print('Examples of duplicate apps:', duplicate_apps[:15])

Number of duplicate apps: 1181


Examples of duplicate apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']



We don't want to count certain apps more than once when we analyze data, so we need to remove the duplicate entries and keep only one entry per app. One thing we could do is remove the duplicate rows randomly, but we could probably find a better way.

If you examine the rows we printed two cells above for the Instagram app, the main difference happens on the fourth position of each row, which corresponds to the number of reviews. The different numbers show that the data was collected at different times. We can use this to build a criterion for keeping rows. We won't remove rows randomly, but rather we'll keep the rows that have the highest number of reviews because the higher the number of reviews, the more reliable the ratings.

To do that, we will:

Create a dictionary where each key is a unique app name, and the value is the highest number of reviews of that app
Use the dictionary to create a new data set, which will have only one entry per app (and we only select the apps with the highest number of reviews)

# Part 2
We will first begin by building a dictionary.


In [10]:
name_and_reviews = {'Instagram': 66577313, 'Facebook': 78158306} 
print('LinkedIn' not in name_and_reviews)
print('Instagram' not in name_and_reviews)

True
False


In [11]:
reviews_max = {}

for app in android:
    name = app[0]
    n_reviews = float(app[3])
    
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
        
    elif name not in reviews_max:
        reviews_max[name] = n_reviews

In [12]:
print('Expected Length:', len(android) - 1181)
print('Actual Length:', len(reviews_max))
      

Expected Length: 9659
Actual Length: 9659


In [13]:
android_clean = []
already_added = []

for app in android:
    name = app[0]
    n_reviews = float(app[3])
    
    if (reviews_max[name] == n_reviews) and (name not in already_added):
        android_clean.append(app)
        already_added.append(name)


Now let's quickly explore the new data set, and confirm that the number of rows is 9,659.

In [14]:
explore_data(android_clean, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9659
Number of columns: 13


We have 9659 rows just as expected

# Isolating English Apps

In [15]:
#Function to check whether app name is English or other 
#language.
def is_eng(str):
    
    for letter in str:
        if ord(letter) > 127:
            return False
        else:
            return True

#Testing Function
is_eng('爱奇艺PPS -《欢乐颂2》电视剧热播')

False

Below we use the is_eng() function to filter out non english apps.

In [16]:

android_english = []
ios_english = []

for app in android_clean:
    name = app[0]
    if is_eng(name):
        android_english.append(app)
        
for app in ios:
    name = app[1]
    if is_eng(name):
        ios_english.append(app)
        
explore_data(android_english, 0, 3, True)
print('\n')
explore_data(ios_english, 0, 3, True)
    

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9623
Number of columns: 13


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 

# Isolating Free Apps

As mentioned earlier in this analysis, our main source of revenue is from in-app ads. These datasets contain both free and non-free apps; we'll need to isolate only the free apps.

In [17]:
android_final = []
ios_final = []

for app in android_english:
    price = app[7]
    if price == '0':
        android_final.append(app)
        
for app in ios_english:
    price = app[4]
    if price == '0.0':
        ios_final.append(app)


In [18]:
print(len(android_final))
print(len(ios_final))

8873
3300


We are left with 8873 android apps, and 3300 iOS apps. Which will be enough to complete our analysis.

# Determining Popular Genres for Both Markets

So far we have spent a fair amount of time cleaning our data and removing inaccurate data, duplicate data, and non-english apps. In the last code blocl we isolated the free-apps. Our aim is to determind the kinds of apps that are likely to attract more users because our revenue is influenced by the number of people using these apps.

Our end goal is to build an app that is successful on both the Google Play and App store. To do this we need to find app profiles that are successful on both markets. To begin we will build frequency tables to get a sense of what the most common genres for each market are.

The validation strategy for this assessment is based off 3 steps:
1. Build a minimal Android version of the app, and add it to Google Play.
2. If the app has a good response from users, we develop it further.
3. If the app is profitable after six months, we build an iOS version of the app and add it to the App Store

In [19]:

def freq_table(dataset, index):
    table = {}
    total = 0
    
    for row in dataset:
        total += 1
        value = row[index]
        if value in table:
            table[value] += 1
        else:
            table[value] = 1
    
    table_percentages = {}
    for key in table:
        percentage = (table[key] / total) * 100
        table_percentages[key] = percentage 
    
    return table_percentages


def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)
        
    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

# Part 3

We start by examining the frequency table for the prime_genre column of the App Store data set.

In [20]:
display_table(ios_final, -5)

Games : 57.484848484848484
Entertainment : 7.96969696969697
Photo & Video : 4.878787878787879
Education : 3.5757575757575757
Social Networking : 3.3939393939393945
Utilities : 2.696969696969697
Shopping : 2.5757575757575757
Sports : 2.1212121212121215
Health & Fitness : 2.0606060606060606
Music : 2.0303030303030303
Productivity : 1.7575757575757573
Lifestyle : 1.6969696969696972
News : 1.3636363636363635
Travel : 1.1818181818181819
Finance : 1.1818181818181819
Weather : 0.8787878787878787
Food & Drink : 0.8787878787878787
Book : 0.5757575757575757
Reference : 0.5151515151515151
Business : 0.5151515151515151
Navigation : 0.27272727272727276
Medical : 0.21212121212121215
Catalogs : 0.18181818181818182


We can see that among the free English apps, more than a half (58.16%) are games. Entertainment apps are close to 8%, followed by photo and video apps, which are close to 5%. Only 3.66% of the apps are designed for education, followed by social networking apps which amount for 3.29% of the apps in our data set.

The general impression is that App Store (at least the part containing free English apps) is dominated by apps that are designed for fun (games, entertainment, photo and video, social networking, sports, music, etc.), while apps with practical purposes (education, shopping, utilities, productivity, lifestyle, etc.) are more rare. However, the fact that fun apps are the most numerous doesn't also imply that they also have the greatest number of users — the demand might not be the same as the offer.

Let's continue by examining the Genres and Category columns of the Google Play data set (two columns which seem to be related).

In [21]:
display_table(android_final, 1) # Category

FAMILY : 18.95638453736053
GAME : 9.703595176377776
TOOLS : 8.441338893271723
BUSINESS : 4.59821931702919
LIFESTYLE : 3.933280739321537
PRODUCTIVITY : 3.899470303166911
FINANCE : 3.685337540854277
MEDICAL : 3.5275555054660206
SPORTS : 3.3923137608475153
PERSONALIZATION : 3.3246928885382623
COMMUNICATION : 3.234531725459258
HEALTH_AND_FITNESS : 3.076749690071002
PHOTOGRAPHY : 2.9527780908373717
NEWS_AND_MAGAZINES : 2.817536346218866
SOCIAL : 2.6484841654457343
TRAVEL_AND_LOCAL : 2.332920094669221
SHOPPING : 2.231488786205342
BOOKS_AND_REFERENCE : 2.1751380592809646
DATING : 1.8595739885044518
VIDEO_PLAYERS : 1.791953116195199
MAPS_AND_NAVIGATION : 1.3974980277245577
FOOD_AND_DRINK : 1.2284458469514257
EDUCATION : 1.1720951200270482
ENTERTAINMENT : 0.946692212329539
LIBRARIES_AND_DEMO : 0.9354220669446636
AUTO_AND_VEHICLES : 0.9241519215597882
HOUSE_AND_HOME : 0.8114504677110335
WEATHER : 0.800180322326158
EVENTS : 0.7100191592471543
PARENTING : 0.6536684323227769
ART_AND_DESIGN : 0.6423

Even so, practical apps seem to have a better representation on Google Play compared to App Store. This picture is also confirmed by the frequency table we see for the Genres column:

In [22]:
display_table(android_final, -4)

Tools : 8.430068747886848
Entertainment : 6.063338217063
Education : 5.375859348585597
Business : 4.59821931702919
Lifestyle : 3.922010593936662
Productivity : 3.899470303166911
Finance : 3.685337540854277
Medical : 3.5275555054660206
Sports : 3.459934633156768
Personalization : 3.3246928885382623
Communication : 3.234531725459258
Action : 3.0880198354558774
Health & Fitness : 3.076749690071002
Photography : 2.9527780908373717
News & Magazines : 2.817536346218866
Social : 2.6484841654457343
Travel & Local : 2.3216499492843456
Shopping : 2.231488786205342
Books & Reference : 2.1751380592809646
Simulation : 2.06243660543221
Dating : 1.8595739885044518
Arcade : 1.8370336977347006
Video Players & Editors : 1.7694128254254478
Casual : 1.7581426800405726
Maps & Navigation : 1.3974980277245577
Food & Drink : 1.2284458469514257
Puzzle : 1.1270145384875465
Racing : 0.9917727938690409
Role Playing : 0.9354220669446636
Libraries & Demo : 0.9354220669446636
Auto & Vehicles : 0.9241519215597882
Str

# Most Popular Apps by Genre on the App Store

For the Google Play market, we actually have data about the number of installs, so we should be able to get a clearer picture about genre popularity. However, the install numbers don't seem precise enough — we can see that most values are open-ended (100+, 1,000+, 5,000+, etc.):

In [23]:
genres_ios = freq_table(ios_final, -5)

for genre in genres_ios:
    total = 0
    len_genre = 0
    for app in ios_final:
        genre_app = app[-5]
        if genre_app == genre:            
            n_ratings = float(app[5])
            total += n_ratings
            len_genre += 1
    avg_n_ratings = total / len_genre
    print(genre, ':', avg_n_ratings)

Health & Fitness : 22278.352941176472
Weather : 50477.137931034486
Education : 7003.983050847458
Entertainment : 13549.794676806083
Catalogs : 2669.3333333333335
News : 20303.666666666668
Social Networking : 67731.21428571429
Finance : 29048.615384615383
Book : 29310.736842105263
Utilities : 17058.719101123595
Photo & Video : 28264.888198757762
Lifestyle : 15023.089285714286
Navigation : 57393.555555555555
Games : 22199.308381655246
Sports : 22680.2
Medical : 525.4285714285714
Food & Drink : 29886.931034482757
Travel : 28959.5641025641
Music : 56482.02985074627
Reference : 79350.4705882353
Productivity : 20303.310344827587
Business : 7491.117647058823
Shopping : 26586.788235294116


One problem with this data is that is not precise. For instance, we don't know whether an app with 100,000+ installs has 100,000 installs, 200,000, or 350,000. However, we don't need very precise data for our purposes — we only want to get an idea which app genres attract the most users, and we don't need perfect precision with respect to the number of users.

We're going to leave the numbers as they are, which means that we'll consider that an app with 100,000+ installs has 100,000 installs, and an app with 1,000,000+ installs has 1,000,000 installs, and so on.

To perform computations, however, we'll need to convert each install number to float — this means that we need to remove the commas and the plus characters, otherwise the conversion will fail and raise an error. We'll do this directly in the loop below, where we also compute the average number of installs for each genre (category).

In [24]:
categories_android = freq_table(android_final, 1)

for category in categories_android:
    total = 0
    len_category = 0
    for app in android_final:
        category_app = app[1]
        if category_app == category:            
            n_installs = app[5]
            n_installs = n_installs.replace(',', '')
            n_installs = n_installs.replace('+', '')
            total += float(n_installs)
            len_category += 1
    avg_n_installs = total / len_category
    print(category, ':', avg_n_installs)

TOOLS : 10815793.690253671
BUSINESS : 1708215.906862745
ENTERTAINMENT : 11767380.952380951
MEDICAL : 120550.61980830671
LIFESTYLE : 1439955.3839541548
SOCIAL : 23348348.519148935
WEATHER : 5074486.197183099
FAMILY : 3682025.3810939356
DATING : 854028.8303030303
TRAVEL_AND_LOCAL : 13984077.710144928
BOOKS_AND_REFERENCE : 8631794.093264248
GAME : 15547984.262485482
COMMUNICATION : 38456119.167247385
PHOTOGRAPHY : 17772018.759541985
HEALTH_AND_FITNESS : 4188821.9853479853
EVENTS : 253542.22222222222
ART_AND_DESIGN : 1986335.0877192982
NEWS_AND_MAGAZINES : 9472829.04
AUTO_AND_VEHICLES : 647317.8170731707
HOUSE_AND_HOME : 1348645.2916666667
PRODUCTIVITY : 16738957.554913295
SHOPPING : 7072366.590909091
PARENTING : 542603.6206896552
FOOD_AND_DRINK : 1942465.605504587
COMICS : 828700.9433962264
SPORTS : 3638640.1428571427
EDUCATION : 1825480.7692307692
VIDEO_PLAYERS : 24727872.452830188
BEAUTY : 513151.88679245283
PERSONALIZATION : 5183850.806779661
FINANCE : 1361355.1437308867
MAPS_AND_NAVIG


On average, communication apps have the most installs: 38,456,119. This number is heavily skewed up by a few apps that have over one billion installs (WhatsApp, Facebook Messenger, Skype, Google Chrome, Gmail, and Hangouts), and a few others with over 100 and 500 million installs:

In [25]:

for app in android_final:
    if app[1] == 'COMMUNICATION' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

WhatsApp Messenger : 1,000,000,000+
imo beta free calls and text : 100,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
Who : 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Firefox Browser fast & private : 100,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Messenger Lite: Free Calls & Messages : 100,000,000+
Kik : 100,000,000+
KakaoTalk: Free Calls & Text : 100,000,000+
Opera Mini - fast web browser : 100,000,000+
Opera Browser: Fast and Secure : 100,000,000+
Telegram : 100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+
UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+
Viber Mess

If we removed all the communication apps that have over 100 million installs, the average would be reduced roughly ten times:


We see the same pattern for the video players category, which is the runner-up with 24,727,872 installs. The market is dominated by apps like Youtube, Google Play Movies & TV, or MX Player. The pattern is repeated for social apps (where we have giants like Facebook, Instagram, Google+, etc.), photography apps (Google Photos and other popular photo editors), or productivity apps (Microsoft Word, Dropbox, Google Calendar, Evernote, etc.).

Again, the main concern is that these app genres might seem more popular than they really are. Moreover, these niches seem to be dominated by a few giants who are hard to compete against.

The game genre seems pretty popular, but previously we found out this part of the market seems a bit saturated, so we'd like to come up with a different app recommendation if possible.

The books and reference genre looks fairly popular as well, with an average number of installs of 8,767,811. It's interesting to explore this in more depth, since we found this genre has some potential to work well on the App Store, and our aim is to recommend an app genre that shows potential for being profitable on both the App Store and Google Play.

Let's take a look at some of the apps from this genre and their number of installs

In [26]:
for app in android_final:
    if app[1] == 'BOOKS_AND_REFERENCE':
        print(app[0], ':', app[5])

E-Book Read - Read Book for free : 50,000+
Download free book with green book : 100,000+
Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Free Panda Radio Music : 100,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
English Grammar Complete Handbook : 500,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
Google Play Books : 1,000,000,000+
AlReader -any text book reader : 5,000,000+
Offline English Dictionary : 100,000+
Offline: English to Tagalog Dictionary : 500,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
Recipes of Prophetic Medicine for free : 500,000+
ReadEra – free ebook reader : 1,000,000+
Anonymous caller detection : 10,000+
Ebook Reader : 5,000,000+
Litnet - E-books : 100,000+
Read books online : 5,000,000+
English to Urdu Dictionary : 500,000+
eBoox: book reader fb2 epub zip : 1,000,000+
English Persian Dictionary : 500,000+
Flybook : 500,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
E

The book and reference genre includes a variety of apps: software for processing and reading ebooks, various collections of libraries, dictionaries, tutorials on programming or languages, etc. It seems there's still a small number of extremely popular apps that skew the average:

In [27]:
for app in android_final:
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000,000+'
                                            or app[5] == '500,000,000+'
                                            or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

Google Play Books : 1,000,000,000+
Bible : 100,000,000+
Amazon Kindle : 100,000,000+
Wattpad 📖 Free Books : 100,000,000+
Audiobooks from Audible : 100,000,000+



However, it looks like there are only a few very popular apps, so this market still shows potential. Let's try to get some app ideas based on the kind of apps that are somewhere in the middle in terms of popularity (between 1,000,000 and 100,000,000 downloads):

In [28]:
for app in android_final:
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000+'
                                            or app[5] == '5,000,000+'
                                            or app[5] == '10,000,000+'
                                            or app[5] == '50,000,000+'):
        print(app[0], ':', app[5])

Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
AlReader -any text book reader : 5,000,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
ReadEra – free ebook reader : 1,000,000+
Ebook Reader : 5,000,000+
Read books online : 5,000,000+
eBoox: book reader fb2 epub zip : 1,000,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
Moon+ Reader : 10,000,000+
English-Myanmar Dictionary : 1,000,000+
Golden Dictionary (EN-AR) : 1,000,000+
All Language Translator Free : 1,000,000+
Aldiko Book Reader : 10,000,000+
Dictionary - WordWeb : 5,000,000+
50000 Free eBooks & Free AudioBooks : 5,000,000+
Al-Quran (Free) : 10,000,000+
Al Quran Indonesia : 10,000,000+
Al'Quran Bahasa Indonesia : 10,000,000+
Al Quran Al karim : 1,000,000+
Al Quran : EAlim - Translations & MP3 Offline : 5,000,000+
Koran Read &MP3 30 Juz Offline : 1,000,000+
H

This niche seems to be dominated by software for processing and reading ebooks, as well as various collections of libraries and dictionaries, so it's probably not a good idea to build similar apps since there'll be some significant competition.

We also notice there are quite a few apps built around the book Quran, which suggests that building an app around a popular book can be profitable. It seems that taking a popular book (perhaps a more recent book) and turning it into an app could be profitable for both the Google Play and the App Store markets.

However, it looks like the market is already full of libraries, so we need to add some special features besides the raw version of the book. This might include daily quotes from the book, an audio version of the book, quizzes on the book, a forum where people can discuss the book, etc.

# Markdown

In this project, we analyzed data about the App Store and Google Play mobile apps with the goal of recommending an app profile that can be profitable for both markets.

We concluded that taking a popular book (perhaps a more recent book) and turning it into an app could be profitable for both the Google Play and the App Store markets. The markets are already full of libraries, so we need to add some special features besides the raw version of the book. This might include daily quotes from the book, an audio version of the book, quizzes on the book, a forum where people can discuss the book,