 Profitable App Profiles for the App Store and Google Play Markets

In our data analysis project, we, as a mobile app development company, will be exploring data to gain valuable insights into the factors that influence the attraction of more users to our Android and iOS mobile apps, available on Google Play and the App Store.

As a company that heavily relies on in-app ads for revenue, our success is directly tied to the number of users we acquire. With a larger user base, we can enhance engagement with our in-app ads and generate higher revenue. Hence, the main objective of this project is to employ data analysis techniques to assist our developers in understanding the types of apps that are more likely to draw a larger audience.

Through the examination of various factors such as app categories, user ratings, app size, and even pricing (if applicable), along with other relevant metrics, we aim to uncover patterns and trends that can provide guidance to our development team. These insights will enable us to create apps with a higher potential for popularity and user attraction. Ultimately, our analysis will contribute to strategic decision-making and optimization of our app development process, fostering increased user engagement and revenue generation.

open both csv files using the default methods 

In [26]:
from csv import reader 

In [27]:
open_file = open('AppleStore.csv')
read_file = reader(open_file)
ios = list(read_file)
ios_header = ios[0]
ios_main = ios[1:]

In [28]:
open_file = open('googleplaystore.csv')
read_file = reader(open_file)
android = list(read_file)
android_header = android[0]
android_main = android[1:]

 created a function named explore_data() that you can repeatedly use to print rows in a readable way.

In [29]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

Print the first few rows of each dataset.

In [30]:
#print(ios_main[:2])

In [31]:
#print(android_main[:2])

the number of rows and columns of each dataset 

In [32]:
#explore_data(ios_main , 0, 3, True)

In [33]:
#explore_data(android_main , 0, 3, True)

To Familirize with data we will print each of dataset column names 

In [34]:
print(android_header)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


In [35]:
print(ios_header)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


After identifying an error in one row of the Google dataset during our discussion, we have decided to take the necessary step of deleting it to maintain the overall efficiency of our analysis

To ensure that the deletion has taken effect and to maintain transparency, we will print the length of the Google dataset before and after removing the row containing the error. This will allow us to verify the impact of the deletion on the dataset's overall length and confirm the desired efficiency in our analysis.

In [11]:
print(android_header)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


In [36]:
del android_main[10472]

In [37]:
print(len(android_main))

10840


Before we commence with our data analysis, it is essential to perform data cleaning tasks. One method we can employ is removing duplicates. However, before proceeding with duplicate removal, we need to identify whether there are any duplicates present in our dataset.

In [38]:
duplicate_apps = []
unique_apps = []
for app in android_main:
    name = app[0]
    if name not  in unique_apps:
        unique_apps.append(name)
    else:
        duplicate_apps.append(name)
print("Number of unique apps is :", len(unique_apps))
print("Example of duplicate apps are:", duplicate_apps[:5])      
        
        
        

Number of unique apps is : 9659
Example of duplicate apps are: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings']


As we can see there is alot duplicate apps in our dataset and we need to find a way to remove it 

In [40]:
reviews_max = {}

for app in android_main:
    name = app[0]
    n_reviews = float(app[3])
    
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
        
    elif name not in reviews_max:
        reviews_max[name] = n_reviews
        

In [71]:
print(len(android_main))

10840


In [44]:
android_clean = []
already_added = []
for app in android_main:
    name = app[0]
    n_reviews = float(app[3])
    if (n_reviews == reviews_max[name]) and (name not in already_added):
        android_clean.append(app)
        already_added.append(name)
    
    

In [43]:
print(len(android_clean))

9659


In [45]:
explore_data(android_clean, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9659
Number of columns: 13


In [27]:
def is_english(string):
    
    for character in string:
        if ord(character) > 127:
            return False
    
    return True

print(is_english('Instagram'))
print(is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))

True
False


In [28]:
print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))

print(ord('™'))
print(ord('😜'))

False
False
8482
128540


In [46]:
def is_english(string):
    non_ascii = 0  # Counter for non-ASCII characters
    
    for character in string:
        if ord(character) > 127:  # Check if character is non-ASCII
            non_ascii += 1
    
    if non_ascii > 3:  # If more than 3 non-ASCII characters are found
        return False  # Return False, indicating non-English
    else:
        return True  # Return True, indicating English

# Test cases
print(is_english('Docs To Go™ Free Office Suite'))  # Contains non-ASCII character (™)
print(is_english('Instachat 😜'))  # Contains non-ASCII characters (😜)


True
True


In [48]:
android_english = []  # Create an empty list to store English Android apps

ios_english = []  # Create an empty list to store English iOS apps

# Iterate over each app in the 'android_clean' list
for app in android_clean:
    name = app[0]  # Extract the app name from the current Android app
    if is_english(name):  # Check if the app name is in English
        android_english.append(app)  # If yes, add the app to the 'android_english' list

# Iterate over each app in the 'ios' list
for app in ios:
    name = app[1]  # Extract the app name from the current iOS app
    if is_english(name):  # Check if the app name is in English
        ios_english.append(app)  # If yes, add the app to the 'ios_english' list

# Call the 'explore_data' function to display information about the Android English apps
explore_data(android_english, 0, 3, True)
print('\n')
# Call the 'explore_data' function to display information about the iOS English apps
explore_data(ios_english, 0, 3, True)


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9614
Number of columns: 13


['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagr

In [31]:
android_final = []
ios_final = []

for app in android_english:
    price = app[7]
    if price == '0':
        android_final.append(app)
        
for app in ios_english:
    price = app[4]
    if price == '0.0':
        ios_final.append(app)
        
print(len(android_final))
print(len(ios_final))

0
3222


As previously mentioned, our focus is on developing free-to-download and install apps, with our primary revenue stream coming from in-app advertisements. Our datasets consist of both free and non-free apps, but for our analysis, we need to filter and isolate only the free apps. In the following code, we accomplish this task by extracting and segregating the free apps from our datasets.

In [49]:
android_final = []  # Create an empty list to store the final filtered Android apps

ios_final = []  # Create an empty list to store the final filtered iOS apps

# Iterate over each app in the 'android_english' list
for app in android_english:
    price = app[7]  # Extract the price of the app from the current Android app
    if price == '0':  # Check if the app is free (price is '0')
        android_final.append(app)  # If yes, add the app to the 'android_final' list

# Iterate over each app in the 'ios_english' list
for app in ios_english:
    price = app[4]  # Extract the price of the app from the current iOS app
    if price == '0.0':  # Check if the app is free (price is '0.0')
        ios_final.append(app)  # If yes, add the app to the 'ios_final' list

print(len(android_final))  # Print the number of free Android apps in the 'android_final' list
print(len(ios_final))  # Print the number of free iOS apps in the 'ios_final' list


8864
3222




The goal of this analysis is to identify app genres that have a higher likelihood of attracting more users, as the revenue of the company is heavily influenced by the app's user base. The validation strategy involves three steps: starting with a minimal Android version of the app on Google Play, further developing it if it receives positive user feedback, and eventually creating an iOS version for the App Store if it proves to be profitable within six months. To achieve the objective of targeting both markets, it is necessary to find app profiles that are successful on both platforms.

The analysis begins by examining the most common genres in each market. Frequency tables will be created for the "prime_genre" column in the App Store dataset, and the "Genres" and "Category" columns in the Google Play dataset. Two functions will be developed to analyze these frequency tables: one to generate percentage-based frequency tables and another to display the percentages in descending order.

In summary, this analysis aims to identify app genres that have a high potential for success on both the App Store and Google Play.

In [50]:
def freq_table(dataset, index):
    table = {}  # Create an empty dictionary to store the frequency of values
    total = 0  # Variable to keep track of the total number of rows in the dataset

    # Iterate over each row in the dataset
    for row in dataset:
        total += 1  # Increment the total count for each row
        value = row[index]  # Get the value at the specified index in the row

        # Check if the value is already present in the table
        if value in table:
            table[value] += 1  # If it is present, increment its frequency
        else:
            table[value] = 1  # If it is not present, initialize its frequency to 1

    table_percentages = {}  # Create a new dictionary to store the percentages of each value
    for key in table:
        percentage = (table[key] / total) * 100  # Calculate the percentage for each value
        table_percentages[key] = percentage  # Store the percentage in the new dictionary

    return table_percentages  # Return the dictionary containing value frequencies as percentages


def display_table(dataset, index):
    table = freq_table(dataset, index)  # Get the frequency table using the freq_table function
    table_display = []  # Create an empty list to store the formatted table

    # Iterate over each key in the table dictionary
    for key in table:
        key_val_as_tuple = (table[key], key)  # Create a tuple with the frequency and value
        table_display.append(key_val_as_tuple)  # Append the tuple to the table_display list

    table_sorted = sorted(table_display, reverse=True)  # Sort the table in descending order of frequency

    # Iterate over each entry in the sorted table and display it
    for entry in table_sorted:
        print(entry[1], ':', entry[0])  # Print the value and its frequency


Let's begin by analyzing the frequency table pertaining to the "prime_genre" column within the App Store dataset.

In [51]:
display_table(ios_final, -5)

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


Among the free English apps, it can be observed that games constitute more than half (58.16%) of the total. Entertainment apps account for nearly 8%, followed closely by photo and video apps at around 5%. Educational apps represent only 3.66% of the total, followed by social networking apps at 3.29%.

The overall impression suggests that the App Store, specifically the section containing free English apps, is predominantly filled with apps designed for recreational purposes such as games, entertainment, photo and video, social networking, sports, and music, among others. On the other hand, apps with practical uses like education, shopping, utilities, productivity, and lifestyle are comparatively less common. However, it is important to note that while fun apps may be abundant, it doesn't necessarily indicate that they have the highest number of users, as demand may not align with the supply.

Let's proceed by analyzing the Genres and Category columns within the Google Play dataset, as they appear to have a connection.

In [52]:
display_table(android_final, 1) # Category

FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299638989
ART_AND_DESIGN : 


When looking at Google Play, it becomes evident that the app ecosystem has distinct characteristics. There is a conspicuous absence of apps focused on entertainment, and a significant proportion of the available apps cater to practical purposes like family, tools, business, lifestyle, productivity, and other similar categories. However, delving deeper into the analysis reveals that the family category, accounting for approximately 19% of the apps, predominantly comprises games specifically created for children.

However, when considering practical applications, it appears that Google Play has a stronger presence compared to the App Store. This observation is further supported by the frequency table displayed for the Genres column.

In [55]:
display_table(android_final, -4)

Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2378158844765346
Action : 3.1024368231046933
Health & Fitness : 3.0798736462093865
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.8501805054151623
Video Players & Editors : 1.7712093862815883
Casual : 1.7599277978339352
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718411552346
Auto & Vehicles : 0.9250902527075

Although the distinction between the Genres and Category columns is not completely transparent, it is evident that the Genres column provides a more detailed breakdown with a greater number of categories. However, for the purpose of our analysis, we will solely focus on the Category column in order to gain a broader understanding.

Thus far, our findings have revealed that the App Store primarily consists of entertainment-oriented applications, whereas Google Play offers a more diverse selection encompassing both practical and recreational apps. Now, our objective is to determine the types of apps that enjoy the largest user base.

Most Popular Apps by Genre on the App Store


To determine the most popular genres in terms of user base, one approach is to calculate the average number of installations for each app genre. In the case of the Google Play dataset, this information can be obtained from the Installs column. However, for the App Store dataset, the Installs column is unavailable. As an alternative, we will utilize the total number of user ratings as an approximation, which can be found in the rating_count_tot field.

Below, we present the calculation of the average number of user ratings per app genre on the App Store.

In [58]:
# Generate frequency table of genres in the ios_final dataset
genres_ios = freq_table(ios_final, -5)

# Iterate over each genre in the frequency table
for genre in genres_ios:
    total = 0
    len_genre = 0

    # Iterate over each app in the ios_final dataset
    for app in ios_final:
        # Get the genre of the current app
        genre_app = app[-5]

        # Check if the genre matches the current genre being iterated over
        if genre_app == genre:
            # Convert the number of ratings to float and add it to the total
            n_ratings = float(app[5])
            total += n_ratings

            # Increment the count of apps in the genre
            len_genre += 1

    # Calculate the average number of ratings for the current genre
    avg_n_ratings = total / len_genre

    # Print the genre and its corresponding average number of ratings
    print(genre, ':', avg_n_ratings)



Social Networking : 71548.34905660378
Photo & Video : 28441.54375
Games : 22788.6696905016
Music : 57326.530303030304
Reference : 74942.11111111111
Health & Fitness : 23298.015384615384
Weather : 52279.892857142855
Utilities : 18684.456790123455
Travel : 28243.8
Shopping : 26919.690476190477
News : 21248.023255813954
Navigation : 86090.33333333333
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Food & Drink : 33333.92307692308
Sports : 23008.898550724636
Book : 39758.5
Finance : 31467.944444444445
Education : 7003.983050847458
Productivity : 21028.410714285714
Business : 7491.117647058823
Catalogs : 4004.0
Medical : 612.0


On average, navigation apps have the highest number of user reviews, but this figure is heavily influenced by Waze and Google Maps, which have close to half a million user reviews together:

In [59]:
for app in ios_final:
    if app[-5] == 'Navigation':
        print(app[1], ':', app[5]) # print name and number of ratings

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


Social networking apps and music apps exhibit a similar trend where the average number of ratings is significantly influenced by a handful of dominant players like Facebook, Pinterest, Skype, Pandora, Spotify, and Shazam. As a result, the perceived popularity of these genres, particularly navigation, social networking, and music apps, may be inflated. The average ratings are heavily skewed by a small number of apps with a large user base, often reaching hundreds of thousands of ratings, while the majority of other apps struggle to surpass the 10,000 rating threshold. To gain a more accurate understanding of popularity, it would be beneficial to exclude these extremely popular apps within each genre and recalculate the averages. However, we will postpone this level of analysis for future consideration.

On average, reference apps receive 74,942 user ratings. It's important to note that the average rating is influenced by specific apps such as the Bible and Dictionary.com, which significantly contribute to the higher average rating.

In [60]:
for app in ios_final:
    if app[-5] == 'Reference':
        print(app[1], ':', app[5])

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
教えて!goo : 0
Jishokun-Japanese English Dictionary & Translator : 0


The niche of turning popular books into feature-rich apps shows potential. By enhancing the raw version of a book with additional elements like daily quotes, audio versions, quizzes, and an embedded dictionary, we can create an app that stands out in the saturated for-fun app market dominated by the App Store. This practical approach has a higher chance of attracting attention among the vast number of existing apps.

While other popular genres like weather, food and drink, and finance exist, they don't align with our interests and capabilities:

Weather apps have limited user engagement and profit potential through in-app ads, and acquiring reliable live weather data may involve connecting with non-free APIs.
Food and drink apps, exemplified by major brands like Starbucks and McDonald's, require cooking and delivery services, which fall outside our company's scope.
Finance apps involve complex financial operations that require specialized domain knowledge, which we currently lack.
Next, let's delve into an analysis of the Google Play market.

Most Popular Apps by Genre on Google Play

When it comes to the Google Play market, we have access to data regarding the number of installs, which can provide us with a better understanding of genre popularity. However, it's worth noting that the install numbers available to us are not highly precise. In many cases, the values are presented in open-ended ranges such as "100+", "1,000+", "5,000+", and so on.

In [61]:
display_table(android_final, 5) # the Installs columns

1,000,000+ : 15.726534296028879
100,000+ : 11.552346570397113
10,000,000+ : 10.548285198555957
10,000+ : 10.198555956678701
1,000+ : 8.393501805054152
100+ : 6.915613718411552
5,000,000+ : 6.825361010830325
500,000+ : 5.561823104693141
50,000+ : 4.7721119133574
5,000+ : 4.512635379061372
10+ : 3.5424187725631766
500+ : 3.2490974729241873
50,000,000+ : 2.3014440433213
100,000,000+ : 2.1322202166064983
50+ : 1.917870036101083
5+ : 0.78971119133574
1+ : 0.5076714801444043
500,000,000+ : 0.2707581227436823
1,000,000,000+ : 0.22563176895306858
0+ : 0.04512635379061372
0 : 0.01128158844765343


The available data lacks precision regarding the number of installations for apps labeled with "100,000+" installs. We don't require exact precision for our goal of understanding which app genres attract the most users. To work with the data, we will consider apps with "100,000+" installs to have 100,000 installs, and "1,000,000+" installs to have 1,000,000 installs.

To perform calculations, we need to convert the installation numbers to float values. This requires removing commas and plus characters from the numbers to avoid conversion errors. We will handle this task within the provided loop, where we will also compute the average number of installs for each app genre (category).

In [62]:
categories_android = freq_table(android_final, 1)

# Iterate through each category in the `categories_android` list
for category in categories_android:
    total = 0  # Variable to store the total number of installs for a category
    len_category = 0  # Variable to store the number of apps in a category
    # Iterate through each app in the `android_final` list
    for app in android_final:
        category_app = app[1]  # Get the category of the current app
        if category_app == category:
            n_installs = app[5]  # Get the number of installs of the current app
            # Remove any commas and plus signs from the `n_installs` value
            n_installs = n_installs.replace(',', '')
            n_installs = n_installs.replace('+', '')
            total += float(n_installs)  # Add the number of installs to the total
            len_category += 1  # Increment the count of apps in the category
    avg_n_installs = total / len_category  # Calculate the average number of installs for the category
    print(category, ':', avg_n_installs)  # Print the category and its average number of installs


ART_AND_DESIGN : 1986335.0877192982
AUTO_AND_VEHICLES : 647317.8170731707
BEAUTY : 513151.88679245283
BOOKS_AND_REFERENCE : 8767811.894736841
BUSINESS : 1712290.1474201474
COMICS : 817657.2727272727
COMMUNICATION : 38456119.167247385
DATING : 854028.8303030303
EDUCATION : 1833495.145631068
ENTERTAINMENT : 11640705.88235294
EVENTS : 253542.22222222222
FINANCE : 1387692.475609756
FOOD_AND_DRINK : 1924897.7363636363
HEALTH_AND_FITNESS : 4188821.9853479853
HOUSE_AND_HOME : 1331540.5616438356
LIBRARIES_AND_DEMO : 638503.734939759
LIFESTYLE : 1437816.2687861272
GAME : 15588015.603248259
FAMILY : 3695641.8198090694
MEDICAL : 120550.61980830671
SOCIAL : 23253652.127118643
SHOPPING : 7036877.311557789
PHOTOGRAPHY : 17840110.40229885
SPORTS : 3638640.1428571427
TRAVEL_AND_LOCAL : 13984077.710144928
TOOLS : 10801391.298666667
PERSONALIZATION : 5201482.6122448975
PRODUCTIVITY : 16787331.344927534
PARENTING : 542603.6206896552
WEATHER : 5074486.197183099
VIDEO_PLAYERS : 24727872.452830188
NEWS_AND_

Communication apps, on average, have a high number of installations, with an average of 38,456,119 installs. This average is largely influenced by a handful of apps that have achieved over one billion installations, such as WhatsApp, Facebook Messenger, Skype, Google Chrome, Gmail, and Hangouts. Additionally, there are several other communication apps with significant installations ranging from 100 to 500 million.

In [63]:
for app in android_final:
    if app[1] == 'COMMUNICATION' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

WhatsApp Messenger : 1,000,000,000+
imo beta free calls and text : 100,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
Who : 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Firefox Browser fast & private : 100,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Messenger Lite: Free Calls & Messages : 100,000,000+
Kik : 100,000,000+
KakaoTalk: Free Calls & Text : 100,000,000+
Opera Mini - fast web browser : 100,000,000+
Opera Browser: Fast and Secure : 100,000,000+
Telegram : 100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+
UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+
Viber Mess

If we removed all the communication apps that have over 100 million installs, the average would be reduced roughly ten times:

In [64]:
under_100_m = []

for app in android_final:
    n_installs = app[5]
    n_installs = n_installs.replace(',', '')
    n_installs = n_installs.replace('+', '')
    if (app[1] == 'COMMUNICATION') and (float(n_installs) < 100000000):
        under_100_m.append(float(n_installs))
        

In [65]:
sum(under_100_m) / len(under_100_m)

3603485.3884615386

The video players category follows a similar pattern with 24,727,872 installations, primarily dominated by popular apps like Youtube, Google Play Movies & TV, and MX Player. This trend of dominance by a few major players is also observed in other genres such as social apps (Facebook, Instagram, Google+), photography apps (Google Photos, popular photo editors), and productivity apps (Microsoft Word, Dropbox, Google Calendar, Evernote). While these genres appear popular, it raises concerns about the difficulty of competing against these established giants. The gaming genre is popular but saturated, so alternative app recommendations are sought. On the other hand, the books and reference genre shows promise, with an average of 8,767,811 installations, making it worth exploring further as it demonstrates potential for profitability on both the App Store and Google Play.

In [66]:
for app in android_final:
    if app[1] == 'BOOKS_AND_REFERENCE':
        print(app[0], ':', app[5])

E-Book Read - Read Book for free : 50,000+
Download free book with green book : 100,000+
Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Free Panda Radio Music : 100,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
English Grammar Complete Handbook : 500,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
Google Play Books : 1,000,000,000+
AlReader -any text book reader : 5,000,000+
Offline English Dictionary : 100,000+
Offline: English to Tagalog Dictionary : 500,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
Recipes of Prophetic Medicine for free : 500,000+
ReadEra – free ebook reader : 1,000,000+
Anonymous caller detection : 10,000+
Ebook Reader : 5,000,000+
Litnet - E-books : 100,000+
Read books online : 5,000,000+
English to Urdu Dictionary : 500,000+
eBoox: book reader fb2 epub zip : 1,000,000+
English Persian Dictionary : 500,000+
Flybook : 500,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
E

The book and reference genre encompasses a range of apps, including software for ebook processing and reading, diverse library collections, dictionaries, programming and language tutorials, and more. However, the average number of installations in this genre is influenced by a few highly popular apps that have a significant impact on the overall average.

In [67]:
for app in android_final:
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000,000+'
                                            or app[5] == '500,000,000+'
                                            or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

Google Play Books : 1,000,000,000+
Bible : 100,000,000+
Amazon Kindle : 100,000,000+
Wattpad 📖 Free Books : 100,000,000+
Audiobooks from Audible : 100,000,000+


Nevertheless, it appears that the market for book and reference apps is characterized by a limited number of highly popular applications, indicating potential opportunities. To generate app ideas, let's focus on apps that fall within the moderate popularity range, with downloads ranging from 1,000,000 to 100,000,000.

In [68]:
for app in android_final:
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000+'
                                            or app[5] == '5,000,000+'
                                            or app[5] == '10,000,000+'
                                            or app[5] == '50,000,000+'):
        print(app[0], ':', app[5])

Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
AlReader -any text book reader : 5,000,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
ReadEra – free ebook reader : 1,000,000+
Ebook Reader : 5,000,000+
Read books online : 5,000,000+
eBoox: book reader fb2 epub zip : 1,000,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
Moon+ Reader : 10,000,000+
English-Myanmar Dictionary : 1,000,000+
Golden Dictionary (EN-AR) : 1,000,000+
All Language Translator Free : 1,000,000+
Aldiko Book Reader : 10,000,000+
Dictionary - WordWeb : 5,000,000+
50000 Free eBooks & Free AudioBooks : 5,000,000+
Al-Quran (Free) : 10,000,000+
Al Quran Indonesia : 10,000,000+
Al'Quran Bahasa Indonesia : 10,000,000+
Al Quran Al karim : 1,000,000+
Al Quran : EAlim - Translations & MP3 Offline : 5,000,000+
Koran Read &MP3 30 Juz Offline : 1,000,000+
H


The book and reference niche is currently dominated by ebook processing and reading software, library collections, and dictionaries. Consequently, it might not be advisable to develop similar apps as competition in this area is significant. Interestingly, we observe a considerable number of apps centered around the book Quran, indicating the potential profitability of building an app based on a popular book. This suggests that transforming a well-liked, potentially more recent book into an app could be lucrative in both the Google Play and App Store markets. However, since the market is already saturated with libraries, it would be essential to incorporate additional distinctive features beyond the basic book version. These could include daily quotes from the book, an audio rendition of the book, book-related quizzes, a discussion forum for users, and more.

Conclusion 

After analyzing data from the App Store and Google Play mobile apps, we have determined that creating an app based on a popular book, preferably a recent one, could be a lucrative venture in both markets. However, due to the existing abundance of library apps, incorporating unique elements beyond the book itself is essential. These could include daily quotes sourced from the book, an audio rendition, interactive quizzes, and a dedicated forum enabling users to engage in discussions centered around the book.