# Analysis of App User Data in App Store and Google Play

The aim of this project is to provide an informed analysis of popular App user data, to make a recommendation for a new profitable application for the App Store and Google Play English user markets.

My analysis will focus on apps that are in English, free to download and install, and where the main source of revenue consists of in-app ads. The goal is to showcase and analyze mobile app data to understand what kinds of apps attract more users. This analysis looks to inform my decision on what to build that results in more user engagement, therefore, resulting in more ad revenue. 

## In the code below, I will:

Data Exploration

1. Choose a Representative Sample of Data
2. Open and Explore the Data

Data Cleaning

3. Delete Unnecessary Data
4. Remove Duplicate Entries
5. Remove Non-English Apps
6. Isolate Free Apps

Data Analysis

7. Group Most Used Apps by Genre
8. Analyze App Store and Google Play Markets Most Popular Apps by Genre
9. Provide a Conclusion of my Analysis

### Data Exploration

### 1. Choose a Representative Sample of Data:

There are approximately 2 million Apps sold in the App Store and 2.1 million in Google Play as of September of 2018.  To save time and money, collecting data on all of these apps, I will use an already existing free representative sample of this data. This data can be accessed at no cost at [Kaggle.com](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps)

### 2. Open and Explore the Data

#### Open Apple Store and Google Play Data

In [None]:
from csv import reader

### Apple Store Data ###
opened_file = open('AppleStore.csv')
read_file = reader(opened_file)
ios_apps_data = list(read_file)
ios_header = ios_apps_data[0]
ios_apps_data = ios_apps_data[1:]

print(ios_apps_data[0:5])

print('\n') 

### Google Play Data ###
opened_file = open('googleplaystore.csv')
read_file = reader(opened_file)
android_apps_data = list(read_file)
android_header = android_apps_data[0]
android_apps_data = android_apps_data[1:]
      
print(ios_apps_data[0:5])

#### Explore Apple Store and Google Play Data

##### Apple Store

In [None]:
def explore_data(dataset, start, end, rows_and_columns = False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n')

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))
        
print(ios_header)
print('\n')

explore_data(ios_apps_data, 0, 5, True)

As can be seen from the data set, after excluding the header row, there are 7196 iOS apps in the dataset and 17 columns. For the purpose of this project the columns that may provide the most useful information are: `'price'` and `'currency'`, `'user_rating'` and `'rating_count_tot'`, `'lang.num'`, `'prime_genre'` and `'track_name'`. Not all column names are self-explanatory in this data, but details about each column can be found in the data set [documentation](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/home).

##### Google Play

In [None]:
print(android_header)
print('\n')
explore_data(android_apps_data, 0, 5, True)

##### Google Play data set 

As can be seen from the data set, after excluding the header row, there are 10840 Android apps in the dataset and 13 columns. For the purpose of this project the columns that may provide the most useful information are: `'Price'` and `'Type'`, `'Rating'` and `'Reviews'`, `'Genres'` and `'App'`. Similar to the iOs data, there is a discussion section about the data set [documentation](https://www.kaggle.com/lava18/google-play-store-apps/discussion).

### Data Cleaning

To make my dataset more useful and representative I have to review it for inconsistencies, duplicate data, or unnecessary/misleading information. This will improve the quality of my analysis and perhaps even simplify my efforts in analysis. The discussion sections availabe to these datasets may provide a lot of insight in this.

### 3. Delete Unnecessary Data

In order to clean our data, I will delete the data that I do not want to include in our analysis because it is inaccurate, misleading, duplicated, etc.

##### Google Play data set 

I can see that [one of the discussions](https://www.kaggle.com/lava18/google-play-store-apps/discussion/66015) outlines an error for row 10472. Let's print this row and compare it against the header and another row that is correct.

In [None]:
print(android_header)
print('\n')
print(android_apps_data[10472])
print('\n')
explore_data(android_apps_data, 0, 3, True)

##### Google Play data set 

As can be seen from the comparison of the rows, row `'10472'` that corresponds to the app `'Life Made WI-Fi Touchscreen Photo Frame'` has a value for it's `'Category'` and a `'Rating'` of `'19'` which is well above the maximum of 5 that is available for Android apps. As a result, I should delete this row as it is inaccurate and will incorrectly skew our data analysis if I choose to analyze the ratings of apps as a possible profitability indicator.

In [None]:
print(len(android_apps_data))
del android_apps_data[10472] 
print(len(android_apps_data))

### 4. Remove Duplicate Data

To continue cleaning the data, I will delete the data that is duplicated or shows up more than once. In order to do that, first, I must find the duplicate entries to make a decision about what to keep and what to delete. I can do this with a function that will give us the number of duplcate entries and their information.

##### Apple Store and Google Play data set 

In [None]:
duplicate_ios_apps = []
unique_ios_apps = []

duplicate_android_apps = []
unique_android_apps = []

for app in ios_apps_data:
    name = app[2]
    if name in unique_ios_apps:
        duplicate_ios_apps.append(name)
    else:
        unique_ios_apps.append(name)

for app in android_apps_data:
    name = app[0]
    if name in unique_android_apps:
        duplicate_android_apps.append(name)
    else:
        unique_android_apps.append(name)
        
print('Number of duplicate iOS apps:', len(duplicate_ios_apps))
print('\n')
print('Examples of duplicate iOS apps:', duplicate_ios_apps[:5])

print('\n')

print('Number of duplicate Android apps:', len(duplicate_android_apps))
print('\n')
print('Examples of duplicate Android apps:', duplicate_android_apps[:30])

If I look at the Android duplicate data I find that one of those duplicate apps is `'Instagram'.`

In [None]:
for app in android_apps_data:
    name = app[0]
    
    if name == 'Instagram':
        print(app)

##### Choosing which apps to keep in our data set

To avoid counting apps twice in the data I need to remove the duplicate entries and keep only one entry per app. There are multiple ways to select the entry I will keep. One possibility is to randomly choose one of the duplicates to remain in the data set. A better way to do this is explore the duplicates for some of the entries and try to find a pattern that may help me select the best representative entry for that app. 

For example, the Instagram app duplicates showcase a main difference on the fourth position of each row, which corresponds to the number of reviews. The different numbers show that the data was collected at different times. I can use this to build a criterion for keeping rows. I won't remove rows randomly, but rather I'll keep the rows that have the highest number of reviews because the higher the number of reviews, the more reliable the ratings.

To do that, I will:

1. Create a dictionary where each key is a unique app name, and the value is the highest number of reviews of that app
2. Use the dictionary to create a new data set, which will have only one entry per app (and I only select the apps with the highest number of reviews)

##### Building the dictionary

In [None]:
android_max_reviews = {}

for app in android_apps_data:
    name = app[0]
    n_reviews = float(app[3])
    
    if name in android_max_reviews and android_max_reviews[name] < n_reviews:
        android_max_reviews[name] = n_reviews
        
    elif name not in android_max_reviews:
        android_max_reviews[name] = n_reviews

##### Length of dictionary

In a previous code cell, I found that there are 1,181 cases where an app occurs more than once, so the length of the dictionary (of unique apps) should be equal to the difference between the length of our data set and 1,181.

In [None]:
print('Expected Android length:', len(android_apps_data) - 1181)
print('\n')
print('Actual Android length:', len(android_max_reviews))

Now, I will use the `reviews_max` dictionary to remove the duplicates. For the duplicate cases, I'll only keep the entries with the highest number of reviews. In the code cell below:

1. I will start by initializing two empty lists, `android_clean` and `already_added`.
2. I will loop through the android data set, and for every iteration:
3. I will isolate the name of the app and the number of reviews.
4. I will add the current row (app) to the `android_clean` list, and the app name (name) to the `already_cleaned` list if:

    * The number of reviews of the current app matches the number of reviews of that app as described in the `reviews_max` dictionary; and
    * The name of the app is not already in the `already_added` list. I need to add this supplementary condition to account for those cases where the highest number of reviews of a duplicate app is the same for more than one entry (for example, the Box app has three entries, and the number of reviews is the same). If I check for `reviews_max[name]` == `n_reviews`, I will still end up with duplicate entries for some apps.

In [None]:
android_clean = []
already_added = []

for app in android_apps_data:
    name = app[0]
    n_reviews = float(app[3])
    
    if (android_max_reviews[name] == n_reviews) and (name not in already_added):
        android_clean.append(app)
        already_added.append(name)

It is important to explore the new data set, and confirm that the number of rows is 9,659.

In [None]:
explore_data(android_clean, 0, 5, True)

### 5. Remove Non-English apps

#### Part 1

Because my analysis is geared for an English-speaking audience, I will explore the data sets looking for other information that may not be useful to me. I must be clear: app data from different languages could potentially provide us with information that could potentially be of use to us, but because of language barriers, this may a project to take on for the future. As anticipated, the names of some of the apps in the dataset suggest they are not directed toward an English-speaking audience. Below, I will showcase a couple of examples from both data sets:

In [None]:
print(ios_apps_data[813][1])
print(ios_apps_data[6731][1])

print(android_clean[4412][0])
print(android_clean[7940][0])

Because these apps fall outside of the scope of this project, I will remove them. One way to go about this is to remove each app whose name contains a symbol that is not commonly used in English text — English text usually includes letters from the English alphabet, numbers composed of digits from 0 to 9, punctuation marks (., !, ?, ;, etc.), and other symbols (+, *, /, etc.).

All of the characters that are specific to English texts are encoded using the ASCII standard. Each ASCII character has a corresponding number between 0 and 127 associated with it. As a result, I can build a function that checks an app name and tells me whether it contains non-ASCII characters.

In the function below, I use the built-in `ord()` function to find out the corresponding encoding number of each character.

In [None]:
def is_english(string):
    
    for character in string:
        if ord(character) > 127:
            return False
    
    return True

print(is_english('Instagram'))
print(is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))

The function woeks well, but some English app names use emojis or other symbols (™, — (em dash), – (en dash), etc.) that fall outside of the ASCII range. Because of this, I would be removing useful apps if I use the function in its current form.

In [None]:
print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))

print(ord('™'))
print(ord('😜'))

#### Part 2

To minimize the impact of data loss, I will only remove an app if its name has more than three non-ASCII characters:

In [None]:
def is_english(string):
    non_ascii = 0
    
    for character in string:
        if ord(character) > 127:
            non_ascii += 1
    
    if non_ascii > 3:
        return False
    else:
        return True

print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))

Even though the function may not be perfect, and very few non-English apps might get past my filter, I beleive the data is clean and representative enough for the purposes of this project. In fact, spending too much time on optimization is possible, and may take away from the analysis that I'm trying to provide in a timely manner.

Below, I use the `ios_english()` function to filter out the non-English apps for both data sets:

In [None]:
ios_english = []
android_english = []

for app in ios_apps_data:
    name = app[2]
    if is_english(name):
        ios_english.append(app)

for app in android_clean:
    name = app[0]
    if is_english(name):
        android_english.append(app)
        
explore_data(ios_english, 0, 3, True)
print('\n')
explore_data(android_english, 0, 3, True)

### 6. Isolate Free Apps

As I mentioned in the introduction, my analysis is focused on apps that are free to download and install. This means that the main source of revenue consists of in-app ads. The data sets contain both free and non-free apps, and I will need to isolate only the free apps for my analysis. Below, I isolate the free apps for both data sets.

In [None]:
ios_final = []
android_final = []

for app in ios_english:
    price = app[5]
    if price == '0':
        ios_final.append(app)

for app in android_english:
    price = app[7]
    if price == '0':
        android_final.append(app)
        
print(len(ios_final))
print(len(android_final))

I'm left with 3222 iOS apps, and 8864 Android apps. This is sufficient data for our analysis.

### Data Analysis

### 7. Group Most Used Apps by Genre

#### Part 1

As I mentioned in the introduction, my aim is to determine the kinds of apps that are likely to attract more users. This is because revenue will come from in-app ads, which is highly influenced by the number of people using our apps.

To minimize risks and overhead, my validation strategy for an app idea is comprised of three steps:

   1. Build a minimal Android version of the app, and add it to Google Play.
   2. If the app has a good response from users, I recommend to develop it further.
   3. If the app is profitable after six months, I recommend to build an iOS version of the app and add it to the App Store.

Because my end goal is to add the app on both the App Store and Google Play, I need to find app profiles that are successful on both markets.

I will begin the analysis by getting a sense of the most common genres for each market. For this, I will build a frequency table for the `'prime_genre'` column of the iOS data set, and the `'Genres'` and `'Category'` columns of the Android data set.

#### Part 2

I will build two functions I can use to analyze the frequency tables:

   1. One function to generate frequency tables that show percentages of the appearance of the genres (columns) evaluated in my data
   2. Another function that will display the percentages of the genres in a descending order


In [None]:
def freq_table(dataset, index):
    table = {}
    total = 0
    
    for row in dataset:
        total += 1
        value = row[index]
        if value in table:
            table[value] += 1
        else:
            table[value] = 1
    
    table_percentages = {}
    for key in table:
        percentage = (table[key] / total) * 100
        table_percentages[key] = percentage 
    
    return table_percentages


def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)
        
    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

#### Part 3

I will examine the frequency table for the `'prime_genre'` column of the App Store data set.

In [None]:
display_table(ios_final, -5)

![](Apps_AS_Prime_Genre.png)

It is clear that among the free iOS English apps, more than a half (58.65%) are games. Entertainment apps make up 8.23%, followed by photo and video apps (4.11)%. Only 3.52% of the apps are designed for social networking, followed by education apps which amount to 3.25% of the apps in our data set.

If I designate the genres to the three categories: Fun, Practical, and other apps; the amount of apps in each category will not be distributed equally. I must also note that this designation is somewhat subjective given that some apps may be commonly used for fun but certain users may use them for productivity. For example, influencers may use social networking apps for both fun and practical uses, similar to how musicians may use music apps for both uses. For this reason I've created an Other category and tried to think of mass use to determine classification.  

Still, it is evident that the App Store (English apps) is dominated by apps that are designed for fun (games, entertainment, social networking, music, etc.). On the other hand, apps with practical purposes (education, shopping, utilities, productivity, lifestyle, etc.) are less prominent. However, the fact that certain genres of apps are the most numerous doesn't also imply that they also have the greatest number of users — the demand might not be the aligned with the most abundant products available.

I will continue by examining the `'Genres'` and `'Category'` columns of the Google Play data set, because both columns which appear to be related.

In [None]:
display_table(android_final, 1) # Category

![](Apps_GP_Prime_Genre_1.png)
![](Apps_GP_Prime_Genre_2.png)

The proportion of genres seems significantly different on Google Play. There are not that many apps designed for fun as in the Apple Store, and it seems that a good number of apps are available for practical purposes (family, tools, business, lifestyle, productivity, etc.). However, if I take a close look at the Google Play Store I can see that the family category, accounting for almost 19% of the apps, shows mostly games for kids.

![](GooglePlayImage2.png)

Yet, practical apps seem to have a better representation on Google Play compared to App Store. This, is also confirmed by the frequency table for the `'Genres'` column:

In [None]:
display_table(android_final, -4)

The difference between the `'Genres'` and the `'Category'` columns is not completely clear, but one thing I notice is that the `'Genres'` column is more detailed because it has more categories. Because I'm only looking for the bigger picture at the moment, I will only work with the `'Category'` column moving forward.

In sum, I have found that the App Store is dominated by apps designed for fun, while Google Play shows a more diverse representation of both practical and for-fun apps. Now, I would like to get an idea about the kind of apps that have the most users.

### 8. Analyze App Store and Google Play Markets Most Popular Apps by Genre

One way to find out what genres have the most users, is to calculate the average number of installs for each app genre. For the Google Play data set, I can find this information in the `'Installs'` column, but for the App Store data set this information is missing. As a workaround, I can take the total number of user ratings as a proxy, which I can find in the `'rating_count_tot'` app.

Below, I calculate the average number of user ratings per app genre on the App Store:

In [None]:
genres_ios = freq_table(ios_final, -5)

for genre in genres_ios:
    total = 0
    len_genre = 0
    for app in ios_final:
        genre_app = app[-5]
        if genre_app == genre:            
            n_ratings = float(app[6])
            total += n_ratings
            len_genre += 1
    avg_n_ratings = total / len_genre
    print(genre, ':', avg_n_ratings)

Navigation apps have the highest number of user reviews, but this figure is heavily influenced by Waze and Google Maps, which have close to half a million user reviews together:

In [None]:
for app in ios_final:
    if app[-5] == 'Navigation':
        print(app[2], ':', app[6]) # print name and number of ratings

In [None]:
for app in ios_final:
    if app[-5] == 'Social Networking':
        print(app[2], ':', app[6])

Similar to the navigation apps, the same pattern applies to social networking apps, where the average number of user reviews is heavily influenced by a few giants like Facebook, Pinterest, Skype, WhatsApp, Messenger etc. Similarly, music apps reviews are heavily influenced by big players like Pandora, Spotify, and Shazam heavily influence the average number.

My aim is to find popular genres, but navigation, social networking or music apps might seem more popular than they actually are. The average number of ratings seem to be skewed by very few apps which have 100,000+ user ratings, while the other apps don't meet the 10,000 threshold. One way to address this is to remove these extremely popular apps from each genre and then rework the averages.

First, I will focus on another group of popular apps. Reference apps have 74,942 user ratings on average, but it's actually the Bible and Dictionary.com which skew up the average rating:


In [None]:
for app in ios_final:
    if app[-5] == 'Reference':
        print(app[2], ':', app[6])

This niche seems to show some potential. The reviews show some consistency for the first 8 most popular apps. The App Store is dominated by for-fun apps. This suggests that the market might be saturated with for-fun apps, making it difficult to compete with. As a result, a practical app might have more of a chance to stand out among the huge number of apps on the App Store.

One suggestion is to make an app that takes popular books and includes features in supplement to the raw version of the book. This could include a summary of the book, a discussion of controversial topics in the book, daily quotes from the book, an audio version of the book, commentary from the author of the book, quizzes or crossword puzzles and games about the book, etc. Additionally, I could also embed a dictionary within the app, so users can access definitions without needing to exit our app to look up words in an external app.

Other genres that seem popular include: weather, book, food and drink, and finance. The book genre overlaps with the app idea I described above, and reinforces the potential possibility for user engagement in our suggested app. In addition, I will not focus on those apps in my recommentation because of the following:

- <b>Weather apps:</b>  generally, users don't spend too much time engaging with the app. As a result, the chances of making a profit from in-app adds are low. In addition, getting reliable live weather data may require the app to connect to non-free APIs.

- <b>Food and drink:</b>  examples here include Starbucks, Dunkin' Donuts, McDonald's, etc. Therefore, making a popular food and drink app may require an established cooking and a delivery service, which is outside the scope of this analysis.

- <b>Finance apps:</b>  these apps involve banking, paying bills, money transfer, etc. Building a finance app requires domain knowledge that I would refer to a financial expert which adds more expenses and additional research. This is also outside the scope of this analysis. 

Now I will analyze the Google Play market.

For the Google Play market, the data includes the number of installs, so genre popularity should be more evident in this column. Unfortunately, the install numbers don't seem precise enough — most values are open-ended (100+, 1,000+, 5,000+, etc.):

In [None]:
display_table(android_final, 5) # the Installs columns

One this to notice with this data is that it is not precise. For example, an app with 100,000+ installs can have 100,000 installs, 200,000, or 350,000. Even though, I don't need very precise data for this project, I do want to  foreground the genres attract the most users. Fortunately, this doesn't require perfect precision with respect to the number of users for each genre. 

Because I will consider the numbers as they are, an app with 100,000+ installs will be assumed to have 100,000 installs, and an app with 1,000,000+ installs will be assumed to have 1,000,000 installs, and so on.

I will need to perform computations on these numbers. As a result, I will convert each install number to a float. This requires that I remove the commas and the plus characters. Below I will do this, and also compute the average number of installs for each genre.

In [None]:
categories_android = freq_table(android_final, 1)

for category in categories_android:
    total = 0
    len_category = 0
    for app in android_final:
        category_app = app[1]
        if category_app == category:            
            n_installs = app[5]
            n_installs = n_installs.replace(',', '')
            n_installs = n_installs.replace('+', '')
            total += float(n_installs)
            len_category += 1
    avg_n_installs = total / len_category
    print(category, ':', avg_n_installs)

Even though, communication apps have the most installs: 38,456,119; this number is heavily skewed up by a few apps that have over one billion installs (WhatsApp, Facebook Messenger, Skype, Google Chrome, Gmail, and Hangouts).

In [None]:
for app in android_final:
    if app[1] == 'COMMUNICATION' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

These apps make less that 50% of the communication population apps. I can remove all the communication apps that have over 100 million installs. Evidently, the average would be reduced about ten times:

In [None]:
under_100_m = []

for app in android_final:
    n_installs = app[5]
    n_installs = n_installs.replace(',', '')
    n_installs = n_installs.replace('+', '')
    if (app[1] == 'COMMUNICATION') and (float(n_installs) < 100000000):
        under_100_m.append(float(n_installs))
        
print(sum(under_100_m) / len(under_100_m))
print(len(under_100_m))

The same applies to the video players category, which is next in line, with 24,727,872 installs. The Android market is dominated by apps like Youtube, and Google Play Movies & TV. Similarly, social app installs are dominated by Facebook, Instagram, and Google+; photography apps by Google Photos and other popular photo editors; and productivity apps by Microsoft Word, Dropbox, Google Calendar, Evernote, etc.

All of this suggests that these app genres might seem more popular than they actually are. Additionally, these niches are dominated by a few giants, and this is hard to compete against.

Even though the game genre seems pretty popular, I found that this part of the market seems a bit saturated. I would like to come up with a different app recommendation.

This takes me to the books and reference genre. This genre looks fairly popular with 8,767,811 average number of installs. My analysis suggests that this genre is worth exploring beacause it has some potential to work well on the App Store. Because my aim is to recommend an app genre that shows potential for being profitable on both the App Store and Google Play, I will also look at the data from this genre on the Android dataset.

In [None]:
for app in android_final:
    if app[1] == 'BOOKS_AND_REFERENCE':
        print(app[0], ':', app[5])

The book and reference genre includes a variety of apps: software for processing and reading ebooks, various collections of libraries, dictionaries, tutorials on programming or languages, etc. Yet, there's a small number of extremely popular apps that skew the average. 

In [None]:
for app in android_final:
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000,000+'
                                            or app[5] == '500,000,000+'
                                            or app[5] == '100,000,000+'
                                            or app[5] == '50,000,000+'):
        print(app[0], ':', app[5])

Only 5 apps have more 100,000,000+ installs. I could argue that this is a good market to explore to make a recommendation because it is not heavily saturated, not heavily dominated by certain apps, and there is enough popularity to suggest that it is worth looking into. To begin, I will try to get some app ideas based on the kind of apps that are somewhere in the top (between 100,000,000 and 1,000,000,000 downloads) and middle in terms of popularity (between 1,000,000 and 100,000,000 downloads).

In [None]:
for app in android_final:
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '100,000,000+'
                                            or app[5] == '1,000,000,000+'):
        print(app[0], ':', app[5])
        
print('\n')
        
for app in android_final:
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000+'
                                            or app[5] == '5,000,000+'
                                            or app[5] == '10,000,000+'
                                            or app[5] == '50,000,000+'):
        print(app[0], ':', app[5])

This genre seems to be dominated by software for processing and reading a wide variety of ebooks, as well as various collections of libraries and dictionaries. Even though I may recommend building an app that will not intend to compete with the dominant apps of this genre, I would suggest including some of the most attractive features of these apps while also incorporating a new niche in itself into this new app.

It is also clear that popular books like the Bible and Quran pull a big following and as a result, one may consider building an app around a popular book that can be profitable. The data shows that taking a popular book and turning it into an app could be profitable for both the Google Play and the App Store markets.

I must note that my analysis shows that the market is already saturated with libraries. To supplement this, I recommend adding some special features besides the raw version of the book. This might include, better branding and user-interface, daily quotes from the book, an audio version of the book, quizzes on the book, a forum where people can discuss the book, etc.

## 9. Conclusion

In this project, I analyzed free available data on the App Store and Google Play mobile apps. My goal was to make an analysis to support my recommendation of an app profile that can be profitable for both markets.

I conclude that given the lack of dominant competition and becaususe of the popularity of language, library, and other learning and education related applications, a reasonable recommendation for a new app seeking user engegement would be to develop an app that will provide the user with access to desireable information in written, audio, visual and interactive form, in the reference genre. For example, it may be profitable to take a popular book (perhaps a recent bestseller, Booker Prize winner, one that has been adapted into film or TV, or a "classic") and develop an app that incorporates new and various ways for the user to engage with its content. Because both markets have multiple libraries, there is a need to add some special features to supplement the raw version of the books or information. This could include: better marketing, branding and user-interface, commentary from the author, embedded dictionary, highlighting and comment tools, summary, reviews, analysis, daily quotes from a book, an audio version of a book, crossword puzzles or quizzes on the book, a forum where people can discuss the book, an interactive bookclub feature, etc.

As a result, creating an app in this way may result in favorable ratings and user engagement, which could translate in ad revenue.