# Profitable App Profiles for IOS and Android App Store

In this project, I will analyze data from IOS and Android App Stores to figure out if there is a trend of applications that is more well-received among users.

We will be using data from the IOS Appstore and the Android Google Play Store.

The goal for this project is to recommend an app profile that can be profitable in both the Apple App Store and the Android Google Play Store.

Our company generates revenue mainly off in-app ads, hence the number of users would be directly linked to our revenue generated. Therefore, it is important to analyse app profiles that attracts the most number of users.

Links to the dataset can be obtained from here:

[Apple Store](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps)

[Android Google Play Store](https://www.kaggle.com/lava18/google-play-store-apps)

## Import the Data Sets

Firstly, we have to open the data sets which we have found as seen from above.

In [1]:
from csv import reader 

openedfile = open('googleplaystore.csv')
readfile = reader(openedfile)
android = list(readfile)


openedfile = open('AppleStore.csv')
readfile = reader(openedfile)
ios = list(readfile)


UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 2755: character maps to <undefined>

To create ease in exploring our data sets as we progress along, I am creating a function that we can repeatedly use so as to reduce the amount of time spent as well as code written.

For starters, we will explore the IOS data set.

In [None]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

explore_data(ios, 0, 5, True)

We can see that the number of rows in the IOS app store is 7198, which includes the header, and 16 columns.

Let's take a look at the Android Google Play Data Set.

In [None]:
explore_data(android,0,5,True)

Important columns that could potential aid us in our analysis are 

1. App Name
2. Price
3. Average User Rating Count
4. Language 
5. Age
6. Genre

You can find a clearer documentation of the column names [here](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps)

In the [discussion section](https://www.kaggle.com/lava18/google-play-store-apps/discussion), it is mentioned by users that there is a missing input in line 10473. Let's investigate if this is true.

In [None]:
print(android[0])
print('\n')
print(android[10473])

It can be observed in the 3rd reading that the `Rating` is 19. However, it is clear that the maximum rating for Google Play Store is 5. Hence, there is an incorrect data inputted.

I shall delete the row with the incorrect data present

In [None]:
del android[10473]

Let's check for any duplicate values present in the data set.

In [None]:
unique_entry = []
duplicate_entry = []
for row in android:
    app = row[0]
    if app in unique_entry:
        duplicate_entry.append(app)
    else:
        unique_entry.append(app)
print('Number of duplicate apps:', len(duplicate_entry))

Duplicates present in the data set would not be removed at random. 
I will use the number of reviews as the criteria when removing duplicates. Same application with more number of reviews signifies a later version and hence would be kept within the data set.

To achieve our aim, we will 
- Create a dictionary where each key is a unique app name, and the value is the highest number of reviews of that app.
- We will use the dictionary to create a new data set, which will have only one entry per application - the one with the highest number of reviews. 

In [None]:
reviews_max = {}

for row in android[1:]:
    name = row[0]
    n_reviews = float(row[3])
    
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
        
    elif name not in reviews_max:
        reviews_max[name] = n_reviews

print('Expected Length:', len(android[1:]) - 1181)
print('Actual Length: ',len(reviews_max))

In [None]:
android_clean = []
already_added = []

for row in android[1:]:
    name = row[0]
    n_reviews = float(row[3])
    
    if (n_reviews == reviews_max[name]) and (name not in already_added):
        android_clean.append(row)
        already_added.append(name)
        
explore_data(android_clean, 0,4, True)

Check the number of duplicate applications in the Apple Store

In [None]:
unique_entry1 = []
duplicate_entry1 = []
for row in ios[1:]:
    app = row[0]
    if app in unique_entry:
        duplicate_entry1.append(app)
    else:
        unique_entry1.append(app)
print('Number of duplicate apps:', len(duplicate_entry1))

No duplicate entries found in the Apple Store

Now, we will proceed to remove non-English applications as we only intend to produce applications for the English audience

In [None]:
def englishcheck(language):
    for letter in language:
        if ord(letter) > 127:
            return False
        
    return True

print(englishcheck('Instagram'))
print(englishcheck('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(englishcheck('Docs To Go™ Free Office Suite'))
print(englishcheck('Instachat 😜'))

As seen, `False` is returned even though the name of the application inputted is in English. This is due to the emoji and "tm" sign not being recognised by the system, hence returning us with an incorrect boolean. We will amend the function created to prevent this problem from arising.

In [None]:
def englishcheck(language):
    ascii = 0
    for letter in language:
        if ord(letter) > 127:
            ascii += 1
            if ascii > 3:
                return False
    return True

print(englishcheck('Instagram'))
print(englishcheck('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(englishcheck('Docs To Go™ Free Office Suite'))
print(englishcheck('Instachat 😜'))

Using this newly edited function, I will filter out the rows of English Applications

In [None]:
android_english = []

for app in android_clean:
    name = app[0]
    if englishcheck(name):
        android_english.append(app)
    
explore_data(android_english, 0,5, True)

In [None]:
ios_english = []
for row in ios[1:]:
    name = row[1]
    if englishcheck(name):
        ios_english.append(row)

In [None]:
explore_data(ios_english, 0,3,True)

We can see that we're left with 9614 Android apps and 6183 iOS apps.

Since we are only interested in building free apps, it is not of our concern to analyze paid apps. Therefore, it is of best interest to isolate the free apps in the data set. 

In [None]:
print(android[0])
print(ios[0])

In [None]:
print(android_english[:3])

IOS price index - 4

Android price index - 7 


In [None]:
android_free = [] 

for row in android_english:
    price = row[7]
    
    if price == '0':
        android_free.append(row)
        
explore_data(android_free, 0, 3 , True)


In [None]:
ios_free  = []

for row in ios_english:
    price = row[4]
    
    if price == '0.0':
        ios_free.append(row)
        
explore_data(ios_free, 0, 3 , True)

Now, we can see that we are left with 3222 IOS apps and 8864 Android apps for our analysis

To ensure our app is profitable, we will make use of a validation strategy. 

Our final goal is make an app which would be successful on both markets - IOS and Android. Therefore, we are firstly going to launch a minimal verison of the app on the Google Play Store, if the app is well-received among the users, we then develop it further. If we see profits after a span of 6 months, we will then build an IOS version of the app and add it on the App Store. 

Thus, we will now analyze apps that are profitable on both the Android and IOS application stores. We will get started by highlighting the genres more comonly found in each market. 


In [None]:
print(ios[0])
print(android[0])


The index of the genre for both platforms are

IOS - 11 `prime_genres`

Android 
-4 `Genre`
 1 `Category`

Now, we are going to build a table to view the percentages of genres present in the respective application stores. We will of a frequency table to carry this process out. Afterwards, we will make use of a list of tuples to sort this frequency table to view the percentages in a descending order. 

In [None]:
def freq_table(dataset, index):
    table ={}
    for row in dataset:
        name = row[index]
        if name in table:
            table[name] += 1
        else: 
            table[name] = 1 
    return table

    

In [None]:
def display_table(dataset,index):
    table = freq_table(dataset,index)
    table_display = [] 
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)
        
    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

In [None]:
display_table(ios_free, 11)

In [None]:
display_table(android_free, -4)

In [None]:
display_table(android_free, 1)

**IOS APPLICATIONS**

Looking at the most common genres present in the IOS appstore, Games top the charts and Entertainment comes next. 

The number of game applications found in the IOS app store is significantly more than the rest of the other genres, having 1874 applications that fall under its category, while the category second to it, entertainment, only has 254 such applications. 

Most of the applications found in the IOS app store seems to be for entertainment purposes - the top 3 apps being for entertainment purposes with Games being more significantly produced then the other applications.

However, this frequency table may be inaccurate to a certain degree.
This is because the number of applications falling under a certain genre does not neccessarily signify that the demand of that application is higher than the rest. This could be due to a major application, for instance Netflix, occupying huge market share amongst the Entertainment applicatins, causing it to be difficult for new applications to enter the market, thus seeing less developers producing entertainment applications. Whereas, users may always be looking for new games on their devices, hence having the tendency to download more games. This increases the demand for games which cause more developers to produce more of it. 


**ANDROID APPLICATIONS**

Viewing the android application genres, there is a better spread amongst the different genres as compared to the IOS applications.  

The most common category in the android play store is family, with games in second place. 
The most common genre in the android play store is Tools, with entertainment tailing close behind in second place. 

Android play store seems to have more diversified options as each genre seems to possess many apps. This differs from the IOS store where applications other than the top few genres possess very few options. 

However, I am still unable to recommend an app profile based on the data I have gotten. The data that has been produce does not accurately show the popularity of the applications - more applications does not necessarily signify a higher popularity as mentioned earlier. Therefore, it is pertinent that we get data of which genre has the most users to accurately potray the market demand for each genre.

We will move on to calculate the number of users for each genre of application

# Most popular IOS apps by Genre

In [None]:
prime_genre_table = freq_table(ios_free,11)
print(prime_genre_table)

In [None]:
for genre in prime_genre_table:
    total = 0
    len_genre = 0 
    
    for row in ios_free:
        genre_app = row[11]
        if genre_app == genre:
            ratings = float(row[5])
            total += ratings
            len_genre += 1
            
    average_user_rating = total/len_genre
    print(genre, ':', average_user_rating)

In [None]:
for row in ios_free:
    if row[11] == 'Book':
        print (row[1],':', row[5])

I would recommend Book as the app profile of our new application.

The genre possesses nearly 40 thousand number of users on average which is substantial in relative to the other genres. There are few number of applications present under this genre, meaning we face fewer competition in this market, also there are few "giants" in this genre as compared to Social Networking where there are established platforms such as Facebook which makes it harder to compete with. As the main source of revenue for the application is in app advertisements, it is much easier to have advertisements present in book advertisements without causing users to be frustrated. 

# Most Popular Android Apps by Genre

We analyze the average number of users based on the different categories in the Play Store. We will edit our the install numbers and use the estimated values of installation to get the average results. This will cause our data gotten to be inaccurate to a small extent.

In [None]:
category_freqtable = freq_table(android_free, 1)
print(category_freqtable)

In [None]:
for category in category_freqtable:
    total  = 0
    len_category = 0 
    
    for row in android_free:
        category_app = row[1]
        if category_app == category:
            installs = row[5]
            installs = installs.replace('+', '')
            installs = installs.replace(',','')
            total += float(installs)
            len_category += 1
    
    average_user_install = total/len_category
    print(category, ':',average_user_install)

As our aim is to launch a product that would be profitable in both the IOS app store and the Google Play Store, let's take a look at the market for Books in the Google Play Store.

In [None]:
for row in android_free:
    if row[1] == 'BOOKS_AND_REFERENCE':
        print(row[0], ':', row[5])

This significantly more competition felt in the Google Play Store. This might be due to Android combine the Book category together with the Reference Catetgory of the IOS. Let's take a look at the major competitors

In [None]:
for row in android_free:
    if row[1] == 'BOOKS_AND_REFERENCE' and (row[5] == '1,000,000,000+'
                                           or row[5] == '500,000,000+'
                                           or row[5] == '100,000,000+'):
        print(row[0], ':', row[5])

Google Play Store has a huge competitor in Google Play Books that is seemingly doing significantly better than the other applications that fall under this category. This can cause it to be very difficult for our app to be profitable in this genre of the market. 