# Recommending apps which can be profitable on the play store and the app store

The goal of this notebook is to come up with suggestions on what apps to build which can be profitable on the play store and the app store. 

Before begining the project, the assumption is that this recommnendation would be for only free apps - apps which do not need any money to be downloaded and where the primary source of revenue is via ads. Greater the number of users who see and engage with the ads, the better. Hence the notebook would help the users and developers understand what type of apps can attract more users.

## Exploring the data

Since there are millions of app on both these platform (2 million iOS app and 2.1 million Android app as of Aug 2018,  [Source](https://www.statista.com/statistics/276623/number-of-apps-available-in-leading-app-stores/)), rather than spending time and money to collect data about all of them, we would do our study on any relevant publicly avaiable data set. 

Below are two examples of such datasets
* [~10k android apps collected in Aug, 2018](https://dq-content.s3.amazonaws.com/350/googleplaystore.csv)
* [~7k iOs apps collected in July 2017](https://dq-content.s3.amazonaws.com/350/AppleStore.csv)

For our research, this data would suffice.

Let us see how the data looks.

In [2]:
# Read the data
from csv import reader

# Reading iOS Data
iosData = open("AppleStore.csv", encoding="utf8")
iosDataReader = reader(iosData)
iosApps = list(iosDataReader)

# Reading android Data
androidData = open("googleplaystore.csv", encoding="utf8")
androidDataReader = reader(androidData)
androidApps = list(androidDataReader)

In [5]:
# Exploring the dataset
def explore_data(dataset, start, end, rows_and_columns=False):
    '''
        Function to explore a dataset in the form of list of lists
        
        Parameters
            dataset - list of list
            start - start index of the data set which we want to see
            end - end index of the data set
            rows_and_columns - if this is True, then the function will
            print the total number of rows and columns as well; defualt
            is False
            
        Returns
            None
    '''
    dataset_slice = dataset[start:end]
    for row in dataset_slice:
        print(row, "\n")
    
    if rows_and_columns:
        print("Number of rows : ", len(dataset))
        print("Number of columns : ", len(dataset[0]))

In [6]:
# Let us see the the first 3 rows in iOS data and also want to check
# the total number of rows and columns 
explore_data(iosApps, 0, 3, True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic'] 

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1'] 

['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1'] 

Number of rows :  7198
Number of columns :  16


Looking at the above output, we can see the first row is actually the header or the meta data. Let us save that in a separate headers list.

In [7]:
iosHeaders = iosApps[0]
print(iosHeaders)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


So there are 7197 (7198 - 1 as 1st row is the header) rows and
16 columns or properites of the data.  Let us do the same for android data now.

In [8]:
explore_data(androidApps, 0, 3, True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'] 

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'] 

['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up'] 

Number of rows :  10842
Number of columns :  13


In [9]:
androidHeaders = androidApps[0]
print(androidHeaders)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


While the column names are quite descriptive for Android, they may not be so for iOS. Feel free to go through the documentation for [ios](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps), as well as [android](https://www.kaggle.com/lava18/google-play-store-apps) data, to know more.

In [10]:
print(iosHeaders)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


Looking at both the headers, we can get a good idea about the type of data. The fields which seem import for iOS are 
* currency
* price
* rating_count_tot
* user_rating
* prime_genre
* cont_rating

The fields which seem important for Android are
* Category
* Rating
* Reviews
* Installs
* Type
* Price
* Content Rating
* Genres
