# Mobile App Analysis:
#### aka: Profitable App Profiles for AppStore and Google Play Markets
--------------

* This is a Data Analysis project where I analyze two sources of data:

  1. The Apple Appstore dataset available at [Kaggle](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/version/2), and
  2. The Google Play store dateset available also at [Kaggle](https://www.kaggle.com/lava18/google-play-store-apps).


* The goal is to understand and identify the types of *free* mobile apps that are most likely to attract more users over time.


### First, general Exploration:

In [1]:
# This is a csv to list of lists func.:

def csv2lol(csvFile,trim_headers=0):
    
    '''
    Description:
        A "CSV" to "list of lists" Function,
        expects 2 arguments:
            1. Dataset CSV File Path+Name. (String)
            2. Should the headers be removed.
               (Boolean, Optional, Defaults to False)
    
    Usage:
        csv2lol("googleplaystore.csv")
        csv2lol("googleplaystore.csv",1)
    '''
    opFile = open(csvFile, encoding="utf8")
    from csv import reader
    rFile = reader(opFile)
    lol = list(rFile)
    if trim_headers:
        return lol[1:]
    else:
        return lol

In [2]:
# This is a data 'Scanner' func.:

def dataScan(dataset,start=0,end=0,general_info=1):
    
    '''
    Description:
        Data exploration Function that prints some data
        and some general info,
        expects 4 arguments:
            1. Dataset Name.
               (List of Lists)
            2. Where to start.
               (int, Optional)
            3. Where to End, sholud be greater than start.
               (int, Optional)
            4. Print General info?
               (Boolean, Optional, Defaults to True)
    
    Usage:
        dataScan(dataset)        Prints General Info Only
        dataScan(dataset,1,5,0)  Prints the first 4 rows only
    '''
    print('========================================')

    if general_info:
        print('Dataset General Info :\n====================')
        print('Columns = '+str(len(dataset[0])))
        print('Rows    = '+str(len(dataset))+'   (including headers if present)')
        print('\n')
        print('Dataset Header : ')
        print(dataset[0])
        print('\n')

    if start<end:
        print("Requested Data (%s Rows) :\n======================="%(end-start))
        for i in dataset[start:end]:
            print(i)
    else:
        print('No Data Requested.')
        print('to get some data set the start and end arguments, eg:')
        print('dataScan(dataset,1,5,0)  Prints the first 4 rows only')
    print('========================================')

In [3]:
# help(csv2lol)
# print('\n')
# help(dataScan)

# dataScan(csv2lol("googleplaystore.csv"))
# dataScan(csv2lol("AppleStore.csv"),0,0,0)

In [4]:
dataScan(csv2lol("googleplaystore.csv"),1,2)

Dataset General Info :
Columns = 13
Rows    = 10842   (including headers if present)


Dataset Header : 
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


Requested Data (1 Rows) :
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


In [5]:
dataScan(csv2lol("AppleStore.csv"),1,2)

Dataset General Info :
Columns = 16
Rows    = 7198   (including headers if present)


Dataset Header : 
['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


Requested Data (1 Rows) :
['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


### Next, Error Analysis:
- Find Apps w/ missing data points.
- Find Apps w/ empty data points.

In [6]:
# this Func. finds MISSING data points:

def missFinder(lol):
        
    '''
    Description:
        Prints rows with missing data along
        with their index, by comparing every
        row's length to header row's length,
        expects 1 argument:
            Dataset Name. (List of Lists)
               
    Usage:
        missFinder(Dataset_as_List_of_Lists)
    '''
    
    headlen = len(lol[0])
    for i in lol:
        if len(i) != headlen:
            print('Found Row with index number : ',end='')
            print(lol.index(i))
            print(i)

In [7]:
missFinder(csv2lol("googleplaystore.csv"))

Found Row with index number : 10473
['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


In [8]:
missFinder(csv2lol("AppleStore.csv"))

In [9]:
# this Func. finds EMPTY data points:

def empFinder(lol):
        
    '''
    Description:
        Prints rows with empty data points along
        with their index,
        expects 1 argument:
            Dataset Name. (List of Lists)
               
    Usage:
        empFinder(Dataset_as_List_of_Lists)
    '''
    
    for i in lol:
        for j in i:
            if not len(j):
                print('Found Row with index number : ',end='')
                print(lol.index(i))
                print(i)

In [10]:
empFinder(csv2lol("googleplaystore.csv"))

Found Row with index number : 1554
['Market Update Helper', 'LIBRARIES_AND_DEMO', '4.1', '20145', '11k', '1,000,000+', 'Free', '0', 'Everyone', 'Libraries & Demo', 'February 12, 2013', '', '1.5 and up']
Found Row with index number : 10473
['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


In [11]:
empFinder(csv2lol("AppleStore.csv"))

### Next, Data Cleaning:
- Remove Apps w/ missing data.
- Remove non-English apps.
- Remove non-free apps.
- Remove Duplicates.

In [12]:
# search for repeated App data

apps_data = csv2lol("googleplaystore.csv")

for x in apps_data:
    if x[0] == 'Subway Surfers':
        print(apps_data.index(x))


1655
1701
1751
1873
1873
3897


# ^ This Has A bug :(