# Analyzing Sucessful Apps on App Stores
This project will look through a data set of popular (and unpopular) apps on the apple store and google play store and see what apps have the highest chance of 
generating profits

In [None]:
from csv import reader
apple_data = list(reader(open('Datasets\AppleStore.csv', encoding= 'utf8')))
google_data = list(reader(open('Datasets\googleplaystore.csv', encoding = 'utf8')))

In [None]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

# Issues With the Google Playstore Dataset
---
The google play store dataset has a faulty row at the 10474th row, there's a missing parameter causing one of the variables to become NaN. This should just be thrown out since this can cause some issues with parsing the data. The other issue with the dataset is the amount of duplicates there are in the csv, around 1181 duplicates. We will keep the apps with the most amount of ratings and throw away the rest.

In [None]:
 if google_data[10473][0] == 'Life Made WI-Fi Touchscreen Photo Frame':
     del google_data[10473]



In [None]:
reviews_max = {}

for app in google_data[1:]:
    name = app[0]
    n_reviews = float(app[3])
    if name in reviews_max and n_reviews >= reviews_max[name]:
        reviews_max[name] = n_reviews
    elif name not in reviews_max:
        reviews_max[name] = n_reviews

The code below creates a new list that has the data entries with the most amount of user reviews for the duplicate apps, creating 9659 entries

In [None]:
android_clean = []
already_added = []

for app in google_data[1:]:
    name = app[0]
    n_reviews= float(app[3])
    if reviews_max[name] == n_reviews and name not in already_added:
        android_clean.append(app)
        already_added.append(name)

In [None]:
def english_text(string):
    faulty_chars = 0
    for char in string:
        if ord(char) > 127:
            faulty_chars += 1
            if faulty_chars > 3:
                return False
    return True


In [None]:
eng_android_clean = []
eng_apple_clean = []

for app in android_clean:
    if english_text(app[0]):
        eng_android_clean.append(app)

for app in apple_data[1:]:
    if english_text(app[2]):
        eng_apple_clean.append(app)

In [None]:
final_android_set = []
final_apple_set = []

for app in eng_android_clean:
    if app[7] == '0':
        final_android_set.append(app)
for app in eng_apple_clean:
    if app[5] == '0':
        final_apple_set.append(app)

In [None]:
common_android_genres = {}
common_apple_genres = {}

for app in final_android_set:
    if app[1] in common_android_genres:
        common_android_genres[app[1]] += 1
    else:
        common_android_genres[app[1]] = 1
for app in final_apple_set:
    if app[12] in common_apple_genres:
        common_apple_genres[app[12]] += 1
    else:
        common_apple_genres[app[12]] = 1

print (common_android_genres, common_apple_genres)

In [None]:
def freq_table(dataset,index):
    frequency_table = {}
    print(len(dataset))
    for app in dataset:
        if app[index] in frequency_table:
            frequency_table[app[index]] += 1
        else:
            frequency_table[app[index]] = 1
    for frequency in frequency_table:
        frequency_table[frequency] = (frequency_table[frequency]/len(dataset)) * 100
    return frequency_table

def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])
    print('\n')

display_table(final_android_set,1) #Category
display_table(final_android_set,9) #Genres
display_table(final_apple_set, 12) #prime genre

# Analyzing the most popular genres on each app
We can see the most demanded apps on apple app store that are both in english and free are:

-Games(58%)

-Entertainment(8%)

-Photo and Video(5%)

---
The apple app store has more game apps by percentage than any other genre by a large margin. Now, this doesn't imply that there are a ton of users in the gaming genre for apple app stores, as game apps are probably easier to make than the other types of apps, but there certainly is a market for them

The google playstore:

-Family(19%)

-Game(10%)

-Tools(8.5%)

It seems that the google playstore is more balanced between having fun and productivity, meaning that a developer or company could develop an app for either side and still come out sucessful

In [58]:
freq_prime_genre = freq_table(final_apple_set,12)
print("Average number of ratings for each genre")
for genre in freq_prime_genre:
    total = 0
    len_genre = 0
    for app in final_apple_set:
        genre_app = app[12]
        if genre_app == genre:
            len_genre += 1
            total += float(app[6])
    print(genre + ": " + str( total/len_genre ))


3222
Average number of ratings for each genre
Productivity: 21028.410714285714
Weather: 52279.892857142855
Shopping: 26919.690476190477
Reference: 74942.11111111111
Finance: 31467.944444444445
Music: 57326.530303030304
Utilities: 18684.456790123455
Travel: 28243.8
Social Networking: 71548.34905660378
Sports: 23008.898550724636
Health & Fitness: 23298.015384615384
Games: 22788.6696905016
Food & Drink: 33333.92307692308
News: 21248.023255813954
Book: 39758.5
Photo & Video: 28441.54375
Entertainment: 14029.830708661417
Business: 7491.117647058823
Lifestyle: 16485.764705882353
Education: 7003.983050847458
Navigation: 86090.33333333333
Medical: 612.0
Catalogs: 4004.0



# Average number of ratings for each genre for the apple store
-Productivity: 21028.410714285714

-Weather: 52279.892857142855

-Shopping: 26919.690476190477

-Reference: 74942.11111111111

-Finance: 31467.944444444445

-Music: 57326.530303030304

-Utilities: 18684.456790123455

-Travel: 28243.8

-Social Networking: 71548.34905660378

-Sports: 23008.898550724636

-Health & Fitness: 23298.015384615384

-Games: 22788.6696905016

-Food & Drink: 33333.92307692308

-News: 21248.023255813954

-Book: 39758.5

-Photo & Video: 28441.54375

-Entertainment: 14029.830708661417

-Business: 7491.117647058823

-Lifestyle: 16485.764705882353

-Education: 7003.983050847458

-Navigation: 86090.33333333333

-Medical: 612.0

-Catalogs: 4004.0

From these results, it seems like creating a social networking app or possibly a reference app is your best bet. They have the highest average user ratings and they seem to be easier to develop and break into than other types of app categories

In [63]:
freq_categories = freq_table(final_android_set,1)
print("Average number of installs for each category")
for category in freq_categories:
    total = 0
    len_category = 0
    for app in final_android_set:
        category_app = app[1]
        if category_app == category:
            numInstalls = app[5]
            numInstalls = numInstalls.replace('+','')
            numInstalls = numInstalls.replace(',','')
            total += float(numInstalls)
            len_category += 1
    print("- " + category + ": " + str(total/len_category) + '\n')

8864
Average number of installs for each category
- ART_AND_DESIGN: 1986335.0877192982

- AUTO_AND_VEHICLES: 647317.8170731707

- BEAUTY: 513151.88679245283

- BOOKS_AND_REFERENCE: 8767811.894736841

- BUSINESS: 1712290.1474201474

- COMICS: 817657.2727272727

- COMMUNICATION: 38456119.167247385

- DATING: 854028.8303030303

- EDUCATION: 1833495.145631068

- ENTERTAINMENT: 11640705.88235294

- EVENTS: 253542.22222222222

- FINANCE: 1387692.475609756

- FOOD_AND_DRINK: 1924897.7363636363

- HEALTH_AND_FITNESS: 4188821.9853479853

- HOUSE_AND_HOME: 1331540.5616438356

- LIBRARIES_AND_DEMO: 638503.734939759

- LIFESTYLE: 1437816.2687861272

- GAME: 15588015.603248259

- FAMILY: 3695641.8198090694

- MEDICAL: 120550.61980830671

- SOCIAL: 23253652.127118643

- SHOPPING: 7036877.311557789

- PHOTOGRAPHY: 17840110.40229885

- SPORTS: 3638640.1428571427

- TRAVEL_AND_LOCAL: 13984077.710144928

- TOOLS: 10801391.298666667

- PERSONALIZATION: 5201482.6122448975

- PRODUCTIVITY: 16787331.3449275

# Average number of installs for each category

- ART_AND_DESIGN: 1986335.0877192982

- AUTO_AND_VEHICLES: 647317.8170731707

- BEAUTY: 513151.88679245283

- BOOKS_AND_REFERENCE: 8767811.894736841

- BUSINESS: 1712290.1474201474

- COMICS: 817657.2727272727

- COMMUNICATION: 38456119.167247385

- DATING: 854028.8303030303

- EDUCATION: 1833495.145631068

- ENTERTAINMENT: 11640705.88235294

- EVENTS: 253542.22222222222

- FINANCE: 1387692.475609756

- FOOD_AND_DRINK: 1924897.7363636363

- HEALTH_AND_FITNESS: 4188821.9853479853

- HOUSE_AND_HOME: 1331540.5616438356

- LIBRARIES_AND_DEMO: 638503.734939759

- LIFESTYLE: 1437816.2687861272

- GAME: 15588015.603248259

- FAMILY: 3695641.8198090694

- MEDICAL: 120550.61980830671

- SOCIAL: 23253652.127118643

- SHOPPING: 7036877.311557789

- PHOTOGRAPHY: 17840110.40229885

- SPORTS: 3638640.1428571427

- TRAVEL_AND_LOCAL: 13984077.710144928

- TOOLS: 10801391.298666667

- PERSONALIZATION: 5201482.6122448975

- PRODUCTIVITY: 16787331.344927534

- PARENTING: 542603.6206896552

- WEATHER: 5074486.197183099

- VIDEO_PLAYERS: 24727872.452830188

- NEWS_AND_MAGAZINES: 9549178.467741935

- MAPS_AND_NAVIGATION: 4056941.7741935486

Seems like making a new hit game or new video player seems like a good bet for the google play store, or possibly even making referece/book type on the app store may work well if the app was based around a popular book (like the Bible or another popular religious book)