# Seeking profitable Android app profiles

Focusing on Android apps in top 3 languages: English (~9500 apps), Spanish (35 apps), and Japanese (10 apps). Keep in mind that the sample size we're using is only a small fraction of the millions of apps available.

ISA: Run the cleaning and language sorting scripts below.

In [1]:
from retrieve_csv import retrive_datafile
import csv

In [2]:
def extract_app_data(filename, input_folder, tag, 
                     encoding="utf8", header=True):
    print("Extacting {} app data...".format(tag))
    datafile = retrive_datafile(filename, input_folder)
    with open(datafile, 'r') as data:
        read_file = csv.reader(data)
        dataset = list(read_file)
        # Check if user has included data with a header
        if header:
            data_header = dataset[0]
            dataset = dataset[1:]
            return (data_header, dataset)
        else: 
            return (dataset)

In [3]:
data_folder = '../output/android/sorted-by-language/'
english_app_filename = 'english_apps.csv'
spanish_app_filename = 'spanish_apps.csv'
japanese_app_filename = 'japanese_apps.csv' 

# english_app_filepath = data_folder + english_app_filename
# spanish_app_filepath = data_folder + spanish_app_filename
# japanese_app_filepath = data_folder + japanese_app_filename

In [6]:
# extract_app_data(android_filename, data_folder, tag=android_tag)
english_app_dataset = extract_app_data(english_app_filename, data_folder, tag="English apps", header=False)
spanish_app_dataset = extract_app_data(spanish_app_filename, data_folder, tag="Spanish apps", header=False)
japanese_app_dataset = extract_app_data(japanese_app_filename, data_folder, tag="Japanese apps", header=False)

Extacting English apps app data...
Extacting Spanish apps app data...
Extacting Japanese apps app data...


In [7]:
android_english_free = []
price_index = 7
for app in english_app_dataset:
    price = app[7]
    if price == '0':
        android_english_free.append(app)
        
print("There are {} free English apps.".format( len(android_english_free)))

There are 8845 free English apps.


In [8]:
def frequency_table(dataset, index, display=True):
    freq_table = {}
    total  = 0
    
    for row in dataset:
        value = row[index]
        if value in freq_table:
            freq_table[value] += 1
        else:
            freq_table[value] = 1
        total += 1
            
    table_percentages = {}
    for key in freq_table:
        average = freq_table[key]/total
        percentage = average * 100
        table_percentages[key] = percentage
        
    if display:
        table_display = []
        for key in table_percentages:
            key_value_pair = (key, table_percentages[key])
            table_display.append(key_value_pair)
        sort_table = sorted(table_display, reverse=True)
        for pair in sort_table:
            key = pair[0]
            value = pair[1]
            print("{}: {:.2f}%".format(key, value))
    return table_percentages

In [9]:
genre_index = 1
android_genres = frequency_table(android_english_free, genre_index)

WEATHER: 0.80%
VIDEO_PLAYERS: 1.81%
TRAVEL_AND_LOCAL: 2.33%
TOOLS: 8.46%
SPORTS: 3.31%
SOCIAL: 2.65%
SHOPPING: 2.24%
PRODUCTIVITY: 3.91%
PHOTOGRAPHY: 2.96%
PERSONALIZATION: 3.32%
PARENTING: 0.66%
NEWS_AND_MAGAZINES: 2.77%
MEDICAL: 3.52%
MAPS_AND_NAVIGATION: 1.41%
LIFESTYLE: 3.91%
LIBRARIES_AND_DEMO: 0.94%
HOUSE_AND_HOME: 0.83%
HEALTH_AND_FITNESS: 3.08%
GAME: 9.70%
FOOD_AND_DRINK: 1.24%
FINANCE: 3.66%
FAMILY: 18.96%
EVENTS: 0.71%
ENTERTAINMENT: 0.96%
EDUCATION: 1.18%
DATING: 1.85%
COMMUNICATION: 3.24%
COMICS: 0.62%
BUSINESS: 4.61%
BOOKS_AND_REFERENCE: 2.17%
BEAUTY: 0.60%
AUTO_AND_VEHICLES: 0.93%
ART_AND_DESIGN: 0.66%


In [10]:
installs_index = 5
android_installs = frequency_table(android_english_free, installs_index)

500,000,000+: 0.27%
500,000+: 5.53%
500+: 3.21%
50,000,000+: 2.28%
50,000+: 4.80%
50+: 1.91%
5,000,000+: 6.83%
5,000+: 4.52%
5+: 0.79%
100,000,000+: 2.10%
100,000+: 11.55%
100+: 6.87%
10,000,000+: 10.57%
10,000+: 10.24%
10+: 3.55%
1,000,000,000+: 0.23%
1,000,000+: 15.74%
1,000+: 8.42%
1+: 0.51%
0+: 0.05%
0: 0.01%


In [11]:
for genre in android_genres:
    total = 0
    len_genre = 0
    for app in android_english_free:
        app_genre = app[genre_index]
        if app_genre == genre:
            num_installs = app[installs_index]
            num_installs = num_installs.replace(',', '')
            num_installs = num_installs.replace('+', '')
            total += float(num_installs)
            len_genre  += 1
    average_num_installs = total / len_genre
    print("{}: {:,.0f}".format(genre, average_num_installs))

ART_AND_DESIGN: 1,952,105
AUTO_AND_VEHICLES: 647,318
BEAUTY: 513,152
BOOKS_AND_REFERENCE: 8,155,944
BUSINESS: 1,708,216
COMICS: 817,657
COMMUNICATION: 38,456,119
DATING: 859,206
EDUCATION: 1,825,481
ENTERTAINMENT: 11,640,706
EVENTS: 253,542
FINANCE: 1,402,817
FOOD_AND_DRINK: 1,924,898
HEALTH_AND_FITNESS: 4,204,222
HOUSE_AND_HOME: 1,331,541
LIBRARIES_AND_DEMO: 638,504
LIFESTYLE: 1,452,527
GAME: 15,542,732
FAMILY: 3,607,021
MEDICAL: 119,718
SOCIAL: 23,450,260
SHOPPING: 7,067,367
PHOTOGRAPHY: 17,772,019
SPORTS: 3,679,626
TRAVEL_AND_LOCAL: 14,051,913
TOOLS: 10,763,428
PERSONALIZATION: 5,201,480
PRODUCTIVITY: 16,738,958
PARENTING: 542,604
WEATHER: 5,074,486
VIDEO_PLAYERS: 24,573,948
NEWS_AND_MAGAZINES: 9,667,594
MAPS_AND_NAVIGATION: 4,025,282


## Exploring with Tableau
To interact with the data viz displayed below, [click on this link]( https://public.tableau.com/views/Dataquest-Project1/Dashboard1?:language=en&:display_count=y&publish=yes&:origin=viz_share_link).

<img src="../output/Tableau/Dashboard.png">

## Conclusions

Android apps tend to be dominated by games, communication (e.g., WhatsApp) and social media. Taking this into consideration, as well as the increasing normalization of remote work, we want to consdier genres such as productivity, books, and education. Developing an app that draws elements from multiple genres, for example an app that serves as an educational comic/manga reference (and could even be companion apps to existing books such as ["The Manga Guide to Physics"](https://www.goodreads.com/book/show/6291415-the-manga-guide-to-physics)), would be one way to tap into a potentially growing market.