# Profitable App Profiles for the App Store and Google Play Markets

This project will help to get the better understanding on how many of the app users that see and engage with the in-app ads. The goal of this project is to analyze data to give developers insight on what type of apps are likely to attract more users.

Prerequisite:
* The apps name should be in English
* The apps are free

**Load the data**

In [1]:
from csv import reader

In [2]:
def load_data(dataset, num_of_rows, header=True):
    opened_file = open(dataset, encoding='utf8')
    df = list(reader(opened_file))
    df_slice = df[:num_of_rows+1]
    if header:
        return df_slice[0], df_slice[1:]
    return df_slice

In [3]:
googlestore_df = load_data('googleplaystore.csv', 5000)
applestore_df = load_data('AppleStore.csv', 5000)

In [4]:
googlestore_df_value = googlestore_df[1]

In [5]:
googlestore_df[0]

['App',
 'Category',
 'Rating',
 'Reviews',
 'Size',
 'Installs',
 'Type',
 'Price',
 'Content Rating',
 'Genres',
 'Last Updated',
 'Current Ver',
 'Android Ver']

In [6]:
def extract_data(data, index):
    column = {}
    for row in data:
        cat = row[index]
        if cat in column:
            column[cat] += 1
        else:
            column[cat] = 1   
    return column

In [7]:
extract_data(googlestore_df_value, 6)

{'Free': 4682, 'Paid': 318}

In [8]:
header = googlestore_df[0]
check_total = {}
temp = []
temp2 = []

for i in range(len(header)):
    for value in extract_data(googlestore_df_value, i).values():
        temp.append(value)  
    temp2.append(sum(temp))
    temp = []
    
for idx1 in header:
    for idx2 in temp2:
        check_total[idx1] = idx2

check_total

{'App': 5000,
 'Category': 5000,
 'Rating': 5000,
 'Reviews': 5000,
 'Size': 5000,
 'Installs': 5000,
 'Type': 5000,
 'Price': 5000,
 'Content Rating': 5000,
 'Genres': 5000,
 'Last Updated': 5000,
 'Current Ver': 5000,
 'Android Ver': 5000}

No missing value

**Data cleaning**

- Unidentified symbols

In [9]:
string1 = 'asdf'
string2 = '1234'
string3 = '日本語'
string4 = '®'
string5 = "Lep's World 2 🍀🍀"
string6 = '-'
string7 = '–'
string8 = 'Röhrich Werner Soundboard'
string9 = '’'

print(string1.isascii())
print(string2.isascii())
print(string3.isascii())
print(string4.isascii())
print(string5.isascii())
print(string6.isascii())
print(string7.isascii())
print(string8.isascii())
print(string9.isascii())

True
True
False
False
False
True
False
False
False


In [10]:
temp = 0
for row in googlestore_df_value:
    if not row[0].isascii():
        temp += 1
temp

259

In [11]:
for row in googlestore_df_value:
    if not row[0].isascii():
        print(row[0])

U Launcher Lite – FREE Live Cool Themes, Hide Apps
CarMax – Cars for Sale: Search Used Car Inventory
AutoScout24 Switzerland – Find your new car
Zona Azul Digital Fácil SP CET - OFFICIAL São Paulo
Wattpad 📖 Free Books
ReadEra – free ebook reader
Docs To Go™ Free Office Suite
USPS MOBILE®
Invoice 2go — Professional Invoices and Estimates
Invoice 2go — Professional Invoices and Estimates
Docs To Go™ Free Office Suite
Röhrich Werner Soundboard
Manga Net – Best Online Manga Reader
Truyện Vui Tý Quậy
Comic Es - Shojo manga / love comics free of charge ♪ ♪
漫咖 Comics - Manga,Novel and Stories
Tapas – Comics, Novels, and Stories
【Ranobbe complete free】 Novelba - Free app that you can read and write novels
Messenger – Text and Video Chat for Free
Yahoo Mail – Stay Organized
Call Free – Free Call
Xperia Link™
Messenger – Text and Video Chat for Free
Dolphin Browser - Fast, Private & Adblock🐬
DU Browser—Browse fast & fun
Sync.ME – Caller ID & Block
Yahoo Mail – Stay Organized
myMail – Email for H

In [12]:
for row in googlestore_df_value:
    if '–' in row[0]:
        row[0] = row[0].replace('–', '-')
    elif '—' in row[0]:
        row[0] = row[0].replace('—', '-')
    elif '™' in row[0]:
        row[0] = row[0].replace('™', '')
    elif '®' in row[0]:
        row[0] = row[0].replace('®', '')
    elif '’' in row[0]:
        row[0] = row[0].replace('’', "\'")

In [13]:
temp = 0
for row in googlestore_df_value:
    if not row[0].isascii():
        temp += 1
temp

77

In [14]:
for row in googlestore_df_value:
    if not row[0].isascii():
        print(row[0])

Zona Azul Digital Fácil SP CET - OFFICIAL São Paulo
Wattpad 📖 Free Books
Röhrich Werner Soundboard
Truyện Vui Tý Quậy
Comic Es - Shojo manga / love comics free of charge ♪ ♪
漫咖 Comics - Manga,Novel and Stories
【Ranobbe complete free】 Novelba - Free app that you can read and write novels
Dolphin Browser - Fast, Private & Adblock🐬
FlirtChat - ♥Free Dating/Flirting App♥
FlirtChat - ♥Free Dating/Flirting App♥
Learn Spanish - Español
Flame - درب عقلك يوميا
Kanji test · Han search Kanji training (free version)
🔥 Football Wallpapers 4K | Full HD Backgrounds 😍
İşCep
Cookpad - FREE recipe search makes fun cooking · musical making!
Wendy’s - Food and Offers
Homes.com 🏠 For Sale, Rent
At home - rental · real estate · room finding application such as apartment · apartment
乐屋网: Buying a house, selling a house, renting a house
Best New Ringtones 2018 Free 🔥 For Android
Top Popular Ringtones 2018 Free 🔥
Super Funny Ringtones 2018 🔔
Cool Popular Ringtones 2018 🔥
သိင်္ Astrology - Min Thein Kha BayDin


The rest, such as emojis or non-English character will not be included

In [30]:
# Run this cell more than once if there are still unidentified symbols
for row in googlestore_df_value:
    if not row[0].isascii():
        idx = [(i, lst.index(row[0])) for i, lst in enumerate(googlestore_df_value) if row[0] in lst]
        pop_index = idx[0][0]
        googlestore_df_value.pop(pop_index)

In [31]:
for row in googlestore_df_value:
    if not row[0].isascii():
        print(row[0])

In [32]:
len(googlestore_df_value)

4923

- Duplicated values

In [35]:
dupli = []
unique = []

for row in googlestore_df_value:
    name = row[0]
    if name in unique:
        dupli.append(name)
    else:
        unique.append(name)
        
len(dupli)

943

In [36]:
for row in googlestore_df_value:
    name = row[0]
    if name == 'Rosetta Stone: Learn to Speak & Read New Languages':
        print(name)

Rosetta Stone: Learn to Speak & Read New Languages
Rosetta Stone: Learn to Speak & Read New Languages
Rosetta Stone: Learn to Speak & Read New Languages
Rosetta Stone: Learn to Speak & Read New Languages


In [20]:
lst1 = ['a', 'b', 'c']
lst2 = [['a', 'b', 'c']]

In [93]:
google_cp = googlestore_df_value.copy()

In [92]:
dupli = []
unique = []

for row in google_cp:
    if row[0] in unique:
        dupli.append(row)
    else:
        unique.append(row)

len(dupli)        

0

In [59]:
for row in google_cp:
    if row[0] in dupli:
        idx = [(i, lst.index(row[0])) for i, lst in enumerate(google_cp) if row[0] in lst]
        pop_index = idx[0][0]
        google_cp.pop(pop_index)

In [72]:
dupli

[['Quick PDF Scanner + OCR FREE',
  'BUSINESS',
  '4.2',
  '80805',
  'Varies with device',
  '5,000,000+',
  'Free',
  '0',
  'Everyone',
  'Business',
  'February 26, 2018',
  'Varies with device',
  '4.0.3 and up'],
 ['Box',
  'BUSINESS',
  '4.2',
  '159872',
  'Varies with device',
  '10,000,000+',
  'Free',
  '0',
  'Everyone',
  'Business',
  'July 31, 2018',
  'Varies with device',
  'Varies with device'],
 ['Google My Business',
  'BUSINESS',
  '4.4',
  '70991',
  'Varies with device',
  '5,000,000+',
  'Free',
  '0',
  'Everyone',
  'Business',
  'July 24, 2018',
  '2.19.0.204537701',
  '4.4 and up'],
 ['ZOOM Cloud Meetings',
  'BUSINESS',
  '4.4',
  '31614',
  '37M',
  '10,000,000+',
  'Free',
  '0',
  'Everyone',
  'Business',
  'July 20, 2018',
  '4.1.28165.0716',
  '4.0 and up'],
 ['join.me - Simple Meetings',
  'BUSINESS',
  '4.0',
  '6989',
  'Varies with device',
  '1,000,000+',
  'Free',
  '0',
  'Everyone',
  'Business',
  'July 16, 2018',
  '4.3.0.508',
  '4.4 and up

In [86]:
for row in unique:
    if row[0] == 'Telegram':
        print(row[0])

Telegram
Telegram
Telegram


In [60]:
dupli = []
unique = []

for row in google_cp:
    name = row[0]
    if name in unique:
        dupli.append(name)
    else:
        unique.append(name)
        
len(dupli)

0

In [61]:
dupli

[]

In [94]:
for row in google_cp:
    name = row[0]
    if name == 'Telegram':
        print(name)

Telegram
Telegram
Telegram


In [28]:
len(google_cp)

3898

In [29]:
4923-943

3980

In [89]:
unique[0]

['Photo Editor & Candy Camera & Grid & ScrapBook',
 'ART_AND_DESIGN',
 '4.1',
 '159',
 '19M',
 '10,000+',
 'Free',
 '0',
 'Everyone',
 'Art & Design',
 'January 7, 2018',
 '1.0.0',
 '4.0.3 and up']