## Python Programming Project: Profitable App Data Profiling
---

### Description
- This project is designed to demonstrate looping and data profiling skills gained from Dataquest.io training.  
- The perspective taken is that of a company that develops Android and iOS apps for the Apple App Store.
- GOAL: to understand the market for free apps, which is the business model of our ficticious company.

### Data
This data set is from apps that were downloaded, how much they cost, and also ratings of users.  Other metrics are provided to understand how successful apps are across multiple systems.
- The [source](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/data) of the Apple app data is from a Kaggle competition.
- The [source](https://www.kaggle.com/lava18/google-play-store-apps) of the Android app data is from another Kaggle competition.

### Key 

| iOS Name | Android Name |Description |
| ----------- | ----------- |---------------------- |
| "id" | NA | App ID |
| "track_name" |"App" |  App Name |
| "prime_genre" | "Category" | Category the app belongs to |
| "size_bytes" |"Size" | Size (in Bytes) |
| "currency" | NA | Currency of price |
| "price" |"Price" | Purchase price of the app |
| "rating_count_tot" |"Reviews" | Total count of ratings |
| "user_rating" |"Rating" | Average of user ratings |
| "cont_rating" |"Content Rating" | Age group the app is targeted at |
| "rating_count_ver" | NA | Count of current version ratings |
| "user_rating_ver" | NA | Average user ratings for current version |
| "sup_devices.num" | NA | Unknown |
| "ipadSc_urls.num" | NA | Unknown |
| "lang.num" | NA | Unknown |
| "vpp_lic" | NA | Unknown |
| NA | "Genres" | An app can belong to multiple genres (apart from its main category)|
| NA | "Installs" | Number of user downloads/installs for the app (as when scraped) |
| NA | "Last Updated" | Date when the app was last updated on Play Store (as when scraped) |
| NA | "Current Ver" | Current version of the app available on Play Store (as when scraped) |
| NA | "Android Ver" | Min required Android version (as when scraped) |

In [2]:
from csv import reader

In [11]:
apple_file = open('AppleStore.csv', encoding='utf8')
android_file = open('googleplaystore.csv', encoding='utf8')
read_apple_file = reader(apple_file)
read_android_file = reader(android_file)

ios = list(read_apple_file)
ios_header = ios[0]
ios = ios[1:]

android = list(read_android_file)
android_header = android[0]
android = android[1:]

In [12]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [17]:
print(ios_header)

['', 'id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


In [13]:
explore_data(ios,0,2, True)

['1', '281656475', 'PAC-MAN Premium', '100788224', 'USD', '3.99', '21292', '26', '4', '4.5', '6.3.5', '4+', 'Games', '38', '5', '10', '1']


['2', '281796108', 'Evernote - stay organized', '158578688', 'USD', '0', '161065', '26', '4', '3.5', '8.2.2', '4+', 'Productivity', '37', '5', '23', '1']


Number of rows: 7197
Number of columns: 17


In [16]:
print(android_header)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


In [14]:
explore_data(android,0,2, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


Number of rows: 10841
Number of columns: 13


In [20]:
# From the discussion boards, we find out that there is a row that 
# missing the category.
print(android[10472])

['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', 'August 7, 2018', '6.06.14', '4.4 and up']


In [19]:
# del android[10472]