# App Market Research

In this project, publicly available data about dog play-date apps from Google Play Store is scrapped and analyzed to access the potential of such doggy play-date apps in the app market. Possible insights from the analysis might include:
* What do people love most and least about such apps?
* How well do such apps generally tend to do financially?
* What regions of the world are such apps currently found in?

In [26]:
# Imports.
import datetime
import pycountry
from google_play_scraper import app
from utility_functions import get_now
from google_play_scraper import search
from utility_functions import inspect_function

%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [30]:
# 20 Countries based on dogs per capita.
# References: 
# 1. https://www.mappr.co/thematic-maps/world-pet-ownership/
# 2. https://www.petsecure.com.au/pet-care/a-guide-to-worldwide-pet-ownership/

country_names = [ # ISO standard names.
    'United States', 'Brazil', 'China', 'Russian Federation', 'Japan', 
    'Philippines', 'India', 'Argentina', 'United Kingdom', 'France', 
    'Poland', 'Spain', 'Romania', 'Australia', 'Hungary', 
    'Czechia', 'South Africa', 'Germany', 'Ethiopia',  'Canada'
]

languages = {'english': 'en'} # English apps only.

## 1. Data Acquisition

In [39]:
data = {} # Object with app IDs as key and an object containing all data regarding each app as its value.

In [42]:
search_str = "dog play date" # Search string => kind of app we're interested in.
n_apps = 3 # No. of apps per country (value range = [1, 30]).

# For each of our shortlisted countries ...
country = country_names[0] # Currently 'United States'. Do manually for all shortlisted countries.
# Try to get information about some dog play-date apps.
country_code = pycountry.countries.get(name=country).alpha_2.lower()
language_code = languages['english']
app_ids = [a['appId'] for a in search(search_str, lang=language_code, country=country_code, n_hits=n_apps)]
# For each app ...
for app_id in app_ids:
    # Get details regarding the app.
    app_details = app(app_id, lang=language_code, country=country_code)
    # Add potentially useful new data fields.
    app_details['searchMoment'] = str(datetime.datetime.now())
    app_details['countryCode'] = country_code
    app_details['languageCode'] = language_code
    # Remove less useful data fields.
    del app_details['video']
    del app_details['videoImage']
    del app_details['descriptionHTML']
    # Add app to data dictionary.
    data[app_id] = app_details

Each app details request returns data with following fields.
* `title`: Brief title.
* `description`: Description in plain text.
* `descriptionHTML`: Description in HTML format.
* `summary`: Summary of what the app is about.
* `installs`: No. of installs display string.
* `minInstalls`: At least these many installs.
* `realInstalls`: Exactly these many installs.
* `score`: Average user rating out of 5. ???
* `ratings`: No. of ratings.
* `reviews`: No. of reviews.
* `histogram`: List corresponding to no. of 1, 2, 3, 4 and 5 ratings respectively.
* `price`: Price of install.
* `free`: Whether or not this app is free to install.
* `currency`: Currency that the price is expressed in.
* `sale`: Whether or not this app is for sale. ???
* `saleTime`: ???
* `originalPrice`: ???
* `saleText`: ???
* `offersIAP`: Whether or not this app offers in app purchases.
* `inAppProductPrice`: String indicating prizes of purchasable items in the app.
* `developer`: Developer of the app.
* `developerId`: Identification string corresponding to app developer.
* `developerEmail`: Email corresponding to app developer.
* `developerWebsite`: Website corresponding to app developer.
* `developerAddress`: Address corresponding to app developer.
* `privacyPolicy`: Link to the privacy policy of this app.
* `genre`: A string trying to encompass the main category that this app may be put into.
* `genreId`: Identifier string trying to encompass the main category that this app may be put into.
* `categories`: List of {'name', 'id'} objects corresponding to categories that this app may be put into.
* `icon`: Link to app icon image.
* `headerImage`: Link to app header image that shows up as part of a search result.
* `screenshots`: List of links to screenshots of the app.
* `video`: A video associated with this app.
* `videoImage`: Link to a thumbnail image of above video.
* `contentRating`: Category of people allowed to rate this app. ???.
* `contentRatingDescription`: Genre under which people date this app. ???.
* `adSupported`: Whether or not this app supports ads.
* `containsAds`: Whether or not this app contains ads.
* `released`: String corresponding to date of release of this app.
* `updated`: No. of times this app was updated. ???
* `version`: Current version string.
* `comments`: List of some comments.
* `appId`: App identifier string.
* `url`: Link to this app on Google Play Store.

Following potentially useful data fields were added to details of each app.
* `searchMoment`: Date time string marking date-time at which app details were fetched.
* `countryCode`: ISO code of source country.
* `languageCode`: Code corresponding to primary language of the app.

Following less useful data fields were removed from details of each app.
* `video`: Not looking to work with videos.
* `videoImage`: Not looking to work with videos.
* `descriptionHTML`: Redundant since another field with same description as plain text already exists.

In [43]:
len(data)

6

## 2. Data Storage
Storage shall be managed in the cloud using following services offered by `AWS`.
1. `DynamoDB`: Transactional NoSQL database.
2. `RedShift`: Analytical database. 
3. `AWS Data Pipeline`: For data transfer between DynamoDB and RedShift.