# Web Scraping / Data Import Notebook
## ADS 509 Final Project
## Authors: Emina Belekanich, Roberto Cancel, Martin Zagari
## Date: 10/3/2022

## Reference:

## The code used in this notebook comes from:
Author: Max Steele    
Date: March 24, 2021    
Scraping App Store Reviews with Python  
Code Version: (unsure at the moment since it's from a link to a website with the code used)  
Type: Python code to scrape Apple App Store reviews    
Web Address: https://python.plainenglish.io/scraping-app-store-reviews-with-python-90e4117ccdfb  
Publisher: Medium.com  


In [None]:
# Required installations:
# pip install itunes-app-scraper-dmi
# pip install app-store-scraper
# pip install tzlocal

In [8]:
import pandas as pd

# for scraping app info from App Store
from itunes_app_scraper.scraper import AppStoreScraper

# for scraping app reviews from App Store
from app_store_scraper import AppStore

# for pretty printing data structures
from pprint import pprint

# for keeping track of timing
import datetime as dt
from tzlocal import get_localzone

# for building in wait times
import random
import time

In [14]:
# initialize data of lists.
data = {'app_name': ['HBO Max', 'Netflix', 'Hulu', 'Disney+', 'YouTube'],
        'iOS_app_name': ['hbo-max-stream-tv-movies', 'netflix', 'hulu-stream-shows-movies', 'disney', 'youtube-watch-listen-stream'],
        'iOS_app_id': [971265422, 363590051, 376510438, 1446075923, 544007664]}

# Create a dataframe to house our app information
app_df = pd.DataFrame(data)

In [15]:
# Create list of app names and app IDs
app_names = list(app_df['iOS_app_name'])
app_ids = list(app_df['iOS_app_id'])

In [16]:
app_ids

[971265422, 363590051, 376510438, 1446075923, 544007664]

In [17]:
## Set up App Store Scraper
scraper = AppStoreScraper()
app_store_list = list(scraper.get_multiple_app_details(app_ids))

## Pretty print the data for the first app
pprint(app_store_list[0])

{'advisories': 'Regelmatig/Intens grof taalgebruik of grove '
               'humor,Regelmatig/Intens animatiegeweld of fictief '
               'geweld,Soms/Milde seksuele content en '
               'naaktheid,Regelmatig/Intense horror-/angstthema’s,Soms/Milde '
               'medische/behandelingsinformatie,Soms/Mild realistisch '
               'geweld,Soms/Mild alcohol-, tabaks- of drugsgebruik of '
               'referenties ernaar,Soms/Milde simulatie van het '
               'gokken,Soms/Milde volwassen/suggestieve thema’s',
 'appletvScreenshotUrls': '',
 'artistId': 1514826633,
 'artistName': 'WarnerMedia',
 'artistViewUrl': 'https://apps.apple.com/nl/developer/warnermedia/id1514826633?uo=4',
 'artworkUrl100': 'https://is5-ssl.mzstatic.com/image/thumb/Purple112/v4/ef/1b/2c/ef1b2c6e-d54b-f385-5ddf-ca32f85ce972/AppIcon-Release-0-0-1x_U007emarketing-0-0-0-7-0-0-sRGB-0-0-0-GLES2_U002c0-512MB-85-220-0-0.png/100x100bb.jpg',
 'artworkUrl512': 'https://is5-ssl.mzstatic.com/image/thu

In [19]:
## Convert list of dicts to Pandas DataFrame and write to csv
app_info_df = pd.DataFrame(app_store_list)
app_info_df.head()

Unnamed: 0,isGameCenterEnabled,features,supportedDevices,advisories,appletvScreenshotUrls,artistViewUrl,screenshotUrls,artworkUrl100,artworkUrl60,ipadScreenshotUrls,...,sellerUrl,trackViewUrl,minimumOsVersion,contentAdvisoryRating,trackContentRating,languageCodesISO2A,trackCensoredName,version,wrapperType,userRatingCount
0,False,iosUniversal,"iPhone5s-iPhone5s,iPadAir-iPadAir,iPadAirCellu...",Regelmatig/Intens grof taalgebruik of grove hu...,,https://apps.apple.com/nl/developer/warnermedi...,https://is2-ssl.mzstatic.com/image/thumb/Purpl...,https://is5-ssl.mzstatic.com/image/thumb/Purpl...,https://is5-ssl.mzstatic.com/image/thumb/Purpl...,https://is2-ssl.mzstatic.com/image/thumb/Purpl...,...,https://www.hbomax.com/,https://apps.apple.com/nl/app/hbo-max-kijk-tv-...,12.2,12+,12+,"BG,HR,CS,DA,NL,EN,ET,FI,HU,LV,LT,MK,NB,PL,PT,R...",HBO Max: Kijk TV en films,52.45.0,software,1449
1,False,iosUniversal,"iPhone5s-iPhone5s,iPadAir-iPadAir,iPadAirCellu...","Soms/Milde horror-/angstthema’s,Soms/Mild real...",,https://apps.apple.com/nl/developer/netflix-in...,https://is3-ssl.mzstatic.com/image/thumb/Purpl...,https://is5-ssl.mzstatic.com/image/thumb/Purpl...,https://is5-ssl.mzstatic.com/image/thumb/Purpl...,https://is4-ssl.mzstatic.com/image/thumb/Purpl...,...,http://www.netflix.com,https://apps.apple.com/nl/app/netflix/id363590...,15.0,12+,12+,"AR,HR,CS,DA,NL,EN,FI,FR,DE,EL,HE,HI,HU,ID,IT,J...",Netflix,15.1.1,software,10625
2,False,iosUniversal,"iPhone5s-iPhone5s,iPadAir-iPadAir,iPadAirCellu...",,,https://apps.apple.com/nl/developer/disney/id2...,https://is4-ssl.mzstatic.com/image/thumb/Purpl...,https://is2-ssl.mzstatic.com/image/thumb/Purpl...,https://is2-ssl.mzstatic.com/image/thumb/Purpl...,https://is4-ssl.mzstatic.com/image/thumb/Purpl...,...,https://disneyplus.com,https://apps.apple.com/nl/app/disney/id1446075...,14.0,4+,4+,"CS,DA,NL,EN,FI,FR,DE,EL,HU,IT,JA,KO,NB,PL,PT,R...",Disney+,2.11.2,software,18360
3,False,iosUniversal,"iPhone5s-iPhone5s,iPadAir-iPadAir,iPadAirCellu...","Soms/Mild animatiegeweld of fictief geweld,Som...",,https://apps.apple.com/nl/developer/google-llc...,https://is2-ssl.mzstatic.com/image/thumb/Purpl...,https://is2-ssl.mzstatic.com/image/thumb/Purpl...,https://is2-ssl.mzstatic.com/image/thumb/Purpl...,https://is2-ssl.mzstatic.com/image/thumb/Purpl...,...,https://www.youtube.com,https://apps.apple.com/nl/app/youtube/id544007...,12.0,17+,17+,"AF,SQ,AM,AR,HY,AZ,EU,BE,BN,BS,BG,MY,KM,CA,HR,C...",YouTube,17.39.4,software,861240


In [None]:
## Set up loop to go through all apps
for app_name, app_id in zip(app_names, app_ids):
    
    # Get start time
    start = dt.datetime.now(tz=get_localzone())
    fmt= "%m/%d/%y - %T %p"
    
    # Print starting output for app
    print('---'*20)
    print('---'*20)    
    print(f'***** {app_name} started at {start.strftime(fmt)}')
    print()
    
    # Instantiate AppStore for app
    app_ = AppStore(country='us', app_name=app_name, app_id=app_id)
    
    # Scrape reviews posted since January 1, 2022 and limit to 1,000 reviews
    app_.review(how_many=1000,
                after=dt.datetime(2022, 1, 1),
                sleep=random.randint(20,25))
    
    reviews = app_.reviews
    
    # Add keys to store information about which app each review is for
    for rvw in reviews:
        rvw['app_name'] = app_name
        rvw['app_id'] = app_id
    
    # Print update that scraping was completed
    print(f"""Done scraping {app_name}. 
    Scraped a total of {app_.reviews_count} reviews.\n""")
    
    # Convert list of dicts to Pandas DataFrame and write to csv
    review_df = pd.DataFrame(reviews)
    review_df.to_csv('C:/Users/rober/OneDrive - University of San Diego/Documents/GitHub/ADS509_Final_Project/Data/' + app_name + '.csv', index=False)
    
    # Get end time
    end = dt.datetime.now(tz=get_localzone())
    
    # Print ending output for app
    print(f"""Successfully wrote {app_name} reviews to csv
    at {end.strftime(fmt)}.\n""")
    print(f'Time elapsed for {app_name}: {end-start}')
    print('---'*20)
    print('---'*20)
    print('\n')
    
    # Wait 5 to 10 seconds to start scraping next app
    time.sleep(random.randint(5,10))

------------------------------------------------------------
------------------------------------------------------------
***** hbo-max-stream-tv-movies started at 10/03/22 - 11:28:32 AM



2022-10-03 11:28:33,022 [INFO] Base - Initialised: AppStore('us', 'hbo-max-stream-tv-movies', 971265422)
2022-10-03 11:28:33,022 [INFO] Base - Ready to fetch reviews from: https://apps.apple.com/us/app/hbo-max-stream-tv-movies/id971265422


In [23]:
review_df

Unnamed: 0,userName,developerResponse,date,rating,review,title,isEdited,app_name,app_id
0,SirHotPockets,"{'id': 27814282, 'body': 'Thanks so much for s...",2022-02-01 04:39:57,5,EDIT: Freedom to watch content in preferred as...,HBO Enforces Mandatory Letterboxing for “CoNsI...,False,hbo-max-stream-tv-movies,971265422
1,melerzz,"{'id': 27934564, 'body': 'Hi, thank you for ta...",2022-02-07 08:46:08,2,I love the amount of shows and movies HBO Max ...,Great movies terrible app,False,hbo-max-stream-tv-movies,971265422
2,DanShockley,"{'id': 28138401, 'body': 'Hi Dan! Thanks for l...",2022-02-19 21:25:09,2,"Got a decent size iPhone? Good thing, cause th...","Buggy, slow, poorly designed, screen-waster",True,hbo-max-stream-tv-movies,971265422
3,caterpillar27,"{'id': 29201710, 'body': 'Hi, thank you for ta...",2022-04-14 23:55:52,2,The content this streaming app has already and...,Great content/terrible app,False,hbo-max-stream-tv-movies,971265422
4,JD3825426,"{'id': 31437937, 'body': 'Hi, thank you for ta...",2022-08-12 00:14:51,1,I’m not sure any elaboration is necessary but ...,Thoroughly broken apps & utterly inept custome...,False,hbo-max-stream-tv-movies,971265422
...,...,...,...,...,...,...,...,...,...
996,rhpowers,"{'id': 29612646, 'body': 'We want to make sure...",2022-05-04 20:12:50,2,Downloads unpredictable. And when I've got a d...,Poor downloads and offline use. And many Crashes.,False,hbo-max-stream-tv-movies,971265422
997,Wickedmic,"{'id': 29348950, 'body': 'We want to make sure...",2022-04-22 11:52:55,2,This app is excellent when it works. Force clo...,Good when it works,False,hbo-max-stream-tv-movies,971265422
998,Maryam11,"{'id': 29385983, 'body': 'Hi there, we'd be mo...",2022-04-24 07:12:34,1,"I’m sorry, but HBO is the worst streaming site...",So glitchy,False,hbo-max-stream-tv-movies,971265422
999,nicolekaym,"{'id': 29294823, 'body': 'Hi, thank you for ta...",2022-04-17 22:26:49,1,"Great content, but Apple TV app keeps crashing...","Broken app, Keeps crashing",False,hbo-max-stream-tv-movies,971265422
