# Web Scraping / Data Import Notebook
## ADS 509 Final Project
## Authors: Emina Belekanich, Roberto Cancel, Martin Zagari
## Date: 10/3/2022

## Reference:

## The code used in this notebook comes from:
Author: Max Steele    
Date: March 24, 2021    
Scraping App Store Reviews with Python  
Code Version: (unsure at the moment since it's from a link to a website with the code used)  
Type: Python code to scrape Apple App Store reviews    
Web Address: https://python.plainenglish.io/scraping-app-store-reviews-with-python-90e4117ccdfb  
Publisher: Medium.com  


In [None]:
# Required installations:
# pip install itunes-app-scraper-dmi
# pip install app-store-scraper
# pip install tzlocal

In [8]:
import pandas as pd

# for scraping app info from App Store
from itunes_app_scraper.scraper import AppStoreScraper

# for scraping app reviews from App Store
from app_store_scraper import AppStore

# for pretty printing data structures
from pprint import pprint

# for keeping track of timing
import datetime as dt
from tzlocal import get_localzone

# for building in wait times
import random
import time

In [14]:
# initialize data of lists.
data = {'app_name': ['HBO Max', 'Netflix', 'Hulu', 'Disney+', 'YouTube'],
        'iOS_app_name': ['hbo-max-stream-tv-movies', 'netflix', 'hulu-stream-shows-movies', 'disney', 'youtube-watch-listen-stream'],
        'iOS_app_id': [971265422, 363590051, 376510438, 1446075923, 544007664]}

# Create a dataframe to house our app information
app_df = pd.DataFrame(data)

In [15]:
# Create list of app names and app IDs
app_names = list(app_df['iOS_app_name'])
app_ids = list(app_df['iOS_app_id'])

In [16]:
app_ids

[971265422, 363590051, 376510438, 1446075923, 544007664]

In [17]:
## Set up App Store Scraper
scraper = AppStoreScraper()
app_store_list = list(scraper.get_multiple_app_details(app_ids))
363590051363590051## Instantiate AppStore class for desired app
my_app = AppStore(
  country='us',        # required, 2-letter code
  app_name='aloe-bud', # required, found in app's url
  app_id=1318382054    # technically not required, found in app's url
) 
    
## Use review method to scrape reviews from App Store
my_app.review(
  how_many=100, # if not provided, defaults to scraping all
  after,        # optional, datetime object, filter old reviews
  sleep         # optional, int, seconds between each call
)

{'advisories': 'Regelmatig/Intens grof taalgebruik of grove '
               'humor,Regelmatig/Intens animatiegeweld of fictief '
               'geweld,Soms/Milde seksuele content en '
               'naaktheid,Regelmatig/Intense horror-/angstthema’s,Soms/Milde '
               'medische/behandelingsinformatie,Soms/Mild realistisch '
               'geweld,Soms/Mild alcohol-, tabaks- of drugsgebruik of '
               'referenties ernaar,Soms/Milde simulatie van het '
               'gokken,Soms/Milde volwassen/suggestieve thema’s',
 'appletvScreenshotUrls': '',
 'artistId': 1514826633,
 'artistName': 'WarnerMedia',
 'artistViewUrl': 'https://apps.apple.com/nl/developer/warnermedia/id1514826633?uo=4',
 'artworkUrl100': 'https://is5-ssl.mzstatic.com/image/thumb/Purple112/v4/ef/1b/2c/ef1b2c6e-d54b-f385-5ddf-ca32f85ce972/AppIcon-Release-0-0-1x_U007emarketing-0-0-0-7-0-0-sRGB-0-0-0-GLES2_U002c0-512MB-85-220-0-0.png/100x100bb.jpg',
 'artworkUrl512': 'https://is5-ssl.mzstatic.com/image/thu

In [19]:
## Convert list of dicts to Pandas DataFrame and write to csv
app_info_df = pd.DataFrame(app_store_list)
app_info_df.head()

Unnamed: 0,isGameCenterEnabled,features,supportedDevices,advisories,appletvScreenshotUrls,artistViewUrl,screenshotUrls,artworkUrl100,artworkUrl60,ipadScreenshotUrls,...,sellerUrl,trackViewUrl,minimumOsVersion,contentAdvisoryRating,trackContentRating,languageCodesISO2A,trackCensoredName,version,wrapperType,userRatingCount
0,False,iosUniversal,"iPhone5s-iPhone5s,iPadAir-iPadAir,iPadAirCellu...",Regelmatig/Intens grof taalgebruik of grove hu...,,https://apps.apple.com/nl/developer/warnermedi...,https://is2-ssl.mzstatic.com/image/thumb/Purpl...,https://is5-ssl.mzstatic.com/image/thumb/Purpl...,https://is5-ssl.mzstatic.com/image/thumb/Purpl...,https://is2-ssl.mzstatic.com/image/thumb/Purpl...,...,https://www.hbomax.com/,https://apps.apple.com/nl/app/hbo-max-kijk-tv-...,12.2,12+,12+,"BG,HR,CS,DA,NL,EN,ET,FI,HU,LV,LT,MK,NB,PL,PT,R...",HBO Max: Kijk TV en films,52.45.0,software,1449
1,False,iosUniversal,"iPhone5s-iPhone5s,iPadAir-iPadAir,iPadAirCellu...","Soms/Milde horror-/angstthema’s,Soms/Mild real...",,https://apps.apple.com/nl/developer/netflix-in...,https://is3-ssl.mzstatic.com/image/thumb/Purpl...,https://is5-ssl.mzstatic.com/image/thumb/Purpl...,https://is5-ssl.mzstatic.com/image/thumb/Purpl...,https://is4-ssl.mzstatic.com/image/thumb/Purpl...,...,http://www.netflix.com,https://apps.apple.com/nl/app/netflix/id363590...,15.0,12+,12+,"AR,HR,CS,DA,NL,EN,FI,FR,DE,EL,HE,HI,HU,ID,IT,J...",Netflix,15.1.1,software,10625
2,False,iosUniversal,"iPhone5s-iPhone5s,iPadAir-iPadAir,iPadAirCellu...",,,https://apps.apple.com/nl/developer/disney/id2...,https://is4-ssl.mzstatic.com/image/thumb/Purpl...,https://is2-ssl.mzstatic.com/image/thumb/Purpl...,https://is2-ssl.mzstatic.com/image/thumb/Purpl...,https://is4-ssl.mzstatic.com/image/thumb/Purpl...,...,https://disneyplus.com,https://apps.apple.com/nl/app/disney/id1446075...,14.0,4+,4+,"CS,DA,NL,EN,FI,FR,DE,EL,HU,IT,JA,KO,NB,PL,PT,R...",Disney+,2.11.2,software,18360
3,False,iosUniversal,"iPhone5s-iPhone5s,iPadAir-iPadAir,iPadAirCellu...","Soms/Mild animatiegeweld of fictief geweld,Som...",,https://apps.apple.com/nl/developer/google-llc...,https://is2-ssl.mzstatic.com/image/thumb/Purpl...,https://is2-ssl.mzstatic.com/image/thumb/Purpl...,https://is2-ssl.mzstatic.com/image/thumb/Purpl...,https://is2-ssl.mzstatic.com/image/thumb/Purpl...,...,https://www.youtube.com,https://apps.apple.com/nl/app/youtube/id544007...,12.0,17+,17+,"AF,SQ,AM,AR,HY,AZ,EU,BE,BN,BS,BG,MY,KM,CA,HR,C...",YouTube,17.39.4,software,861240


In [27]:
## Set up loop to go through all apps
for app_name, app_id in zip(app_names, app_ids):
    
    # Get start time
    start = dt.datetime.now(tz=get_localzone())
    fmt= "%m/%d/%y - %T %p"
    
    # Print starting output for app
    print('---'*20)
    print('---'*20)    
    print(f'***** {app_name} started at {start.strftime(fmt)}')
    print()
    
    # Instantiate AppStore for app
    app_ = AppStore(country='us', app_name=app_name, app_id=app_id)
    
    # Scrape reviews posted since January 1, 2022 and limit to 1,000 reviews
    app_.review(how_many=1000,
                after=dt.datetime(2022, 1, 1),
                sleep=random.randint(20,25))
    
    reviews = app_.reviews
    
    # Add keys to store information about which app each review is for
    for rvw in reviews:
        rvw['app_name'] = app_name
        rvw['app_id'] = app_id
    
    # Print update that scraping was completed
    print(f"""Done scraping {app_name}. 
    Scraped a total of {app_.reviews_count} reviews.\n""")
    
    # Convert list of dicts to Pandas DataFrame and write to csv
    review_df = pd.DataFrame(reviews)
    review_df.to_csv('C:/Users/rober/OneDrive - University of San Diego/Documents/GitHub/ADS509_Final_Project/Data/' + app_name + '.csv', index=False)
    
    # Get end time
    end = dt.datetime.now(tz=get_localzone())
    
    # Print ending output for app
    print(f"""Successfully wrote {app_name} reviews to csv
    at {end.strftime(fmt)}.\n""")
    print(f'Time elapsed for {app_name}: {end-start}')
    print('---'*20)
    print('---'*20)
    print('\n')
    
    # Wait 5 to 10 seconds to start scraping next app
    time.sleep(random.randint(5,10))

------------------------------------------------------------
------------------------------------------------------------
***** hbo-max-stream-tv-movies started at 10/03/22 - 12:50:16 PM



2022-10-03 12:50:16,691 [INFO] Base - Initialised: AppStore('us', 'hbo-max-stream-tv-movies', 971265422)
2022-10-03 12:50:16,691 [INFO] Base - Ready to fetch reviews from: https://apps.apple.com/us/app/hbo-max-stream-tv-movies/id971265422
2022-10-03 12:50:41,898 [INFO] Base - [id:971265422] Fetched 8 reviews (8 fetched in total)
2022-10-03 12:51:32,347 [INFO] Base - [id:971265422] Fetched 22 reviews (22 fetched in total)
2022-10-03 12:52:22,801 [INFO] Base - [id:971265422] Fetched 33 reviews (33 fetched in total)
2022-10-03 12:53:13,257 [INFO] Base - [id:971265422] Fetched 44 reviews (44 fetched in total)
2022-10-03 12:54:04,083 [INFO] Base - [id:971265422] Fetched 48 reviews (48 fetched in total)
2022-10-03 12:54:54,548 [INFO] Base - [id:971265422] Fetched 61 reviews (61 fetched in total)
2022-10-03 12:55:45,033 [INFO] Base - [id:971265422] Fetched 67 reviews (67 fetched in total)
2022-10-03 12:56:35,596 [INFO] Base - [id:971265422] Fetched 79 reviews (79 fetched in total)
2022-10-03 

Done scraping hbo-max-stream-tv-movies. 
    Scraped a total of 1001 reviews.

Successfully wrote hbo-max-stream-tv-movies reviews to csv
    at 10/03/22 - 14:11:35 PM.

Time elapsed for hbo-max-stream-tv-movies: 1:21:19.890091
------------------------------------------------------------
------------------------------------------------------------


------------------------------------------------------------
------------------------------------------------------------
***** netflix started at 10/03/22 - 14:11:40 PM



2022-10-03 14:11:41,787 [INFO] Base - Initialised: AppStore('us', 'netflix', 363590051)
2022-10-03 14:11:41,787 [INFO] Base - Ready to fetch reviews from: https://apps.apple.com/us/app/netflix/id363590051
2022-10-03 14:12:07,054 [INFO] Base - [id:363590051] Fetched 5 reviews (5 fetched in total)
2022-10-03 14:12:57,667 [INFO] Base - [id:363590051] Fetched 9 reviews (9 fetched in total)
2022-10-03 14:13:48,241 [INFO] Base - [id:363590051] Fetched 16 reviews (16 fetched in total)
2022-10-03 14:14:38,922 [INFO] Base - [id:363590051] Fetched 24 reviews (24 fetched in total)
2022-10-03 14:15:29,552 [INFO] Base - [id:363590051] Fetched 31 reviews (31 fetched in total)
2022-10-03 14:16:20,392 [INFO] Base - [id:363590051] Fetched 39 reviews (39 fetched in total)
2022-10-03 14:17:11,081 [INFO] Base - [id:363590051] Fetched 47 reviews (47 fetched in total)
2022-10-03 14:18:01,805 [INFO] Base - [id:363590051] Fetched 52 reviews (52 fetched in total)
2022-10-03 14:18:52,471 [INFO] Base - [id:36359

Done scraping netflix. 
    Scraped a total of 174 reviews.

Successfully wrote netflix reviews to csv
    at 10/03/22 - 14:36:12 PM.

Time elapsed for netflix: 0:24:31.593227
------------------------------------------------------------
------------------------------------------------------------


------------------------------------------------------------
------------------------------------------------------------
***** hulu-stream-shows-movies started at 10/03/22 - 14:36:20 PM



2022-10-03 14:36:21,457 [INFO] Base - Initialised: AppStore('us', 'hulu-stream-shows-movies', 376510438)
2022-10-03 14:36:21,460 [INFO] Base - Ready to fetch reviews from: https://apps.apple.com/us/app/hulu-stream-shows-movies/id376510438
2022-10-03 14:36:45,652 [INFO] Base - [id:376510438] Fetched 6 reviews (6 fetched in total)
2022-10-03 14:37:34,093 [INFO] Base - [id:376510438] Fetched 7 reviews (7 fetched in total)
2022-10-03 14:38:22,563 [INFO] Base - [id:376510438] Fetched 9 reviews (9 fetched in total)
2022-10-03 14:39:11,019 [INFO] Base - [id:376510438] Fetched 14 reviews (14 fetched in total)
2022-10-03 14:39:59,682 [INFO] Base - [id:376510438] Fetched 15 reviews (15 fetched in total)
2022-10-03 14:40:48,180 [INFO] Base - [id:376510438] Fetched 19 reviews (19 fetched in total)
2022-10-03 14:41:36,628 [INFO] Base - [id:376510438] Fetched 24 reviews (24 fetched in total)
2022-10-03 14:42:25,122 [INFO] Base - [id:376510438] Fetched 26 reviews (26 fetched in total)
2022-10-03 14:4

Done scraping hulu-stream-shows-movies. 
    Scraped a total of 1000 reviews.

Successfully wrote hulu-stream-shows-movies reviews to csv
    at 10/03/22 - 18:01:30 PM.

Time elapsed for hulu-stream-shows-movies: 3:25:09.571416
------------------------------------------------------------
------------------------------------------------------------


------------------------------------------------------------
------------------------------------------------------------
***** disney started at 10/03/22 - 18:01:36 PM



2022-10-03 18:01:37,284 [INFO] Base - Initialised: AppStore('us', 'disney', 1446075923)
2022-10-03 18:01:37,285 [INFO] Base - Ready to fetch reviews from: https://apps.apple.com/us/app/disney/id1446075923
2022-10-03 18:02:02,515 [INFO] Base - [id:1446075923] Fetched 3 reviews (3 fetched in total)
2022-10-03 18:02:53,048 [INFO] Base - [id:1446075923] Fetched 7 reviews (7 fetched in total)
2022-10-03 18:03:43,535 [INFO] Base - [id:1446075923] Fetched 11 reviews (11 fetched in total)
2022-10-03 18:04:34,057 [INFO] Base - [id:1446075923] Fetched 11 reviews (11 fetched in total)
2022-10-03 18:05:24,545 [INFO] Base - [id:1446075923] Fetched 14 reviews (14 fetched in total)
2022-10-03 18:06:14,975 [INFO] Base - [id:1446075923] Fetched 19 reviews (19 fetched in total)
2022-10-03 18:07:05,670 [INFO] Base - [id:1446075923] Fetched 23 reviews (23 fetched in total)
2022-10-03 18:07:56,127 [INFO] Base - [id:1446075923] Fetched 26 reviews (26 fetched in total)
2022-10-03 18:08:46,598 [INFO] Base - [

Done scraping disney. 
    Scraped a total of 1008 reviews.

Successfully wrote disney reviews to csv
    at 10/03/22 - 21:23:23 PM.

Time elapsed for disney: 3:21:47.879555
------------------------------------------------------------
------------------------------------------------------------


------------------------------------------------------------
------------------------------------------------------------
***** youtube-watch-listen-stream started at 10/03/22 - 21:23:32 PM



2022-10-03 21:23:34,008 [INFO] Base - Initialised: AppStore('us', 'youtube-watch-listen-stream', 544007664)
2022-10-03 21:23:34,010 [INFO] Base - Ready to fetch reviews from: https://apps.apple.com/us/app/youtube-watch-listen-stream/id544007664
2022-10-03 21:23:58,266 [INFO] Base - [id:544007664] Fetched 11 reviews (11 fetched in total)
2022-10-03 21:24:46,958 [INFO] Base - [id:544007664] Fetched 24 reviews (24 fetched in total)
2022-10-03 21:25:35,597 [INFO] Base - [id:544007664] Fetched 33 reviews (33 fetched in total)
2022-10-03 21:26:24,509 [INFO] Base - [id:544007664] Fetched 34 reviews (34 fetched in total)
2022-10-03 21:27:13,435 [INFO] Base - [id:544007664] Fetched 41 reviews (41 fetched in total)
2022-10-03 21:28:02,228 [INFO] Base - [id:544007664] Fetched 48 reviews (48 fetched in total)
2022-10-03 21:28:51,091 [INFO] Base - [id:544007664] Fetched 55 reviews (55 fetched in total)
2022-10-03 21:29:39,908 [INFO] Base - [id:544007664] Fetched 59 reviews (59 fetched in total)
202

Done scraping youtube-watch-listen-stream. 
    Scraped a total of 1019 reviews.

Successfully wrote youtube-watch-listen-stream reviews to csv
    at 10/03/22 - 23:46:33 PM.

Time elapsed for youtube-watch-listen-stream: 2:23:00.880807
------------------------------------------------------------
------------------------------------------------------------




# IMPORT NOTE:  

There was an unanticipated issue with scraping the Netflix reviews - only 174 reviews were scraped. For this reason, the previously generated Netflix file was deleted and a separate Netflix scrape notebook was created to scrape the 1,000 reviews needed for this project.  
