#### App Store Reviews Scraper

---

This notebook scrapes the Apple App Store for the reviews of a particular app. In this case, we will use the scraper to get the reviews for GCash. 

Reference: https://python.plainenglish.io/scraping-app-store-reviews-with-python-90e4117ccdfb

In [1]:
import pandas as pd
import datetime as dt
import random
import time

from app_store_scraper import AppStore
from tzlocal import get_localzone

#### Scraping Reviews

This section focuses on how to use the AppStore library and discusses its various parameters. 

---

First, we search for the application name and id of the application of interest. These details are readily available from the web page of the app. 

For example, here's the url for the App Store page of GCash: https://apps.apple.com/ph/app/gcash/id520020791. The application name is the value between "app/" and "/id..." while the application ID is the value after "/id"



In [8]:
app_name = "gcash"
app_id = 520020791

In [2]:
app_name = "paymaya"
app_id = 991673877

We first intantiate _AppStore_ so that we may access its _review_ method. It has three required parameters: _country, app_name,_ and _app_id_. For the _app_name_ and _app_id_, we simply input the values we have obtained previously and set the App Store country accordingly. In this case, we set the _country_ to "ph" or Philippines.

In [9]:
app_store = AppStore(country='ph', 
                     app_name=app_name, 
                     app_id=app_id)

2021-06-13 14:05:53,113 [INFO] Base - Initialised: AppStore('ph', 'gcash', 520020791)
2021-06-13 14:05:53,114 [INFO] Base - Ready to fetch reviews from: https://apps.apple.com/ph/app/gcash/id520020791


Using the _review_ method, we scrape the reviews of the specified app. We may limit the number of reviews scraped by setting a value for the _how_many_ parameter. 

Without setting a value for this parameter, the method scrapes all reviews by default. We could also filter reviews by date by passing a datetime object to the _after_ parameter. By providing a value to this parameter, the scraper would only get reviews posted after the specified date. Lastly, sleep simply adds a sleep time in between calls.

Since we want to get all of the reviews for GCash, we do not provide a value for the _how_many_ and _after_ parameters.

In [10]:
app_store.review(sleep = random.randint(3,5))

reviews = app_store.reviews
print(f"Number of reviews scraped: {app_store.reviews_count}")

2021-06-13 14:06:00,918 [INFO] Base - [id:520020791] Fetched 20 reviews (20 fetched in total)
2021-06-13 14:06:11,571 [INFO] Base - [id:520020791] Fetched 60 reviews (60 fetched in total)
2021-06-13 14:06:22,073 [INFO] Base - [id:520020791] Fetched 100 reviews (100 fetched in total)
2021-06-13 14:06:32,530 [INFO] Base - [id:520020791] Fetched 140 reviews (140 fetched in total)
2021-06-13 14:06:43,133 [INFO] Base - [id:520020791] Fetched 180 reviews (180 fetched in total)
2021-06-13 14:06:54,193 [INFO] Base - [id:520020791] Fetched 220 reviews (220 fetched in total)
2021-06-13 14:07:05,001 [INFO] Base - [id:520020791] Fetched 260 reviews (260 fetched in total)
2021-06-13 14:07:15,683 [INFO] Base - [id:520020791] Fetched 300 reviews (300 fetched in total)
2021-06-13 14:07:26,399 [INFO] Base - [id:520020791] Fetched 340 reviews (340 fetched in total)
2021-06-13 14:07:37,286 [INFO] Base - [id:520020791] Fetched 380 reviews (380 fetched in total)
2021-06-13 14:07:48,079 [INFO] Base - [id:52

Number of reviews scraped: 4867


The method returns a list of dictionaries with the following keys: username, rating, title, review, date, and isEdited. A sample review is shown below.

In [11]:
app_store.reviews[0]

{'title': 'It is starting to become pretty annoying',
 'date': datetime.datetime(2018, 12, 29, 15, 24, 59),
 'rating': 3,
 'userName': 'Fhayerie0406',
 'isEdited': False,
 'review': 'Almost everything is smooth except for three things that i had experienced; the first one was i purchase a load amounting to 100 i received a text message that it was an error but gcash deducted the 100 on my balance and it never returned next i was trying to pay my electric bill through pelco2 and i keep on getting an error of invalid account number wherein last month my transaction went smoothly with the same company and lastly i was trying to cashin my remittance through moneygram and the only options are to provide the amount and the control number i did that and was getting a pop up stating that i need to provide the sender name and receiver name but there is no option for me to type that hence, i tried it again same error and on the fourth attempt i receive a pop up error stating that the control num

#### Storing the Reviews as CSV

This section focuses on the storage of the reviews for future use.

---

Since by default, the output does not contain information regarding the name and id of the app, we add the app_name and app_id columns for reference.

In [12]:
for review in reviews:
    review['app_name'] = app_name
    review['app_id'] = app_id

Finally, we store the DataFrame as a CSV file using the _to_csv_ method. We add a timestamp for reference.

In [13]:
time_now = time.strftime("%m%d-%H%M")
review_df = pd.DataFrame(reviews)
path = f"Data/{app_name}_{time_now}.csv"
review_df.to_csv(path, index=False)