#### App Store Reviews Scraper

---

This notebook scrapes the Apple App Store for the reviews of a particular app. In this case, we will use the scraper to get the reviews for GCash. 

Reference: https://python.plainenglish.io/scraping-app-store-reviews-with-python-90e4117ccdfb

In [1]:
import pandas as pd
import datetime as dt
import random
import time

from app_store_scraper import AppStore
from tzlocal import get_localzone

#### Scraping Reviews

This section focuses on how to use the AppStore library and discusses its various parameters. 

---

First, we search for the application name and id of the application of interest. These details are readily available from the web page of the app. 

For example, here's the url for the App Store page of GCash: https://apps.apple.com/ph/app/gcash/id520020791. The application name is the value between "app/" and "/id..." while the application ID is the value after "/id"



In [None]:
app_name = "gcash"
app_id = 520020791

In [2]:
app_name = "paymaya"
app_id = 991673877

We first intantiate _AppStore_ so that we may access its _review_ method. It has three required parameters: _country, app_name,_ and _app_id_. For the _app_name_ and _app_id_, we simply input the values we have obtained previously and set the App Store country accordingly. In this case, we set the _country_ to "ph" or Philippines.

In [3]:
app_store = AppStore(country='ph', 
                     app_name=app_name, 
                     app_id=app_id)

2021-06-13 13:53:34,441 [INFO] Base - Initialised: AppStore('ph', 'paymaya', 991673877)
2021-06-13 13:53:34,442 [INFO] Base - Ready to fetch reviews from: https://apps.apple.com/ph/app/paymaya/id991673877


Using the _review_ method, we scrape the reviews of the specified app. We may limit the number of reviews scraped by setting a value for the _how_many_ parameter. 

Without setting a value for this parameter, the method scrapes all reviews by default. We could also filter reviews by date by passing a datetime object to the _after_ parameter. By providing a value to this parameter, the scraper would only get reviews posted after the specified date. Lastly, sleep simply adds a sleep time in between calls.

Since we want to get all of the reviews for GCash, we do not provide a value for the _how_many_ and _after_ parameters.

In [4]:
app_store.review(sleep = random.randint(3,5))

reviews = app_store.reviews
print(f"Number of reviews scraped: {app_store.reviews_count}")

2021-06-13 13:53:42,951 [INFO] Base - [id:991673877] Fetched 40 reviews (40 fetched in total)
2021-06-13 13:53:55,688 [INFO] Base - [id:991673877] Fetched 100 reviews (100 fetched in total)
2021-06-13 13:54:08,732 [INFO] Base - [id:991673877] Fetched 160 reviews (160 fetched in total)
2021-06-13 13:54:21,413 [INFO] Base - [id:991673877] Fetched 220 reviews (220 fetched in total)
2021-06-13 13:54:34,486 [INFO] Base - [id:991673877] Fetched 280 reviews (280 fetched in total)
2021-06-13 13:54:47,394 [INFO] Base - [id:991673877] Fetched 340 reviews (340 fetched in total)
2021-06-13 13:55:00,309 [INFO] Base - [id:991673877] Fetched 400 reviews (400 fetched in total)
2021-06-13 13:55:13,184 [INFO] Base - [id:991673877] Fetched 460 reviews (460 fetched in total)
2021-06-13 13:55:26,041 [INFO] Base - [id:991673877] Fetched 520 reviews (520 fetched in total)
2021-06-13 13:55:38,906 [INFO] Base - [id:991673877] Fetched 580 reviews (580 fetched in total)
2021-06-13 13:55:52,157 [INFO] Base - [id:

Number of reviews scraped: 2569


The method returns a list of dictionaries with the following keys: username, rating, title, review, date, and isEdited. A sample review is shown below.

In [5]:
app_store.reviews[0]

{'date': datetime.datetime(2020, 3, 30, 1, 10, 54),
 'rating': 5,
 'isEdited': False,
 'review': 'I have been contacting them about a payment i made via this app because such payment did not reach the biller. It has been two weeks since the payment was made (as of writing). I have been contacting them via email, messenger, call and to no avail no one responded with a resolution nor offered to help. Until now my line is cut because i do not have extra to pay off my bill twice. VERY DISAPPOINTING!!!\nUPDATE: NO TICKET WAS PROVIDED. I checked the email and Miyuki replied to me asking for my details last March 10 5:45AM(for reference isent the screenshot of the email to paymaya cares messenger). I provided it on the same day and i got no response since then. I also communicated via Paymaya Cares and got no response either. I followed-up last March 28 via email and messenger and still no response.',
 'title': 'VERY BAD SERVICE',
 'userName': 'Hyrucha',
 'developerResponse': {'id': 14329762,

#### Storing the Reviews as CSV

This section focuses on the storage of the reviews for future use.

---

Since by default, the output does not contain information regarding the name and id of the app, we add the app_name and app_id columns for reference.

In [6]:
for review in reviews:
    review['app_name'] = app_name
    review['app_id'] = app_id

Finally, we store the DataFrame as a CSV file using the _to_csv_ method. We add a timestamp for reference.

In [7]:
time_now = time.strftime("%m%d-%H%M")
review_df = pd.DataFrame(reviews)
path = f"Data/{app_name}_{time_now}.csv"
review_df.to_csv(path, index=False)