# Proses *Web Scraping* Pada Ulasan Aplikasi Peduli Lindungi

### Menginstall *package scraping*

In [None]:
!pip install google-play-scraper

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting google-play-scraper
  Downloading google_play_scraper-1.2.3-py3-none-any.whl (28 kB)
Installing collected packages: google-play-scraper
Successfully installed google-play-scraper-1.2.3


### Mengimpor *library* yang dibutuhkan

In [None]:
from google_play_scraper import app, Sort, reviews_all
import pandas as pd 
import numpy as np

### *Scraping* ulasan pada aplikasi Peduli Lindungi

In [None]:
us_reviews = reviews_all(
    "com.telkom.tracencare",
    sleep_milliseconds = 0, # defaults to 0
    lang = 'en', # defaults to "en"
    country = 'id', # defaults to "us"
    sort = Sort.MOST_RELEVANT # defaults to Sort.MOST_RELEVANTS
)

### Menampilkan data dalam bentuk tabel

In [None]:
df_review = pd.DataFrame(np.array(us_reviews),columns=['review'])
df_review = df_review.join(pd.DataFrame(df_review.pop('review').tolist()))
df_review.head()

Unnamed: 0,reviewId,userName,userImage,content,score,thumbsUpCount,reviewCreatedVersion,at,replyContent,repliedAt
0,f8817df9-e3d1-428a-95fb-3e6e1bdaca04,Wine Amanda,https://play-lh.googleusercontent.com/a-/ACNPE...,Its mandatory to use this app to scan the QR c...,4,366,4.4.6,2022-06-16 03:08:12,"Halo Kak,\n\nKami mohon maaf atas ketidaknyama...",2022-06-16 13:17:12
1,9ce3a2c9-714a-4bf4-b943-5c9a7fe99d4a,A Google user,https://play-lh.googleusercontent.com/EGemoI2N...,The app stopped working multiple times and I c...,3,440,2.0.0,2020-04-21 12:37:10,,NaT
2,b138f46d-b6a3-459d-8a71-c28bfb43a2a9,Joel David,https://play-lh.googleusercontent.com/a-/ACNPE...,"On my first review, I wrote that I can't sync....",2,159,3.4.0,2021-08-03 22:48:25,"""Halo,\n\nKami mohon maaf atas ketidaknyamanan...",2021-09-05 06:14:40
3,d6067d62-0379-4ed9-a081-944223ea3868,D. T.,https://play-lh.googleusercontent.com/a-/ACNPE...,We are supposed to use it the app when to go t...,1,94,3.4.6,2021-10-19 16:45:33,"Halo D. T.,\n\nMohon maaf atas ketidaknyamanan...",2021-10-22 04:01:20
4,5ca0e0b4-0e01-4b7a-8499-dd49969189b8,Haraz Anadiya,https://play-lh.googleusercontent.com/a-/ACNPE...,Latest update makes the app unable to open. An...,1,206,3.4.5,2021-09-09 16:23:46,"Hello Haraz,\n\nWe apologize for your inconven...",2021-09-17 07:55:15


### Menampilkan jumlah baris dan kolom

In [None]:
df_review.shape

(51863, 10)

### Menampilkan beberapa kolom yang dibutuhkan

In [None]:
my_df = df_review[['userName', 'score', 'at', 'content']]
my_df.columns = ["Nama reviewer", "Rating", "Tanggal ulasan", "Ulasan"]
my_df.head()

Unnamed: 0,Nama reviewer,Rating,Tanggal ulasan,Ulasan
0,Wine Amanda,4,2022-06-16 03:08:12,Its mandatory to use this app to scan the QR c...
1,A Google user,3,2020-04-21 12:37:10,The app stopped working multiple times and I c...
2,Joel David,2,2021-08-03 22:48:25,"On my first review, I wrote that I can't sync...."
3,D. T.,1,2021-10-19 16:45:33,We are supposed to use it the app when to go t...
4,Haraz Anadiya,1,2021-09-09 16:23:46,Latest update makes the app unable to open. An...


### Memfilter data berdasarkan tanggal ulasan dibuat

In [None]:
data_review = my_df[(my_df["Tanggal ulasan"] >= '2021-12-16') & (my_df["Tanggal ulasan"] <= '2022-04-16')]
data_review = data_review.reset_index()
data_review = data_review.drop('index', axis=1)
data_review.head()

Unnamed: 0,Nama reviewer,Rating,Tanggal ulasan,Ulasan
0,edyh widiyanto,1,2021-12-26 11:27:32,It's a bold move to make this app mandatory wh...
1,Fathur Firmansyah,4,2021-12-20 15:04:25,We all know the devs won't ever give us the op...
2,Edward JP,3,2022-02-26 09:42:38,i was tested positive for covid. yet the app k...
3,Maria Mauri,1,2022-03-17 06:01:42,It is impossible for foreigners to use this ap...
4,AlwaysOn Forever,2,2022-01-13 06:53:17,Great idea to make an app like this and make t...


### Menampilkan jumlah data ulasan

In [None]:
len(data_review)

9257

### Menyimpan data dalam bentuk *csv* dan *excel*

In [None]:
data_review.to_csv('Scrapped_data.csv', index = False)
data_review.to_excel('Scrapped_data.xlsx', index = False)