<a href="https://colab.research.google.com/github/luthfiyahastutiningtyas/web-scraping/blob/main/Data_Scraping.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Web Scraping**

Scraping data (atau _web scraping_) adalah proses mengambil data secara otomatis dari situs web menggunakan program komputer. Teknik ini berguna untuk mengumpulkan informasi dari internet tanpa harus menyalinnya secara manual.

- Mengambil harga produk dari e-commerce (Sh**e, Tok*****a).
- Mengumpulkan daftar artikel berita terbaru.
- Mengambil data review dan rating aplikasi dari Google Play Store.
- Mengambil data lowongan kerja dari website karier.

## **Install Google Play Scraper**

In [None]:
pip install google-play-scraper

Collecting google-play-scraper
  Downloading google_play_scraper-1.2.7-py3-none-any.whl.metadata (50 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/50.2 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.2/50.2 kB[0m [31m1.8 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading google_play_scraper-1.2.7-py3-none-any.whl (28 kB)
Installing collected packages: google-play-scraper
Successfully installed google-play-scraper-1.2.7


## **Proses Scraping Link Mobile Banking Syariah**

Mengambil (scrape) daftar aplikasi dari Google Play Store berdasarkan kata kunci pencarian tertentu, dalam kasus ini adalah "bank syariah"

In [None]:
import warnings
warnings.filterwarnings("ignore")

In [None]:
import requests
from bs4 import BeautifulSoup

url = 'https://play.google.com/store/search?q=bank%20syariah&c=apps&hl=en'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
get_all = soup.find_all('div', {'jsname' : 'O2DNWb', 'class': 'fUEl2e'})

app_links = []
for item in get_all:
    app_link_elements = item.find_all('a', {'class': 'Si6A0c Gy4nib'})
    for link_element in app_link_elements:
        id_link = link_element['href'].replace('/store/apps/details?id=', '')
        app_links.append(id_link)

print(app_links)

['com.bsm.activity2', 'co.id.bankbsi.superapp', 'id.aladinbank.mobile', 'id.co.bcasyariah.bsya', 'id.dana', 'com.jago.digitalBanking', 'com.chase.sig.android', 'id.co.bri.brimo', 'co.uk.getmondo', 'com.citi.citimobile', 'com.usbank.mobilebanking', 'com.simas.mobile.SimobiPlusSyariah', 'id.bmri.livin', 'com.bca', 'src.com.bni', 'com.bnc.finance', 'com.gma.mbanking.bmd', 'id.bni.wondr', 'com.bca.mybca.omni.android', 'id.co.bankmas.bebas.retail', 'com.bibit.bibitid', 'com.ruya.bank', 'com.bsn.mybsn', 'com.dbs.sg.dbsmbanking', 'ae.ahb.digital', 'com.alami_funder', 'com.roshandigital', 'com.SIBMobile', 'com.netspend.mobileapp.ace.flare', 'id.socialbanking']


## **Melakukan Evaluasi**

Evaluasi dilakukan karena ternyata tidak semua list yang di scrapping merupakan data yang dibutuhkan

In [None]:
master_id_bank_syariah = [
    'com.bsm.activity2',
    'com.megasyariah',
    'id.aladinbank.mobile',
    'com.mobilemaslahah',
    'com.app.vioss4',
    'com.dwidasa.bcas.mb.android',
    'com.bankbtpns.mobilebanking',
    'com.simas.mobile.SimobiPlusSyariah',
    'com.aceh.action',
    'mlpt.siemo.mobilebanking.riau',
    'com.bankntbsyariah.mobilebanking',
    'com.gma.mbanking.bmd',
    'com.jago.digitalBanking',
    'com.alami_funder',
    'com.danasyariah.mobiledanasyariah',
    'id.co.danamonsyariah.shafa'
]

## **Proses Scraping Review & Rating Mobile Banking Syariah**


In [None]:
import pandas as pd
from google_play_scraper import app, reviews_all

def convert_result_to_dataframe(results):
    hasil = dict()
    for data in results:
        for key, value in data.items():
            if key not in hasil:
                hasil[key] = []
            hasil[key].append(value)

    hasil_akhir = pd.DataFrame(hasil)
    return(hasil_akhir)

iterasi = 1
data_integrasi = pd.DataFrame()

for app_package in master_id_bank_syariah:
    try:
        app_detail = app(app_package)
        print(f"Nama Aplikasi : {app_detail['title']}")

        result = reviews_all(
            app_package,
            #lang = 'id'
        )

        data = convert_result_to_dataframe(result)
        data['app_name'] = app_detail['title'].upper()

        if(iterasi == 1):
            data_integrasi = data
        else:
            data_integrasi = pd.concat([data_integrasi, data])

        iterasi += 1
    except:
      pass

for col_date in ['at', 'repliedAt']:
    data_integrasi[col_date] = pd.to_datetime(data_integrasi[col_date], errors='coerce')

Nama Aplikasi : BSI Mobile
Nama Aplikasi : M-Syariah
Nama Aplikasi : Aladin : Bank Syariah Digital
Nama Aplikasi : Mobile Maslahah by bjb syariah
Nama Aplikasi : BISA Mobile by KBBS
Nama Aplikasi : Tepat Mobile
Nama Aplikasi : Aira Mobile
Nama Aplikasi : Action Mobile
Nama Aplikasi : BRKS Mobile
Nama Aplikasi : Bank NTB Syariah mBanking
Nama Aplikasi : BMD Syariah Mobile System
Nama Aplikasi : Bank Jago/Jago Syariah
Nama Aplikasi : ALAMI P2P Funding Sharia
Nama Aplikasi : Dana Syariah
Nama Aplikasi : Shafa by Danamon Syariah


In [None]:
# Tampilkan hasilnya
data_integrasi.head(10)

Unnamed: 0,reviewId,userName,userImage,content,score,thumbsUpCount,reviewCreatedVersion,at,replyContent,repliedAt,appVersion,app_name
0,33ee743d-8bec-41d7-8e7a-b09455e65a1d,A Google user,https://play-lh.googleusercontent.com/EGemoI2N...,sy dan istri pengguna 2 akun bsi mobile dengan...,5.0,0.0,6.17.0,2025-07-13 11:53:49,Assalamualaikum Bapak Teguh Wasisto mohon maaf...,2025-07-13 12:35:07,6.17.0,BSI MOBILE
1,2724411c-f2ff-4c03-9ee8-5f29392ec001,A Google user,https://play-lh.googleusercontent.com/EGemoI2N...,udah 3 minggu lebih top up ke DANA ga bisa dil...,1.0,0.0,6.26.0,2025-07-12 06:34:23,"Assalamualaikum Bapak Handy, mohon maaf transa...",2025-07-12 06:46:41,6.26.0,BSI MOBILE
2,19ef70fd-61f0-4fbf-8459-68f4b0a4db7b,A Google user,https://play-lh.googleusercontent.com/EGemoI2N...,Setelah diupdate setiap buka aplikasi harus ny...,1.0,0.0,6.26.0,2025-07-05 02:57:19,"Assalamualaikum Bapak/Ibu,mohon maaf atas keti...",2025-07-05 03:15:49,6.26.0,BSI MOBILE
3,407f5822-65ba-4556-9485-bbcfead85437,A Google user,https://play-lh.googleusercontent.com/EGemoI2N...,"busuk banget sumpah nih aplikasi, gangguan mul...",1.0,0.0,,2025-06-29 03:16:43,"Assalamualaikum Bapak Rachmat , mohon maaf ata...",2025-06-29 06:10:54,,BSI MOBILE
4,ee4eaef4-7c50-4293-8fa0-dd3d35ba1884,A Google user,https://play-lh.googleusercontent.com/EGemoI2N...,Ribet kali mau cek saldo aja harus install apl...,1.0,1.0,6.26.0,2025-06-28 19:20:53,"Assalamualaikum Ibu Hazeeza, mohon maaf atas k...",2025-06-28 23:57:39,6.26.0,BSI MOBILE
5,36ce483e-5688-4341-a46e-d8f3dfb90134,A Google user,https://play-lh.googleusercontent.com/EGemoI2N...,"susah betul buka app ini, keluar sendiri setel...",1.0,0.0,,2025-06-27 02:10:53,"Assalamualaikum Bapak Arya, mohon maaf atas ke...",2025-06-27 03:26:27,,BSI MOBILE
6,afbc1dd3-4b12-48f6-b389-3ee290b38c75,A Google user,https://play-lh.googleusercontent.com/EGemoI2N...,di aplikasi tidak ada fitur buka rekening baru...,1.0,0.0,6.26.0,2025-06-26 07:02:34,"Assalamualaikum Bapak Fajar, mohon maaf atas k...",2025-06-26 10:04:49,6.26.0,BSI MOBILE
7,0d9e91c6-9c2a-46fa-800c-0ec4c59afd3a,A Google user,https://play-lh.googleusercontent.com/EGemoI2N...,Tidak memikirkan warga menengah ke bawah yang ...,2.0,0.0,6.26.0,2025-06-25 23:43:39,"Assalamualaikum Bapak Cahyo, mohon maaf atas k...",2025-06-26 07:55:38,6.26.0,BSI MOBILE
8,6524d048-3549-4e44-acec-c5d2b966e1e0,A Google user,https://play-lh.googleusercontent.com/EGemoI2N...,"Sekarang Gak Asik, Udah mah sering Eror, gak b...",2.0,2.0,6.26.0,2025-06-24 21:43:51,"Assalamualaikum Bapak Nuril, mohon maaf atas k...",2025-06-25 08:08:14,6.26.0,BSI MOBILE
9,f663df66-feae-423a-9d48-366f714ea3d7,A Google user,https://play-lh.googleusercontent.com/EGemoI2N...,Ini ngelag terus ga bisa dibuka kenapa?,2.0,0.0,6.26.0,2025-06-24 01:12:49,"Assalamualaikum Ibu Fanila, mohon maaf atas ke...",2025-06-24 01:19:45,6.26.0,BSI MOBILE


## **Simpan Hasil Scraping**


In [None]:
data_integrasi.to_excel('data_scrapping.xlsx', index = False, engine = 'openpyxl')

## **Contoh Kebutuhan Analysis**


In [None]:
rataan_score = data_integrasi.groupby(['app_name'], as_index = False).agg(
    jumlah_data = ('reviewId', 'count'),
    total_score = ('score', 'sum'),
    avg_score = ('score', 'mean'),
    stdev_score = ('score', 'std')
)

rataan_score = rataan_score.sort_values(by = ['avg_score'], ascending = False, ignore_index = True)
rataan_score

Unnamed: 0,app_name,jumlah_data,total_score,avg_score,stdev_score
0,BMD SYARIAH MOBILE SYSTEM,3,15.0,5.0,0.0
1,ALADIN : BANK SYARIAH DIGITAL,2542,11939.0,4.696696,1.001996
2,ACTION MOBILE,177,744.0,4.20339,1.362397
3,DANA SYARIAH,1036,4317.0,4.166988,1.473145
4,BISA MOBILE BY KBBS,118,487.0,4.127119,1.51091
5,TEPAT MOBILE,47,193.0,4.106383,1.563636
6,MOBILE MASLAHAH BY BJB SYARIAH,220,882.0,4.009091,1.523007
7,ALAMI P2P FUNDING SHARIA,274,1090.0,3.978102,1.571109
8,M-SYARIAH,303,1162.0,3.834983,1.698996
9,BANK JAGO/JAGO SYARIAH,7694,27715.0,3.602158,1.729956
