# **Scraping | Cyber University (BRI Institute)**

## Prerequists

**Modul yang harus diinstall:**<br>
`serpapi` : Untuk mengambil data dari Google Search Engine<br>
`google-search-results` : Untuk mengambil data dari Google Search Engine<br>
`pandas` : Untuk menyimpan hasil scraping ke csv

**Cara menginstall:**
```bash
pip install serpapi
pip install google-search-results
pip install pandas
```

In [1]:
# API_KEY= 8ad99b53192a9e2d79205b48516c9d66e4aef15e48b085d799dc0d3d8a525fb3
# Target scraping: Cyber University (BRI Institute)
# Site: https://www.google.com/maps/place/Cyber+University/@-6.3035261,106.8428613,17z/data=!4m8!3m7!1s0x2e69ed8c761affff:0x85573c41e3336634!8m2!3d-6.3035261!4d106.8454362!9m1!1b1!16s%2Fg%2F11srmdvw0y?hl=id&entry=ttu&g_ep=EgoyMDI0MTIxMS4wIKXMDSoASAFQAw%3D%3D
# Data id: 0x2e69ed8c761affff:0x85573c41e3336634

In [2]:
# import the necessary libraries
from serpapi import GoogleSearch
from urllib.parse import urlsplit, parse_qsl
import json
import pandas as pd
import os

In [3]:
# define the API key and data id
API_KEY = "8ad99b53192a9e2d79205b48516c9d66e4aef15e48b085d799dc0d3d8a525fb3"
data_id = "0x2e69ed8c761affff:0x85573c41e3336634" # place id data located inside Google Maps Place URL: located inside `data=` query parameter.

# define the output directory and file name
output_dir = "../../dataset"
output_name = "CyberU_reviews.csv"

# define the search parameters
params = {
  "api_key": API_KEY,                                 # serpapi api key
  "engine": "google_maps_reviews",                    # serpapi search engine
  "hl": "id",                                         # language of the search
  "data_id": data_id  # place id data located inside Google Maps Place URL: located inside `data=` query parameter. 
}

# search for the reviews
search = GoogleSearch(params)

# create an empty list to store the reviews
reviews = []

# loop through the pages to extract the reviews
page_num = 0
while True:
    page_num += 1
    try:
        results = search.get_dict()
    except ConnectionError as e:
        print(f"Connection error: {e}")
        continue
    if "error" not in results:
        print(f"Extracting reviews from {page_num} page.")

    if not "error" in results:
        for result in results.get("reviews", []): # return an empty list [] if no reviews from the place
            reviews.append({
                "page": page_num,
                "name": result.get("user").get("name"),
                "link": result.get("user").get("link"),
                "thumbnail": result.get("user").get("thumbnail"),
                "rating": result.get("rating"),
                "date": result.get("date"),
                "snippet": result.get("snippet"),
                "images": result.get("images"),
                "local_guide": result.get("user").get("local_guide"),
                # other data
            })
    else:
        print(results["error"])
        break

    serpapi_pagination = results.get("serpapi_pagination")
    if serpapi_pagination and serpapi_pagination.get("next") and serpapi_pagination.get("next_page_token"):
        # split URL in parts as a dict and update search "params" variable to a new page that will be passed to GoogleSearch()
        search.params_dict.update(dict(parse_qsl(urlsplit(serpapi_pagination["next"]).query)))
    else:
        break

    
print(json.dumps(reviews, indent=2, ensure_ascii=False))
df = pd.DataFrame(reviews)
# Ensure the directory exists
os.makedirs(output_dir, exist_ok=True)

# Save the dataframe to a CSV file
df.to_csv(os.path.join(output_dir, output_name), index=False)

Extracting reviews from 1 page.
Extracting reviews from 2 page.
Extracting reviews from 3 page.
Extracting reviews from 4 page.
Extracting reviews from 5 page.
Extracting reviews from 6 page.
Extracting reviews from 7 page.
Extracting reviews from 8 page.
Google hasn't returned any results for this query.
[
  {
    "page": 1,
    "name": "Mahasiswa Pelajar",
    "link": "https://www.google.com/maps/contrib/112543931638230760355?hl=id",
    "thumbnail": "https://lh3.googleusercontent.com/a-/ALV-UjVjyYXn2YYmpStUE3RTNihsw_Cq2gw4-UB6Yv24cqmM-TnXGMjP=s120-c-rp-mo-br100",
    "rating": 5.0,
    "date": "seminggu lalu",
    "snippet": "Lokasi Kampusnya strategis, 1 menit keluar tol pasar minggu\nlulusan nya memiliki kompetensi dan knowledge keren, ga percaya? cek di google via meta ai juga boleh\n\n#Cyberian #NewGeneration",
    "images": null,
    "local_guide": null
  },
  {
    "page": 1,
    "name": "Fathia Rizky",
    "link": "https://www.google.com/maps/contrib/103682754447822984153?hl=