# **Scraping | Universitas International Semen Indonesia (UISI)**

## Prerequists

**Modul yang harus diinstall:**<br>
`serpapi` : Untuk mengambil data dari Google Search Engine<br>
`google-search-results` : Untuk mengambil data dari Google Search Engine<br>
`pandas` : Untuk menyimpan hasil scraping ke csv

**Cara menginstall:**
```bash
pip install serpapi
pip install google-search-results
pip install pandas
```

In [1]:
# API_KEY=8ad99b53192a9e2d79205b48516c9d66e4aef15e48b085d799dc0d3d8a525fb3
# Target scraping: Universitas Internasional Semen Indonesia UISI
# Site: https://www.google.com/maps/place/Universitas+Internasional+Semen+Indonesia/@-7.17562,112.6465981,17z/data=!4m18!1m9!3m8!1s0x2dd8003eae3b5885:0xe591511ea76dac1d!2sUniversitas+Internasional+Semen+Indonesia!8m2!3d-7.17562!4d112.649173!9m1!1b1!16s%2Fg%2F1jkw8vfbv!3m7!1s0x2dd8003eae3b5885:0xe591511ea76dac1d!8m2!3d-7.17562!4d112.649173!9m1!1b1!16s%2Fg%2F1jkw8vfbv?hl=id&entry=ttu&g_ep=EgoyMDI0MTIxMS4wIKXMDSoASAFQAw%3D%3D
# Data id: 0x2dd8003eae3b5885:0xe591511ea76dac1d

In [2]:
# import the necessary libraries
from serpapi import GoogleSearch
from urllib.parse import urlsplit, parse_qsl
import json
import pandas as pd
import os

In [3]:
# define the API key and data id
API_KEY = "8ad99b53192a9e2d79205b48516c9d66e4aef15e48b085d799dc0d3d8a525fb3"
data_id = "0x2dd8003eae3b5885:0xe591511ea76dac1d" # place id data located inside Google Maps Place URL: located inside `data=` query parameter.

# define the output directory and file name
output_dir = "../../dataset"
output_name = "UISI_reviews.csv"

# define the search parameters
params = {
  "api_key": API_KEY,                                 # serpapi api key
  "engine": "google_maps_reviews",                    # serpapi search engine
  "hl": "id",                                         # language of the search
  "data_id": data_id  # place id data located inside Google Maps Place URL: located inside `data=` query parameter. 
}

# search for the reviews
search = GoogleSearch(params)

# create an empty list to store the reviews
reviews = []

# loop through the pages to extract the reviews
page_num = 0
while True:
    page_num += 1
    try:
        results = search.get_dict()
    except ConnectionError as e:
        print(f"Connection error: {e}")
        continue
    if "error" not in results:
        print(f"Extracting reviews from {page_num} page.")

    if not "error" in results:
        for result in results.get("reviews", []): # return an empty list [] if no reviews from the place
            reviews.append({
                "page": page_num,
                "name": result.get("user").get("name"),
                "link": result.get("user").get("link"),
                "thumbnail": result.get("user").get("thumbnail"),
                "rating": result.get("rating"),
                "date": result.get("date"),
                "snippet": result.get("snippet"),
                "images": result.get("images"),
                "local_guide": result.get("user").get("local_guide"),
                # other data
            })
    else:
        print(results["error"])
        break

    serpapi_pagination = results.get("serpapi_pagination")
    if serpapi_pagination and serpapi_pagination.get("next") and serpapi_pagination.get("next_page_token"):
        # split URL in parts as a dict and update search "params" variable to a new page that will be passed to GoogleSearch()
        search.params_dict.update(dict(parse_qsl(urlsplit(serpapi_pagination["next"]).query)))
    else:
        break

    
print(json.dumps(reviews, indent=2, ensure_ascii=False))
df = pd.DataFrame(reviews)
# Ensure the directory exists
os.makedirs(output_dir, exist_ok=True)

# Save the dataframe to a CSV file
df.to_csv(os.path.join(output_dir, output_name), index=False)

Extracting reviews from 1 page.
Extracting reviews from 2 page.
Extracting reviews from 3 page.
Extracting reviews from 4 page.
Extracting reviews from 5 page.
Extracting reviews from 6 page.
Extracting reviews from 7 page.
Extracting reviews from 8 page.
Extracting reviews from 9 page.
Extracting reviews from 10 page.
Extracting reviews from 11 page.
Extracting reviews from 12 page.
Extracting reviews from 13 page.
Extracting reviews from 14 page.
Extracting reviews from 15 page.
Extracting reviews from 16 page.
Extracting reviews from 17 page.
Extracting reviews from 18 page.
Extracting reviews from 19 page.
Extracting reviews from 20 page.
Extracting reviews from 21 page.
Extracting reviews from 22 page.
[
  {
    "page": 1,
    "name": "Firdaus Agil Prasetyo",
    "link": "https://www.google.com/maps/contrib/111423810639753200894?hl=id",
    "thumbnail": "https://lh3.googleusercontent.com/a-/ALV-UjWYynH8NXJHZVjRP02hX7-xS8E1cZreid20nSAOnwlgqTebzLBTMg=s120-c-rp-mo-ba3-br100",
    "ra