# **Scraping | *Kampus...........***

## Prerequists

**Modul yang harus diinstall:**<br>
`serpapi` : Untuk mengambil data dari Google Search Engine<br>
`google-search-results` : Untuk mengambil data dari Google Search Engine<br>
`pandas` : Untuk menyimpan hasil scraping ke csv

**Cara menginstall:**
```bash
pip install serpapi
pip install google-search-results
pip install pandas
```

In [None]:
# API_KEY= 
# Target scraping: https://www.google.com/maps/place/Universitas+Telkom+Surabaya/@-7.3111665,112.7263401,17z/data=!4m8!3m7!1s0x2dd7fbd1cb925a1d:0x1dbecb0b2e9b059f!8m2!3d-7.3111665!4d112.728915!9m1!1b1!16s%2Fg%2F11ghfs58ly?entry=ttu&g_ep=EgoyMDI1MDYwMy4wIKXMDSoASAFQAw%3D%3D
# Site: 
# Data id: 0x2dd7fbd1cb925a1d:0x1dbecb0b2e9b059f

In [1]:
# import the necessary libraries
from serpapi import GoogleSearch
from urllib.parse import urlsplit, parse_qsl
import json
import pandas as pd
import os
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

# Get SERPAPI_KEY from environment variables
SERPAPI_KEY = os.getenv("SERPAPI_KEY")

In [2]:
# define the API key and data id
# API_KEY = ""
data_id = "0x2dd7fbd1cb925a1d:0x1dbecb0b2e9b059f" # place id data located inside Google Maps Place URL: located inside `data=` query parameter.

# define the output directory and file name.
output_dir = "../dataset"
output_name = "data_ITTS.csv"

# define the search parameters
params = {
  "api_key": SERPAPI_KEY,                                 # serpapi api key
  "engine": "google_maps_reviews",                    # serpapi search engine
  "hl": "id",                                         # language of the search
  "data_id": data_id  # place id data located inside Google Maps Place URL: located inside `data=` query parameter. 
}

# search for the reviews
search = GoogleSearch(params)

# create an empty list to store the reviews
reviews = []

# loop through the pages to extract the reviews
page_num = 0
while True:
    page_num += 1
    try:
        results = search.get_dict()
    except ConnectionError as e:
        print(f"Connection error: {e}")
        continue
    if "error" not in results:
        print(f"Extracting reviews from {page_num} page.")

    if not "error" in results:
        for result in results.get("reviews", []): # return an empty list [] if no reviews from the place
            reviews.append({
                "page": page_num,
                "name": result.get("user").get("name"),
                "link": result.get("user").get("link"),
                "thumbnail": result.get("user").get("thumbnail"),
                "rating": result.get("rating"),
                "date": result.get("date"),
                "snippet": result.get("snippet"),
                "images": result.get("images"),
                "local_guide": result.get("user").get("local_guide"),
                # other data
            })
    else:
        print(results["error"])
        break

    serpapi_pagination = results.get("serpapi_pagination")
    if serpapi_pagination and serpapi_pagination.get("next") and serpapi_pagination.get("next_page_token"):
        # split URL in parts as a dict and update search "params" variable to a new page that will be passed to GoogleSearch()
        search.params_dict.update(dict(parse_qsl(urlsplit(serpapi_pagination["next"]).query)))
    else:
        break

    
print(json.dumps(reviews, indent=2, ensure_ascii=False))
df = pd.DataFrame(reviews)
# Ensure the directory exists
os.makedirs(output_dir, exist_ok=True)

# Save the dataframe to a CSV file
df.to_csv(os.path.join(output_dir, output_name), index=False)

Extracting reviews from 1 page.
Extracting reviews from 2 page.
Extracting reviews from 3 page.
Extracting reviews from 4 page.
Extracting reviews from 5 page.
Extracting reviews from 6 page.
Extracting reviews from 7 page.
Extracting reviews from 8 page.
Extracting reviews from 9 page.
Extracting reviews from 10 page.
Extracting reviews from 11 page.
Extracting reviews from 12 page.
Extracting reviews from 13 page.
Extracting reviews from 14 page.
Extracting reviews from 15 page.
Extracting reviews from 16 page.
Extracting reviews from 17 page.
Extracting reviews from 18 page.
Extracting reviews from 19 page.
Extracting reviews from 20 page.
Extracting reviews from 21 page.
Extracting reviews from 22 page.
Extracting reviews from 23 page.
Extracting reviews from 24 page.
Extracting reviews from 25 page.
Extracting reviews from 26 page.
Extracting reviews from 27 page.
Extracting reviews from 28 page.
Extracting reviews from 29 page.
Extracting reviews from 30 page.
Extracting reviews 