# 動態爬蟲：用Selenium及BeautifulSoup獲取Google Map 店家評論

## 前言
這篇文章將透過 Selenium & BeautifulSoup 自動抓取 Google Map 上 Costco 的客戶評分與留言等資料。

### 本文架構 
1. 獲取評論網頁
2. 選擇評論排序方法
3. 取得評論內容(評論者名稱、評論時間、評論內容、評論星評、評論獲讚數)
4. 載入及保存資料

### 導入需要的工具

In [1]:
import os
import time
import pandas as pd
from itertools import zip_longest
from bs4 import BeautifulSoup as Soup
from selenium.webdriver import Chrome
from selenium.webdriver.common.by import By

### 搜尋Costco評論 & 排序

In [2]:
# 搜尋評論
driver = Chrome('./chromedriver')
url = 'https://www.google.com/maps/place/%E5%A5%BD%E5%B8%82%E5%A4%9A+%E5%8F%B0%E4%B8%AD%E5%BA%97/@24.1327624,120.6468621,17z/data=!4m7!3m6!1s0x34693db15b74e261:0x400cf6c2b8dac047!8m2!3d24.1327624!4d120.6490508!9m1!1b1'

driver.get(url)
time.sleep(5)

# 依評論時間排序

## "Sort" button
driver.find_element(by = By.XPATH, value = '//*[@id="QA0Szd"]/div/div/div[1]/div[2]/div/div[1]/div/div/div[2]/div[7]/div[2]/button').click()
time.sleep(5)

## "Newest" section
driver.find_element(by = By.XPATH, value = '//*[@id="action-menu"]/div[2]').click()
time.sleep(5)


  driver = Chrome('./chromedriver')


### 載入評論
由新到舊排序後，會發現預設都只有前十條留言的資料，先設定載入前50筆留言

In [3]:
# 評論分頁下滑

number = 0
while True:
    number = number + 1
    
    pane = driver.find_element(by = By.XPATH, value = '//*[@id="QA0Szd"]/div/div/div[1]/div[2]/div/div[1]/div/div/div[2]')
    driver.execute_script("arguments[0].scrollTop = arguments[0].scrollHeight", pane)
    
    # Wait to load page
    time.sleep(5)
    
    if number == 5:
            break


In [4]:
item = driver.find_elements(by = By.XPATH, value = '//*[@id="QA0Szd"]/div/div/div[1]/div[2]/div/div[1]/div/div/div[2]/div[9]')
time.sleep(3)

username_list = []
comment_time_list = []
review_list = []
rate_list = []
likes_list = []

# 展開'全文'
for i in item:

    button = i.find_elements(by = By.TAG_NAME, value = 'button')
    for m in button:
        if m.text == "全文":
            m.click()
    time.sleep(5)
    
    # 儲存評論
    username = i.find_elements(by = By.CLASS_NAME, value = 'd4r55')
    comment_time = i.find_elements(by = By.CLASS_NAME, value = 'rsqaWe')
    review = i.find_elements(by = By.CLASS_NAME, value = 'wiI7pd')
    rate = i.find_elements(by = By.CLASS_NAME, value = 'kvMYJc')
    
    for j,k,l,p in zip_longest(username,comment_time,review,rate):
        username_list.append(j.text)
        comment_time_list.append(k.text)
        review_list.append(l.text)
        rate_list.append(p.get_attribute("aria-label").strip().strip("顆星"))
        

soup = Soup(driver.page_source, 'html')
likes = soup.find_all(class_ = 'jftiEf fontBodyMedium')
for s in likes:
    try:
        likes = s.find(class_ = "pkWtMe").text
        likes_list.append(likes)
        
    except AttributeError as e:
        likes_list.append('0')
        pass
    
time.sleep(5)
driver.close()

### 載入&儲存評論

In [6]:
d1 = pd.DataFrame(
    {'store_name': '好市多 台中店',
     'username': username_list,
     'comment_time': comment_time_list,
     'review': review_list,
     'rate': rate_list,
     'likes': likes_list})  

d1.to_csv('g_review.csv',index = False)
d1.head()

Unnamed: 0,store_name,username,comment_time,review,rate,likes
0,好市多 台中店,Felicia Peng,5 個月前,週五晚上人也好多\n停車場停超滿滿滿～～\n每一條走道都有很多人😅 …,4,0
1,好市多 台中店,May Zhou,11 個月前,很方便的美式賣場，自從北屯店開了之後，終於分散了一些人潮。 …,5,8
2,好市多 台中店,紅晨星,2 個月前,會員制，每年年費1350元，如果晉升黑卡每筆消費還優惠2%，以各式生鮮商品還有紅白酒最超值，...,5,0
3,好市多 台中店,Jimmy Chen,2 個月前,位於台中的好市多一店，吸引了許多彰化員林的人前往購買，消費力不容小覷。汽車時常需要繞好幾圈才...,4,2
4,好市多 台中店,坤澤 _Cliff,6 個月前,停車目前不用收費，賣場很大很好逛，東西都是家庭號的分量，除下來還是划算，目前商場裡面有試吃，...,5,0
