# Day29
## 動態網頁爬蟲起手式：Selenium 
- 認識 Selenium 及其應用
- 瞭解 Selenium 如何用於動態爬蟲



## 作業說明
練習使用 Selenium 爬取品類列表連結
- 目標網站： https://channel.jd.com/outdoor.html

目標：
- 取得當前所有小類別名稱

![](https://i.imgur.com/2aCtZM5.png)

Hint: 
- 參考講義 Day29 內容
- 請記得先安裝 Chrome 瀏覽器
- 也會用到我們 Day20 所學的 Xpath

### 套件導入

In [1]:

import time

import numpy as np
from fake_useragent import UserAgent
from selenium import webdriver
from selenium.webdriver.common.by import By
from tqdm import tqdm

### 使用 fake-useragent 產生 User Agent

In [2]:
# 設定 user agent
opt = webdriver.ChromeOptions()
ua = UserAgent()
opt.add_argument(f'--user-agent={ua.random}')

### 測試是否能成功啟動 webdriver 開啟目標網址


In [3]:
# 啟動 webdriver，載入目標網址
base_url = 'https://channel.jd.com/outdoor.html'

driver = webdriver.Remote(command_executor='http://selenium:4444/wd/hub', options=opt)
driver.maximize_window()
driver.get(base_url)
time.sleep(np.random.uniform(3, 5))

### 獲取資料
- 練習 selenium 操作
- 活用 `WebElement.find_elements_by_xpath()`, `WebElement.text`, `WebElement.get_attribute()` 等方法

In [4]:
# 觀察有兩種規則
# 1. 大字類別： <div id="Categorys">...<dt>"戶外鞋服"</dt>   (網站本身英文複數用錯...)
# 2. 小字類別： <div id="Categorys">...<dd>...<a>"衝鋒衣褲"</a>

# 取得 <div id="Categorys"> 
categories = driver.find_element(By.XPATH, '//div[@id="Categorys"]')

# 取得所有 大字類別
# 使用相對路徑: "." 開頭
cate_names = []

for ele_group in tqdm(categories.find_elements(By.XPATH, './/dl[@class="item-inner"]')):
    medium_cate = ele_group.find_element(By.XPATH, './/dt').text
    for small_cate in ele_group.find_elements(By.XPATH, './/a'):
        cate_names.append((medium_cate, small_cate.text, small_cate.get_attribute('href')))

100%|██████████| 5/5 [00:01<00:00,  3.54it/s]


In [5]:
driver.quit()

In [6]:
import pandas as pd

pd.DataFrame(cate_names, columns=["中類別","小類別","類別頁連結"])


Unnamed: 0,中類別,小類別,類別頁連結
0,户外鞋服,冲锋衣裤,https://list.jd.com/list.html?cat=1318%2C2628%...
1,户外鞋服,徒步鞋,https://list.jd.com/list.html?cat=1318%2C2628%...
2,户外鞋服,抓绒衣裤,"https://list.jd.com/list.html?cat=1318,2628,12128"
3,户外鞋服,羽绒服棉服,"https://list.jd.com/list.html?cat=1318,2628,12126"
4,户外鞋服,越野跑鞋,"https://list.jd.com/list.html?cat=1318,2628,12137"
5,户外鞋服,软壳,"https://list.jd.com/list.html?cat=1318,2628,12129"
6,户外鞋服,登山鞋,"https://list.jd.com/list.html?cat=1318,2628,12134"
7,户外鞋服,休闲鞋,"https://list.jd.com/list.html?cat=1318,2628,12138"
8,户外装备,帐篷,"https://list.jd.com/list.html?cat=1318,1462,1473"
9,户外装备,照明,"https://list.jd.com/list.html?cat=1318,1462,1476"
