# selenium
selenium适用于爬取少量的页面和数据，

优点是简单，不用分析数据是由哪个请求产生的

缺点是慢，如果要爬取大量页面和数据，那得慢死。。。

https://selenium-python.readthedocs.io/

In [3]:
# !pip install selenium
!pip freeze | grep selenium

selenium==3.141.0
snapshot-selenium==0.0.2


## Drivers
Selenium requires a driver to interface with the chosen browser. 

Make sure it’s in your PATH, e. g., place it in /usr/bin or /usr/local/bin.

Chrome:	https://sites.google.com/a/chromium.org/chromedriver/downloads

# Getting Started
https://selenium-python.readthedocs.io/getting-started.html

## Simple Usage

In [4]:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys

In [5]:
driver = webdriver.Chrome()

In [6]:
driver.get("http://www.python.org")

In [7]:
driver.title

'Welcome to Python.org'

In [8]:
assert "Python" in driver.title

In [9]:
elem = driver.find_element_by_name("q")
elem

<selenium.webdriver.remote.webelement.WebElement (session="cf207d82035ffd892c4c441bee023c7b", element="6e0b9d6b-a220-4728-b241-1f4e1e81517c")>

In [10]:
elem.text

''

In [11]:
elem.clear()

In [12]:
elem.send_keys("pycon")

In [13]:
elem.send_keys(Keys.RETURN)

In [15]:
driver.page_source



In [16]:
driver.page_source.find('No results found.')

-1

In [14]:
assert "No results found." not in driver.page_source

In [17]:
driver.close()

## Using Selenium to write tests

In [19]:
!python test_python_org_search.py

.
----------------------------------------------------------------------
Ran 1 test in 4.883s

OK


## Navigating 导航
https://selenium-python.readthedocs.io/navigating.html

In [None]:
driver.get("http://www.google.com")

### Interacting with the page

In [None]:
<input type="text" name="passwd" id="passwd-id" />

In [None]:
element = driver.find_element_by_id("passwd-id")
element = driver.find_element_by_name("passwd")
element = driver.find_element_by_xpath("//input[@id='passwd-id']")

In [None]:
# you may want to enter some text into a text field:
element.send_keys("some text")

In [None]:
# You can simulate pressing the arrow keys by using the “Keys” class:
element.send_keys(" and some", Keys.ARROW_DOWN)

It is possible to call send_keys on any element, which makes it possible to test keyboard shortcuts such as those used on GMail. A side-effect of this is that typing something into a text field won’t automatically clear it. Instead, what you type will be appended to what’s already there.

In [None]:
# You can easily clear the contents of a text field or textarea with the clear method:

element.clear()

### Filling in forms

## Locating Elements

* find_element_by_id
* find_element_by_name
* find_element_by_xpath
* find_element_by_link_text
* find_element_by_partial_link_text
* find_element_by_tag_name
* find_element_by_class_name
* find_element_by_css_selector

To find multiple elements (these methods will return a list):

* find_elements_by_name
* find_elements_by_xpath
* find_elements_by_link_text
* find_elements_by_partial_link_text
* find_elements_by_tag_name
* find_elements_by_class_name
* find_elements_by_css_selector

## Waits
These days most of the web apps are using AJAX techniques. When a page is loaded by the browser, the elements within that page may load at different time intervals. This makes locating elements difficult: if an element is not yet present in the DOM, a locate function will raise an ElementNotVisibleException exception. Using waits, we can solve this issue. Waiting provides some slack between actions performed - mostly locating an element or any other operation with the element.

Selenium Webdriver provides two types of waits - implicit & explicit. An explicit wait makes WebDriver wait for a certain condition to occur before proceeding further with execution. An implicit wait makes WebDriver poll the DOM for a certain amount of time when trying to locate an element.



### Explicit Waits 显示等待
An explicit wait is a code you define to wait for a certain condition to occur before proceeding further in the code. The extreme case of this is time.sleep(), which sets the condition to an exact time period to wait. There are some convenience methods provided that help you write code that will wait only as long as required. WebDriverWait in combination with ExpectedCondition is one way this can be accomplished.

In [None]:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Firefox()
driver.get("http://somedomain/url_that_delays_loading")
try:
    element = WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.ID, "myDynamicElement"))
    )
finally:
    driver.quit()

This waits up to 10 seconds before throwing a TimeoutException unless it finds the element to return within 10 seconds. WebDriverWait by default calls the ExpectedCondition every 500 milliseconds until it returns successfully. A successful return is for ExpectedCondition type is Boolean return true or not null return value for all other ExpectedCondition types.

#### Expected Conditions

There are some common conditions that are frequently of use when automating web browsers. Listed below are the names of each. Selenium Python binding provides some convenience methods so you don’t have to code an expected_condition class yourself or create your own utility package for them.

In [None]:
title_is
title_contains
presence_of_element_located
visibility_of_element_located
visibility_of
presence_of_all_elements_located
text_to_be_present_in_element
text_to_be_present_in_element_value
frame_to_be_available_and_switch_to_it
invisibility_of_element_located
element_to_be_clickable
staleness_of
element_to_be_selected
element_located_to_be_selected
element_selection_state_to_be
element_located_selection_state_to_be
alert_is_present

### Implicit Waits
An implicit wait tells WebDriver to poll the DOM for a certain amount of time when trying to find any element (or elements) not immediately available. The default setting is 0. Once set, the implicit wait is set for the life of the WebDriver object.



In [None]:
from selenium import webdriver

driver = webdriver.Firefox()
driver.implicitly_wait(10) # seconds
driver.get("http://somedomain/url_that_delays_loading")
myDynamicElement = driver.find_element_by_id("myDynamicElement")