# Selenium vs Normal Request

In the sample website, the port numbers of the proxy IPs are loaded dynamically after loading the page. Thus, if we try to use the normal request to get the HTML and parse the port numbers, they will be wrong values. As shown below.

### Normal Request

In [26]:
# coding : UTF-8
import json
from utils import request_util

In [31]:
soup = BeautifulSoup(request_util.get_html("http://www.goubanjia.com/"), "lxml")
td_list = soup.select("table.table-hover tbody td.ip")
for i in td_list:
    ip = ""
    for j in i.find_all()[:-1]:
        if(j.name!="p"):
            ip += j.text
    port = i.select("span.port")[0].text
    cmd = "http_proxy=%s:%s curl ip.gs" % (ip, port)
    print(cmd)

http_proxy=185.136.158.14:8638 curl ip.gs
http_proxy=206.189.66.204:8486 curl ip.gs
http_proxy=118.24.185.22:8800 curl ip.gs
http_proxy=128.199.222.51:8676 curl ip.gs
http_proxy=47.104.172.227:8167 curl ip.gs
http_proxy=178.128.91.23:8912 curl ip.gs
http_proxy=144.208.88.204:8381 curl ip.gs
http_proxy=195.235.204.60:8904 curl ip.gs
http_proxy=64.19.101.22:8670 curl ip.gs
http_proxy=195.189.60.23:8269 curl ip.gs
http_proxy=39.137.107.34:8865 curl ip.gs
http_proxy=185.136.157.235:8351 curl ip.gs
http_proxy=60.255.186.169:8605 curl ip.gs
http_proxy=103.108.140.216:8702 curl ip.gs
http_proxy=202.155.82.222:8867 curl ip.gs
http_proxy=188.235.152.114:8416 curl ip.gs
http_proxy=35.224.248.29:8105 curl ip.gs
http_proxy=69.51.6.201:9058 curl ip.gs
http_proxy=41.164.31.154:9040 curl ip.gs
http_proxy=194.44.246.83:8973 curl ip.gs


### Selenium

Using Selenium, we can mimic the browser behaviors. In this case, we wait for 3 seconds to let the website loads the correct port numbers. Then we try to parse the port numbers and are able to get the correct values. Notice the differences in port numbers between the results below and up.

In [28]:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException

In [29]:
browser = webdriver.Chrome('/Users/frank/Desktop/crawler/chromedriver')
browser.get("http://www.goubanjia.com")
myElem = WebDriverWait(browser, 3)

In [30]:
td_list = browser.find_elements_by_css_selector("table.table-hover tbody td.ip")
for i in td_list:
    ip = ""
    for j in i.find_elements_by_css_selector("*")[:-1]:
        if(j.tag_name!="p"):
            ip += j.text
    port = i.find_elements_by_css_selector("span.port")[0].text
    cmd = "http_proxy=%s:%s curl ip.gs" % (ip, port)
    print(cmd)
    
browser.close()

http_proxy=185.136.158.14:1080 curl ip.gs
http_proxy=206.189.66.204:3128 curl ip.gs
http_proxy=118.24.185.22:80 curl ip.gs
http_proxy=128.199.222.51:8080 curl ip.gs
http_proxy=47.104.172.227:80 curl ip.gs
http_proxy=178.128.91.23:8080 curl ip.gs
http_proxy=144.208.88.204:3128 curl ip.gs
http_proxy=195.235.204.60:3128 curl ip.gs
http_proxy=64.19.101.22:23500 curl ip.gs
http_proxy=195.189.60.23:3128 curl ip.gs
http_proxy=39.137.107.34:8080 curl ip.gs
http_proxy=185.136.157.235:1080 curl ip.gs
http_proxy=60.255.186.169:8888 curl ip.gs
http_proxy=103.108.140.216:3128 curl ip.gs
http_proxy=202.155.82.222:40408 curl ip.gs
http_proxy=188.235.152.114:8080 curl ip.gs
http_proxy=35.224.248.29:3128 curl ip.gs
http_proxy=69.51.6.201:8080 curl ip.gs
http_proxy=41.164.31.154:59565 curl ip.gs
http_proxy=194.44.246.83:41601 curl ip.gs
