## 5.4 동적크롤링

reference: https://selenium-python.readthedocs.io/index.html

### 5.4.1 설치 방법

```cmd
pip install selenium
```

Chrome Driver: https://chromedriver.chromium.org/downloads

### 5.4.2 사용법

기본적인 사용 방법은 아래와 같음
```python
from selenium import webdriver
 
driver = webdriver.Chrome(executable_path='chromedriver') # executable_path: chromedriver path
# driver = webdriver.Firefox()

driver.get(url="http://www.naver.com")

driver.close()
```

#### 5.4.2.1 Locating Elements

find_element: 조건에 해당하는 하나의 값 반환  
find_elements: 조건에 해당하는 다수의 값 반환

element를 찾는 method는 아래와 같음
- by_ID
- by_xpath
- by_link_text
- by_partial_link_text
- by_name
- by_tag_name
- by_class_name
- by_css_selector

```python
element = driver.find_element(By.XPATH, x_path)
element = driver.find_element_by_xpath(x_path)

element = driver.find_element(By.CSS_SELECTOR, css)
element = driver.find_element_by_css_selector(css)
```

해당 값을 못 찾은 경우 NoSuchElementException 발생

#### 5.4.2.2 Navigating

###### Send Keys
```python
driver.find_element_by_xpath(xpath).send_keys(id)
driver.find_element_by_xpath(xpath).send_keys(password)
driver.find_element_by_xpath(xpath).click()
```

외에도 Keys 내의 다양한 커맨드 입력 가능
```
from selenium.webdriver.common.keys import Keys
```
|Command          |Action |
|-----------------|-------|
|Keys.ENTER       |엔터    |
|Keys.RRETURN     |       |
|Keys.SPACE       |스페이스 |
|Keys.ARROW_UP    |화살표   |
|Keys.ARROW_DOWN  |       |
|Keys.ARROW_LEFT  |       |
|Keys.ARROW_RIGHT |       |
|Keys.BACK_SPACE  |지우기   |
|Keys.DELETE      |       |
|Keys.CONTROL     |Ctrl   |
|Keys.ALT         |Alt    |
|Keys.SHIFT       |Shift  |
|Keys.TAB         |Tab    |
|Keys.PAGE_UP     |Page-Up|
|Keys.PAGE_DOWN   |Page-Down|
|Keys.TAB         |Tab    |
|Keys.F1 ~ Keys.F9|F1~F9  |
|Keys.ESCAPE      |ESC    |
|Keys.HOME        |Home   |
|Keys.INSERT      |Insert |
|...              |...    |

##### Drag and Drop

```python
from selenium.webdriver import ActionChains

element = driver.find_element(By.NAME, "source")
target = driver.find_element(By.NAME, "target")

action_chains = ActionChains(driver)
action_chains.drag_and_drop(element, target).perform()
```

##### Moving between Windows
```python
previous_window = driver.window_handles[0]
new_window = driver.window_handles[1]
driver.switch_to.window(new_window)
```


##### Moving
```python
driver.forward()
driver.back()
```

#### 5.4.2.3 Waiting

Selenium은 실행 중인 driver를 이용하여 정보를 추출  
driver에 데이터 로딩이 완료되지 않을 경우 데이터 수집이 불가  
따라서 페이지 이동 등의 작업이 들어가면 다음 작업이 완료될 때까지 대기해야만 함

##### Time

가장 기본적인 방법으로 대기 시간을 time 모듈을 통해 명시

```python
import time
from selenium import webdriver


driver = webdriver.Chrome(executable_path='chromedriver')
driver.get(url="http://www.naver.com")
time.time(5)
driver.close()
```

##### Implicit Wait

위의 time에서는 2가지 문제가 존재
1. 모든 반응형 동작마다 time을 걸어 웹 로딩 대기
2. time을 5초로 설정했지만 실제 그보다 동작이 빨리 끝날 경우 불필요한 시간 대기

위의 문제를 해결하고자 implicit wait을 이용  
이는 driver의 옵션 설정으로 반응형 동작에서 최대 허용 대기 시간을 설정  
만약 허용 시간 이내에 로드가 완료될 경우 다음 작업을 바로 진행

```python
from selenium import webdriver


driver = webdriver.Chrome(executable_path='chromedriver')
driver.implicitly_wait(time_to_wait=5)
driver.get(url="http://www.naver.com")
driver.close()

```

##### Explicit Wait

implicit wait에서는 driver가 로딩을 대기하는 시간을 설정하여 5초를 설정하였어도 이전에 로드가 완료되면 다음 작업의 진행이 가능하였음  
하지만 위 경우도 문제가 있는데 5초가 지나도 내가 원하는 데이터가 로드가 안 되었을 경우 데이터를 수집할 수 없음  
따라서 특정 값이 로드될 때까지 기다렸다가 그 값이 로드되면 다음 작업을 진행할 필요성이 발생  
이 때 이용하는 게 explicit wait

```python
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome(executable_path='chromedriver') 
driver.get(url="http://www.naver.com")

try:
    element = WebDriverWait(driver, 5).until(
        EC.presence_of_element_located((By.CLASS_NAME , 'paging'))
    )
finally:
    driver.quit()

driver.close()

```

until: 조건이 False인 동안에 계속 실행  
not_until: 조건이 True인 동안에 계속 실행
<br>
Expected Conditions의 종류 <br>
title_is <br>
title_contains <br>
presence_of_element_located <br>
visibility_of_element_located <br>
visibility_of <br>
presence_of_all_elements_located <br>
text_to_be_present_in_element <br>
text_to_be_present_in_element_value <br>
frame_to_be_available_and_switch_to_it <br>
invisibility_of_element_located <br>
element_to_be_clickable <br>
staleness_of <br>
element_to_be_selected <br>
element_located_to_be_selected <br>
element_selection_state_to_be <br>
element_located_selection_state_to_be <br>
alert_is_present <br>

#### 5.4.2.4 ActionChains

연속 동작을 수행하기 위함

ex) control + c
```python
ActionChains(driver).key_down(Keys.CONTROL).send_keys('c').key_up(Keys.CONTROL).perform()
```

##### Drag and Drop

```python
from selenium.webdriver import ActionChains

element = driver.find_element(By.NAME, "source")
target = driver.find_element(By.NAME, "target")

action_chains = ActionChains(driver)
action_chains.drag_and_drop(element, target).perform()
```

#### 5.4.2.5 Others

##### Options
```python
options = webdriver.ChromeOptions()
options.add_argument('window-size=1920,1080')

driver = webdriver.Chrome(executable_path, options=options)
```

##### Alert

경고창 발생 시 이에 수락, 거절 등의 행동을 취할 수 있음

```python
from selenium.webdriver.common.alert import Alert

Alert(driver).accept() # 수락
Alert(driver).dismiss() # 거절
Alert(driver).send_keys(keysToSend=key) # 특정 키를 보낼 수 있음
```

##### Scroll Down
페이지의 최하단으로 이동
```python
driver.execute_script('window.scrollTo(0, document.body.scrollHeight);')
```
<br>

특정 태그가 등장할 때까지 이동
```python
from selenium.webdriver import ActionChains

some_tag = driver.find_element_by_id('gorio')
ActionChains(driver).move_to_element(some_tag).perform()
```

##### Minimize/Maximize

```python
driver.minimize_window()
driver.maximize_window()
```

##### Screen Shot
```python
driver.save_screenshot('screenshot.png')
```

##### 5. 4. 3. 3 DART

moving between windows

In [23]:
from selenium import webdriver
from selenium.webdriver.common.by import By
import time
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup

In [None]:

element = WebDriverWait(driver, 5).until(
        EC.presence_of_element_located((By.CLASS_NAME , 'paging'))

In [15]:

driver = webdriver.Chrome(executable_path='C:\\Users\\user\\Documents\\github\\2022_08_06_big_data\\chromedriver.exe') # executable_path: chromedriver path
# driver = webdriver.Firefox()

driver.get(url="https://dart.fss.or.kr")
search_xpath = '/html/body/div[2]/div[1]/div[3]/div/div[2]/form/div[1]/div[2]/span[2]/div/input'
click_xpath = '/html/body/div[2]/div[1]/div[3]/div/div[2]/form/div[1]/div[3]/a'
WebDriverWait(driver, 5).until(
    EC.element_to_be_clickable((By.XPATH , search_xpath))
)
driver.find_element(By.XPATH, search_xpath).send_keys('삼성전자')
driver.find_element(By.XPATH, click_xpath).click()


  driver = webdriver.Chrome(executable_path='C:\\Users\\user\\Documents\\github\\2022_08_06_big_data\\chromedriver.exe') # executable_path: chromedriver path


In [16]:
driver.find_element(By.XPATH,'/html/body/div[4]/div[2]/div[1]/div[2]/div[2]/form[3]/div/div[1]/table/tbody/tr[6]/td[3]/a').click()

In [18]:
# window 확인
driver.window_handles

['CDwindow-062000EA02B6864BF730BFCFFD27A684',
 'CDwindow-C42ED6E4F5652E3712736F7BCDBD25E2']

In [19]:
driver.switch_to.window(driver.window_handles[1])

In [20]:
driver.page_source

'<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="ko" lang="ko" class=" ext-strict"><head>\n\t<title>\n\t삼성전자/반기보고서/2022.08.16\n\t</title>\n\t<meta charset="UTF-8">\n\t<meta name="viewport" content="width=device-width, user-scaleable=no">\n\t<meta http-equiv="X-UA-Compatible" content="ie=edge">\n\t\n\t\n\n\n\t\n<script type="text/javascript" src="/resource/js/jquery-3.3.1.min.js"></script>\n\n<link rel="stylesheet" href="/js/jquery-ui/jquery-ui.min.css">\n<script type="text/javascript" src="/js/jquery-ui/jquery-ui.min.js"></script>\n\n<!-- 2011.11.01 ext 2.3 -->\n<!--[if lte IE 8]><link rel="stylesheet" type="text/css" href="/js/ext-main/resources/css/ext-all-ie8.css" /><![endif]-->\n<script type="text/javascript" src="/js/ext-main/adapter/ext/ext-base.js"></script>\n<script type="text/javascript" src="/js/ext-main/ext-all.js"></script>\n\n<!-- x-xeries js libraries  -->\n<script type="text/javascript" src="/js/xjs.js?ver=1.17"></script>\n\n<!-- application js libraries -->\n<scrip

In [21]:
# 재무제표
driver.find_element(By.XPATH, '/html/body/div[3]/div/div[2]/div[1]/div[2]/ul/ul/li[5]/ul/li[2]/a').click()

In [24]:
bs = BeautifulSoup(driver.page_source, 'lxml')

In [29]:
bs.select('div.leftPane1')
bs.select('table')  # 표의 정보를 들고오지 못함.

[<div class="loadingMask" id="loadingMask" style="display:none"></div>,
 <div class="loadingWrap" id="loadingWrap" style="display:none">
 <div class="loadingImg"></div>
 <div class="loadingMessage">잠시만 기다려주세요.</div>
 </div>,
 <div class="loadingImg"></div>,
 <div class="loadingMessage">잠시만 기다려주세요.</div>,
 <div class="wrapper">
 <form action="" method="post" name="fmove">
 <input name="goAction" style="display:none" type="submit"/>
 </form>
 <!-- viewerPop -->
 <div class="viewerPop">
 <!-- header -->
 <div class="header">
 <div class="top">
 <h1 style="cursor:auto;"></h1>
 <div class="nameWrap"><span class="tagCom_kospi" title="유가증권시장">유</span>
 <span onclick="openCorpInfoNew('00126380', 'winCorpInfo', '/dsae001/selectPopup.ax');" style="cursor:pointer;">삼성전자</span></div>
 <div class="searchWrap">
 <span class="frmCheck">
 <input checked="" id="searchGubun1" name="searchGubun" title="현재목차" type="radio" value="2"/>
 <label for="searchGubun1" title="현재목차">현재목차</label>
 </span>
 <span cla

In [31]:
driver.close()
driver.switch_to.window(driver.window_handles[0])

NoSuchWindowException: Message: no such window: target window already closed
from unknown error: web view not found
  (Session info: chrome=104.0.5112.102)
Stacktrace:
Backtrace:
	Ordinal0 [0x010A78B3+2193587]
	Ordinal0 [0x01040681+1771137]
	Ordinal0 [0x00F541A8+803240]
	Ordinal0 [0x00F3C910+706832]
	Ordinal0 [0x00FA0139+1114425]
	Ordinal0 [0x00FA3C60+1129568]
	Ordinal0 [0x00F9D773+1103731]
	Ordinal0 [0x00F777E0+948192]
	Ordinal0 [0x00F786E6+952038]
	GetHandleVerifier [0x01350CB2+2738370]
	GetHandleVerifier [0x013421B8+2678216]
	GetHandleVerifier [0x011317AA+512954]
	GetHandleVerifier [0x01130856+509030]
	Ordinal0 [0x0104743B+1799227]
	Ordinal0 [0x0104BB68+1817448]
	Ordinal0 [0x0104BC55+1817685]
	Ordinal0 [0x01055230+1856048]
	BaseThreadInitThunk [0x7613FA29+25]
	RtlGetAppContainerNamedObjectPath [0x76EC7A9E+286]
	RtlGetAppContainerNamedObjectPath [0x76EC7A6E+238]
