<a href="https://colab.research.google.com/github/lblogan14/web_scraping_with_python/blob/master/ch15_test_website.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#An Introduction to Testing
Having a suite of tests that can be run to ensure that code performs as expected.

##What are Unit Tests
A unit test has the following charateristics:
* Each unit test tests one aspect of the functionality of a component. Often, unit tests are grouped together in the same class, based on the component they are testing.
* Each unit test can be run completely independently, and any setup or teardown required for the unit test must be handled by the unit test itself.
* Each unit test usually contains at least one *assertion*.
* Unit tests are separated from the bulk of the code.

#Python `unittest`
The `unittest.TestCase` can do the following:
* Provide `setUp` and `tearDown` functions that run before and after each unit test
* Provide several types of “assert” statements to allow tests to pass or fail
* Run all functions that begin with `test_` as unit tests, and ignore functions that
are not prepended as tests

The following provides a simple unit test for ensuring that 2+2=4:

In [0]:
import unittest

In [3]:
class TestAddition(unittest.TestCase):
  
  def setUp(self):
    print('Setting up the test')
    
  def tearDown(self):
    print('Tearing down the test')
    
  def test_twoPlusTwo(self):
    total = 2+2
    self.assertEqual(4, total)
    
if __name__ == '__main__':
  unittest.main(argv=[''], exit=False)

.

Setting up the test
Tearing down the test



----------------------------------------------------------------------
Ran 1 test in 0.004s

OK


`setUp` `tearDown` are only included for the purposes of illustration here. These functions are run before and after each individual test, not before and after all the tests in the class. 

###Running `unittest` in Jupyter-like environment


```
if __name__ == '__main__':
  unittest.main()
```

The line `if __name__ == '__main__'` is true only if the line is executed directly in Python, and not via an import statement. This allows users to run their own unit test, using the `unittest.TestCase` class that it extends, directly from the command line. \\
In a Jupyter-like environment, things are a little bit different. The `argv` parameters created by Jupyter can cause errors in the unit test, and, because the `unittest` framework exits Python by default after the test is run (which causes problems in the notebook kernel). \\
To prevent this from happening, need to launch unit tests in the following way:


```
if __name__ == '__main__':
  unittest.main(argv=[''], exit=False)
  %reset
```

The second line sets all of the `argv` variables (command-line arguments) to a single
empty string, which is ignored by `unnittest.main`. It also prevents `unittest` from
exiting after the test is run. \\
The `%reset` line resets the memory and destroys all user-created variables in the Jupyter notebook. Without it, each unit test written in the notebook will contain all of the methods from all other previously run tests that also inherited `unittest.TestCase`, including `setUp` and `tearDown` methods.

##Testing Wikipedia

Testing the frontend of targeted website (excluding JavaScript) combines the Python `unittest` library with a web scraper:

In [0]:
from urllib.request import urlopen
from bs4 import BeautifulSoup
import unittest

In [5]:
class TestWikipedia(unittest.TestCase):
  
  bs = None
  def setUpClass():
    url = 'http://en.wikipedia.org/wiki/Monty_Python'
    TestWikipedia.bs = BeautifulSoup(urlopen(url), 'html.parser')
    
  def test_titleText(self):
    pageTitle = TestWikipedia.bs.find('h1').get_text()
    self.assertEqual('Monty Python', pageTitle)
    
  def test_contentExists(self):
    content = TestWikipedia.bs.find('div', {'id': 'mw-content-text'})
    self.assertIsNotNone(content)
    
if __name__ == '__main__':
  unittest.main(argv=[''], exit=False)
  %reset

.

Setting up the test
Tearing down the test


..
----------------------------------------------------------------------
Ran 3 tests in 0.894s

OK


Once deleted, variables cannot be recovered. Proceed (y/[n])? y


There are two tests this time: the first tests whether the title of the page is the
expected “Monty Python,” and the second makes sure that the page has a content div. 

The content of the page is loaded only once, and that the global object `bs` is shared between tests. The `setUpClass` function is run only once at the start of the class (unlike `setUp` which is run before every individual test). Use `setUpClass` instead of `setUp` to save unnecessary page loads. \\
The `setUpClass` is a static method that “belongs” to the
class itself and has global class variables, whereas `setUp` is an instance function that
belongs to a particular instance of the class.

To run a test repeatedly, BE CAREFUL to load each page only once for each set of tests on that page, and make sure to also avoid holding large amounts of information in memory at once:

In [0]:
from urllib.request import urlopen
from bs4 import BeautifulSoup
import unittest
import re
import random
from urllib.parse import unquote

In [10]:
class TestWikipedia(unittest.TestCase):
  
  def test_PageProperties(self):
    self.url = 'http://en.wikipedia.org/wiki/Monty_Python'
    #Test the first 10 pages encountered
    for i in range(1,10):
      self.bs = BeautifulSoup(urlopen(self.url), 'html.parser')
      titles = self.titleMatchesURL()
      self.assertEqual(titles[0], titles[1])
      self.assertTrue(self.contentExists())
      self.url = self.getNextLink()
    print('Done!')
    
  def titleMatchesURL(self):
    pageTitle = self.bs.find('h1').get_text()
    urlTitle = self.url[(self.url.index('/wiki/')+6):]
    urlTitle = urlTitle.replace('_', ' ')
    urlTitle = unquote(urlTitle)
    return [pageTitle.lower(), urlTitle.lower()]
  
  def contentExists(self):
    content = self.bs.find('div', {'id': 'mw-content-text'})
    if content is not None:
      return True
    return False
  
  def getNextLink(self):
    #Returns random link on page, using technique from Chapter 3
    links = self.bs.find('div', {'id': 'bodyContent'}).find_all('a',
                                                                href=re.compile('^(/wiki/)((?!:).)*$'))
    randomLink = random.SystemRandom().choice(links)
    return 'https://wikipedia.org{}'.format(randomLink.attrs['href'])
  
if __name__ == '__main__':
  unittest.main(argv=[''], exit=False)
  %reset

.
----------------------------------------------------------------------
Ran 1 test in 10.841s

OK


Done!
Once deleted, variables cannot be recovered. Proceed (y/[n])? y


* There is only one actual test in this class. Other functions are technically only helper functions.
* While `contentExists` returns a boolean, `titleMatchesURL` returns the values themselves back for evaluation.

#Testing with Selenium
Selenium tests can be written more casually than Python unit tests, and
assert statements can even be integrated into regular code, where it is desirable for
code execution to terminate if some condition is not met.

```
driver = webdriver.PhantomJS()
driver.get('http://en.wikipedia.org/wiki/Monty_Python')
assert 'Monty Python' in driver.title
driver.close()
```



##Interacting with the Site
Everything introduced before is designed to bypass the brower infterface, not uses it. Selenium, on the other hand, can literally enter text, click buttons, and do everything through the browser and detect things like broken forms, badly coded JavaScript, HTML typos, and other issues.
Key to this testing is the concept of Selenium `elements`:


```
usernameField = driver.find_element_by_name('username')
```

There are many actions Selenium can perform on any given element:


```
myElement.click()
myElement.click_and_hold()
myElement.release()
myElement.double_click()
myElement.send_keys_to_element('content to enter')
```

Strings of actions can be combined into *action chains*, which can be stored and executed once or multiple times in a program. The example is performed on the form page at *http://pythonscraping.com/pages/files/form.html*:

In [0]:
from selenium.webdriver.remote.webelement import WebElement
from selenium.webdriver.common.keys import Keys
from selenium.webdriver import ActionChains

In [0]:
driver = webdriver.PhantomJS(executable_path='<Path to Phantom JS>')
driver.get('http://pythonscraping.com/pages/files/form.html')

firstnameField = driver.find_element_by_name('firstname')
lastnameField = driver.find_element_by_name('lastname')
submitButton = driver.find_element_by_id('submit')

### METHOD 1 ###
firstnameField.send_keys('Ryan')
lastnameField.send_keys('Mitchell')
submitButton.click()

### METHOD 2 ###
actions = ActionChains(driver).click(firstnameField).send_keys('Ryan')\
                        .click(lastnameField).send_keys('Mitchell')\
                        .send_keys(Keys.RETURN)
actions.perform()

print(driver.find_element_by_tag_name('body').text)
driver.close()

Method 1 calls `send_keys` on the two fields and then clicks the submit button.
Method 2 uses a single action chain to click and enter text in each field, which hap‐
pens in a sequence after the perform method is called.

To run it in Chrome:

In [0]:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver import ActionChains
from selenium.webdriver.chrome.options import Options

In [0]:
chrome_options = Options()
chrome_options.add_argument("--headless")

driver = webdriver.Chrome(executable_path='drivers/chromedriver',
                          options=chrome_options)
driver.get('http://pythonscraping.com/pages/files/form.html')

firstnameField = driver.find_element_by_name('firstname')
lastnameField = driver.find_element_by_name('lastname')
submitButton = driver.find_element_by_id('submit')

### METHOD 1 ###
firstnameField.send_keys('Ryan')
lastnameField.send_keys('Mitchell')
submitButton.click()


### METHOD 2 ###
#actions = ActionChains(driver).click(firstnameField).send_keys('Ryan').click(lastnameField).send_keys('Mitchell').send_keys(Keys.RETURN)
#actions.perform()


print(driver.find_element_by_tag_name('body').text)

driver.close()

###Drag and drop
Use its drag-and-drop function requires users to specify a *source* element (the element to be dragged) and either an offset to drag it across, or a target element to drag it to.
The demo page is at *http://pythonscraping.com/pages/javascript/draggable-Demo.html*from selenium import webdriver
from selenium.webdriver.remote.webelement import WebElement
from selenium.webdriver import ActionChains

In [0]:
from selenium import webdriver
from selenium.webdriver.remote.webelement import WebElement
from selenium.webdriver import ActionChains

In [0]:
driver = webdriver.PhantomJS(executable_path='<Path to Phantom JS>')
driver.get('http://pythonscraping.com/pages/javascript/draggableDemo.html')

print(driver.find_element_by_id('message').text)

element = driver.find_element_by_id('draggable')
target = driver.find_element_by_id('div2')
actions = ActionChains(driver)
actions.drag_and_drop(element, target).perform()

print(driver.find_element_by_id('message').text)

Dragging elements to prove user is not a bot is a common theme in many CAPTCHAs.

###Taking screenshots


```
driver = webdriver.PhantomJS()
driver.get('http://www.pythonscraping.com/')
driver.get_screenshot_as_file('tmp/pythonscraping.png')
```

This script navigates to *http://pythonscraping.com* and then stores a screenshot of the
home page in the local tmp folder

#`unittest` or Selenium
Selenium can easily be used to obtain
information about a website, and `unittest` can evaluate whether that information
meets the criteria for passing the test.

For example, the following script creates a unit test for a website’s draggable interface,
asserting that it correctly says, “You are not a bot!” after one element has been dragged to another:

In [0]:
from selenium import webdriver
from selenium.webdriver.remote.webelement import WebElement
from selenium.webdriver import ActionChains
import unittest

In [0]:
class TestDragAndDrop(unittest.TestCase):
  
  driver = None
  def setUp(self):
    self.driver = webdriver.PhantomJS(executable_path='<Path to PhantomJS>')
    url = 'http://pythonscraping.com/pages/javascript/draggableDemo.html'
    self.driver.get(url)

  def tearDown(self):
    print("Tearing down the test")

  def test_drag(self):
    element = self.driver.find_element_by_id('draggable')
    target = self.driver.find_element_by_id('div2')
    actions = ActionChains(self.driver)
    actions.drag_and_drop(element, target).perform()
    self.assertEqual('You are definitely not a bot!',self.driver.find_element_by_id('message').text)

if __name__ == '__main__':
  unittest.main(argv=[''], exit=False)

Virtually anything on a website can be tested with the combination of Python’s
unittest and Selenium.

To run it in Chrome:

In [0]:
from selenium import webdriver
from selenium.webdriver import ActionChains
from selenium.webdriver.chrome.options import Options
import unittest

In [0]:
class TestDragAndDrop(unittest.TestCase):
    
  driver = None
  def setUp(self):
    chrome_options = Options()
    chrome_options.add_argument("--headless")
    self.driver = webdriver.Chrome(executable_path='drivers/chromedriver',
                                   options=chrome_options)
    url = 'http://pythonscraping.com/pages/javascript/draggableDemo.html'
    self.driver.get(url)

  def tearDown(self):
    self.driver.close()

  def test_drag(self):
    element = self.driver.find_element_by_id('draggable')
    target = self.driver.find_element_by_id('div2')
    actions = ActionChains(self.driver)
    actions.drag_and_drop(element, target).perform()
    self.assertEqual('You are definitely not a bot!',
                     self.driver.find_element_by_id('message').text)

if __name__ == '__main__':
  unittest.main(argv=[''], exit=False)
  %reset