# Project

The project's objetive will be to create an scrapper that checks the portfolio section of the webpage. It will be divided into three main sections:

* Scrapper: This section will retrieve an image from the website section that belongs to the portfolio
* Image Checker: This section will check the image retrieved by the scrapper with a previous expected template
* Checker: Our main module, this section is intended to implement the main logic of our website tester

## Scrapper

In [1]:
# import packages
from selenium import webdriver

In [5]:
# Create a browser instance and gets the web page
browser = webdriver.Firefox()
browser.get("file:///home/jota/pypereira/websites/magnum-template/index.html")

In [17]:
# The goal is to check the portfolio section, so let's look all the elements with the class 'page-scroll'
page_scroll_objects = browser.find_elements_by_class_name("page-scroll")
len(page_scroll_objects)

5

In [18]:
# As there are 5 elements, we need to loop over them to know which one is the portfolio link button
for obj in page_scroll_objects:
    print(obj.text)

LEARN MORE
ABOUT
PORTFOLIO
CONTACT
MY PORTFOLIO


In [16]:
portfolio = None
for obj in page_scroll_objects:
    if obj.text == "PORTFOLIO":
        portfolio = obj
        break

# Check the type of portfolio, it should be a selenium object
type(portfolio)

selenium.webdriver.firefox.webelement.FirefoxWebElement

In [19]:
# Makes click on the element

portfolio.click()

In [23]:
# Takes a screencapture
browser.save_screenshot("screencaptures/portfolio_template.png")

True

Nice, you have created a scrapper to retrieve information from the website. Now let's compile all of this in a function to make easier to call it later

In [44]:
# Function to create the scrapper of the navbar
from selenium import webdriver
import time

def scrapper_navbar(url, section_name, path_to_store, template=False):
    '''@url: The url of the website
       @section_name: The navbar section name
       @path_to_store: The path where the images will be stored
       @template: If True, saves a template with the section name
    '''
    
    browser = webdriver.Firefox()
    browser.get("file:///home/jota/pypereira/websites/magnum-template/index.html")
    page_scroll_objects = browser.find_elements_by_class_name("page-scroll")
    
    section = None
    for obj in page_scroll_objects:
        if obj.text == "PORTFOLIO":
            section = obj
            break

    # Check the type of portfolio, it should be a selenium object
    if section is None:
        print("[Error getting the navbar section]")
        return None
    
    # Makes click on the button section
    section.click()
    time.sleep(5)
    
    # Conditional to create a template
    if template:
        file_path = "{}/template_{}.png".format(path_to_store, section_name.lower())
        result = browser.save_screenshot(file_path)
        browser.quit()
        return result
    
    file_path = "{}/test_{}.png".format(path_to_store, section_name.lower())
    result = browser.save_screenshot(file_path)
    browser.quit()
    return result   
    
    

In [33]:
# Tests your function

url = "file:///home/jota/pypereira/websites/magnum-template/index.html"
section_name = "PORTFOLIO"
path_to_store = "/home/jota/pypereira/screencaptures/"

# Let's create a template

scrapper_navbar(url, section, path_to_store, template=True)

True

In [34]:
# Let's create a testing image

scrapper_navbar(url, section, path_to_store)

True

You should have now two images that belongs to the portfolio testing and template, nice! :)

# Image Checker

Now let's create an image checker with the OpenCV concepts that we have learned

In [35]:
def image_comparator(path_to_store, section_name):
    '''
       @section_name: The navbar section name
       @path_to_store: The path where the images will be stored
    '''
    template_path = "{}/template_{}.png".format(path_to_store, section_name.lower())
    testing_path = "{}/test_{}.png".format(path_to_store, section_name.lower())
    template = cv2.imread(template_path)
    testing = cv2.imread(testing_path)
    
    if template is None or testing is None:
        print("[ERROR] could not load images, please check your section name")
        return None
    
    template_gray = cv2.cvtColor(template, cv2.COLOR_RGB2GRAY)
    testing_gray = cv2.cvtColor(testing, cv2.COLOR_RGB2GRAY)

    # Applies a bitwise XOR operation that will return one only if some pixel is different
    xor = np.bitwise_xor(template_gray, testing_gray)
    ones = cv2.countNonZero(xor)

    return ones > 0


This function compares if two images (our test image and the template) are equals or not. First we attempt to load the images giving as argument the path and the section to analyze. The function assumes that you will have two images with the section name that begin with 'template_' and 'test_' respectively.

Returns True if any difference is found.

In [36]:
def store_differences(path_to_store, section_name):
    '''
       @section_name: The navbar section name
       @path_to_store: The path where the images will be stored
    '''
    template_path = "{}/template_{}.png".format(path_to_store, section_name.lower())
    testing_path = "{}/test_{}.png".format(path_to_store, section_name.lower())
    template = cv2.imread(template_path)
    testing = cv2.imread(testing_path)
    
    if template is None or testing is None:
        print("[ERROR] could not load images, please check your section name")
        return None
    
    # Find differences
    result = cv2.absdiff(template, testing)
    gray = cv2.cvtColor(result, cv2.COLOR_BGR2GRAY)
    ret, thresh = cv2.threshold(gray, 1, 255, 0)
    
    # Find contours
    cnts = cv2.findContours(thresh, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)[1]
    cv2.drawContours(result, cnts, -1, (0, 0, 255), 1)
    
    # Store the differences
    file_path = "{}/differences_{}.png".format(path_to_store, section_name.lower())
    return cv2.imwrite(file_path, result)

This function was taken literally from our OpenCV lesson, it loads both images to be read, applieas absolute differences and find and draw the contours of them. Finally, it stores the differences mask.

## Checker

Let's finish implementing our main algorithm

In [46]:
def checker(url, section_name, path_to_store, template=False):
    '''
    @url: The url of the website
    @section_name: The navbar section name
    @path_to_store: The path where the images will be stored
    @template: If True, saves a template with the section name
    '''
    
    # Creates a template
    if template:
        print("[INFO] Attempting to store template")
        result = scrapper_navbar(url, section_name, path_to_store, template=template)
        if not result or result is None:
            print("[ERROR] Could not save your template")
            return result
    
    # Creates a test image
    test = scrapper_navbar(url, section, path_to_store)
    if not test or test is None:
        print("[ERROR] Could not save your test image")
        return result
    
    # Compares images
    result = image_comparator(path_to_store, section_name)
    if result:  # Images are different
        print("[INFO] Images are different, storing difference mask")
        stored = store_differences(path_to_store, section_name)
        if not stored:
            print("[ERROR] could not store the difference mask")
            return stored
        print("[INFO] difference mask stored")
        return stored
    
    print("[INFO] Images are equal, your deployment did not affect your website!")
    return result

In [47]:
# let's test the checker
url = "file:///home/jota/pypereira/websites/magnum-template/index.html"
section_name = "PORTFOLIO"
path_to_store = "/home/jota/pypereira/screencaptures/"

checker(url, section_name, path_to_store, template=False)

[INFO] Images are equal, your deployment did not affect your website!


False

Now go to the magnum website folder and open the index.html file in a text editor. Go to line 95 and edit the portfolio header, change it for any word that you wish, I will change it for 'portafolio' which is not an english word. Run the scrapper again and test if it finds this small but not desired difference

In [49]:
checker(url, section_name, path_to_store, template=False)

[INFO] Images are different, storing difference mask
[INFO] difference mask stored


True

Below the result image

<img src="screencaptures/differences_portfolio_test_1.png">

Now let's make another test, edit the line 95 of index.html again and let the header again as 'portfolio'.

Go to the CSS folder and open the 'style.css' file. Edit the line 11 and change the color to ```#777```, this is a small change but again we want to retrieve ANY change in our deployments

In [52]:
checker(url, section_name, path_to_store, template=False)

[INFO] Images are different, storing difference mask
[INFO] difference mask stored


True

Below the result image

<img src="screencaptures/differences_portfolio_test_2.png">

Nice, you know the basics about how to use OpenCV and Selenium for scrapper and testing purposes!! :)