# Screenshot Grabber

Grab screenshot of linked page and save it as an image.

One of the issues with embedding links in long-lived educational materials to third party web resources is that the pages being referred to can disappear over time.

For someone perhaps new to a coure tasked with repairing of finding replacements for broken links, it may not be immediately obvious from text what the intention was in including a link to a particular resource. Submitting the original link to a web archiving service during production means that there is a chance of being able to revisit a version of the site from the time it was originally linked to, although with sites increasingly heavily dependent on Javascript and dynamic page construction, there are no guarantees that an archived page will work or look anything like the original version.

Whilst archiving pages using third party services means we stand a chance of providing a "backup" link to students in the event of link breaking, an issue still remains about how best to support a module team looking for a replacement resource if the original has disappeared.

Another approach is to just grab a screenshot of the original linked resource and retain it as a local archival copy that can be used for internal reference purposes.

Note that if the linked resource were actually a landing page that provides a gateway to one or more other pages, a screenshot of just the landing page, and not views linked from that page, will not help us recall the original views that were perhaps the point of linked resource. (That said, in some cases, we may have generated screenshots or screencasts within the course material showing the journey through or from the linked resource. Generally, it would be useful to record, even informally, a screencast for reference purposes showing the intended usage of a linked resource if it is an application, for example.)

The tool described in this recipe uses browser automation to load the resolved URL into a headless web browser (which is to say, a browser running in the background and not displaying itself on a screen).

Note that grabbing screenshots requires the installation (or availability) of a web browser, as well as the browser automation tools (using selenium) which are not required for simple link checking. 

### Setup the webdriver

We can automate installation of a browser webdriver using the `webdriver-manager` package:

In [None]:
# Required packages:
#%pip install selenium  webdriver-manager

In [11]:

from selenium import webdriver
from webdriver_manager.firefox import GeckoDriverManager

driver = webdriver.Firefox(executable_path=GeckoDriverManager().install())



Current firefox version is 88.0
Get LATEST driver version for 88.0
Driver [/Users/tonyhirst/.wdm/drivers/geckodriver/macos/v0.29.1/geckodriver] found in cache


In [12]:
driver.close()

In [None]:
#https://pypi.org/project/webdriver-manager/
from webdriver_manager.chrome import ChromeDriverManager

driver = webdriver.Chrome(ChromeDriverManager().install())

In [None]:
# Should we really support all browsers?
# webdriver_manager can handle

def webdriver_fetch(browser='Firefox'):
    """Fetch (and install if required) required webdriver."""
    supported_browsers = [b.lower() for b in ['firefox' 'chrome', 'ie', 'edge', 'chromium', 'opera']]
    browser = browser.lower()
    if browser not in supported_browsers:
        print(f"Browser *{browser}* is not supported. Use one of {', '.join(supported_browsers)}")
    elif browser=='chrome':
        driver = webdriver.Chrome(ChromeDriverManager().install())
    else:
        # Which makes the first check redundant...
        print(f"Browser *{browser}* is not supported. Use one of {', '.join(supported_browsers)}")
        

In [None]:
# Chrome
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager

driver = webdriver.Chrome(ChromeDriverManager().install())

In [None]:
#opera
rom webdriver_manager.opera import OperaDriverManager

driver = webdriver.Opera(executable_path=OperaDriverManager().install())

In [None]:
from selenium import webdriver
from webdriver_manager.firefox import GeckoDriverManager

driver = webdriver.Firefox(executable_path=GeckoDriverManager().install())

In [None]:
edge
rom webdriver_manager.microsoft import EdgeChromiumDriverManager

driver = webdriver.Edge(EdgeChromiumDriverManager().install())

In [None]:
#ie
from selenium import webdriver
from webdriver_manager.microsoft import IEDriverManager

driver = webdriver.Ie(IEDriverManager().install())

In [None]:
# chromoum
from webdriver_manager.chrome import ChromeDriverManager
from webdriver_manager.utils import ChromeType

driver = webdriver.Chrome(ChromeDriverManager(chrome_type=ChromeType.CHROMIUM).install())

In [None]:
#Via https://stackoverflow.com/a/52572919/454773
def setup_screenshot(driver,path):
    # Ref: https://stackoverflow.com/a/52572919/
    original_size = driver.get_window_size()
    required_width = driver.execute_script('return document.body.parentNode.scrollWidth')
    required_height = driver.execute_script('return document.body.parentNode.scrollHeight')
    driver.set_window_size(required_width, required_height)
    # driver.save_screenshot(path)  # has scrollbar
    driver.find_element_by_tag_name('body').screenshot(path)  # avoids scrollbar
    driver.set_window_size(original_size['width'], original_size['height'])

In [None]:
def init_browser(scale_factor=2, headless=True, browser_type='firefox'):
    if browser_type=='chrome':
        opt = webdriver.ChromeOptions()
        opt.add_argument('--force-device-scale-factor={}'.format(scale_factor))
        if headless:
            opt.add_argument('headless')

        browser = webdriver.Chrome(options=opt)
        
    elif browser_type=='firefox':
        #print('using firefox')
        from selenium.webdriver.firefox.options import Options

        opt = Options()
        if headless:
            opt.headless = True

        browser = webdriver.Firefox(options=opt)
    
    return browser
    
def getTableImage(url, fn='dummy_table', basepath='.', path='.',
                  delay=None, scale_factor=2, height=420, width=800, headless=True,
                  logging=False, browser=None):
    ''' Render HTML file in browser and grab a screenshot. '''
    
    #options = Options()
    #options.headless = True

    if browser is None:
        browser = init_browser(scale_factor=scale_factor,
                               headless=headless)
        reset_browser = True
    else:
        reset_browser = False

    #browser.set_window_size(width, height)
    browser.get(url)
    #Give the map tiles some time to load
    #Should really do this with some sort of browseronload check
    if delay is not None:
        time.sleep(delay)
    imgpath='{}/{}.png'.format(path,fn)
    imgfn = '{}/{}'.format(basepath, imgpath)
    imgfile = '{}/{}'.format(os.getcwd(),imgfn)
    
    setup_screenshot(browser,imgfile)
    
    if reset_browser:
        browser.quit()
        
    os.remove(imgfile.replace('.png','.html'))
    if logging:
        print("Save to {}".format(imgfn))
    return imgpath

In [None]:
def getTablePNG(tablehtml, basepath='.', path='testpng',
                fnstub='testhtml', scale_factor=2,
                browser=None):
    ''' Save HTML table as file. '''
    if not os.path.exists(path):
        os.makedirs('{}/{}'.format(basepath, path))
    fn='{cwd}/{basepath}/{path}/{fn}.html'.format(cwd=os.getcwd(), 
                                                  basepath=basepath, path=path,
                                                  fn=fnstub)
    tmpurl='file://{fn}'.format(fn=fn)
    with open(fn, 'w') as out:
        out.write(tablehtml)
    return getTableImage(tmpurl, fnstub, basepath, path,
                         scale_factor=scale_factor, browser=browser)