# Drivers

Drivers is a module that contains functions to access HTML pages via HTTP.

# Initialization

The following code imports drivers. The code assumes that the current directory contains the scrape package.

In [1]:
import os
import sys
PROJECT_DIR = os.path.dirname(os.path.abspath('..'))
print('Project folder: ' + PROJECT_DIR)
sys.path.append(PROJECT_DIR)

from scrape import drivers

Project folder: D:\Projects\Python\projects\scrape
Initializing scrape ...


# Working with drivers

### Creating parameters 

In order to work with a driver one needs to provide a dictionary of parameters. A default dictionary can be created with the function `get_params`.

The main parameter is `package`. The supported packages are `requests` and `selenium.`
- `requests` is an elegant and simple HTTP library for Python, built for human beings. More information [here](https://docs.python-requests.org/en/master/).
- `selenium` is used to automate web browser interaction from Python. More information [here](https://pypi.org/project/selenium/).


In [3]:
params = drivers.get_params()
params

{'package': 'requests',
 'headers': '',
 'timeout': 10,
 'filename': '',
 'log': '',
 'headless': True}

### Starting and stopping a driver
In this example we will start and stop a driver. We use the functions `start_driver` and `stop_driver`.

In [4]:
adriver = drivers.driver_start(params)
drivers.driver_stop(adriver)  
type(adriver)

requests.sessions.Session

### Retrieving HTML pages
There are two ways to retrieve HTML pages: one can use `get_page` to retrieve a single page, or one can use `request_page` to retrieve multiple pages with the same driver connection.

**Get a page**

`get_page` only requires an url as input.

In [5]:
url = 'https://www.crummy.com/software/BeautifulSoup/bs4/doc/'
page = drivers.get_page(url)
page[:100]

'\n<!DOCTYPE html>\n\n<html>\n  <head>\n    <meta charset="utf-8" />\n    <meta name="viewport" content="wi'

**Request a page**

`request_page` requires a driver and an url as input.

In [8]:
url = 'https://www.crummy.com/software/BeautifulSoup/bs4/doc/'
adriver = drivers.driver_start(params)
page = drivers.request_page(adriver,url)
drivers.driver_stop(adriver)  
page[:100]

'\n<!DOCTYPE html>\n\n<html>\n  <head>\n    <meta charset="utf-8" />\n    <meta name="viewport" content="wi'

# Versions

In [7]:
%reload_ext watermark
%watermark

%watermark -iv

packages = ['requests','selenium']
for package in packages:
    %watermark -p {package}

Last updated: 2021-10-30T13:13:40.570962+02:00

Python implementation: CPython
Python version       : 3.7.9
IPython version      : 7.19.0

Compiler    : MSC v.1916 64 bit (AMD64)
OS          : Windows
Release     : 10
Machine     : AMD64
Processor   : Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
CPU cores   : 8
Architecture: 64bit

sys: 3.7.9 | packaged by conda-forge | (default, Dec  9 2020, 20:36:16) [MSC v.1916 64 bit (AMD64)]

requests: 2.24.0

selenium: 3.141.0

