# Onboarding Task - Webscraping Police Data
For this task, you'll need to collect and aggregate the information that gets posted daily to a police bulletin. There's no way to download this data in a structured format and it would be impractical to manually copy it from the page every day, so instead you'll be building a custom webscraper to do all of the work for you.
### What is a webscraper?
A webscraper is a program that automatically collects (or "scrapes") data from a website. 
### How do I build one?
Python lets us build webscrapers using the Selenium library, which provides a live view of what's going on in the browser during as the process runs. Pandas will also be useful for organizing the data we end up collecting. You can import these tools as follows:

In [None]:
from selenium import webdriver
from selenium.webdriver.common.by import By
import pandas as pd

To browse the web, Selenium relies on the Gecko webdriver, which you should download [here](https://github.com/mozilla/geckodriver/releases). Once it's installed, copy the path to `geckodriver.exe` and paste it below:

In [None]:
GECKO_PATH = 'your-path-here'

Now, you can create an instance of Firefox for Selenium to use:

In [None]:
driver = webdriver.Firefox(executable_path=GECKO_PATH)

Tell the driver to navigate to your police bulletin by copying the link into the code below:

In [None]:
driver.get('your-bulletin-url')

Once you run the cell above, Firefox window should open and automatically go to the url specified. All of the data you now see is accessible to Selenium; you just have to tell it what to look for!

### How do I tell Selenium to select specific parts of a webpage?


The underlying structure of a webpage is written in HTML code, which can be seen in the browser's inspect element window (open this either by pressing `ctrl+shift+i` or right clicking and selecting `Inspect`).

HTML elements generally look like this:
1. 
```html
<tag class="..." id="..." name="...">...</tag>
```
or this:
2.
```html
<tag class="..." id="..." name="..."/>
```
*Note: Not all elements have `id`s, `name`s, or `class`es, and some have other attributes that you can ignore for now.*

A tag identifies the type of element being used. Classes are labels that assign an element to one or more groups of elements with the same class (an element can have multiple class names separated by spaces). An id is a unique identifier for a particular element and will not be shared with any others. Names are typically used for labelling the input fields of a form.

Many elements have additional elements nested within them, which are known as children. These will show up in the `>...<` area if an element is formatted as in (1).

Selenium lets us select elements by these attributes as follows:

In [None]:
# Select by tag
elem1 = webdriver.find_element(By.TAG, 'some-tag')

# Select by a single class
elem2 = webdriver.find_element(By.CLASS, 'some-class')
# Select by multiple classes
elem3 = webdriver.find_element(By.CLASS, 'some-class1 some-class2')

# Select by id
elem4 = webdriver.find_element(By.ID, 'some-id')

# Select by name
elem5 = webdriver.find_element(By.NAME, 'some-name')

*Note: `find_element` returns the first matched element by default. To get all matches, use `find_elements` instead.*

`find_element(s)` can also be called from any returned element to search that element's children.

To "scrape" a piece of data, you'll need to find the lowest-level element that contains it (should look like `<tag ...>raw-data</tag>`). Then, use `find_element` along with either its `name`, `id`, `class`, `tag`, or containing elements to select this element (be careful about other elements it might share attributes with!). This data will be contained in its `.text` property.

In [None]:
# Select the element here
element = ...

# Get the data it contains
my_data = element.text

### Other ways to interact with a page
Sometimes, webscraping isn't as simple as going to a webpage and directly pulling data. You may need to click buttons, enter input data, or submit forms. Here are some examples demonstrating how to do so:

In [None]:
# Click a button element
button_element = webdriver.find_element(By.TAG, 'button')
button_element.click()

# Set input text
input_element = webdriver.find_element(By.TAG, 'input')
input_element.send_keys('input-data')

# Submit a form
form_element = webdriver.find_element(By.ID, 'login-form')
form_element.submit()

### Organizing scraped data

Once you figure out how to pull your target data from the webpage, you'll want to store it somehow. This is where pandas comes in. First, edit the `DataFrame` constructor below to create a table with the columns necessary to hold your data:

In [None]:
table = pd.DataFrame({
    'my-col-1': [],
    'my-col-2': [],
    ...
})

Now that you have a structured table, data points can be stored as rows by passing a dictionaries with the names and values of each column to the `append` function.

In [None]:
table.append({'my-col-1': 'my-col-1-value', 'my-col-2': 'my-col-2-value', ...})

Since this table is stored in program memory, it's also useful to save it from time to time, which can be done with `to_csv`:

In [None]:
table.to_csv('my-folder/my-table.csv')

The saved data can be loaded later on by passing the path it was saved at to `from_csv`:

In [None]:
table = pd.from_csv('my-folder/my-table.csv')