## Beautiful Soup Documentation
    https://www.crummy.com/software/BeautifulSoup/bs4/doc/
    https://www.dataquest.io/blog/web-scraping-tutorial-python/


### Useful Functions
    find_all()  : looks through a tag's descendants and returns a list of all descendants matching the filter
    find()      : returns the first descendant matching the filter
    find_next() : returns the first 'sibling' matching the filter, that appears immediately after the current tag

In [1]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import time

## Code Structure for Scrape.py
    (1) Define the starting point (the base webpage)
    (2) Use requests.get(<URL of the starting base webpage>) : to get a HTML document
    (3) Use BeautifulSoup(response.text, 'html parser') : to extract data out of the HTML document
    (4) Use page.find_all('b')[-1] to get a list of all bold tags
    (5) (For) Loop over the list of bold tags to follow each 'bold tage' (URL) and retrieve the data on it
            - Use for b in bold_tags: b.find('a')
            - Each iteration call a function 'get_data' to follow link and get data (should return a dictionary)
                  - Check what a.attrs gives you
                  - For each 'a' you will have to follow Steps (2) & (3) : get HTML document, use BS4 to extract data from it
                  - headers = ['Date', 'Operator', 'Flight origin', 'Destination', 'Fatalities']
                  - Insert the 'Name' of the accident in the dictionary to return under key = 'Name'
                  - Iterating over the list above, call another function 'get_accident_data' 
                        - Use th = a_page.find('th', text=header) to find the table which contains the information
                        - Use th.find_next('td') to get the data stored in the table
            - append the result of 'get_data' function into a larger list (this will be a list of dictionaries)


    (6) Finally, once the Loop over bold_tags ends, convert your list of dictionaries into a DataFrame
    (7) Drop duplicates : Use df.drop_duplicates(inplace=True)
    (8) Write the DataFrame onto a CSV file (without the Index) : Use df.to_csv('accidents.csv', index=False)

In [17]:
## Step 1
base = 'https://www.nytimes.com'
path = '/wiki/List_of_accidents_and_incidents_involving_commercial_aircraft'

In [18]:
## Step 2
response = requests.get(base + path)

In [19]:
## Step 3
page = BeautifulSoup(response.text, 'html.parser')

In [20]:
## Step 4
page.find_all('b')

[]

In [21]:
page

<!DOCTYPE html>

<!--[if (gt IE 9)|!(IE)]> <!--> <html class="no-js " itemscope="" lang="en" xmlns:og="http://opengraphprotocol.org/schema/"> <!--<![endif]-->
<!--[if IE 9]> <html lang="en" class="no-js ie9 lt-ie10 " xmlns:og="http://opengraphprotocol.org/schema/"> <![endif]-->
<!--[if IE 8]> <html lang="en" class="no-js ie8 lt-ie10 lt-ie9 " xmlns:og="http://opengraphprotocol.org/schema/"> <![endif]-->
<!--[if (lt IE 8)]> <html lang="en" class="no-js lt-ie10 lt-ie9 lt-ie8 " xmlns:og="http://opengraphprotocol.org/schema/"> <![endif]-->
<head>
<title>Page Not Found</title>
<meta content="true" name="errorpage"/>
<meta content="IE=edge,chrome=1" http-equiv="X-UA-Compatible"/>
<meta content="nyt-v5" name="sourceApp"/>
<meta content="" id="foundation-build-id" name="foundation-build-id"/>
<meta content="404 - Not Found" name="errortype"/>
<meta content="" name="PST"/>
<!--[if (gt IE 9)|!(IE)]> <!-->
<link href="https://g1.nyt.com/assets/error/20180503-144802/css/error/styles.css" media="scr