### High level
What is web scraping?
> Web scraping (web harvesting or web data extraction) is a computer software technique of extracting information from websites.

What questions can you answer with web scraping?
- What TV shows are airing tonight?
- What is the name and price of the first 5 results for X on ebay?
- How many words is the wiki page for X?
- Has X been updated recently with this text?
- is X band playing at Doug Fir any time soon?
- is that [refurbished Baratza](http://www.baratza.com/product/encore-refurb/) in stock yet?
- Are tickets available for sale yet?


Ethics of web scraping
- https://news.ycombinator.com/item?id=12345693


### Tools
name | Purpose
-----|--------
[Selector Gadget](http://selectorgadget.com/) | find css selectors visually
[CSS selector cheat-sheet](http://www.cheetyr.com/css-selectors) | CSS selector reference
[BeautifulSoup4](http://beautiful-soup-4.readthedocs.io/en/latest/) | Parse HTML webpages with selectors
[requests](http://docs.python-requests.org/en/master/) | Connect to and download webpages (HTML)

## HTML
**HyperText Markup Language** 

It's the code that forms websites.  We won't be learning HTML today, but we'll learn enough to understand how we can navigate it.
 
#### HTML is made up of elements as its base components

Elements have structure:

![element structure](https://upload.wikimedia.org/wikipedia/commons/thumb/5/55/HTML_element_structure.svg/330px-HTML_element_structure.svg.png)



When nested inside eachother, they give the document form

![html structure](http://www.htmlgoodies.com/img/2007/06/page_container.gif)


This can also be viewed as a tree-like structure.  Here's the above when we only care about *children* and *ancestors*
![html tree-like structure](http://www.htmlgoodies.com/img/2007/06/flowChart2.gif)


In [86]:
import bs4
import requests

## Fetching the HTML

First step will be to actually get the website's html.  To do that, we'll be using the 3rd-party *requests*\* module.
This simulates:
1. opening your browser
2. typing in the url you want to visit
3. selecting 'View Source'
4. copying the text
5. pasting it into a variable.

\* we could do this using just the std-lib, but requests is popular enough you'll encounter it often.

In [14]:
url = 'https://raw.githubusercontent.com/hassanshamim/python_foundations/master/README.md'
response = requests.get(url)

In [15]:
response # If you're not familiar with HTTP codes, this output might be totally useless.

<Response [200]>

In [16]:
help(response) # let's see what this *response object* can do.

Help on Response in module requests.models object:

class Response(builtins.object)
 |  The :class:`Response <Response>` object, which contains a
 |  server's response to an HTTP request.
 |  
 |  Methods defined here:
 |  
 |  __bool__(self)
 |      Returns true if :attr:`status_code` is 'OK'.
 |  
 |  __getstate__(self)
 |  
 |  __init__(self)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |  
 |  __iter__(self)
 |      Allows you to use a response as an iterator.
 |  
 |  __nonzero__(self)
 |      Returns true if :attr:`status_code` is 'OK'.
 |  
 |  __repr__(self)
 |      Return repr(self).
 |  
 |  __setstate__(self, state)
 |  
 |  close(self)
 |      Releases the connection back to the pool. Once this method has been
 |      called the underlying ``raw`` object must not be accessed again.
 |      
 |      *Note: Should not normally need to be called explicitly.*
 |  
 |  iter_content(self, chunk_size=1, decode_unicode=False)
 |      Iterates over the res

In [18]:
response.status_code

200

In [17]:
response.ok # Did the website/server respond properly?

True

The following result is Markup, not HTML.  Why?
The page we requested was just plain text - not HTML.

In [19]:
response.text # the contents.  In this example it's markup, not HTML.

'# [Python Foundations](http://www.hackoregon.org/beginner-ish)\n\n**Summer 2016**\xa0- (Oct 11th - December 8th)\n\n**Falcon Building**\xa0- 321 NW Glisan St, Portland, OR 97209\xa0 Tuesday and Thursday evenings 6-9 PM\n\n**Instructor:**\xa0Hassan Shamim\xa0*contact info distributed in class*\n\n**Office Hours:**\xa0TBD\n\n------\n\n\n\n## About\n\nThe purpose of this course is to introduce students to the Python programming language. \xa0We will also cover general programming concepts, methodologies, tools and vocabulary. \xa0Towards the end of the course we will apply what we’ve learned to some common, tedious tasks so we may leverage our newfound programming skills to solve real-world problems.\n\n**Note**: This course draws heavily from [Automate the Boring Stuff with Python](https://automatetheboringstuff.com/).  This book available for free online and for purchase in print.  While we shall follow the general outline of the book, some course content shall differ.  I\'ll do my bes

In [20]:
'Core Principles' in response.text

True

So let's try a real web page!

In [3]:
response2 = requests.get('http://www.hackoregon.org/upcoming-courses')
response2.ok

True

In [37]:
response2.text

'<!doctype html>\n<html xmlns:og="http://opengraphprotocol.org/schema/" xmlns:fb="http://www.facebook.com/2008/fbml" xmlns:website="http://ogp.me/ns/website" lang="en-US" itemscope itemtype="http://schema.org/WebPage"  class="touch-styles">\n  <head>\n    <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">\n    \n    <meta name="viewport" content="width=device-width, initial-scale=1">\n    \n    <!-- This is Squarespace. --><!-- catherine-nikolovski -->\n<base href="">\n<meta charset="utf-8" />\n<title>Upcoming Courses — Hack Oregon</title>\n<link rel="shortcut icon" type="image/x-icon" href="/favicon.ico"/>\n<link rel="canonical" href="http://www.hackoregon.org/upcoming-courses/"/>\n<meta property="og:site_name" content="Hack Oregon"/>\n<meta property="og:title" content="Upcoming Courses"/>\n<meta property="og:url" content="http://www.hackoregon.org/upcoming-courses/"/>\n<meta property="og:type" content="website"/>\n<meta itemprop="name" content="Upcoming Courses"/>\n<meta 

In [39]:
print(soup.prettify())

<!DOCTYPE doctype html>
<html class="touch-styles" itemscope="" itemtype="http://schema.org/WebPage" lang="en-US" xmlns:fb="http://www.facebook.com/2008/fbml" xmlns:og="http://opengraphprotocol.org/schema/" xmlns:website="http://ogp.me/ns/website">
 <head>
  <meta content="IE=edge,chrome=1" http-equiv="X-UA-Compatible">
   <meta content="width=device-width, initial-scale=1" name="viewport">
    <!-- This is Squarespace. -->
    <!-- catherine-nikolovski -->
    <base href="">
     <meta charset="utf-8"/>
     <title>
      Upcoming Courses — Hack Oregon
     </title>
     <link href="/favicon.ico" rel="shortcut icon" type="image/x-icon"/>
     <link href="http://www.hackoregon.org/upcoming-courses/" rel="canonical"/>
     <meta content="Hack Oregon" property="og:site_name"/>
     <meta content="Upcoming Courses" property="og:title"/>
     <meta content="http://www.hackoregon.org/upcoming-courses/" property="og:url"/>
     <meta content="website" property="og:type"/>
     <meta content=

In [22]:
response2.text

'<!doctype html>\n<html xmlns:og="http://opengraphprotocol.org/schema/" xmlns:fb="http://www.facebook.com/2008/fbml" xmlns:website="http://ogp.me/ns/website" lang="en-US" itemscope itemtype="http://schema.org/WebPage"  class="touch-styles">\n  <head>\n    <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">\n    \n    <meta name="viewport" content="width=device-width, initial-scale=1">\n    \n    <!-- This is Squarespace. --><!-- catherine-nikolovski -->\n<base href="">\n<meta charset="utf-8" />\n<title>Upcoming Courses — Hack Oregon</title>\n<link rel="shortcut icon" type="image/x-icon" href="/favicon.ico"/>\n<link rel="canonical" href="http://www.hackoregon.org/upcoming-courses/"/>\n<meta property="og:site_name" content="Hack Oregon"/>\n<meta property="og:title" content="Upcoming Courses"/>\n<meta property="og:url" content="http://www.hackoregon.org/upcoming-courses/"/>\n<meta property="og:type" content="website"/>\n<meta itemprop="name" content="Upcoming Courses"/>\n<meta 

YAY! It's working.  But what if the thing we're getting isn't text?  What if it's an image?

Well, that's out of scope for today, but the general process is:
- get response from image url - `requests.get('http://www.website.com/file/image.jpeg')`
- get the binary data out, **not** the text - `response.content`
- save it to a file or render it as in image in python

## Finding the Data we want
If we want all the dates on a webpage, we can't just search for 'dates'
We either:
- have to know **where the dates occur** consistently in the webpage (structurally)
- have to know **how the dates are marked** (are they all in an element with a certain keyword? like 'arrival-date')
- or we have to **know how dates are formatted**, and look for everything that follows that format (i.e. some slashes then numbers then slashes then more numbers - this is what regular expressiosn do)

We'll be using a combination of the first two, with some help from Selector Gadget



### Beautiful Soup cheatsheet

**NOTE**: the traversal methods (select, find, .h3) can be used on tags as well as the whole soup

command | what it does
--------|------------
bs4.BeautifulSoup(data, 'html.parser') | creates our soup object that we use to scan the document
soup.find_all | return a *list* of tag objects that match our query
soup.find | returns the *first* tag object that matches our query
soup.select | uses **css selectors** to query our data.  returns the first
soup.select_all | same as above, but returns a list
soup.h3 | returns the first h3 tag matched.  same as `soup.find('h3')`  Works for any tag name
tag.text | returns text inside
tag.get_text() | fetches inner text ignoring any tags
tag.stripped_strings | returns a *generator* of component strings with whitespace removed.  Pass to `list()` to get a list object from the generator


### Hack University Example

In [11]:
#selector = '.span-6 h2 , .span-8 h2 , .span-7 h2 , .span-7 strong'
selector = 'h3, #block-yui_3_17_2_5_1480111306117_36598 h3'

soup = bs4.BeautifulSoup(response2.text, 'html.parser')

In [34]:
result = soup.select(selector)
result = result
result

[<h3>LEVEL: BEGINNER</h3>,
 <h3>START DATE: FEB 8TH, TUES + THURS, 6-9PM</h3>,
 <h3>DURATION: 6 WEEKS</h3>,
 <h3>COST $850</h3>,
 <h3>INSTRUCTOR: DAVID ZULAICA</h3>,
 <h3><strong></strong></h3>,
 <h3><strong>DESCRIPTION: </strong></h3>,
 <h3><strong>WHO SHOULD TAKE IT?</strong></h3>,
 <h3>LEVEL: Intermediate</h3>,
 <h3>START DATE: JAN 23rd, Mon + Wed, 6-9pm</h3>,
 <h3>DURATION: 8 WEEKS</h3>,
 <h3>React Office Hours: +$250, Tues + Thurs, 6-9pm</h3>,
 <h3 dir="ltr">Instructor: Andrew Brenwald</h3>,
 <h3> </h3>,
 <h3><strong><span style="font-size:14.6667px"></span></strong></h3>,
 <h3><strong><span style="font-size:14.6667px">DESCRIPTION</span></strong></h3>,
 <h3><strong><span style="font-size:14.6667px">WHO SHOULD TAKE IT?</span></strong></h3>,
 <h3><strong><span style="font-size:14.6667px">HOW DO I KNOW I’M READY?</span></strong></h3>,
 <h3><strong><span style="font-size:14.6667px">WHAT'S OFFICE HOURS?</span></strong></h3>,
 <h3>LEVEL: Advanced</h3>,
 <h3>START DATE: JAN 23RD, MON + W

In [15]:
result

[<h3>LEVEL: BEGINNER</h3>,
 <h3>START DATE: FEB 8TH, TUES + THURS, 6-9PM</h3>,
 <h3>DURATION: 6 WEEKS</h3>,
 <h3>COST $850</h3>,
 <h3>INSTRUCTOR: DAVID ZULAICA</h3>,
 <h3><strong></strong></h3>,
 <h3><strong>DESCRIPTION: </strong></h3>,
 <h3><strong>WHO SHOULD TAKE IT?</strong></h3>,
 <h3>LEVEL: Intermediate</h3>,
 <h3>START DATE: JAN 23rd, Mon + Wed, 6-9pm</h3>]

In [17]:
for tag in result:
    print(type(tag), tag.name, tag.get_text(), sep=',  ')

<class 'bs4.element.Tag'>,  h3,  LEVEL: BEGINNER
<class 'bs4.element.Tag'>,  h3,  START DATE: FEB 8TH, TUES + THURS, 6-9PM
<class 'bs4.element.Tag'>,  h3,  DURATION: 6 WEEKS
<class 'bs4.element.Tag'>,  h3,  COST $850
<class 'bs4.element.Tag'>,  h3,  INSTRUCTOR: DAVID ZULAICA
<class 'bs4.element.Tag'>,  h3,  
<class 'bs4.element.Tag'>,  h3,  DESCRIPTION: 
<class 'bs4.element.Tag'>,  h3,  WHO SHOULD TAKE IT?
<class 'bs4.element.Tag'>,  h3,  LEVEL: Intermediate
<class 'bs4.element.Tag'>,  h3,  START DATE: JAN 23rd, Mon + Wed, 6-9pm


In [18]:
t = result[0]

In [19]:
type(t)

bs4.element.Tag

In [23]:
t

<h3>LEVEL: BEGINNER</h3>

In [24]:
list(t.stripped_strings)

['LEVEL: BEGINNER']

In [26]:
result

[<h3>LEVEL: BEGINNER</h3>,
 <h3>START DATE: FEB 8TH, TUES + THURS, 6-9PM</h3>,
 <h3>DURATION: 6 WEEKS</h3>,
 <h3>COST $850</h3>,
 <h3>INSTRUCTOR: DAVID ZULAICA</h3>,
 <h3><strong></strong></h3>,
 <h3><strong>DESCRIPTION: </strong></h3>,
 <h3><strong>WHO SHOULD TAKE IT?</strong></h3>,
 <h3>LEVEL: Intermediate</h3>,
 <h3>START DATE: JAN 23rd, Mon + Wed, 6-9pm</h3>]

In [29]:
[r.get_text() for r in result if r.get_text().startswith('START DATE:')]

['START DATE: FEB 8TH, TUES + THURS, 6-9PM',
 'START DATE: JAN 23rd, Mon + Wed, 6-9pm']

In [35]:
for tag in result:
    text = tag.get_text()
    if text.startswith('START DATE:'):
        print(text)

START DATE: FEB 8TH, TUES + THURS, 6-9PM
START DATE: JAN 23rd, Mon + Wed, 6-9pm
START DATE: JAN 23RD, MON + WED, 6-9PM
START DATE: JAN 23RD, tues + thurs, 6-9PM
START DATE: JAN 10TH, TUES + THURS, 6-9PM
START DATE: JAN 23RD, TUES + THURS, 6-9PM


In [31]:
result

[<h3>LEVEL: BEGINNER</h3>,
 <h3>START DATE: FEB 8TH, TUES + THURS, 6-9PM</h3>,
 <h3>DURATION: 6 WEEKS</h3>,
 <h3>COST $850</h3>,
 <h3>INSTRUCTOR: DAVID ZULAICA</h3>,
 <h3><strong></strong></h3>,
 <h3><strong>DESCRIPTION: </strong></h3>,
 <h3><strong>WHO SHOULD TAKE IT?</strong></h3>,
 <h3>LEVEL: Intermediate</h3>,
 <h3>START DATE: JAN 23rd, Mon + Wed, 6-9pm</h3>]

In [40]:
test = bs4.BeautifulSoup('<div class="sqs-block-content" id="yui_3_17_2_1_1480579287682_383"><h2 id="yui_3_17_2_1_1480579287682_382">Applied Data Visualization</h2><h3>LEVEL: Advanced</h3><h3>START DATE: JAN 23RD, MON + WED, 6-9PM</h3><h3>DURATION: 8 WEEKS</h3><h3>COST $850</h3><h3>REACT OFFICE HOURS: +$250, TUES + THURS, 6-9PM</h3><h3>Instructor: David Daniel</h3></div>', 'html.parser')

In [42]:
print(test.prettify())

<div class="sqs-block-content" id="yui_3_17_2_1_1480579287682_383">
 <h2 id="yui_3_17_2_1_1480579287682_382">
  Applied Data Visualization
 </h2>
 <h3>
  LEVEL: Advanced
 </h3>
 <h3>
  START DATE: JAN 23RD, MON + WED, 6-9PM
 </h3>
 <h3>
  DURATION: 8 WEEKS
 </h3>
 <h3>
  COST $850
 </h3>
 <h3>
  REACT OFFICE HOURS: +$250, TUES + THURS, 6-9PM
 </h3>
 <h3>
  Instructor: David Daniel
 </h3>
</div>


In [47]:
test.h2

<h2 id="yui_3_17_2_1_1480579287682_382">Applied Data Visualization</h2>

In [50]:
test.find_all('h3')

[<h3>LEVEL: Advanced</h3>,
 <h3>START DATE: JAN 23RD, MON + WED, 6-9PM</h3>,
 <h3>DURATION: 8 WEEKS</h3>,
 <h3>COST $850</h3>,
 <h3>REACT OFFICE HOURS: +$250, TUES + THURS, 6-9PM</h3>,
 <h3>Instructor: David Daniel</h3>]

In [None]:
test.find_all('h3')

In [58]:
res = test.find(['h2'])

In [59]:
test.select('#yui_3_17_2_1_1480579287682_382')

[<h2 id="yui_3_17_2_1_1480579287682_382">Applied Data Visualization</h2>]

In [60]:
result = soup.select('.span-6 h2 , .span-8 h2 , .span-7 h2 , .span-7 strong')
[tag.text for tag in result if tag.text]

['DIGITIAL MARKETING AND BRANDING',
 '3D WITH UNITY',
 'Data Science in the Wild',
 'DEV OPS',
 'Backend Integration',
 'MODERN CSS',
 'ReactJS',
 'Applied Data Visualization',
 'MODERN CSS']

In [63]:
res

<h2 id="yui_3_17_2_1_1480579287682_382">Applied Data Visualization</h2>

In [68]:
res.next_sibling

<h3>LEVEL: Advanced</h3>

### Yelp Example

In [94]:
yelp_url = 'https://www.yelp.com/search?find_desc=poodles&find_loc=Paris'

In [70]:
yelp_page = requests.get(yelp_url)

In [71]:
yelp_page.status_code

200

In [72]:
selector = '.js-analytics-click span'

In [73]:
yelp_soup = bs4.BeautifulSoup(yelp_page.text)



 BeautifulSoup([your markup])

to this:

 BeautifulSoup([your markup], "html.parser")

  markup_type=markup_type))


In [79]:
result = yelp_soup.select(selector)
result

[<span>Pizza Factory</span>,
 <span>Life of Pie Pizza</span>,
 <span>Apizza Scholls</span>,
 <span>Scottie’s Pizza Parlor</span>,
 <span>Sizzle Pie</span>,
 <span>Baby Doll Pizza</span>,
 <span>Red Sauce Pizza</span>,
 <span>East Glisan Pizza Lounge</span>,
 <span>Escape From New York Pizza</span>,
 <span>The Pocket Pub</span>,
 <span>Giants NY Pizza &amp; Subs</span>]

In [84]:
t = result[0]
t.get_text()

'Pizza Factory'

In [85]:
all_pizza_names = [pizza.get_text() for pizza in result]
all_pizza_names

['Pizza Factory',
 'Life of Pie Pizza',
 'Apizza Scholls',
 'Scottie’s Pizza Parlor',
 'Sizzle Pie',
 'Baby Doll Pizza',
 'Red Sauce Pizza',
 'East Glisan Pizza Lounge',
 'Escape From New York Pizza',
 'The Pocket Pub',
 'Giants NY Pizza & Subs']

In [92]:
def get_biznames(url):
    page = requests.get(url)
    soup = bs4.BeautifulSoup(page.text, 'html.parser')
    selector = '.biz-name'
    results = soup.select(selector)
    names = [tag.get_text() for tag in results]
    return names
yelp_url

'https://www.yelp.com/search?find_desc=pizza&find_loc=Portland'

In [93]:
get_biznames('https://www.yelp.com/search?find_desc=poodles&find_loc=Paris')

['Au Paradis Canin',
 'Animalis',
 'Salon Pluche',
 'Toilettage Au Poil !',
 'Dog in the City',
 'Dog’s Store',
 'Chats Comme Chiens',
 'Canicrèche',
 'Prima Toilettage',
 'MiaouWaou']

In [95]:
def get_soup(url):
    page = requests.get(url)
    soup = bs4.BeautifulSoup(page.text, 'html.parser')
    return soup


In [96]:
soup = get_soup(yelp_url)

In [124]:
result = soup.select('a.biz-name')
t = result[0]

In [111]:
print(t.prettify())

<a class="biz-name js-analytics-click" data-analytics-label="biz-name" data-hovercard-id="pQKVA8h_bZReyUlvtnyQwA" href="/biz/au-paradis-canin-paris?osq=poodles">
 <span>
  Au Paradis Canin
 </span>
</a>



In [121]:
'http://www.yelp.com' + t.get('href')

'http://www.yelp.com/biz/au-paradis-canin-paris?osq=poodles'

In [122]:
yelp_soup.body.find_all('li', {'class': 'regular-search-result'})

[<li class="regular-search-result">
 <div class="search-result natural-search-result" data-key="1">
 <div class="biz-listing-large">
 <div class="main-attributes">
 <div class="media-block media-block--12">
 <div class="media-avatar">
 <div class="photo-box pb-90s">
 <a class="js-analytics-click" data-analytics-label="biz-photo" href="/biz/life-of-pie-pizza-portland?osq=pizza">
 <img alt="Life of Pie Pizza" class="photo-box-img" height="90" src="https://s3-media4.fl.yelpcdn.com/bphoto/vCRvwVkKB26ud3BfqnnJUQ/90s.jpg" width="90">
 </img></a>
 </div>
 </div>
 <div class="media-story">
 <h3 class="search-result-title">
 <span class="indexed-biz-name">1.         <a class="biz-name js-analytics-click" data-analytics-label="biz-name" data-hovercard-id="E_gJtcdekNi8vLI3-XcI_Q" href="/biz/life-of-pie-pizza-portland?osq=pizza"><span>Life of Pie Pizza</span></a>
 </span>
 </h3>
 <div class="biz-rating biz-rating-large clearfix">
 <div class="i-stars i-stars--regular-4-half rating-large" title="4.

In [None]:
yelp_soup.body.find_all('li', class_='regular-search-result')

In [None]:
result = yelp_soup.body.select('li.regular-search-result')

In [None]:
r = result[0]

In [None]:
r.find('a', class_='biz-name').span.text

In [None]:
r.select_one('a.biz-name span').text

In [None]:
r.select_one('div.i-stars').get('title').split()

In [123]:
list(r.address.stripped_strings)

NameError: name 'r' is not defined

#### Pagination

After we hit 'next' in the yelp search page, we get the second page of results.  the url looks like this:

`https://www.yelp.com/search?find_desc=pizza&find_loc=Portland&start=10`

Same as our original URL, but notice the **&start=10**  This is called a **query parameter**.  It's a key/value pair (in this case *start* and *10* respectively) that yelp uses to find and create the page we're looking for.

We can manually or programmatically adjust this to get the page we want.  Alternatively, we could find the 'next' button every time and follow that link.

In [None]:
hundreth_page = requests.get('https://www.yelp.com/search?find_desc=pizza&find_loc=Portland&start=1000')

In [None]:
# Same as the above
params = {'find_desc': 'pizza', 'find_loc': 'Portland OR', 'start': 100}
hundreth_page = requests.get('https://www.yelp.com/search', params=params)

In [None]:
hundreth_page.url

In [None]:
requests.utils.urlparse('https://www.yelp.com/search?find_desc=pizza&start=100&find_loc=Portland+OR').query

In [None]:
hundreth_page.ok
soup = bs4.BeautifulSoup(hundreth_page.text, 'html.parser')

In [None]:
soup.select('li.regular-search-result')

In [None]:
soup.select('#super-container > div > div > div > div > h3')

### Wikipedia Example

In [None]:
wiki_page_url = 'https://en.wikipedia.org/wiki/ISO_4217'
wiki_html = requests.get(wiki_page_url).text
wsoup = bs4.BeautifulSoup(wiki_html, 'html.parser')

In [None]:
wsoup.find_all('table')

In [None]:
wsoup.select_one('#Active_codes').parent.next_sibling.next_sibling.next_sibling.next_sibling

In [None]:
currency_table = wsoup.select_one('h2 + p + table')

In [None]:
row = currency_table.select('tr')[1]
row

In [None]:
wsoup.select('h2 + p + table tr')[1]

In [None]:
row.find_all('td')

In [None]:
requests.utils.quote('Portland, OR')

## Practice:
- write a script to play the wikipedia game.
- write a script to download all the comics from xkcd.
- Write a function that pulls the current weather
- Just think of a website you use often and play around.

### Additional References:
- https://automatetheboringstuff.com/chapter11/
- [HTTP Status Codes](http://www.restapitutorial.com/httpstatuscodes.html)