# Web scraping
A common data collection task is to collect data from web pages and transform them into an analysis ready format. In this exercise, you'll be scraping the [Informatics Course Information](https://www.washington.edu/students/crscat/info.html) page to ask some basic questions about the courses offered. To do so, we'll be using the [beautiful soup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/) package, which should have been included as part of the Anaconda python distribtuion. A great starting place for understanding the package is [this tutorial](https://www.dataquest.io/blog/web-scraping-tutorial-python/).

## Set up
In order to use packages in a script (that are downloaded as part of the Anaconda distribution), you will need to `import` them. There are a variety of approaches for importing packages: you can import an entire package, or import only some of its functions. It's common to import a package as an abbreviation for easier use with this syntax:

```
# Import a package as an abbreviation
import requests as r

# Only import some functions (as abbreviations)
from bs4 import BeautifulSoup as bs, SoupStrainer as ss
```

In [1]:
# Import the `requests` and `BeautifulSoup` packages using the code above

## Scraping Content

In [2]:
# Use the `get` method of the requests library to fetch the page content

In [3]:
# Use bs to parse the HTML returned

In [4]:
# We can now use the `find_all` method to find all course title elements
# Store the *text* of the course titles in variable
# Hint: You'll need to review the HTML to figure out how to identify them
# Hint: use a list comprehension!

In [5]:
# We can now use the `find_all` method to find all course description elements
# Store the *text* of the course description in variable
# Hint: You'll need to review the HTML to figure out how to identify them
# Hint: you may have to skip certain elements...

## Data processing
Now that you have the data, we'll re-structure it so that we can easily ask questions about the data

In [6]:
# Create a dictionary where the *keys are course numbers*, and the values are *dictionaries* 
# with information about that course. Specifically, include the following values: 
#     - "title": title of the course (from above)
#     - "description": description of the course (from above)
#     - "credits": can be a string of the number of credits (some are a range)
#     - "level": 100, 200, 300, or 400 (an *integer*)
# Hint: start with an empty dictionary, and use a loop, keeping track of the *index* using the `enumerate` method
# Hint: think of creative ways to get the credits/level from your string 
# the `.find` method can help you find characters in a string

## Asking questions of the data
Now we can filter the dataset to ask questions of interest

In [7]:
# How many courses are 300 level courses?
# Hint: use a list comprehension! 

In [8]:
# Write a function that takes in your courses object and a course level (100, 200, etc.) and 
# returns all of the *course titles* of courses that are that level

# Make sure to use a doc string to document your function

In [9]:
# Demonstrate that your function works by passing in the `courses` object and a course level