# Cleaning Quiz: Udacity's Course Catalog
It's your turn! Udacity's [course catalog page](https://www.udacity.com/courses/all) has changed since the last video was filmed. One notable change is the introduction of  _schools_.

In this activity, you're going to perform similar actions with BeautifulSoup to extract the following information from each course listing on the page:
1. The course name - e.g. "Data Analyst"
2. The school the course belongs to - e.g. "School of Data Science"

**Note: All solution notebooks can be found by clicking on the Jupyter icon on the top left of this workspace.**

### Step 1: Get text from Udacity's course catalog web page
You can use the `requests` library to do this.

You may have to scroll down past the javascript and CSS in the output of the last cell in this section to see the text.

In [None]:
# import statements
import requests
from bs4 import BeautifulSoup

In [None]:
# fetch web page
r = requests.get("https://www.udacity.com/courses/all")

In [None]:
# display text from web page
print(r.text)

### Step 2: Use BeautifulSoup to remove HTML tags
Use `"lxml"` rather than `"html5lib"`.

Again, you may have to scroll down past the javascript and CSS in the output of the last cell in this section to see the text. **Alternatively,** you can run the following two lines right before running `soup.get_text()`:

```python
for script in soup(["script", "style"]):
    script.decompose()
```
Read more about this [here](https://stackoverflow.com/questions/22799990/beatifulsoup4-get-text-still-has-javascript).

In [12]:
BeautifulSoup?

In [8]:
soup = BeautifulSoup(r.text)
# kill all script and style elements
for script in soup(["script", "style"]):
    script.decompose()    # rip it out

print(soup.get_text())






























Explore SchoolsBack to Menu Artificial Intelligence Back to MenuNanodegree Programs Machine Learning Engineer  AI Programming with Python  Deep Learning  Artificial Intelligence for Trading  Computer Vision  Natural Language Processing  Deep Reinforcement Learning  Artificial Intelligence  Data Science Back to MenuNanodegree Programs Data Analyst  Predictive Analytics for Business  Data Scientist  Business Analytics  Programming for Data Science  Programming and Development Back to MenuNanodegree Programs Introduction to Programming  Front End Web Developer  iOS Developer  Full Stack Web Developer  React  Mobile Web Specialist  VR Foundations  VR Mobile 360  VR High-Immersion  Learn Unreal VR Foundations  Blockchain Developer  Android Developer  Android Basics  Autonomous Systems Back to MenuNanodegree Programs Self Driving Car Engineer Nanodegree  Intro to Self-Driving Cars Nanodegree  Robotics Software Engineer Nanodegree  Flying Car and Autonomous Flight

### Step 3: Find all course summaries
Use the BeautifulSoup's `find_all` method to select based on tag type and class name. Just ike in the video, you can right click on the item, and click "Inspect" to view its html on a web page.

In [11]:
# Find all course summaries
summaries = soup.find_all("summary")
print('Number of Courses:', len(summaries))

Number of Courses: 0


### Step 4: Inspect the first summary to find selectors for the course name and school
Tip: `.prettify()` is a super helpful method BeautifulSoup provides to output html in a nicely indented form! Make sure to use `print()` to ensure whitespace is displayed properly.

In [None]:
# print the first summary in summaries


Look for selectors that contain the courses title and school name text you want to extract. Then, use the `select_one` method on the summary object to pull out the html with those selectors. Afterwards, don't forget to do some extra cleaning to isolate the names (get rid of unnecessary html), as you saw in the last video.

In [None]:
# Extract course title


In [None]:
# Extract school


### Step 5: Collect names and schools of ALL course listings
Reuse your code from the previous step, but now in a loop to extract the name and school from every course summary in `summaries`!

In [None]:
courses = []
for summary in summaries:
    # append name and school of each summary to courses list


In [None]:
# display results
print(len(courses), "course summaries found. Sample:")
courses[:20]