# Cleaning Quiz: Udacity's Course Catalog
It's your turn! Udacity's [course catalog page](https://www.udacity.com/courses/all) has changed since the last video was filmed. One notable change is the introduction of  _schools_.

In this activity, you're going to perform similar actions with BeautifulSoup to extract the following information from each course listing on the page:
1. The course name - e.g. "Data Analyst"
2. The school the course belongs to - e.g. "School of Data Science"

### Step 1: Get text from Udacity's course catalog web page
You can use the `requests` library to do this.

In [1]:
# import statements
import requests
from bs4 import BeautifulSoup

In [2]:
# fetch web page
r = requests.get("https://www.udacity.com/courses/all")

### Step 2: Use BeautifulSoup to remove HTML tags
Use `"lxml"` rather than `"html5lib"`.

In [3]:
soup = BeautifulSoup(r.text, "lxml")
# print(soup.get_text())

### Step 3: Find all course summaries
Use the BeautifulSoup's `find_all` method to select based on tag type and class name. Just ike in the video, you can right click on the item, and click "Inspect" to view its html on a web page.

In [4]:
# Find all course summaries
summaries = soup.find_all("div", {"class":"course-summary-card"})
print('Number of Courses:', len(summaries))

Number of Courses: 225


### Step 4: Inspect the first summary to find selectors for the course name and school
Tip: `.prettify()` is a super helpful method BeautifulSoup provides to output html in a nicely indented form! Make sure to use `print()` to ensure whitespace is displayed properly.

In [5]:
# print the first summary in summaries
print(summaries[0].prettify())

<div _ngcontent-sc259="" class="course-summary-card row row-gap-medium catalog-card nanodegree-card ng-star-inserted">
 <ir-catalog-card _ngcontent-sc259="" _nghost-sc262="">
  <div _ngcontent-sc262="" class="card-wrapper is-collapsed">
   <div _ngcontent-sc262="" class="card__inner card mb-0">
    <div _ngcontent-sc262="" class="card__inner--upper">
     <div _ngcontent-sc262="" class="image_wrapper hidden-md-down">
      <a _ngcontent-sc262="" href="/course/intro-to-programming-nanodegree--nd000">
       <!-- -->
       <div _ngcontent-sc262="" class="image-container ng-star-inserted" style="background-image:url(https://d20vrrgs8k4bvw.cloudfront.net/images/degrees/nd000/nd-card.jpg);">
        <div _ngcontent-sc262="" class="image-overlay">
        </div>
       </div>
      </a>
      <!-- -->
     </div>
     <div _ngcontent-sc262="" class="card-content">
      <!-- -->
      <!-- -->
      <div _ngcontent-sc262="" class="category-wrapper">
       <span _ngcontent-sc262="" class="m

Look for selectors contain the the courses title and school name text you want to extract. Then, use the `select_one` method on the summary object to pull out the html with those selectors. Afterwards, don't forget to do some extra cleaning to isolate the names (get rid of unnecessary html), as you saw in the last video.

In [6]:
# Extract course title
summaries[0].select_one("h3").get_text().strip()

'Introduction to Programming'

In [7]:
# Extract school
summaries[0].select_one("h4").get_text().strip()

'School of Development'

### Step 5: Collect names and schools of ALL course listings
Reuse your code from the previous step, but now in a loop to extract the name and school from every course summary in `summaries`!

In [8]:
courses = []
for summary in summaries:
    # append name and school of each summary to courses list
    title = summary.select_one("h3").get_text().strip()
    school = summary.select_one("h4").get_text().strip()
    courses.append((title, school))

In [9]:
# display results
print(len(courses), "course summaries found. Sample:")
courses[:20]

225 course summaries found. Sample:


[('Introduction to Programming', 'School of Development'),
 ('What is Programming?', 'School of Development'),
 ('Data Analyst', 'School of Data Science'),
 ('iOS Developer', 'School of Development'),
 ('Full Stack Web Developer', 'School of Development'),
 ('Predictive Analytics for Business', 'School of Business'),
 ('Machine Learning Engineer', 'School of Artificial Intelligence'),
 ('Self Driving Car Engineer Nanodegree', 'School of Autonomous Systems'),
 ('VR Developer', 'School of Development'),
 ('Digital Marketing', 'School of Business'),
 ('React', 'School of Development'),
 ('Mobile Web Specialist', 'School of Development'),
 ('Data Scientist', 'School of Data Science'),
 ('AI Programming with Python', 'School of Artificial Intelligence'),
 ('Business Analytics', 'School of Business'),
 ('Deep Learning', 'School of Artificial Intelligence'),
 ('Programming for Data Science', 'School of Data Science'),
 ('Intro to Self-Driving Cars', 'School of Autonomous Systems'),
 ('Learn U