# Cleaning Quiz: Udacity's Course Catalog
It's your turn! Udacity's [course catalog page](https://www.udacity.com/courses/all) has changed since the last video was filmed. One notable change is the introduction of  _schools_.

In this activity, you're going to perform similar actions with BeautifulSoup to extract the following information from each course listing on the page:
1. The course name - e.g. "Data Analyst"
2. The school the course belongs to - e.g. "School of Data Science"

### Step 1: Get text from Udacity's course catalog web page
You can use the `requests` library to do this.

You may have to scroll down past the javascript and CSS in the output of the last cell in this section to see the text.

In [1]:
# import statements
import requests

In [2]:
# fetch web page
r = requests.get('https://www.udacity.com/courses/all')

In [3]:
# display text from web page
print(r.text)

<!DOCTYPE html><html><head>
  <meta charset="utf-8">
  <script type="text/javascript" class="ng-star-inserted">window.NREUM||(NREUM={}),__nr_require=function(t,n,e){function r(e){if(!n[e]){var o=n[e]={exports:{}};t[e][0].call(o.exports,function(n){var o=t[e][1][n];return r(o||n)},o,o.exports)}return n[e].exports}if("function"==typeof __nr_require)return __nr_require;for(var o=0;o<e.length;o++)r(e[o]);return r}({1:[function(t,n,e){function r(t){try{s.console&&console.log(t)}catch(n){}}var o,i=t("ee"),a=t(15),s={};try{o=localStorage.getItem("__nr_flags").split(","),console&&"function"==typeof console.log&&(s.console=!0,o.indexOf("dev")!==-1&&(s.dev=!0),o.indexOf("nr_dev")!==-1&&(s.nrDev=!0))}catch(c){}s.nrDev&&i.on("internal-error",function(t){r(t.stack)}),s.dev&&i.on("fn-err",function(t,n,e){r(e.stack)}),s.dev&&(r("NR AGENT IN DEVELOPMENT MODE"),r("flags: "+a(s,function(t,n){return t}).join(", ")))},{}],2:[function(t,n,e){function r(t,n,e,r,s){try{p?p-=1:o(s||new UncaughtException(t,n,e

### Step 2: Use BeautifulSoup to remove HTML tags
Use `"lxml"` rather than `"html5lib"`.

Again, you may have to scroll down past the javascript and CSS in the output of the last cell in this section to see the text. **Alternatively,** you can run the following two lines right before running `soup.get_text()`:

```python
for script in soup(["script", "style"]):
    script.decompose()
```
Read more about this [here](https://stackoverflow.com/questions/22799990/beatifulsoup4-get-text-still-has-javascript).

In [4]:
from bs4 import BeautifulSoup

In [13]:
soup = BeautifulSoup(r.text, 'lxml')

# Remove script and styling
[script.decompose() for script in soup(['script', 'style'])]
print(soup.get_text())




Free Courses and Nanodegree Programs | Udacity




























 Nanodegrees  All Courses  For Business  Blog  Sign In  Get Started  Nanodegrees  All Courses  For Business  Blog  Sign In  Get Started Program CatalogSort by PopularSort by NewestFilters Filter BySelect Program DetailsTypeNanodegreeFree CoursesSkill LevelBeginnerIntermediateAdvancedEstimated Duration<1 month1 - 3 months3+ monthsIndustry SkillsSkillsAI AlgorithmsAR developmentARKitAlteryxAndroid DevelopmentBlueprint programmingC++Career AdvancementCSSControls and EstimationCore DataDeep LearningDesign SprintDigital MarketingDisplay AdsExcelFacebook MarketingGoogle VR SDK for UnityHTMLJava programmingJavaScriptJupyter NotebooksKalman FiltersMachine LearningMobile Web AppsNetworkingNeural NetworksOffline Capable Web AppsOptimizationPredictive ModelingProbabilityPrototypingPythonReactReact NativeReduxRobot Operating SystemRoboticsSQLSearch AdsSearch AlgorithmsSelf-Driving CarsSocial Media MarketingStatisticsSupe

### Step 3: Find all course summaries
Use the BeautifulSoup's `find_all` method to select based on tag type and class name. Just ike in the video, you can right click on the item, and click "Inspect" to view its html on a web page.

In [20]:
# Find all course summaries
summaries = soup.find_all('div', {'class':'card__expander--summary'})
print('Number of Courses:', len(summaries))
summaries[0].get_text()

Number of Courses: 222


"Udacity's Intro to Programming course is your first step towards careers in Web and App Development, Machine Learning, Data Science, AI, and more! This program is perfect for beginners."

### Step 4: Inspect the first summary to find selectors for the course name and school
Tip: `.prettify()` is a super helpful method BeautifulSoup provides to output html in a nicely indented form! Make sure to use `print()` to ensure whitespace is displayed properly.

In [21]:
# print the first summary in summaries
cards = soup.find_all('div', {'class':'course-summary-card'})
print(cards[0].prettify())

<div _ngcontent-sc212="" class="course-summary-card row row-gap-medium catalog-card nanodegree-card ng-star-inserted">
 <ir-catalog-card _ngcontent-sc212="" _nghost-sc215="">
  <div _ngcontent-sc215="" class="card-wrapper is-collapsed">
   <div _ngcontent-sc215="" class="card__inner card mb-0">
    <div _ngcontent-sc215="" class="card__inner--upper">
     <div _ngcontent-sc215="" class="image_wrapper hidden-md-down">
      <a _ngcontent-sc215="" href="/course/intro-to-programming-nanodegree--nd000">
       <!-- -->
       <div _ngcontent-sc215="" class="image-container ng-star-inserted" style="background-image:url(https://eu.udacity.com/assets/iridium/images/shared/catalog-images/nd000.png);">
        <div _ngcontent-sc215="" class="image-overlay">
        </div>
       </div>
      </a>
      <!-- -->
     </div>
     <div _ngcontent-sc215="" class="card-content">
      <!-- -->
      <!-- -->
      <div _ngcontent-sc215="" class="category-wrapper">
       <span _ngcontent-sc215="" cl

Look for selectors that contain the courses title and school name text you want to extract. Then, use the `select_one` method on the summary object to pull out the html with those selectors. Afterwards, don't forget to do some extra cleaning to isolate the names (get rid of unnecessary html), as you saw in the last video.

In [40]:
# Extract course title
cards[0].select_one('h3').get_text()

'Learn to Code'

In [24]:
# Extract course title# Extract course title
cards[0].select_one('h4').get_text()

'School of Development'

In [48]:
# Extract detail
cards[0].select_one('div .card__expander--summary').get_text().strip()

"Udacity's Intro to Programming course is your first step towards careers in Web and App Development, Machine Learning, Data Science, AI, and more! This program is perfect for beginners."

### Step 5: Collect names and schools of ALL course listings
Reuse your code from the previous step, but now in a loop to extract the name and school from every course summary in `summaries`!

In [71]:
courses = []
for card in cards:
    # append name and school of each summary to courses list
    title = card.select_one('h3')
    school = card.select_one('h4')
    detail = card.select_one('div .card__expander--summary')
    courses.append(tuple([t.get_text().strip() for t in (title, school, detail)]))

In [74]:
# display results
print(len(courses), "course summaries found. Sample:")
courses[:5]

222 course summaries found. Sample:


[('Learn to Code',
  'School of Development',
  "Udacity's Intro to Programming course is your first step towards careers in Web and App Development, Machine Learning, Data Science, AI, and more! This program is perfect for beginners."),
 ('What is Programming?',
  'School of Development',
  'This course is your first step towards a career in programming.'),
 ('Become a Data Analyst',
  'School of Data Science',
  'Prepare for a career in data analytics. Learn the skills and tools to uncover insights, communicate critical findings, and create data-driven solutions.'),
 ('Become an iOS Developer',
  'School of Development',
  'Master the Swift programming language and create a portfolio of iOS apps for iPhone and iPad to showcase your skills!'),
 ('Become a Professional Full Stack Developer',
  'School of Development',
  'In this program, you’ll prepare for a job as a Full Stack Web Developer, and learn to create websites, and complex server-side web applications that use powerful relat