# Challenges Scraping Non-Tabular Data

On <a href="https://sandeepmj.github.io/scrape-example-page">this demo page</a> I've reproduced several variations of issues we are likely to encounter when scraping.

- Review scrape of an well-organized page.
- Dynamically getting column names.
- Scraping a disorganized page.
- Excluding multi-classes.


Let's start by scraping <a href="https://sandeepmj.github.io/scrape-example-page/#organized">the organized CEO data</a>.

### The same steps each time:

* Is the content on the page (use ```Reveal Source```)?
* Where and how is the content held on the page?
* Which classes and IDs do we target?
* Is there a pattern?
* Is there anything that breaks the pattern?

In [1]:
pip install icecream

Collecting icecream
  Downloading icecream-2.1.3-py2.py3-none-any.whl (8.4 kB)
Collecting asttokens>=2.0.1
  Downloading asttokens-2.4.0-py2.py3-none-any.whl (27 kB)
Collecting executing>=0.3.1
  Downloading executing-2.0.0-py2.py3-none-any.whl (24 kB)
Installing collected packages: executing, asttokens, icecream
Successfully installed asttokens-2.4.0 executing-2.0.0 icecream-2.1.3
Note: you may need to restart the kernel to use updated packages.


In [2]:
## import libraries
from bs4 import BeautifulSoup
import pandas as pd
import requests ## a library that returns information from websites.
from icecream import ic

In [6]:
## link to scrape
url = "https://sandeepmj.github.io/scrape-example-page/"

In [10]:
## request site html
response = requests.get(url)
response.status_code

200

In [23]:
## what type of data is this?
type(response)

requests.models.Response

In [22]:
## turn into soup
soup = BeautifulSoup(response.text, "html.parser")

In [21]:
## call soup
print(soup.prettify())

<!DOCTYPE html>
<!--
   Basic template
-->
<html lang="en">
 <head>
  <!-- Makes the page responsive and scaled to be read easily -->
  <meta content="width=device-width, initial-scale=1" name="viewport"/>
  <!-- Links to stylesheet -->
  <link href="style.css" rel="stylesheet" type="text/css"/>
  <!-- Remember to update page title -->
  <title>
   Demo Webpage for Scraping
  </title>
 </head>
 <body>
  <!-- All content goes here -->
  <div class="container">
   <div class="headline">
    Demo Webpage for Scraping
   </div>
   <div class="text">
    <p>
     This page holds some content to demo scraping.
    </p>
    <ul>
     <li>
      <a href="#bev">
       Morning Beverages
      </a>
     </li>
     <li>
      <a href="#organized">
       Organized Data
      </a>
     </li>
     <li>
      <a href="#disorganized">
       Disorganized Data
      </a>
     </li>
     <li>
      <a href="#spanned">
       Extra Spans
      </a>
     </li>
     <li>
      <a href="#exclude">
       E

In [25]:
## what do we target? Which class or ID?
organized = soup.find(id = "organized")
organized

<section id="organized">
<h2>Organized - Top 5 Compensated CEOs in 2018</h2>
<div class="ceo">
<p class="rank">Rank: 1</p>
<p class="name">Name: Hock E. Tan</p>
<p class="annual_compensation">Annual Compensation: $103.2 million</p>
<p class="company">Company: Broadcom</p>
</div>
<div class="ceo">
<p class="rank">Rank: 2</p>
<p class="name">Name: Frank Bisignano</p>
<p class="annual_compensation">Annual Compensation: $102.2 million</p>
<p class="company">Company: First Data (FDC)</p>
</div>
<div class="ceo">
<p class="rank">Rank: 3</p>
<p class="name">Name: Michael Rapino</p>
<p class="annual_compensation">Annual Compensation: $70.6 million</p>
<p class="company">Company: Live Nation Entertainment (LYV)</p>
</div>
<div class="ceo">
<p class="rank">Rank: 4</p>
<p class="name">Name: Leslie Moonves</p>
<p class="annual_compensation">Annual Compensation: 68.4 million</p>
<p class="company">Company: CBS</p>
</div>
<div class="ceo">
<p class="rank">Rank: 5</p>
<p class="name">Name: Gregory Ma

In [26]:
## what type of data?
## if list, we will need to iterate
type(organized)

bs4.element.Tag

In [32]:
## Isolate to CEOs only rather than h2 etc
ceos = organized.find_all("div", class_="ceo")
ceos

[<div class="ceo">
 <p class="rank">Rank: 1</p>
 <p class="name">Name: Hock E. Tan</p>
 <p class="annual_compensation">Annual Compensation: $103.2 million</p>
 <p class="company">Company: Broadcom</p>
 </div>,
 <div class="ceo">
 <p class="rank">Rank: 2</p>
 <p class="name">Name: Frank Bisignano</p>
 <p class="annual_compensation">Annual Compensation: $102.2 million</p>
 <p class="company">Company: First Data (FDC)</p>
 </div>,
 <div class="ceo">
 <p class="rank">Rank: 3</p>
 <p class="name">Name: Michael Rapino</p>
 <p class="annual_compensation">Annual Compensation: $70.6 million</p>
 <p class="company">Company: Live Nation Entertainment (LYV)</p>
 </div>,
 <div class="ceo">
 <p class="rank">Rank: 4</p>
 <p class="name">Name: Leslie Moonves</p>
 <p class="annual_compensation">Annual Compensation: 68.4 million</p>
 <p class="company">Company: CBS</p>
 </div>,
 <div class="ceo">
 <p class="rank">Rank: 5</p>
 <p class="name">Name: Gregory Maffei</p>
 <p class="annual_compensation">Annua

In [33]:
## type?
type(ceos)

bs4.element.ResultSet

In [34]:
## length
len(ceos)

5

In [62]:
## get all ranks into a list
## via for loop
ranks_fl = []
for rank in ceos:
    ranks_fl.append(rank.find(class_ = "rank").get_text().replace("Rank: ", ""))
ranks_fl

['1', '2', '3', '4', '5']

In [63]:
## get all ranks into a list
## via List comprehension
ranks_lc = [rank.find(class_ = "rank").get_text().replace("Rank: ", "") for rank in ceos]
ranks_lc

['1', '2', '3', '4', '5']

In [65]:
## find all the names using LC
names_lc = [name.find(class_ = "name").get_text().replace("Name: ", "") for name in ceos]
names_lc

['Hock E. Tan',
 'Frank Bisignano',
 'Michael Rapino',
 'Leslie Moonves',
 'Gregory Maffei']

In [66]:
## annual_comp via lc:
annual_comp_lc = [compensation.find(class_ = "annual_compensation").get_text().replace("Annual Compensation: ", "") for compensation in ceos]
annual_comp_lc

['$103.2 million',
 '$102.2 million',
 '$70.6 million',
 '68.4 million',
 '$67.2 million']

In [68]:
## company name via LC:
company_lc = [company.find(class_ = "company").get_text().replace("Company: ", "") for company in ceos]
company_lc

['Broadcom',
 'First Data (FDC)',
 'Live Nation Entertainment (LYV)',
 'CBS',
 'Liberty Media & Qurate Retail Group']

In [71]:
## FOR Loop and zip into list of tuples
top_ceos = []
for item in zip (ranks_lc, names_lc, company_lc, annual_comp_lc):
    top_ceos.append(item)
top_ceos

[('1', 'Hock E. Tan', 'Broadcom', '$103.2 million'),
 ('2', 'Frank Bisignano', 'First Data (FDC)', '$102.2 million'),
 ('3', 'Michael Rapino', 'Live Nation Entertainment (LYV)', '$70.6 million'),
 ('4', 'Leslie Moonves', 'CBS', '68.4 million'),
 ('5',
  'Gregory Maffei',
  'Liberty Media & Qurate Retail Group',
  '$67.2 million')]

In [72]:
df = pd.DataFrame(top_ceos)
df.columns = ["Rank", "Name", "Company", "Annual compensation"]
df

Unnamed: 0,Rank,Name,Company,Annual compensation
0,1,Hock E. Tan,Broadcom,$103.2 million
1,2,Frank Bisignano,First Data (FDC),$102.2 million
2,3,Michael Rapino,Live Nation Entertainment (LYV),$70.6 million
3,4,Leslie Moonves,CBS,68.4 million
4,5,Gregory Maffei,Liberty Media & Qurate Retail Group,$67.2 million


### Built-in Functions are always faster

In [75]:
## instead of for loop zip, we can use list and zip
ceos_x = zip(ranks_lc, names_lc, company_lc, annual_comp_lc) 
ceos_x #muestra que el lugar de la memoria donde está zippeado

<zip at 0x7faea2d65980>

In [77]:
list(ceos_x) # Es más rápido. Los for loops son lentos

[('1', 'Hock E. Tan', 'Broadcom', '$103.2 million'),
 ('2', 'Frank Bisignano', 'First Data (FDC)', '$102.2 million'),
 ('3', 'Michael Rapino', 'Live Nation Entertainment (LYV)', '$70.6 million'),
 ('4', 'Leslie Moonves', 'CBS', '68.4 million'),
 ('5',
  'Gregory Maffei',
  'Liberty Media & Qurate Retail Group',
  '$67.2 million')]

In [79]:
## turn into dataframe
df2 = pd.DataFrame(top_ceos,  columns = ["Rank", "Name", "Company", "Annual compensation"])
df2

Unnamed: 0,Rank,Name,Company,Annual compensation
0,1,Hock E. Tan,Broadcom,$103.2 million
1,2,Frank Bisignano,First Data (FDC),$102.2 million
2,3,Michael Rapino,Live Nation Entertainment (LYV),$70.6 million
3,4,Leslie Moonves,CBS,68.4 million
4,5,Gregory Maffei,Liberty Media & Qurate Retail Group,$67.2 million


## What if there are dozens or more column header names?
### That's a lot of typing

In [80]:
## recall what CEOs was
ceos

[<div class="ceo">
 <p class="rank">Rank: 1</p>
 <p class="name">Name: Hock E. Tan</p>
 <p class="annual_compensation">Annual Compensation: $103.2 million</p>
 <p class="company">Company: Broadcom</p>
 </div>,
 <div class="ceo">
 <p class="rank">Rank: 2</p>
 <p class="name">Name: Frank Bisignano</p>
 <p class="annual_compensation">Annual Compensation: $102.2 million</p>
 <p class="company">Company: First Data (FDC)</p>
 </div>,
 <div class="ceo">
 <p class="rank">Rank: 3</p>
 <p class="name">Name: Michael Rapino</p>
 <p class="annual_compensation">Annual Compensation: $70.6 million</p>
 <p class="company">Company: Live Nation Entertainment (LYV)</p>
 </div>,
 <div class="ceo">
 <p class="rank">Rank: 4</p>
 <p class="name">Name: Leslie Moonves</p>
 <p class="annual_compensation">Annual Compensation: 68.4 million</p>
 <p class="company">Company: CBS</p>
 </div>,
 <div class="ceo">
 <p class="rank">Rank: 5</p>
 <p class="name">Name: Gregory Maffei</p>
 <p class="annual_compensation">Annua

In [81]:
## what does a single item in the list hold
ceos[0]

<div class="ceo">
<p class="rank">Rank: 1</p>
<p class="name">Name: Hock E. Tan</p>
<p class="annual_compensation">Annual Compensation: $103.2 million</p>
<p class="company">Company: Broadcom</p>
</div>

## ```HTML Attributes```

In the following ```html```:

```<div class="some_class">This div class holds something</div>```

- ```div``` is a tag.
- ```class``` is an attribute.
- ```some_class``` holds a value.

We can use ```BeautifulSoup``` to grab what an attribute holds:

```div['class']``` will return ```some_class```

In [93]:
## return the class of the very first item
ceos[0].p["class"]

['rank']

In [99]:
## zoom into a single instance of ceos
ptags = ceos[0].find_all("p")
ptags    

[<p class="rank">Rank: 1</p>,
 <p class="name">Name: Hock E. Tan</p>,
 <p class="annual_compensation">Annual Compensation: $103.2 million</p>,
 <p class="company">Company: Broadcom</p>]

In [101]:
## let's get class values in our ptags
col_names = []
for p in ptags:
    col_names.append(p["class"])
col_names

[['rank'], ['name'], ['annual_compensation'], ['company']]

In [123]:
## let's actually grab the values via FL:
col_names_fl = []
for ptag in ptags:
    col_names_fl.append(ptag["class"][0])
col_names_fl

['rank', 'name', 'annual_compensation', 'company']

In [124]:
## let's actually grab the values via LC:
## let's actually grab the values via FL:
col_names_lc = [ptag["class"][0] for ptag in ptags]
col_names_lc

['rank', 'name', 'annual_compensation', 'company']

In [125]:
import this

In [129]:
## Pandas to create data frame
## turn into dataframe
df3 = pd.DataFrame(top_ceos,  
                   columns = col_names_fl)
df3

Unnamed: 0,rank,name,annual_compensation,company
0,1,Hock E. Tan,Broadcom,$103.2 million
1,2,Frank Bisignano,First Data (FDC),$102.2 million
2,3,Michael Rapino,Live Nation Entertainment (LYV),$70.6 million
3,4,Leslie Moonves,CBS,68.4 million
4,5,Gregory Maffei,Liberty Media & Qurate Retail Group,$67.2 million


# Reality - not handed to you so cleanly

### Let's scrape the disorganized section

In [130]:
## recall soup
soup

<!DOCTYPE html>

<!--
   Basic template
-->
<html lang="en">
<head>
<!-- Makes the page responsive and scaled to be read easily -->
<meta content="width=device-width, initial-scale=1" name="viewport"/>
<!-- Links to stylesheet -->
<link href="style.css" rel="stylesheet" type="text/css"/>
<!-- Remember to update page title -->
<title>Demo Webpage for Scraping</title>
</head>
<body>
<!-- All content goes here -->
<div class="container">
<div class="headline">Demo Webpage for Scraping</div>
<div class="text">
<p>This page holds some content to demo scraping.</p>
<ul>
<li><a href="#bev">Morning Beverages</a></li>
<li><a href="#organized">Organized Data</a></li>
<li><a href="#disorganized">Disorganized Data</a></li>
<li><a href="#spanned">Extra Spans</a></li>
<li><a href="#exclude">Exclude a Class in Common</a></li>
<li><a href="#nfl_table">Tabular Data</a></li>
<li><a href="heaviest-animals-page1.html">Multipage Tabular Data</a></li>
</ul>
</div>
<section id="bev">
<table class="full_table

In [132]:
## grab appropriate section
disorg = soup.find(id = "disorganized")
disorg

<section id="disorganized">
<h2>Disorganized - Top 5 Compensated CEOs in 2018</h2>
<div class="ceo">
<span>Rank:</span><dt> 1</dt>
<span>Name:</span><dt> Hock E. Tan</dt>
<span>Annual compensation:</span><dt> $103.2 million</dt>
<span>Company:</span><dt> Broadcom</dt>
</div>
<div class="ceo">
<span>Rank: </span><dt> 2</dt>
<span>Name:</span><dt> Frank Bisignano</dt>
<span>Annual Compensation:</span><dt> $102.2 million</dt>
<span>Company:</span><dt> First Data (FDC)</dt>
</div>
<div class="ceo">
<span>Rank: </span><dt> 3</dt>
<span>Name:</span><dt> Michael Rapino</dt>
<span>Annual Compensation:</span><dt> $70.6 million</dt>
<span>Company:</span><dt> Live Nation Entertainment (LYV)</dt>
</div>
<div class="ceo">
<span>Rank: </span><dt> 4</dt>
<span>Name:</span><dt> Leslie Moonves</dt>
<span>Annual Compensation:</span><dt> 68.4 million</dt>
<span>Company:</span><dt> CBS</dt>
</div>
<div class="ceo">
<span>Rank: </span><dt> 5</dt>
<span>Name:</span> <dt> Gregory Maffei</dt>
<span>Annual Com

### Why is this disorganized?

In [133]:
## type of data
type(disorg)

bs4.element.Tag

In [140]:
## find class ceo and store in object called ceos
ceosd = disorg.find_all(class_ = "ceo")
ceosd

[<div class="ceo">
 <span>Rank:</span><dt> 1</dt>
 <span>Name:</span><dt> Hock E. Tan</dt>
 <span>Annual compensation:</span><dt> $103.2 million</dt>
 <span>Company:</span><dt> Broadcom</dt>
 </div>,
 <div class="ceo">
 <span>Rank: </span><dt> 2</dt>
 <span>Name:</span><dt> Frank Bisignano</dt>
 <span>Annual Compensation:</span><dt> $102.2 million</dt>
 <span>Company:</span><dt> First Data (FDC)</dt>
 </div>,
 <div class="ceo">
 <span>Rank: </span><dt> 3</dt>
 <span>Name:</span><dt> Michael Rapino</dt>
 <span>Annual Compensation:</span><dt> $70.6 million</dt>
 <span>Company:</span><dt> Live Nation Entertainment (LYV)</dt>
 </div>,
 <div class="ceo">
 <span>Rank: </span><dt> 4</dt>
 <span>Name:</span><dt> Leslie Moonves</dt>
 <span>Annual Compensation:</span><dt> 68.4 million</dt>
 <span>Company:</span><dt> CBS</dt>
 </div>,
 <div class="ceo">
 <span>Rank: </span><dt> 5</dt>
 <span>Name:</span> <dt> Gregory Maffei</dt>
 <span>Annual Compensation:</span><dt> $67.2 million</dt>
 <span>Com

In [156]:
## type
type(ceosd)

bs4.element.ResultSet

In [164]:
## ic all the items of interest
ceo_dict_list = []
for ceo in ceosd:
        all_targets = ceo.find_all ("dt")
        ranksd = all_targets[0].get_text(strip = True) #Strip elimina los espacios
        namesd = all_targets[1].get_text(strip = True)
        companiesd = all_targets[2].get_text(strip = True)
        annual_compd = all_targets[3].get_text(strip = True)
        ceo_temp_dict = {
            "rank": ranksd,
            "name": namesd,
            "company" : companiesd,
            "annual_comp" : annual_compd
        }
        ceo_dict_list.append(ceo_temp_dict)


In [165]:
## call list
ceo_dict_list

[{'rank': '1',
  'name': 'Hock E. Tan',
  'company': '$103.2 million',
  'annual_comp': 'Broadcom'},
 {'rank': '2',
  'name': 'Frank Bisignano',
  'company': '$102.2 million',
  'annual_comp': 'First Data (FDC)'},
 {'rank': '3',
  'name': 'Michael Rapino',
  'company': '$70.6 million',
  'annual_comp': 'Live Nation Entertainment (LYV)'},
 {'rank': '4',
  'name': 'Leslie Moonves',
  'company': '68.4 million',
  'annual_comp': 'CBS'},
 {'rank': '5',
  'name': 'Gregory Maffei',
  'company': '$67.2 million',
  'annual_comp': 'Liberty Media & Qurate Retail Group'}]

In [167]:
## turn into df
df4 = pd.DataFrame(ceo_dict_list)
df4

Unnamed: 0,rank,name,company,annual_comp
0,1,Hock E. Tan,$103.2 million,Broadcom
1,2,Frank Bisignano,$102.2 million,First Data (FDC)
2,3,Michael Rapino,$70.6 million,Live Nation Entertainment (LYV)
3,4,Leslie Moonves,68.4 million,CBS
4,5,Gregory Maffei,$67.2 million,Liberty Media & Qurate Retail Group


# Excluding classes

Most modern sites have tags that include multiple classes.

What if you want to target a tag with a single class but that class also appears in tags with others that holds other types of content.

For example, capture ```Excluding Some Classes``` section of our page in ```BeautifulSoup``` object.



### ```.select```

In this case, we want to target the ```name``` class that does not also have the ```former``` class.

For example:

```<p class="name">Name: Hock E. Tan</p>```

v.

```<p class="name former">Ex-CEO: Charlie Fote</p>```

In this case we use ```.select``` which looks for that class by itself.

```soup.select('[class="class_name"]')```

A simple example:

In [168]:
## RUN this cell that holds some html
some_html = '''<li> Silly List </li>
<li class="a"> A alone  - UNWANTED </li>
<li class="a z"> A and Z  - UNWANTED </li>
<li class="z"> Z first - my target</li>
<li class="b z"> B and Z  - UNWANTED</li>
<li class="x z"> X and Z - UNWANTED </li>
<li class="z"> Z second - my target</li>'''



In [169]:
## print it
print (some_html)

<li> Silly List </li>
<li class="a"> A alone  - UNWANTED </li>
<li class="a z"> A and Z  - UNWANTED </li>
<li class="z"> Z first - my target</li>
<li class="b z"> B and Z  - UNWANTED</li>
<li class="x z"> X and Z - UNWANTED </li>
<li class="z"> Z second - my target</li>


In [177]:
## convert to temp_soup
temp_soup = BeautifulSoup(some_html, "html.parser")
temp_soup

<li> Silly List </li>
<li class="a"> A alone  - UNWANTED </li>
<li class="a z"> A and Z  - UNWANTED </li>
<li class="z"> Z first - my target</li>
<li class="b z"> B and Z  - UNWANTED</li>
<li class="x z"> X and Z - UNWANTED </li>
<li class="z"> Z second - my target</li>

In [178]:
print(temp_soup.prettify())

<li>
 Silly List
</li>
<li class="a">
 A alone  - UNWANTED
</li>
<li class="a z">
 A and Z  - UNWANTED
</li>
<li class="z">
 Z first - my target
</li>
<li class="b z">
 B and Z  - UNWANTED
</li>
<li class="x z">
 X and Z - UNWANTED
</li>
<li class="z">
 Z second - my target
</li>


In [182]:
## target li tags, class of z
temp_soup.find(class_ = "z")

<li class="a z"> A and Z  - UNWANTED </li>

In [183]:
## find all
## target li tags, class of z
temp_soup.find_all(class_ = "z")

[<li class="a z"> A and Z  - UNWANTED </li>,
 <li class="z"> Z first - my target</li>,
 <li class="b z"> B and Z  - UNWANTED</li>,
 <li class="x z"> X and Z - UNWANTED </li>,
 <li class="z"> Z second - my target</li>]

In [187]:
## select li tag class z only!
temp_soup.select('li[class = "z"]')

[<li class="z"> Z first - my target</li>,
 <li class="z"> Z second - my target</li>]

### Back to our CEOs

In [188]:
## what do you see that might be a challenge?
exclude = soup.find(id = "nonsense")
exclude

<section id="nonsense">
<h2>Exclude Some Classes</h2>
<div class="ceo">
<p class="rank">Rank: 1</p>
<p class="name">Name: Hock E. Tan</p>
<p class="annual_compensation">Annual Compensation: $103.2 million</p>
<p class="company">Company: Broadcom</p>
<p class="rank">Rank: 2</p>
<p class="name">Name: Frank Bisignano</p>
<p class="annual_compensation">Annual Compensation: $102.2 million</p>
<p class="company">Company: First Data (FDC)</p>
<p class="rank">Rank: 3</p>
<p class="name">Name: Michael Rapino</p>
<p class="annual_compensation">Annual Compensation: $70.6 million</p>
<p class="company">Company: Live Nation Entertainment (LYV)</p>
<p class="rank">Rank: 4</p>
<p class="name">Name: Bob Bakish</p>
<p class="annual_compensation">Annual Compensation: 68.4 million</p>
<p class="company">Company: CBS</p>
<p class="rank">Rank: 5</p>
<p class="name">Name: David Rawlinson </p>
<p class="annual_compensation">Annual Compensation: $67.2 million</p>
<p class="company">Company: Qurate Retail Grou

In [191]:
## narrow to ceo divs only
exclude.find_all ("p", class_ ="name")

[<p class="name">Name: Hock E. Tan</p>,
 <p class="name">Name: Frank Bisignano</p>,
 <p class="name">Name: Michael Rapino</p>,
 <p class="name">Name: Bob Bakish</p>,
 <p class="name">Name: David Rawlinson </p>,
 <p class="name former">Ex-CEO: Henry Thompson Nicholas III</p>,
 <p class="name former">Ex-CEO: Charlie Fote</p>,
 <p class="name former">Ex-CEO: Irving Azoff </p>,
 <p class="name former">Ex-CEO: Leslie Moonves</p>,
 <p class="name former">Ex-CEO: Gregory Maffei</p>]

In [None]:
## ic all ceos and former ceos names



In [192]:
## find all former ceos only
exclude.find_all ("p", class_ ="former")

[<p class="name former">Ex-CEO: Henry Thompson Nicholas III</p>,
 <p class="name former">Ex-CEO: Charlie Fote</p>,
 <p class="name former">Ex-CEO: Irving Azoff </p>,
 <p class="name former">Ex-CEO: Leslie Moonves</p>,
 <p class="name former">Ex-CEO: Gregory Maffei</p>]

In [212]:
## find only current ceos
current_ceos = [exclude.select('p[class = "name"]')][0]
current_ceos

[<p class="name">Name: Hock E. Tan</p>,
 <p class="name">Name: Frank Bisignano</p>,
 <p class="name">Name: Michael Rapino</p>,
 <p class="name">Name: Bob Bakish</p>,
 <p class="name">Name: David Rawlinson </p>]

In [214]:
for ceo in current_ceos:
    print(ceo.get_text().replace("Name: ", ""))

Hock E. Tan
Frank Bisignano
Michael Rapino
Bob Bakish
David Rawlinson 
