# Scraping the Street

![](images/21_jump_street.png)

One of the most important television shows of all time was **21 Jump Street**.  The show gave birth to stars like Johnny Depp, Richard Greico, and Holly Robinson Peete.  The show also spoke to the youth of the late 80's and early 90's with a crew of undercover cops tackling law breakers.  


### Wikipedia List of Guest Stars

![](images/wiki_jump.png)

Wikipedia has a page containing information on the list of guest stars for five seasons of 21 Jump Street.  Our goal is to create a table with the information on all the guest stars.  

In [8]:
import requests
from bs4 import BeautifulSoup

In [9]:
url = 'https://en.wikipedia.org/wiki/List_of_guest_stars_on_21_Jump_Street'

In [10]:
page = requests.get(url)

In [11]:
page

<Response [200]>

In [5]:
soup = BeautifulSoup(page.text, 'html.parser')

In [6]:
soup.title.text

'List of guest stars on 21 Jump Street - Wikipedia'

In [7]:
soup.title.string

'List of guest stars on 21 Jump Street - Wikipedia'

In [11]:
soup.a

<a id="top"></a>

In [12]:
soup.div

<div class="noprint" id="mw-page-base"></div>

In [13]:
soup.find_all('a')

[<a id="top"></a>,
 <a href="#mw-head">navigation</a>,
 <a href="#p-search">search</a>,
 <a class="mw-redirect" href="/wiki/Television_personality" title="Television personality">television</a>,
 <a href="/wiki/Fox_Broadcasting_Company" title="Fox Broadcasting Company">Fox</a>,
 <a href="/wiki/21_Jump_Street" title="21 Jump Street">21 Jump Street</a>,
 <a href="#Season_1"><span class="tocnumber">1</span> <span class="toctext">Season 1</span></a>,
 <a href="#Season_2"><span class="tocnumber">2</span> <span class="toctext">Season 2</span></a>,
 <a href="#Season_3"><span class="tocnumber">3</span> <span class="toctext">Season 3</span></a>,
 <a href="#Season_4"><span class="tocnumber">4</span> <span class="toctext">Season 4</span></a>,
 <a href="#Season_5"><span class="tocnumber">5</span> <span class="toctext">Season 5</span></a>,
 <a href="#See_also"><span class="tocnumber">6</span> <span class="toctext">See also</span></a>,
 <a href="#References"><span class="tocnumber">7</span> <span cl

In [14]:
all_links = soup.find_all("a")
for link in all_links:
    print(link.get("href"))

None
#mw-head
#p-search
/wiki/Television_personality
/wiki/Fox_Broadcasting_Company
/wiki/21_Jump_Street
#Season_1
#Season_2
#Season_3
#Season_4
#Season_5
#See_also
#References
/w/index.php?title=List_of_guest_stars_on_21_Jump_Street&action=edit&section=1
/wiki/Barney_Martin
/wiki/Brandon_Douglas
/w/index.php?title=Reginald_T._Dorsey&action=edit&redlink=1
/wiki/Billy_Jayne
/wiki/Steve_Antin
/wiki/Traci_Lind
/wiki/Leah_Ayres
/wiki/Geoffrey_Blake_(actor)
/wiki/Josh_Brolin
/w/index.php?title=Jamie_Bozian&action=edit&redlink=1
/wiki/John_D%27Aquino
/w/index.php?title=Troy_Byer&action=edit&redlink=1
/wiki/Lezlie_Deane
/wiki/Blair_Underwood
/wiki/Robert_Picardo
/wiki/Scott_Schwartz
/wiki/Liane_Curtis
/wiki/Byron_Thames
/wiki/Sherilyn_Fenn
/wiki/Christopher_Heyerdahl
/wiki/Kurtwood_Smith
/wiki/Sarah_G._Buxton
/wiki/Jason_Priestley
/w/index.php?title=List_of_guest_stars_on_21_Jump_Street&action=edit&section=2
/wiki/Kurtwood_Smith
/wiki/Ray_Walston
/wiki/Pauly_Shore
/wiki/Shannon_Tweed
/wiki/Lo

In [15]:
all_tables = soup.find_all('table')

In [17]:
all_tables[0].text

'\n\nActor\nCharacter\nSeason #\nEpisode #\nEpisode Title\n\n\nBarney Martin\nCharlie\n1\n1\n"Pilot"\n\n\nBrandon Douglas\nKenny Weckerle\n1\n1 & 2\n"Pilot"\n\n\nReginald T. Dorsey\nTyrell "Waxer" Thompson\n1\n1 & 2\n"Pilot"\n\n\nBilly Jayne\nMark Dorian\n1\n2\n"America, What a Town"\n\n\nSteve Antin\nStevie Delano\n1\n2\n"America, What a Town"\n\n\nTraci Lind\nNadia\n1\n2\n"America, What a Town"\n\n\nLeah Ayres\nSusan Chadwick\n1\n3\n"Don\'t Pet the Teacher"\n\n\nGeoffrey Blake\nJeffrey Stone\n1\n3\n"Don\'t Pet the Teacher"\n\n\nJosh Brolin\nTaylor Rolator\n1\n4\n"My Future\'s So Bright, I Gotta Wear Shades"\n\n\nJamie Bozian\nKurt Niles\n1\n4\n"My Future\'s So Bright, I Gotta Wear Shades"\n\n\nJohn D\'Aquino\nVinny Morgan\n1\n4\n"My Future\'s So Bright, I Gotta Wear Shades"\n\n\nTroy Byer\nPatty Blatcher\n1\n5\n"The Worst Night of Your Life"\n\n\nLezlie Deane\nJane Kinney\n1\n5\n"The Worst Night of Your Life"\n\n\nBlair Underwood\nReginald Brooks\n1\n6\n"Gotta Finish the Riff"\n\n\nR

In [18]:
right_tables = soup.find_all('table', class_='wikitable')

In [19]:
len(right_tables)

5

In [137]:
right_tables[0]

<table class="wikitable">
<tr>
<th>Actor</th>
<th>Character</th>
<th>Season #</th>
<th>Episode #</th>
<th>Episode Title</th>
</tr>
<tr>
<td><a href="/wiki/Barney_Martin" title="Barney Martin">Barney Martin</a></td>
<td>Charlie</td>
<td>1</td>
<td>1</td>
<td>"Pilot"</td>
</tr>
<tr>
<td><a href="/wiki/Brandon_Douglas" title="Brandon Douglas">Brandon Douglas</a></td>
<td>Kenny Weckerle</td>
<td>1</td>
<td>1 &amp; 2</td>
<td>"Pilot"</td>
</tr>
<tr>
<td><a class="new" href="/w/index.php?title=Reginald_T._Dorsey&amp;action=edit&amp;redlink=1" title="Reginald T. Dorsey (page does not exist)">Reginald T. Dorsey</a></td>
<td>Tyrell "Waxer" Thompson</td>
<td>1</td>
<td>1 &amp; 2</td>
<td>"Pilot"</td>
</tr>
<tr>
<td><a href="/wiki/Billy_Jayne" title="Billy Jayne">Billy Jayne</a></td>
<td>Mark Dorian</td>
<td>1</td>
<td>2</td>
<td>"America, What a Town"</td>
</tr>
<tr>
<td><a href="/wiki/Steve_Antin" title="Steve Antin">Steve Antin</a></td>
<td>Stevie Delano</td>
<td>1</td>
<td>2</td>
<td>"America

In [41]:
right_tables[0].find_all('tr')[0].text

'\nActor\nCharacter\nSeason #\nEpisode #\nEpisode Title\n'

In [46]:
right_tables[0].find_all('tr')[3].text

'\nReginald T. Dorsey\nTyrell "Waxer" Thompson\n1\n1 & 2\n"Pilot"\n'

In [66]:
for row in right_tables[0].find_all('tr'):
    cells = row.find_all('td')

In [67]:
cells

[<td><a href="/wiki/Jason_Priestley" title="Jason Priestley">Jason Priestley</a></td>,
 <td>Tober</td>,
 <td>1</td>,
 <td>12</td>,
 <td>"Mean Streets and Pastel Houses"</td>]

In [76]:
for i in range(5):
    for row in right_tables[i].find_all('tr'):
        cells = row.find_all('td')

In [77]:
cells[0].text

'Jada Pinkett Smith'

In [79]:
cells[1].text

'Nicole'

In [63]:
act

['\nBrigitta Dau\nClaire\n5\n3\n"Buddy System"\n']

In [89]:
right_tables[0].find_all('td')[0].text

'Barney Martin'

In [90]:
right_tables[0].find_all('td')[1].text

'Charlie'

In [91]:
right_tables[0].find_all('td')[2].text

'1'

In [92]:
right_tables[0].find_all('td')[3].text

'1'

In [93]:
right_tables[0].find_all('td')[4].text

'"Pilot"'

In [96]:
right_tables[0].find_all('td')[5].text

'Brandon Douglas'

In [97]:
len(right_tables[0].find_all('td'))

120

In [118]:
a = []

for j in range(120):
    items = right_tables[0].find_all('td')[j].text
    a.append(items)

In [120]:
a[:20]

['Barney Martin',
 'Charlie',
 '1',
 '1',
 '"Pilot"',
 'Brandon Douglas',
 'Kenny Weckerle',
 '1',
 '1 & 2',
 '"Pilot"',
 'Reginald T. Dorsey',
 'Tyrell "Waxer" Thompson',
 '1',
 '1 & 2',
 '"Pilot"',
 'Billy Jayne',
 'Mark Dorian',
 '1',
 '2',
 '"America, What a Town"']

In [122]:
a[::5]

['Barney Martin',
 'Brandon Douglas',
 'Reginald T. Dorsey',
 'Billy Jayne',
 'Steve Antin',
 'Traci Lind',
 'Leah Ayres',
 'Geoffrey Blake',
 'Josh Brolin',
 'Jamie Bozian',
 "John D'Aquino",
 'Troy Byer',
 'Lezlie Deane',
 'Blair Underwood',
 'Robert Picardo',
 'Scott Schwartz',
 'Liane Curtis',
 'Byron Thames',
 'Sherilyn Fenn',
 'Christopher Heyerdahl',
 'Kurtwood Smith',
 'David Raynr',
 'Sarah G. Buxton',
 'Jason Priestley']

In [123]:
actors = a[::5]
character = a[1::5]
season = a[2::5]
episode = a[3::5]
title = a[4::5]

In [124]:
title[:4]

['"Pilot"', '"Pilot"', '"Pilot"', '"America, What a Town"']

In [134]:
import pandas as pd

In [129]:
df = pd.DataFrame()

In [132]:
df['Actors'] = actors
df['Character'] = character
df['Season'] = season
df['Episode'] = episode
df['Title'] = title

In [133]:
df.head()

Unnamed: 0,Actors,Character,Season,Episode,Title
0,Barney Martin,Charlie,1,1,"""Pilot"""
1,Brandon Douglas,Kenny Weckerle,1,1 & 2,"""Pilot"""
2,Reginald T. Dorsey,"Tyrell ""Waxer"" Thompson",1,1 & 2,"""Pilot"""
3,Billy Jayne,Mark Dorian,1,2,"""America, What a Town"""
4,Steve Antin,Stevie Delano,1,2,"""America, What a Town"""


In [135]:
df.shape

(24, 5)

Can you get me the other seasons?

In [136]:
df["Season"].unique

<bound method Series.unique of 0     1
1     1
2     1
3     1
4     1
5     1
6     1
7     1
8     1
9     1
10    1
11    1
12    1
13    1
14    1
15    1
16    1
17    1
18    1
19    1
20    1
21    1
22    1
23    1
Name: Season, dtype: object>

### Problem

Continue to add other 4 seasons, giving a complete table with all guest stars for five seasons of 21 Jump Street.