# Best Scenic Design

#### Author: Obuchi Adikema
#### Date: June 27, 2019

## Introduction

Broadway. 

The flash, the style, the elegance. Some of the most important people in making Broadway plays and musicals the sensations that they are, are scenic designers. 

Every year, a Tony Award is presented for Best Scenic Design in a play or musical.
Since the first Tony Awards in 1947, each year a Tony Award was presented for Best Scenic Design until the categories changed in 2005. From then on, two awards have been presented: Best Scenic Design in a Play and Best Scenic Design in a Musical. 

### Who has been awarded and nominated for the most Tony Awards for Best Scenic Design? 
Who historically, has been the best scenic designer? Or at least who has been awarded and nominated for the most Tony Awards in this category. This analysis will use data from the Internet Broadway Database (IBDB). The database contains information on designers, shows, and awards received from 1947-2004.

For this project, I will be building the following table: 
Designers.csv

## Import Primary Packages

In [1]:
# Import all necessary packages
import lxml.html
import pandas as pd
import re
import requests

## To answer my question, I must scrape the data from IBDB.

In [2]:
# Request String Found on IBDB
data = "__RequestVerificationToken=IYWa4XVN6rW9Ril9alhCIsQ3cgg1FCl-QGGO6MH6OYwxC0xzddPFruiG_OFdgDpA1YuNjN6IUbs5yF3cP8HutxKctpHF80UfaUlxIs-7pg41&Keyword=&FuncNo=Designer&Gender=&birthstartmonth=01&birthstartday=01&birthstartyear=1950&birthendmonth=01&birthendday=01&birthendyear=1960&birthcity=&BirthState=&deathstartmonth=&deathstartday=&deathstartyear=&deathendmonth=&deathendday=&deathendyear=&deathcity=&DeathState="

In [3]:
data = {
    "FuncNo": "Designer",
    "birthstartmonth": "01",
    "birthstartday": "01",
    "birthstartyear": "1800",
    "birthendmonth": "06",
    "birthendday": "06",
    "birthendyear": "2005"
}

r = requests.post("http://www.ibdb.com/cast-staff", data=data)

In [4]:
raw_string = lxml.html.fromstring(r.text)

Let's try to get the names of all of the designers. We have access to a database of all of the desingers, not just the scenic ones. That's fine because the we will get rid of all of the extra names when we cross reference with the list of Tony scenic design winners. 

First, we'll get part of the URLs for each designer's page in the database.

In [5]:
path = "//div[contains(@class, 'person-info')]/div//a/@href"

designers = raw_string.xpath(path)
designers

['/broadway-cast-staff/james-acheson-70969',
 '/broadway-cast-staff/kevin-adams-25504',
 '/broadway-cast-staff/adrian-24600',
 '/broadway-cast-staff/ray-aghayan-24603',
 '/broadway-cast-staff/christopher-akerlind-25507',
 '/broadway-cast-staff/theoni-v-aldredge-24606',
 '/broadway-cast-staff/cris-alexander-69325',
 '/broadway-cast-staff/ren-allio-24612',
 '/broadway-cast-staff/ralph-alswang-14031',
 '/broadway-cast-staff/david-amram-11309',
 '/broadway-cast-staff/john-murray-anderson-6874',
 '/broadway-cast-staff/mariano-andreu-24619',
 '/broadway-cast-staff/waldo-angelo-24622',
 '/broadway-cast-staff/kathleen-ankers-24623',
 '/broadway-cast-staff/michael-annals-24628',
 '/broadway-cast-staff/theodore-antoniou-73379',
 '/broadway-cast-staff/emile-ardolino-88451',
 '/broadway-cast-staff/will-steven-armstrong-24634',
 '/broadway-cast-staff/boris-aronson-24635',
 '/broadway-cast-staff/martin-aronstein-25526',
 '/broadway-cast-staff/colleen-atwood-493877',
 '/broadway-cast-staff/lemuel-aye

In [6]:
# Find the names of the designers
designer_names = []

def clean_names(list):
  new_list = []
  for element in list:
    step_1 = element.rsplit("/")[2]
    step_2 = ''.join([i for i in step_1 if not i.isdigit()]) 
    step_3 = step_2.replace("-", " ")
    new_list.append(step_3.strip())
  return new_list

designerNames = clean_names(designers)

Let's create a database to store all of these names.

In [7]:
df1 = pd.DataFrame(designerNames, columns =['Designer Names'])
df1.head()

Unnamed: 0,Designer Names
0,james acheson
1,kevin adams
2,adrian
3,ray aghayan
4,christopher akerlind


Now it would be nice to get all of the plays that each designer has work on.

In [8]:
path = "//div[@id='broadway']//a/@href"
plays = []
for x in designers:
  request_string = "http://www.ibdb.com" + x
  r2 = requests.get(request_string)
  html_tree = lxml.html.fromstring(r2.text)
  plays.append(html_tree.xpath(path))
plays 


[['/broadway-production/hamlet-4299'],
 ['/broadway-production/the-cher-show-518460',
  '/broadway-production/head-over-heels-517717',
  '/broadway-production/spongebob-squarepants-515039',
  '/broadway-production/the-terms-of-my-surrender-514772',
  '/broadway-production/hedwig-and-the-angry-inch-495294',
  '/broadway-production/hands-on-a-hardbody-493526',
  '/broadway-production/on-a-clear-day-you-can-see-forever-490538',
  '/broadway-production/man-and-boy-490386',
  '/broadway-production/hair-490433',
  '/broadway-production/everyday-rapture-487670',
  '/broadway-production/american-idiot-485578',
  '/broadway-production/next-to-normal-483136',
  '/broadway-production/hair-481766',
  '/broadway-production/passing-strange-475091',
  '/broadway-production/the-39-steps-469215',
  '/broadway-production/spring-awakening-448811',
  '/broadway-production/latinologues-401915',
  '/broadway-production/the-good-body-388460',
  '/broadway-production/sexaholix-13581',
  '/broadway-production/

In [9]:
def clean_plays(list):
    inner_list = []
    outer_list = []
    for designer_plays in list:
        for play in designer_plays:
            playNameAlmost = re.match("/[\w\-\.]+/(.*)", play)
            playNameAlmost2 = re.sub("-", " ", playNameAlmost.group(1))
            playName = re.match("[a-z, " "]*", playNameAlmost2)
            inner_list.append(playName.group().strip())
        outer_list.append(inner_list)
        inner_list = []
    return(outer_list)

playNames = clean_plays(plays)

Let's add the plays that each desinger has done to that database.

In [10]:
df1["Plays"] = playNames
df1.head()

Unnamed: 0,Designer Names,Plays
0,james acheson,[hamlet]
1,kevin adams,"[the cher show, head over heels, spongebob squ..."
2,adrian,"[camelot, obsession, in bed we cry, slightly s..."
3,ray aghayan,"[lorelei, on the town, applause, eddie fisher ..."
4,christopher akerlind,"[time and the conways, indecent, waitress, the..."


In [11]:
# Count the number of plays 
def count_shows(grid):
  count_list = []
  for row in grid:
    count_list.append(len(row))
  return count_list

numberOfPlays = count_shows(plays)


In [12]:
df1["Number of Plays"] = numberOfPlays
df1.head()

Unnamed: 0,Designer Names,Plays,Number of Plays
0,james acheson,[hamlet],1
1,kevin adams,"[the cher show, head over heels, spongebob squ...",25
2,adrian,"[camelot, obsession, in bed we cry, slightly s...",8
3,ray aghayan,"[lorelei, on the town, applause, eddie fisher ...",6
4,christopher akerlind,"[time and the conways, indecent, waitress, the...",23


Now that we have all of the designers, and the plays that they have contributed to, we can check if those designers have had any Tony nods. 

In [13]:
award_data = "__RequestVerificationToken=CmELdELi-0U39OV1eQDJpJhpQUkKXZvq3cewhn-R4sDd5v1-1963vBJ7GrJ3sYrgpIITRnw8nzuFT8LHRvSJ2IvftGWMixu5hWvpdKDzIlw1"

In [14]:
award_data_2005 = {
    "AwdAliasNo": "1001",
    "Year": "",
    "AwdCatNo": "1017"
}

r3 = requests.post("http://www.ibdb.com/awards", data=award_data_2005) 

In [15]:
# Scenic design Tony winners and nominees up to 2005
path = "//div[contains(string(), 'Winner')]//preceding-sibling::a[contains(@href, 'cast-staff')]/@href"

html_tree2 = lxml.html.fromstring(r3.text)
winners_nominees = html_tree2.xpath(path)
winnerNomineeNames = clean_names(winners_nominees)
winnerNomineeNames

['eugene lee',
 'robert brill',
 'tom pye',
 'ralph funicello',
 'catherine martin',
 'john lee beatty',
 'david rockwell',
 'santo loquasto',
 'tim hatley',
 'douglas w schmidt',
 'daniel ostling',
 'john lee beatty',
 'robin wagner',
 'douglas w schmidt',
 'heidi ettinger',
 'bob crowley',
 'bob crowley',
 'robin wagner',
 'thomas lynch',
 'tony walton',
 'richard hoover',
 'riccardo hernndez',
 'bob crowley',
 'bob crowley',
 'richard hudson',
 'eugene lee',
 'bob crowley',
 'stewart laing',
 'gw mercier',
 'julie taymor',
 'tony walton',
 'john lee beatty',
 'brian thomson',
 'john lee beatty',
 'anthony ward',
 'scott bradley',
 'john napier',
 'mark thompson',
 'stephen brimson lewis',
 'john lee beatty',
 'bob crowley',
 'ian macneil',
 'peter j davison',
 'tony walton',
 'john arnone',
 'robin wagner',
 'jerome sirlin',
 'john lee beatty',
 'tony walton',
 'john lee beatty',
 'joe vanek',
 'robin wagner',
 'heidi landesman',
 'richard hudson',
 'john napier',
 'tony walton',
 '

## Calculate which designers have won Tony's for Best Scenic Design and how many they have won.

In [16]:
tonyAwardWN = []
numberOfAwards = []
tony = 0
awardCounter = 0

for designer in designerNames: 
    for winner in winnerNomineeNames:
        if designer == winner:
            tony = 1
            awardCounter = awardCounter + 1
    tonyAwardWN.append(tony)
    numberOfAwards.append(awardCounter)
    tony = 0
    awardCounter = 0

In [17]:
df1["Tony?"] = tonyAwardWN
df1["Number of Tony Nominations/Wins"] = numberOfAwards
df1.head()

Unnamed: 0,Designer Names,Plays,Number of Plays,Tony?,Number of Tony Nominations/Wins
0,james acheson,[hamlet],1,0,0
1,kevin adams,"[the cher show, head over heels, spongebob squ...",25,0,0
2,adrian,"[camelot, obsession, in bed we cry, slightly s...",8,0,0
3,ray aghayan,"[lorelei, on the town, applause, eddie fisher ...",6,0,0
4,christopher akerlind,"[time and the conways, indecent, waitress, the...",23,0,0


In [18]:
df1.to_csv("Designers.csv")

## Now, let's finally find out which scenic designer has received the most nods and wins from The Tony Awards. 

In [28]:
greatestWinner = designerNames[numberOfAwards.index(max(numberOfAwards))]
mostAwards = max(numberOfAwards)
totalPlays = numberOfPlays[numberOfAwards.index(max(numberOfAwards))]


print("The designer with the most Tony Awards and nominations is "+ str(greatestWinner) + " with a combination of "
     + str(mostAwards) + " wins and nominations for Best Scenic Design. They have been credited for work on " 
      + str(totalPlays) + " plays. ")


The designer with the most Tony Awards and nominations is oliver smith with a combination of 23 wins and nominations for Best Scenic Design. They have been credited for work on 138 plays. 


## References

[Internet Broadway Database](http://www.ibdb.com)