# Attempting to merge Emmy Award data with our datasets

In [1]:
library(ggplot2)
library(dplyr)
library(tidyverse)
library(stringr)

Registered S3 methods overwritten by 'ggplot2':
  method         from 
  [.quosures     rlang
  c.quosures     rlang
  print.quosures rlang

Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union

Registered S3 method overwritten by 'rvest':
  method            from
  read_xml.response xml2
── Attaching packages ─────────────────────────────────────── tidyverse 1.2.1 ──
✔ tibble  3.0.4     ✔ purrr   0.3.2
✔ tidyr   0.8.3     ✔ stringr 1.4.0
✔ readr   1.3.1     ✔ forcats 0.4.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()


## Import datasets

In [2]:
title_basics=read.delim("./../dataset/filtered.title.basics.tsv",header=TRUE)

In [3]:
emmy_awards = read.csv("./../dataset/the_emmy_awards.csv",header=TRUE) 

In [4]:
head(title_basics)

tconst,titleType,primaryTitle,originalTitle,isAdult,startYear,endYear,runtimeMinutes,genres
tt0000001,short,Carmencita,Carmencita,0,1894,\N,1,"Documentary,Short"
tt0000002,short,Le clown et ses chiens,Le clown et ses chiens,0,1892,\N,5,"Animation,Short"
tt0000003,short,Pauvre Pierrot,Pauvre Pierrot,0,1892,\N,4,"Animation,Comedy,Romance"
tt0000004,short,Un bon bock,Un bon bock,0,1892,\N,12,"Animation,Short"
tt0000005,short,Blacksmith Scene,Blacksmith Scene,0,1893,\N,1,"Comedy,Short"
tt0000006,short,Chinese Opium Den,Chinese Opium Den,0,1894,\N,1,Short


In [5]:
head(emmy_awards)

id,startYear,category,primaryTitle,staff,company,producer,win
1,2019,Outstanding Character Voice-Over Performance,The Simpsons,"Hank Azaria, as Moe, Carl, Duffman, Kirk",FOX,Gracie Films in association with 20th Century Fox Television,False
2,2019,Outstanding Character Voice-Over Performance,Family Guy,"Alex Borstein, as Lois Griffin, Tricia Takanawa",FOX,20th Century Fox Television,False
3,2019,Outstanding Character Voice-Over Performance,When You Wish Upon A Pickle: A Sesame Street Special,"Eric Jacobson, as Bert, Grover, Oscar",HBO,Sesame Street Workshop,False
4,2019,Outstanding Character Voice-Over Performance,F Is For Family,"Kevin Michael Richardson, as Rosie",Netflix,Wild West Television in association with Gaumont Television,False
5,2019,Outstanding Production Design For A Narrative Contemporary Program (One Hour Or More),Escape At Dannemora,"Mark Ricker, Production Designer; James Truesdale, Art Director; Cherish M. Hale, Set Decorator",Showtime,"Red Hour, Busyhands, The White Mountain Company, Michael De Luca Productions, BZ Entertainment",False
6,2019,Outstanding Production Design For A Narrative Contemporary Program (One Hour Or More),Killing Eve,"Laurence Dorman, Production Designer; Beckie Harvey, Art Director; Linda Wilson, Set Decorator",BBC America,Sid Gentle Films Ltd.,False


In [6]:
tail(emmy_awards)

Unnamed: 0,id,startYear,category,primaryTitle,staff,company,producer,win
21498,21498,1949,MOST POPULAR TELEVISION PROGRAM,Armchair Detective,"n/a,",KTLA,,False
21499,21499,1949,MOST POPULAR TELEVISION PROGRAM,Don Lee Music Hall,"n/a,",KTSL,,False
21500,21500,1949,MOST POPULAR TELEVISION PROGRAM,Felix De Cola Show,"n/a,",KTLA,,False
21501,21501,1949,SPECIAL AWARD,Louis McManus For His Original Design of the Emmy,"n/a,",,,True
21502,21502,1949,STATION AWARD,KTLA for Outstanding overall achievement in 1948,"n/a,",KTLA,,True
21503,21503,1949,TECHNICAL AWARD,Charles Mesak/Don Lee TV for Phasefader - In Recog,"n/a,",,,True


In [7]:
emmy_awards$producer=NULL

In [8]:
emmy_awards$id=NULL

In [9]:
emmy_awards$staff=NULL

In [10]:
emmy_awards$company=NULL

In [11]:
emmy_awards$staff=NULL

In [12]:
head(emmy_awards)

startYear,category,primaryTitle,win
2019,Outstanding Character Voice-Over Performance,The Simpsons,False
2019,Outstanding Character Voice-Over Performance,Family Guy,False
2019,Outstanding Character Voice-Over Performance,When You Wish Upon A Pickle: A Sesame Street Special,False
2019,Outstanding Character Voice-Over Performance,F Is For Family,False
2019,Outstanding Production Design For A Narrative Contemporary Program (One Hour Or More),Escape At Dannemora,False
2019,Outstanding Production Design For A Narrative Contemporary Program (One Hour Or More),Killing Eve,False


In [13]:
unique(emmy_awards$win)

In [14]:
new_awards = emmy_awards

In [15]:
class(new_awards$win)

In [16]:
new_awards$win = as.integer(as.logical(new_awards$win))

In [17]:
unique(new_awards$win)

In [18]:
head(new_awards)

startYear,category,primaryTitle,win
2019,Outstanding Character Voice-Over Performance,The Simpsons,0
2019,Outstanding Character Voice-Over Performance,Family Guy,0
2019,Outstanding Character Voice-Over Performance,When You Wish Upon A Pickle: A Sesame Street Special,0
2019,Outstanding Character Voice-Over Performance,F Is For Family,0
2019,Outstanding Production Design For A Narrative Contemporary Program (One Hour Or More),Escape At Dannemora,0
2019,Outstanding Production Design For A Narrative Contemporary Program (One Hour Or More),Killing Eve,0


In [19]:
title_basics$endYear=NULL

## Testing things

In [20]:
str(new_awards)

'data.frame':	21503 obs. of  4 variables:
 $ startYear   : int  2019 2019 2019 2019 2019 2019 2019 2019 2019 2019 ...
 $ category    : Factor w/ 1333 levels "Achievements In Educational Television",..: 451 451 451 451 1001 1001 1001 1001 1002 1002 ...
 $ primaryTitle: Factor w/ 5628 levels "\"A Streetcar Named Desire\" From The San Francisco",..: 4963 1610 5484 1588 1554 2453 3246 5084 1767 4720 ...
 $ win         : int  0 0 0 0 0 0 0 0 0 0 ...


In [21]:
summary(as.factor(new_awards$win))

In [22]:
merging = merge(title_basics,new_awards,by=c("primaryTitle","startYear"))

In [23]:
head(merging)

primaryTitle,startYear,tconst,titleType,originalTitle,isAdult,runtimeMinutes,genres,category,win
3,2005,tt0485074,short,3,0,22,Short,"Outstanding Sound Editing For A Miniseries, Movie Or A Special",0
5 American Kids - 5 American Handguns,1995,tt0112261,tvMovie,5 American Kids - 5 American Handguns,0,60,Documentary,OUTSTANDING INFORMATIONAL SPECIAL,0
6 Rms Riv Vu,1974,tt0397022,tvMovie,6 Rms Riv Vu,0,110,Comedy,BEST LEAD ACTRESS IN A DRAMA,0
6 Rms Riv Vu,1974,tt0397022,tvMovie,6 Rms Riv Vu,0,110,Comedy,OUTSTANDING LIMITED SERIES,0
6 Rms Riv Vu,1974,tt0397022,tvMovie,6 Rms Riv Vu,0,110,Comedy,BEST LEAD ACTOR IN A DRAMA,0
61*,2001,tt0250934,tvMovie,61*,0,129,"Biography,Drama,History",Outstanding Lead Actor In A Miniseries Or A Movie,0


In [24]:
tail(merging)

Unnamed: 0,primaryTitle,startYear,tconst,titleType,originalTitle,isAdult,runtimeMinutes,genres,category,win
1906,You Don't Know Jack,2010,tt1132623,tvMovie,You Don't Know Jack,0,134,"Biography,Drama",Outstanding Supporting Actress in a Miniseries or Movie,0
1907,You Don't Know Jack,2010,tt1132623,tvMovie,You Don't Know Jack,0,134,"Biography,Drama",Outstanding Hairstyling For A Miniseries Or A Movie,0
1908,You Don't Know Jack,2010,tt1132623,tvMovie,You Don't Know Jack,0,134,"Biography,Drama",Outstanding Supporting Actress in a Miniseries or Movie,0
1909,You Don't Know Jack,2010,tt1132623,tvMovie,You Don't Know Jack,0,134,"Biography,Drama","Outstanding Costumes For A Miniseries, Movie Or A Special",0
1910,Young Catherine,1991,tt0103311,tvMovie,Young Catherine,0,150,"Biography,Drama,History",OUTSTANDING ACHIEVEMENT IN COSTUMING FOR A MINISERIES or a special,0
1911,Young Catherine,1991,tt0103311,tvMovie,Young Catherine,0,150,"Biography,Drama,History",OUTSTANDING SUPPORTING ACTRESS IN A MINISERIES OR SPECIAL,0


In [25]:
tail(title_basics)

Unnamed: 0,tconst,titleType,primaryTitle,originalTitle,isAdult,startYear,runtimeMinutes,genres
1443754,tt9916724,short,Hay Que Ser Paciente,Hay Que Ser Paciente,0,2015,3,"Documentary,Short"
1443755,tt9916730,movie,6 Gunn,6 Gunn,0,2017,116,\N
1443756,tt9916754,movie,Chico Albuquerque - Revelações,Chico Albuquerque - Revelações,0,2013,49,Documentary
1443757,tt9916756,short,Pretty Pretty Black Girl,Pretty Pretty Black Girl,0,2019,\N,Short
1443758,tt9916764,short,38,38,0,2018,\N,Short
1443759,tt9916856,short,The Wind,The Wind,0,2015,27,Short


In [26]:
summary(merging)

        primaryTitle    startYear           tconst       titleType  
 Lost         :  89   2018   : 132   tt10913716:  19   movie  :321  
 The Voice    :  72   2016   : 122   tt0758751 :  17   short  :785  
 Survivor     :  51   2012   : 108   tt2087883 :  17   tvMovie:805  
 Homeland     :  50   2019   : 105   tt2091354 :  17                
 Modern Family:  48   2017   :  99   tt9731004 :  17                
 Friends      :  42   2013   :  97   tt0423510 :  16                
 (Other)      :1559   (Other):1248   (Other)   :1808                
   originalTitle     isAdult        runtimeMinutes
 Lost     :  89   Min.   :0.00000   \\N    : 250  
 The Voice:  59   1st Qu.:0.00000   100    :  61  
 Survivor :  51   Median :0.00000   5      :  58  
 Friends  :  42   Mean   :0.00157   6      :  56  
 Broken   :  40   3rd Qu.:0.00000   120    :  50  
 Mom      :  40   Max.   :1.00000   96     :  48  
 (Other)  :1590                     (Other):1388  
                     genres    
 Drama,S

In [27]:
summary(as.factor(merging$win))

In [28]:
merging

primaryTitle,startYear,tconst,titleType,originalTitle,isAdult,runtimeMinutes,genres,category,win
3,2005,tt0485074,short,3,0,22,Short,"Outstanding Sound Editing For A Miniseries, Movie Or A Special",0
5 American Kids - 5 American Handguns,1995,tt0112261,tvMovie,5 American Kids - 5 American Handguns,0,60,Documentary,OUTSTANDING INFORMATIONAL SPECIAL,0
6 Rms Riv Vu,1974,tt0397022,tvMovie,6 Rms Riv Vu,0,110,Comedy,BEST LEAD ACTRESS IN A DRAMA,0
6 Rms Riv Vu,1974,tt0397022,tvMovie,6 Rms Riv Vu,0,110,Comedy,OUTSTANDING LIMITED SERIES,0
6 Rms Riv Vu,1974,tt0397022,tvMovie,6 Rms Riv Vu,0,110,Comedy,BEST LEAD ACTOR IN A DRAMA,0
61*,2001,tt0250934,tvMovie,61*,0,129,"Biography,Drama,History",Outstanding Lead Actor In A Miniseries Or A Movie,0
61*,2001,tt0250934,tvMovie,61*,0,129,"Biography,Drama,History","Outstanding Sound Editing For A Miniseries, Movie Or A Special",1
61*,2001,tt0250934,tvMovie,61*,0,129,"Biography,Drama,History","Outstanding Casting For A Miniseries, Movie Or A Special",1
61*,2001,tt0250934,tvMovie,61*,0,129,"Biography,Drama,History","OUTSTANDING SINGLE-CAMERA PICTURE EDITING FOR A MINISERIES, MOVIE OR A SPECIAL",0
61*,2001,tt0250934,tvMovie,61*,0,129,"Biography,Drama,History","OUTSTANDING DIRECTING FOR A MINISERIES, MOVIE OR A SPECIAL",0
