Skip to content
Branch: master
Find file Copy path
Find file Copy path
1 contributor

Users who have contributed to this file

63 lines (42 sloc) 2.67 KB

Schrute R package - image of a beet

The Office - Words and Numbers

The data this week comes from the schrute R package for The Office transcripts and for IMDB ratings of each episode.

If you'd like to use the schrute R package for ALL the lines/dialogue from the show - please install it from CRAN via install.packages("schrute"). A quick example from the vignette can be found here.

If you want to do text analysis - make sure to check out the tidytext package - a vignette can be found here and the Tidy Text Mining with R book can be found freely online here.

Lastly - the pudding analyzed The Office dialogue across a few charts - their article is here.

Get the data here

# Get the Data

office_ratings <- readr::read_csv('')

# Or read in with tidytuesdayR package (
# PLEASE NOTE TO USE 2020 DATA YOU NEED TO USE tidytuesdayR version ? from GitHub

# Either ISO-8601 date or year/week works!

# Install via devtools::install_github("thebioengineer/tidytuesdayR")

tuesdata <- tidytuesdayR::tt_load('2020-03-17')
tuesdata <- tidytuesdayR::tt_load(2020, week = 12)

office_ratings <- tuesdata$office_ratings

Data Dictionary


variable class description
season double Season number
episode double Episode number
title character Title of episode
imdb_rating double IMDB Rating (10 is best)
total_votes double Total votes by users
air_date date Original air date

schrute data

variable class description
index integer Index
season character Season Number
episode character Season episode
episode_name character Episode title
director character Episode Director
writer character Episode Writer
character character Episode Character
text character Dialogue as text
text_w_direction character Dialogue as text with direction

Cleaning Script

No cleaning this week!

You can’t perform that action at this time.