Permalink
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
197 lines (152 sloc) 12.5 KB
---
title: "How Many Fucks Does Tarantino Give?"
author: "Olivia Barrows, Jojo Miller, and Jayla Nakayama"
date: "`r Sys.Date()`"
output:
rmarkdown::html_vignette:
df_print: kable
vignette: |
%\VignetteIndexEntry{How Many Fucks Does Tarantino Give?}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(warning=FALSE, message=FALSE, fig.width=7.2)
```
This vignette is based on the data collected for the FiveThirtyEight study [A Complete Catalog Of Every Time Someone Cursed Or Bled Out In A Quentin Tarantino Movie](https://fivethirtyeight.com/features/complete-catalog-curses-deaths-quentin-tarantino-films/), with a focus on the use of swear words in his films.
```{r libraries, message = FALSE, warning = FALSE}
library(fivethirtyeight)
# library(tidyverse)
library(ggplot2)
library(dplyr)
library(stringr)
library(knitr)
library(ggthemes)
# Set default number of digits printed
options(digits = 2)
```
## Points of Analysis
* comparison of total swear word usage over the course of Tarantino's career
* number of uses of the word 'fuck' and its variations, per movie
The `tarantino` dataframe in the `fivethirtyeight` package presents data relating to the amount of deaths and swear words used in each of the seven movies noted via the `movie` variable. The `profane` variable notes whether the instance being examined relates to a swear word or not, and the `word` variable gives the actual form of the swear word used. Both deaths and instances of swearing are then noted as to when exactly they occur in the movie via the `minutes_in` variable. This analysis focuses on the usage of swear words in Tarantino's movies, and does not deal with the number of deaths tallied for his works.
### Preparing the Data
For this analysis, there is first the need to create a new dataframe, which will be called `tarantino_year`. This new dataframe includes the year of release for each film. Once created, the `tarantino_year` dataframe needs to be merged with the already existing `tarantino` dataframe, so the `year` variable will be present for analysis in the `tarantino_plus_year` data frame.
```{r merge}
# Create new dataframe assigning year of release to movies
movie <- c("Reservoir Dogs", "Pulp Fiction", "Jackie Brown", "Kill Bill: Vol. 1", "Kill Bill: Vol. 2", "Inglorious Basterds", "Django Unchained")
year <- c(1992, 1994, 1997, 2003, 2004, 2009, 2012)
tarantino_year <- data_frame(movie, year)
# Combine with existing `tarantino` dataframe
tarantino_plus_year <- inner_join(x = tarantino, y = tarantino_year, by = "movie")
```
The final step in preparing the `tarantino_plus_year` dataframe for this analysis is to filter out unnecessary information, keeping only entries with the value of `TRUE` under the `profane` variable.
```{r profane}
tarantino_swears <- tarantino_plus_year %>% filter(profane == TRUE)
```
### Total Swear Word Usage Over Tarantino's Career
Using the altered `tarantino_swears` dataframe, we can see how many times curse words are used in each of the movies. To better visualize how Tarantino's usage of swearing has changed over the years, we first ordered the movies by year of release, and then created a graphic with this re-ordered data. A table is provided below, for referencing year of release.
```{r plot1, fig.height = 8}
# Ordering the movies by release year
by_year <- c("Reservoir Dogs", "Pulp Fiction", "Jackie Brown", "Kill Bill: Vol. 1", "Kill Bill: Vol. 2",
"Inglorious Basterds", "Django Unchained")
tarantino_factor <- tarantino_swears %>%
mutate(movie = factor(tarantino_swears$movie, levels = by_year))
# Plotting the amount of swear words used in each movie
ggplot(data = tarantino_factor, mapping = aes(x = movie, fill = movie)) +
geom_bar(color = "white") +
theme_fivethirtyeight() +
labs(x = "Movie", y = "Swear Count", title = "Total Swear Word Usage Per Movie") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
```
```{r echo = FALSE}
#Presenting a table of movie and year of release, for reference
tarantino_year
```
In order to better represent this data available on the profanities used in Tarantino's movies, it is beneficial to first divide the swear words used into categories via the `word` variable. For the purposes of this analysis, eight were defined and stored in the `swear_category` variable:
* swears including the word 'ass'
* swears including the word 'damn'
* swears including the word 'dick' or 'cock'
* swears including the word 'fuck'
* swears including the word 'shit', as well as 'merde'
* swears referring to damnation, and the word 'hell'
* swears with derogatory gender implications
* swears with derogatory racial implications
```{r case_when}
# Creating the categories
tarantino_swears <- tarantino_swears %>%
mutate(swear_category =
case_when(grepl("ass", word) ~ "ass",
grepl("shit|merde", word) ~ "shit",
grepl("fuck", word) ~ "fuck",
grepl("damn|hell", word) ~ "damnation",
grepl("bastard", word) ~ "bastard",
grepl("dick|cock", word) ~ "dick",
grepl("bitch|cunt|pussy|faggot|slut", word) ~ "gender",
grepl("gook|jap|jew|n-word|negro|slope|wetback|squaw", word) ~ "race"))
```
With these categories defined, we can produce a table called `Profanity_Sum` showing how often swear words of each category were used during Tarantino's movies.
``` {r Profanity_Sum}
Profanity_Sum <- tarantino_swears %>%
group_by(movie) %>%
summarize(Ass = mean(swear_category == "ass") * 100,
Shit = mean(swear_category == "shit") * 100,
Fuck = mean(swear_category == "fuck") * 100,
Dick = mean(swear_category == "dick") * 100,
Damnation = mean(swear_category == "damnation") * 100,
Bastard = mean(swear_category == "bastard") * 100,
Gender = mean(swear_category == "gender") * 100,
Race = mean(swear_category == "race") * 100,
Unspeakable = Gender + Race)
Profanity_Sum
```
This table allows us to conclude the following:
* Despite what the title would suggest, _Inglorious Basterds_ does not use the word 'bastard' at all
* Apart from _Django Unchained_, the word 'fuck' and its variations are the most commonly used swear words in Tarantino's movies
* _Django Unchained_ features `Unspeakable` as its category with the highest percentage of swear words
* Overall, swear words falling in the `Bastard` and `Dick` categories are used the least often in Tarantino's movies
You may notice the `Unspeakable` variable, which is a combination of the `gender` and `race` categories from the earlier code. It is possible to break this variable down relative to each movie, showing how often either category had swear words used.
```{r Unspeakable_Sum}
Unspeakable_Sum <- tarantino_swears %>%
group_by(movie) %>%
summarize(gen_por = mean(swear_category == "gender"),
race_por = mean(swear_category == "race")) %>%
mutate(Gender_Derogatory = gen_por * 100, Race_Derogatory = race_por * 100) %>%
select(-gen_por, -race_por)
Unspeakable_Sum
```
The resulting table allows us to conclude the following:
* _Django Unchained_ has the highest percentage of racially derogatory swear words of all seven movies
* _Kill Bill, Vol. 1_ has the highest percentage of gender derogatory swear words of all seven movies
* _Kill Bill, Vol. 1_ is the only one of Tarantino's movies to not use any racially derogatory swear words
The observation about _Django Unchained_ having the highest percentage of racially derogatory swear words used compared to not only the other movies analyzed but also in the context of the movie itself may seem unusual, but actually makes sense within the context of the movie. Because the plot takes place in a time where slavery still exists, and the titular character is also a former slave, it is easier to see why this extreme number would be present as an outlier for this particular set of tables.
Based on the above information as a whole, it is possible to see that there is a general trend of less swearing over time in Tarantino's movies, apart from _Django Unchained_. This is somewhat unexpected, as swearing has generally become more accepted over time; however, the later rise in swearing for _Django Unchained_ does better reflect this fact. The use of such a large number of racially derogatory swear words in a movie released in 2012 is also of note.
### How Many Fucks Does Tarantino Actually Give?
Looking at the dataframe `tarantino_swears`, the next question we have to ask is 'How many fucks _exactly_ does Tarantino give in each of his movies?' First, however, it is necessary to filter for only usages of the word 'fuck' and its variations.
```{r fbomb}
tarantino_fuck <- tarantino_swears %>% filter(swear_category == "fuck")
```
At this point, we are able to create a preliminary graph, which details the amount of all fucks given throughout each movie.The following information displays what this amount is, spread out over the time of the movie. We chose to divide the amount of swearing over the course of the movie in this manner because it allows us to examine what parts of each movie include the most swearing.
```{r given, fig.height=8}
ggplot(data = tarantino_fuck, mapping = aes(x = minutes_in)) +
geom_histogram(binwidth = 10, color = "white", fill = "springgreen2") + facet_wrap(~movie) +
theme_fivethirtyeight() +
labs(x = "Minutes In", y = "Fucks Given", title = "Fucks Tarantino Gives Per Movie") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
```
This can be further broken down to show what _types_ of fucks Tarantino gives.
```{r type_given, fig.height=8}
ggplot(data = tarantino_fuck, mapping = aes(x = minutes_in, fill = word)) +
geom_histogram(binwidth = 10, color = "white") + facet_wrap(~movie) +
theme_fivethirtyeight() +
labs(x = "Minutes In", y = "Fucks Given", title = "Fucks Tarantino Gives Per Movie (by Type)") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
```
From this information, it is possible to see that in terms of fucks given:
* Tarantino most often uses the variations 'fuck' and 'fucking'
* _Kill Bill: Vol. 1_ had relatively few uses of the word 'fuck' and its variations
* _Reservoir Dogs_ sees a spike in 'fuck'-related swearing at about midway through the movie
The spike in 'fuck'-related swearing about halfway through _Reservoir Dogs_ coincides with rising action of the plot nearing what would be the climax, and as such is an understandable increases. _Pulp Fiction_ also follows this, with the 'fuck'-related swearing more heavily weighted toward the end of the movie, around the time where the plot's climax would be taking place. Overall, the above plots show that 'fuck' has a number of variations which are used with varying frequency.
## Conclusions
With the above data analyzed, we can see that Quentin Tarantino has used a large number of swear words throughout his career, arguably to great effect in some cases. Although the general trend seems to be a decline in the amount of swearing present in his movies over time, his 2012 movie _Django Unchained_ broke free of that trend and also used a staggeringly large number of racially derogatory swear words. Other than this statistical outlier, it's easy to notice that the word 'fuck' and its different variations is by far the most commonly used category of swear words in most of Tarantino's movies. A very versatile word, 'fuck' can be used in a variety of ways, and it would not be much of a stretch, based on this data, to say that Tarantino has capitalized on its different uses throughout his career.
Sociologically, it is interesting to analyze the different patterns of swear word usage over time, particularly with the gender and racially derogatory categories of swear words. For the most part, Tarantino does not use many gender derogatory swear words for all of his movies apart from _Kill Bill: Vol. 1_ and _Kill Bill: Vol. 2_, gender derogatory swear words amounted to less than ten percent of the total swear words used for each movie. The two outliers, however, showed gender derogatory terms being used nearly twenty and sixteen percent of the time, respectively. Even racially derogatory swear words accounted for about ten percent or less of the swear words used in Tarantino's movies, apart from _Django Unchained_. Furthermore, viewing the word 'fuck' and its variations as a very versatile method of swearing is interesting to consider, though it's worth noting that even that amount has declined over the course of Tarantino's career.
At the end of the day, it can be said with a fair amount of confidence that Tarantino did, indeed, give quite a few fucks.