<h1><center>FEMALE REPRESENTATION IN MOVIES</center></h1>

![title](./movie_banners.png)

## Introduction

### Motivation

Since movies have become so instrumental in influencing viewers’ opinions, it is crucial to ensure that the messages they send across, are the sort we want to see in our society. As we read more about movies and social issues related to them, we grew increasingly interested in gender bias and how it is monitored and measured in the film industry.

We found some [shocking statistics](https://www.huffingtonpost.com/soraya-chemaly/20-mustknow-facts-about-g_b_5869564.html). In 2014, globally:

* There were __2.24 male characters__ for every __1 female character__.
* __23.3%__ of films surveyed had a female lead or co-lead.
* Females comprised __7% of directors__, __19.7% of writers__, and __22.7% of producers__.

### The Bechdel Test
More research led us to a metric known as the Bechdel Test. The Bechdel Test is a test originally proposed in 1985 by Alison Bechdel that is used to "grade" movies on their representation of women. 

The movie needs to pass the following criteria
1. Have at least two named women in it
2. Who talk to each other
3. About something besides a man

A film is given a score between 0-3 based on how many of the Bechdel Test criteria it satisfies. For example, a film with no women characters gets a score of 0, a film with two named women who do not speak to each other gets a 1, and a film with at least two women who talk to each other about a man would receive a 2. 

The test itself has been in question as a metric for gender bias - is it accurate, sufficient and indicative of equal representation of women in film? 

For example, we found it interesting that the movie Gravity (starring Sandra Bullock) fails the test, even though Sandra Bullock, the protagonist, has an extremely developed plotline and background. The issue was that there were no other women in the movie, and so there was no way it would satisfy the Bechdel Test criteria. On the other hand, the blockbuster Titanic is a movie about Kate Winslet's journey and her growing into herself, but the movie passes because of conversations between her mother and her friends gossiping about another female character. We felt that this movie passed for the _wrong_ reasons, because the scene for which it passed the test involved demeaning a woman.

The __two key questions__ we wanted to answer for our project were: 
1. What makes a movie likely to pass or fail the Bechdel test? 
2. Is the Bechdel test a good metric for evaluating female representation?

In [1]:
import csv
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

## Dataset
###  Data Collection

We found a [website](https://bechdeltest.com/) with a crowd-sourced list of movies and their Bechdel test scores. We used their API to create a CSV file of ~7500 movies and their Bechdel test scores. The API also provided each movie's IMDB ID, the unique identifier for each movie on [IMDB.com](http://www.imdb.com). This was useful for combining this data with other information about the movies.

Since IMDB does not provide a public API, we used an API provided by [The Movie Database](https://www.themoviedb.org/?language=en) (TMDB) to get details about each movie's production and success. We used the API to connect a movie's IMDB ID to its TMDB ID, and then sent requests to endpoints to get each movie's details, credits, and similar movies that TMDB would recommend to someone who liked it. We then went through each movie's first ten cast members, directors, and writers and send requests to obtain their details. 

Our dataset about movies is hosted [here](https://github.com/shelly/pds-bechdel-test/blob/master/movies.csv), generated by our code available [here](https://github.com/shelly/pds-bechdel-test/blob/master/get_movies.py).

Our dataset about people is hosted [here](https://github.com/shelly/pds-bechdel-test/blob/master/people.csv), generated by our code available [here](https://github.com/shelly/pds-bechdel-test/blob/master/get_people.py).   

In [2]:
# Load the movie data 
movie_data = pd.read_csv("https://raw.githubusercontent.com/shelly/pds-bechdel-test/master/movies.csv")
movie_data["Crew"] = movie_data["Crew"].apply(eval)
movie_data.set_index("TMDB_ID").loc[[268896, 284054, 49047, 597, 15121]]

Unnamed: 0_level_0,Unnamed: 0,Title,IMDB_ID,Year,Bechdel_Rating,Budget,Overview,Popularity,Revenue,Genres,Cast,Crew,Recommendations
TMDB_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
268896,7652,Pacific Rim: Uprising,2557478,2018,3,150000000,It has been ten years since The Battle of the ...,39.495549,286536960,"[{'id': 28, 'name': 'Action'}, {'id': 14, 'nam...","[{'id': 236695, 'order': 0, 'character': 'Jake...","[{'id': 10828, 'department': 'Production', 'jo...","[333339, 338970, 299536, 427641, 284054]"
284054,7641,Black Panther,1825683,2018,3,200000000,King T'Challa returns home from America to the...,361.506277,1325776812,"[{'id': 28, 'name': 'Action'}, {'id': 12, 'nam...","[{'id': 172069, 'order': 0, 'character': ""T'Ch...","[{'id': 1376891, 'department': 'Art', 'job': '...","[299536, 284053, 333339, 141052, 181808]"
49047,6146,Gravity,1454468,2013,0,105000000,"Dr. Ryan Stone, a brilliant medical engineer o...",20.679122,716392705,"[{'id': 878, 'name': 'Science Fiction'}, {'id'...","[{'id': 18277, 'order': 0, 'character': 'Dr. R...","[{'id': 11218, 'department': 'Directing', 'job...","[68724, 109424, 137113, 17654, 9693]"
597,2694,Titanic,120338,1997,3,200000000,"84 years later, a 101-year-old woman named Ros...",22.530842,1845034188,"[{'id': 18, 'name': 'Drama'}, {'id': 10749, 'n...","[{'id': 204, 'order': 0, 'character': 'Rose De...","[{'id': 2710, 'department': 'Directing', 'job'...","[8587, 425, 808, 12, 607]"
15121,976,The Sound of Music,59742,1965,3,8200000,Film adaptation of a classic Rodgers and Hamme...,9.550476,286214286,"[{'id': 18, 'name': 'Drama'}, {'id': 10751, 'n...","[{'id': 5823, 'order': 0, 'character': 'Maria'...","[{'id': 1744, 'department': 'Directing', 'job'...","[433, 11113, 630, 872, 11708]"


In [3]:
# Load the cast and crew data 
people_data = pd.read_csv("https://raw.githubusercontent.com/shelly/pds-bechdel-test/master/people.csv").set_index('TMDB_ID')
people_data.loc[[54697, 10990, 488, 6884, 1932]]

Unnamed: 0_level_0,Unnamed: 0,Name,Birthday,Deathday,Gender,Place of Birth,Popularity
TMDB_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
54697,54697,Dave Franco,1985-06-12,,2.0,"Palo Alto, California, USA",3.095248
10990,10990,Emma Watson,1990-04-15,,1.0,"Paris, France",11.092018
488,488,Steven Spielberg,1946-12-18,,2.0,Cincinnati - Ohio - USA,8.235147
6884,6884,Patty Jenkins,1971-07-24,,1.0,Victorville - California - USA,0.215638
1932,1932,Audrey Hepburn,1929-05-04,1993-01-20,1.0,"Ixelles, Belgium",3.081924


### Exploratory Data Analysis

Since our dataset is determined by what scores were submitted to [BechdelTest.com](https://bechdeltest.com/), our next step was to inspect our data and understand the composition of the movies we were working with. 

For example, we looked at the percentage of movies from each decade in our dataset that received each score on the Bechdel test:

<img style= src="bechdel_test_over_year.png" width="50%" height="50%">

This shows that our dataset has a trend similar to the real-world trend of a larger percentage of movies passing the Bechdel test with each decade. 

We also looked at what percentage of movies from each genre recieved each score on the test: 

<img style= src="bechdel_test_over_genre.png" width="50%" height="50%">

Again, this seemed to follow real-world trends. We determined that while our dataset is biased towards blockbusters and popular movies, there seems to be a fairly good distribution of movies from different time periods and of different types. However, no initial feature in our dataset was a strongly correlated predictor of whether or not a movie passed the Bechdel Test.

## Data Analysis

### Feature Engineering
We realized that training a model on the given rudimentary features that we collated (such as Year of Release, Names of Directors) would not provide sophisticated models. They would be unable to correctly and consistently predict whether or not a movie passed the Bechdel Test. 

Based on our EDA, we created new, improved features that we thought would better correlate to whether or not movies passed the Bechdel test. The file containing all the functions we wrote to created these features is available [here](https://github.com/shelly/pds-bechdel-test/blob/master/feature_engineering.py).

Some of the new features we created included categorical variables indicating whether or not the first billed actor was female, or the fraction of writers that were female in a particular movie, and the average age of the director.

The example below shows how we created a new feature to find the fraction of women in the writing crew.


In [4]:
# Returns fraction of females in writing crew

def get_female_writing_score():
    crews = movie_data['Crew'] # extract the relevant column from the dataset
    scores = np.zeros(crews.shape)
    ind = 0
    for crew in crews:
        if (len(crew) == 0):
            scores[ind] == float('nan') # account for cases of missing data
        else:
            writers, fem_writers, no_gend = 0, 0, 0
            for mem in crew:
                if (mem['department'] == 'Writing'): # Only considering writers
                    person_id = int(mem['id']) 
                    if (person_id in people_data.index): # Ensure we have data on this individual
                        person_info = people_data.loc[person_id]
                        person_gender = person_info['Gender']
                        if (not person_gender): no_gend += 1 # Unknown gender
                        if (person_gender): writers += 1 # Found a writer
                        if (person_gender == 1): fem_writers += 1 # Found a female writer
            if (writers):
                scores[ind] = float('nan') if (no_gend == len(crew)) else fem_writers/writers
        ind += 1
    return scores

print(get_female_writing_score())

[0.  0.  0.  ... 0.  0.  0.4]


### Modeling

Our aim was to train a model that would accurately predict whether or not a movie would pass the Bechdel Test. We created models using the following algorithms:

* Support Vector Machines
  * RBF Kernel
  * Linear Kernel
* Decision Trees
  * Max-Depth 3
  * Max-Depth 4
* Gaussian Mixture Models
* Naive Bayes
  * Gaussian Naive Bayes
  * Multinomial Naive Bayes
  
In our dataset, 58% of movies passed the Bechdel Test, so we considered this to be the baseline. Our best model, which used the SVM, gave us a testing accuracy of 71%. 

We also looked at predicting whether a movie would pass the Bechdel Test given the scores of movies that are similar to it. We tried three different measures of similarity: two movies were similar if TMDB recommended one of them to people who liked the other, if they shared a director, or if they shared an actor. 

For example, here is a visual showing the relationship between movies in our dataset from 2018:
<img style= src="network.png" width="60%" height="60%">

It is very evident from this network graph that there are distinct clusters of movies.

For each movie, we averaged the Bechdel scores of all movies similar to them, observed that: 

P(fail | average Bechdel score of recommended movies is under 1): __0.71__  
P(pass | average Bechdel score of recommended movies is over 2): __0.67__  
P(fail | average Bechdel score of cast is under 1): __0.75__  
P(pass | average Bechdel score of cast is over 2): __0.64__  
P(fail | average Bechdel score of directors is under 1): __0.72__  
P(pass | average Bechdel score of directors is over 2): __0.62__  

This shows that similar movies behave similarly on the Bechdel Test - so a movie is more likely to pass if other movies similar to it also pass, and vice versa.

In [16]:
from collections import defaultdict

def generate_directors_to_movies_map():
    dirs_to_movies = defaultdict(list)
    for movie_tmdb_id in movie_data.index:
        crew = movie_data.loc[movie_tmdb_id]['Crew']
        for mem in crew:
            if (mem['department'] == 'Directing'):
                person_id = int(mem['id'])
                dirs_to_movies[person_id].append(movie_tmdb_id)
    return dirs_to_movies

def ave_bechdel_dir_score():
    scores = np.zeros(movie_data.shape[0])
    dirs_to_movies = generate_directors_to_movies_map()
    ind = 0
    for movie_tmdb_id in movie_data.index:
        crew = movie_data.loc[movie_tmdb_id]['Crew']
        dir_score = 0
        num_dirs = 0
        for mem in crew:
            if (mem['department'] == 'Directing'):
                person_id = int(mem['id'])
                person_movies = dirs_to_movies[person_id]
                if len(person_movies) > 1:
                    dir_score += (sum(map(lambda movie: movie_data.loc[movie]['Bechdel_Rating'], 
                        person_movies)) - movie_data.loc[movie_tmdb_id]['Bechdel_Rating'])/(len(person_movies) - 1)
                    num_dirs += 1
        scores[ind] = (dir_score / num_dirs) if (num_dirs > 0) else np.nan
        ind += 1
    return scores 

def avg_bechdel_dir_score_cond_prob(): 
    dir_avg = ave_bechdel_dir_score()
    bechdel_rating = movie_data['Bechdel_Rating']

    dir_under_thresh = dir_avg < 1
    dir_over_thresh = dir_avg >= 2

    bechdel_pass = bechdel_rating == 3
    bechdel_fail = bechdel_rating < 3
    print("P(fail | average score of directors is under 1):", 
        sum(np.multiply(dir_under_thresh, bechdel_fail)) / sum(dir_under_thresh))
    print("P(pass | average score of directors is over 2)", 
        sum(np.multiply(dir_over_thresh, bechdel_pass)) / sum(dir_over_thresh))
    
avg_bechdel_dir_score_cond_prob()

P(fail | average score of directors is under threshold): 0.7238805970149254
P(pass | average score of directors is over threshold) 0.6211565585331452




## Conclusion

### Results
We were able to train a relatively successful SVM model with our engineered features. SVMs attempt to linearly separate the data that they are given, so our model's success means that it found a combination of features that would easily separate movies that pass the Bechdel Test from movies that do not.

Our findings also indicate that clusters of movies tend to all pass or fail the test together. For example, movies that have at least one female writer are all likely to pass, and the cluster of movies that are Westerns are all likely to fail. 

Overall, we found multiple features that correlate to whether or not a movie will pass the Bechdel test, including: having at least one female writer, having a female director, being more recent, having a female lead, and belonging to certain genres. Our work supports the hypothesis that the Bechdel test is a good baseline metric for female representation. Many of our features were intentionally engineered to reflect real-world approaches to improving female representation, such as hiring female writers or having a larger percentage of female cast and crew. However, as mentioned before, the fact that Gravity does not pass, but Titanic does, raises doubt in our minds as to whether the test always captures the spirit of female representation in movies. 

### Future Work and Extensions

We suggest evaluating other methods of measuring female representation in a similar manner to our analysis, to find metrics that improve upon the Bechdel Test. One such method is the [Mako Mori Test](http://geekfeminism.wikia.com/wiki/Mako_Mori_test). In a [study by FiveThirtyEight](https://projects.fivethirtyeight.com/next-bechdel/), numerous female current writers, directors and actors were asked to reflect on the Bechdel Test and suggest improvements and alternatives, all of which also seem worth formally evaluating.

A logical next step to further investigating gender bias in movies would be analyze movie scripts or plot overviews, to automatically assess how well-rounded female characters are based on their occupations, actions, and dialogue, such as in this [paper](http://proceedings.mlr.press/v81/madaan18a/madaan18a.pdf).