FilmCharacterRelationship

This repository contains the scripts and data for our project.

The report and slides are hosted on my website.

Folder Structure

data/: contains some of the required data
rmn/: contains original RMN model
rmn_w_pa/: contains modified RMN model
rmndata/: intermediately stores data in prep step before it is copied to both models
vmaps/: contains verb maps, power and agency maps

cmdn_mds: contains the cornell dataset. For more information about the dataset please take a look at cdmn_mds/README.txt
cmu_ng: The list that maps characters to their genders
corenlpsw: List of stopwords downloaded from core nlp
imsdb_ssd: Scripts of movies,from the imsdb database,separated into dialogues and scenes
ms_cfap: character->power and character->agency map .

Note: this folder does not contain some data which may be required by other scripts. Please look at the script descriptions for more information.

Shell scripts:

preparedata.sh: Runs extractMetadataForRMN.py and getConversationsSpanUnigrams.py to prepare data for RMN. Copies over to required locations.
train_rmn.sh: Starts training the RMN model with existing parameters in the background.

Python scripts:

extractCharacterVerbs.py: Extracts the verbs used by each character in the dataset via dependancy parsing and coreference resolution using SpaCy. Download 'GoogleNews-vectors-negative300.bin' to run this script.
extractMetadataForRMN.py: Extracted the metadata, span, movie and character embeddings, for the Relationship Modeling Network proposed in the paper "Feuding Families and Former Friends"" written by Mohit Iyyer et al. Download glove embeddings to run this script.
fixCharacterGenders.py: Some of the characters in the cornell dataset,data/cdmn_mds, don't have genders so the script uses CMU name-gender corpus to obtain the genders for as many characters as possible.
getConversationsSpanUnigrams.py: Creates a bag of words embedding for each span in a movie.
util.py: Utility functions uses in another scripts.
positive-negative.py: An attempt to predict whether a relationship is positive or negative. Uses a logistic regression model input as power and agency of a pair of characters.

Name		Name	Last commit message	Last commit date
Latest commit History 99 Commits
data		data
rmn		rmn
rmn_w_pa		rmn_w_pa
rmndata		rmndata
vmaps		vmaps
.gitignore		.gitignore
README.md		README.md
analysis.ipynb		analysis.ipynb
extractCharacterVerbs.py		extractCharacterVerbs.py
extractMetadataForRMN.py		extractMetadataForRMN.py
fixCharacterGenders.py		fixCharacterGenders.py
getConversationSpanUnigrams.py		getConversationSpanUnigrams.py
positive-negative.py		positive-negative.py
powerAgencyFromVerbs.py		powerAgencyFromVerbs.py
preparedata.sh		preparedata.sh
train_rmn.sh		train_rmn.sh
util.py		util.py