Movie Mood: finding emotions in movies via reviews

Objective

To extract emotions contained in movies, from movie reviews, filtering out plot descriptions, reviewers emotions about the movie, and comments about the support (Blu-ray…) or shipping. This is the foundation for a search engine that would allow users to select a movie based on the emotions found in the movies, by reviewers.

How the objective was achieved

Given a review such as: ‘Great movie! Takes place in an isolated outpost in the galaxy. The hero hates aliens. The Blu-ray contains awesome bonus material.’
The goal was to isolate: ‘the hero hates aliens’. I used the following approach:

1. Removed support-related sentences

Removed support-related sentences (e.g. ‘The Blu-ray contains awesome bonus material’) via keyword search.

2. Removed descriptive sentences

Removed descriptive sentences (‘takes place in an isolated outpost in the galaxy’) by vectorizing text into a space of 7 emotions and removing sentences below a threshold.

3. Removed sentences that contain only reviewers' feelings

Removed reviewers’ sentences that included only their feelings about the movie, by modeling the differences between feeling descriptions in plot (‘the hero hates aliens’) and sentences solely about reviewers' feelings (‘great movie!’).

Findings

Reviewer emotions and emotions in movies are sometimes intertwined

In practice, reviewers combine their emotional reactions with emotions found in movies, sometimes in the same sentence. For example:
'a tender and touching drama based on the true story of a troubled african-american's quest to come to terms with his origins.'

In my modeling, I attempted to keep the sentences with a mix of movie and reviewer emotions by re-classifying mixed sentences like the one above when needed, as I felt the emotions they expressed were highly correlated with emotions in movies.

Emotions correlate with star rating

I used the star ratings associated with the reviews as a way to validate my investigations. I found that negative emotions expressed in reviews correlate negatively with ratings, and positive emotions correlate positively with ratings:

Reviews with strong scared feelings have lower ratings than others:

Reviews with strong happy feelings have higher ratings than others:

It makes sense that reviewers' sentiments correspond to ratings. However, emotions in movies should not be very related to reviewers' ratings, as there are good scary movies out there! (there could still be some correlation between emotions in plot and reviewer ratings if, for example, making a good scary movie is generally more difficult than making a happy one)

The bag-of-words approach above does not allow to differentiate between emotions in plot (‘the hero hates aliens’) and the reviewer’s emotions (‘great movie!’).

Notebook for this section

Reviewer emotions vs. plot emotions separation

Since the bag of emotional words approach was not sufficient, I built a classifier to differentiate reviewer feelings from plot sentences.

I also built a sentiment predictor model to test the classifier: assuming that a reviewers' emotions are the main determining factor of star rating, removing all sentences unrelated to the reviewers' emotions should not impact the ability of a sentiment predictor model to predict the star rating, but removing sentences of reviewers' emotions should.

I computed the accuracy of the sentiment analysis model after the classifier removed either reviewer feelings or plot sentences. I found that removing sentences with reviewer feelings (blue bars below) reduced the accuracy of the sentiment predictor much more than by removing plot-related sentences (red bars below):

The height differences between the blue and red lines in the graph above show that my reviewer emotions vs. plot emotions classifier does a fairly good job.

Notebook that creates a balanced data set to train the sentiment analysis model
Notebook to create shortened reviews with reviewer or plot emotions removed
Notebook that runs the sentiment analysis model on shortened reviews
Notebook that displays the shortened reviews testing results

Next steps

Implement an LSTM model to better differentiate reviewer emotions & emotions in movies
Create the emotion-driven movie selection UI

Technical details

Data set

I obtained 4.6 million Amazon movie & TV reviews from J. McAuley of UCSD, which he collected for the research paper: Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering by R. He, J. McAuley, WWW, 2016 pdf

Bag of words model with 7 emotions

I used a dataset of 23,000 keywords labeled with the following 7 emotions: 'disgust', 'surprise', 'anger', 'sad', 'happy', 'fear' and 'neutral'.

Separating reviewer’s feelings from feelings in plot descriptions

Reviewer feelings vs. plot classifier

Used 5,000 sentences from a plot description site, and 5,000 sentences from rotten tomatoes for the labeled data. Cleaned up the labels manually by inspecting data misclassified by the models.

Reviewer feelings vs. plot classifier: model tuning notebook

Sentiment classifier

Created it to validate the reviewer/plot classifier.

Isolated a set of 30,000 movie reviews with 5 sentences and balanced +/- sentiment, from the Amazon data set.

Sentiment classifier model tuning notebook

For both classifiers

Performed grid searches with logistic regressors, random forests and gradient boosting classifiers (GBC). Found best results with GBC.

Slide presentation

A slide presentation of this readme file is available here.

How to run unit tests

In the project root directory, run: pytest test/unittests.py

Name		Name	Last commit message	Last commit date
Latest commit History 124 Commits
images		images
pickles		pickles
references		references
results		results
src		src
test		test
.gitignore		.gitignore
1_create_training_set_of_reviews_w_balanced_ratings.ipynb		1_create_training_set_of_reviews_w_balanced_ratings.ipynb
2_Remove_some_sentences.ipynb		2_Remove_some_sentences.ipynb
3_A_B_test.ipynb		3_A_B_test.ipynb
4_AB_testing_result_analysis.ipynb		4_AB_testing_result_analysis.ipynb
README.md		README.md
amzn_reviews_emotions.ipynb		amzn_reviews_emotions.ipynb
gbc_500_trees_02_rate_8_depth_5_leaf_sqrt_10k_tfidf.xlsx		gbc_500_trees_02_rate_8_depth_5_leaf_sqrt_10k_tfidf.xlsx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Movie Mood: finding emotions in movies via reviews

Objective

How the objective was achieved

1. Removed support-related sentences

2. Removed descriptive sentences

3. Removed sentences that contain only reviewers' feelings

Findings

Reviewer emotions and emotions in movies are sometimes intertwined

Emotions correlate with star rating

Reviewer emotions vs. plot emotions separation

Next steps

Technical details

Data set

Bag of words model with 7 emotions

Separating reviewer’s feelings from feelings in plot descriptions

Reviewer feelings vs. plot classifier

Sentiment classifier

For both classifiers

Slide presentation

How to run unit tests

About

Releases

Packages

Languages

j-gd/movie_mood

Folders and files

Latest commit

History

Repository files navigation

Movie Mood: finding emotions in movies via reviews

Objective

How the objective was achieved

1. Removed support-related sentences

2. Removed descriptive sentences

3. Removed sentences that contain only reviewers' feelings

Findings

Reviewer emotions and emotions in movies are sometimes intertwined

Emotions correlate with star rating

Reviewer emotions vs. plot emotions separation

Next steps

Technical details

Data set

Bag of words model with 7 emotions

Separating reviewer’s feelings from feelings in plot descriptions

Reviewer feelings vs. plot classifier

Sentiment classifier

For both classifiers

Slide presentation

How to run unit tests

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages