Given a movie review, finds the sentence which best summmarizes the review. Based on research by Zhuang et al., available here.
Requires Stanford Parser, written in Java. We've got the python wrapper included, but you need to download the Java package separately, as described below.
The NYTimes review for "Pain and Gain" is summarized by the sentence:
It all leaves you pondering whether you have just seen a monumentally stupid movie or a brilliant movie about the nature and consequences of stupidity
And, the Roger Ebert's review of the estimable Mean Girls is summarized by
Mean Girls dissects high school society with a lot of observant detail which seems surprisingly well-informed
-
git clone git@github.com:gpleiss/ai-final-project.git
-
Download the Java code from MIT, and copy the 3rdParty directory to this project. (We added it to the .gitignore, because it was huge)
-
You'll need to install JPype for python. We found these instructions helpful for installing JPype on our Macs
-
There's a chance you'll have more debugging to do. It took us about 4 hours to get the Stanford Parser working.
-
To see whether things are set up properly, open Python and...:
>>> from stanford_parser import parser as sp >>> parser = sp.Parser() Loading parser from serialized file 3rdParty/stanford-parser/stanford-parser-2010-08-20/../englishPCFG.July-2010.ser ... done [0.9 sec]. >>> print parser.parseToStanfordDependencies("this movie was utterly fantastic") sentence='this movie was utterly fantastic' det(movie, this) nsubj(fantastic, movie) cop(fantastic, was) advmod(fantastic, utterly)
-
If the above code works, you're good to go
To summarize each review included in the NLTK movie_reviews corpus:
$ python summarizer.py
To summarize movie review(s) not included in the NLTK:
$ python summarizer.py filename1.txt filename2.txt ... etc.
(We include 2 extra movie reviews, review_painandgain.txt and review_meangirls.txt)