# Aspect Based Sentiment Analysis on Movie Reviews

Hi, guys. So I am trying on reproducing the technique discussed in this paper[1]. The process involves no machine learning, and consists of: tokenizing reviews into sentences, POS Tagging, identifying aspects, locating opinion word of aspects, and calculating opinion polarity.


## Tokenize into Sentences

In [2]:
import csv,re,sys
from nltk.tokenize import sent_tokenize
# set positive or negative based on arguments

review_type = "pos"
if len(sys.argv) > 1 and sys.argv[1] in ["pos", "neg"]:
	review_type = sys.argv[1]

input_files = ['../dataset/review_test_'+review_type+'.csv']
items = [];
csvwriter = csv.writer(open("fullsen_"+review_type+".csv",'w'), delimiter=",", quotechar='"', quoting=csv.QUOTE_ALL)

for input_file in input_files:
	readCsv = csv.reader(open(input_file))
	for row in readCsv:
		reviewId = row[0]
		reviewText = re.sub('<br />',' ',row[1])
		movieId = row[3]
		try:
			sentences = sent_tokenize(reviewText)
			for idx, sentence in enumerate(sentences):
				items.append([movieId, reviewId, idx, sentence])
				csvwriter.writerow([movieId, reviewId, idx, sentence])
		except:
			continue

## POS Tagging

In [8]:
# read all sentences
review_type = "pos"
input_files = ['../python-port/fullsen_'+review_type+'.csv']
sentences = []

for input_file in input_files:
	readCsv = csv.reader(open(input_file))
	for row in readCsv:
		sentence = {"movieId": row[0], "reviewId": row[1], "sentenceId": row[2], "text":row[3]}
		sentences.append(sentence)

In [14]:
# sample sentence
sentence = sentences[8526]
# apply pos tagging
import nltk
for sentence in sentences:
    words = word_tokenize(sentence["text"])
    sentence["pos_tag"] = nltk.pos_tag(words)

## Identify Aspects

I used the terms list on [2] (see Table 2) and the movie metadata (movie title, actor names, character names, director names) to identify the aspect. The feature terms is expanded using WordNet similarity (threshold= ... ), so whenever we found NP (nounphrase) in the sentence, we match it to its WordNet counterpart using WSD (word-sense disambiguation) and calculate the average similarity to the already available terms.

Aspect Categories:
* Overall
* Cast
* Director
* Story
* Scene / Cinematography
* Music

First we built a aspect category dictionary consisting of related word-sense (it's taken from Table 2 of [2], and manually checked the appropriate word sense via [this tool](http://wordnetweb.princeton.edu/perl/webwn). For now we will start only with 'overall' and 'cast' aspect category.

In [None]:
from nltk.corpus import wordnet as wn

aspect_terms = {"overall": wn.synset('movie.n.01'), wn.synset('film.n.01'),
"cast": wn.synset('act.v.10'), wn.synset('act.n.04'), wn.synset('acting.n.01'), wn.synset('actress.n.01'), wn.synset('actor.n.01'), wn.synset('role.n.02'), wn.synset('portray.n.03'), wn.synset('character.n.04'), wn.synset('villain.n.02'), wn.synset('performance.n.02'), wn.synset('performed.v.03'), wn.synset('played.v.25'), wn.sysnet('casting.n.04'), wn.synset('cast.n.01'), wn.synset('cast.v.03')}

Next for each sentence, we extracted the verbs and noun, apply word-sense disambiguation to determine the word sense, and check if it exists in `aspect_terms`. We can disregard and skip stop words.

In [None]:
from pywsd.lesk import simple_lesk
from pywsd.utils import penn2morphy

def is_stopword(string):
    if string.lower() in nltk.corpus.stopwords.words('english'):
        return True
    else:
        return False
    
def is_punctuation(string):
    for char in string:
        if char.isalpha() or char.isdigit():
            return False
    return True


for sentence in sentences:
    for item in sentence["pos_tag"]:
        token = item[0]
        pos = item[1]
        if ~is_stopword(token) and ~is_punctuation(token):
            if pos.startswith('NN') or pos.startswith('VB'): # it is a noun
                word_sense = simple_lesk()
            

## Locating Opinion Word

## Calculating Opinion Polarity

## Evaluation

## References
[1]: [Piryani, R., Gupta, V., Singh, V. K., & Ghose, U. (2017). A Linguistic Rule-Based Approach for Aspect-Level Sentiment Analysis of Movie Reviews. In Advances in Computer and Computational Sciences (pp. 201-209). Springer, Singapore.](https://link.springer.com/chapter/10.1007/978-981-10-3770-2_19)
[2]: [Thet, T. T., Na, J. C., & Khoo, C. S. (2010). Aspect-based sentiment analysis of movie reviews on discussion boards. Journal of information science, 36(6), 823-848.](https://dl.acm.org/citation.cfm?id=1899344)