<div style="font-weight: bold; color:#5D8AA8" align="center">
    <div style="font-size: xx-large">Procesamiento del Lenguaje Natural 2021-22</div><br>
    <div style="font-size: x-large; color:gray">Aspect opinion extraction</div><br>
    <div style="font-size: large">María Barroso - Gloria del Valle</div><br></div><hr>
</div>

In [1]:
import json
from nltk.corpus import wordnet as wn
import numpy as np
import pandas as pd

## Assignment 1: Review datasets

**yelp_hotels.json**: json containing 5,034 reviews generated by 4,148 Yelp users about 284 hotels.

**yelp_beauty_spas.json** and **yelp_restaurants.json**: which contain Yelp reviews about beauty/spa resorts and restaurants.

Each review (JSON record) has the following fields:
* *reviewerID*: the identifier of the user who wrote the review
* *asin*: the identifier of the reviewed hotel
* *reviewText*: the text of the user’s review about the hotel
* *overall*: the 1-5 Likert scale rating assigned by the user to the hotel

### Task 1.1
Loading all the hotel reviews from the Yelp hotel reviews file.

In [2]:
def load_all_json_yelp(data_name, data_path = 'yelp_dataset'):
    with open(f'{data_path}/{data_name}.json', encoding='utf-8') as f:
        reviews = json.load(f)
    numReviews = len(reviews)
    print(f'{data_name}: {numReviews} reviews loaded')
    return reviews

reviews_hotels = load_all_json_yelp('yelp_hotels')

yelp_hotels: 5034 reviews loaded


In [3]:
print(reviews_hotels[0])
print(reviews_hotels[0].get('reviewerID'))

{'reviewerID': 'qLCpuCWCyPb4G2vN-WZz-Q', 'asin': '8ZwO9VuLDWJOXmtAdc7LXQ', 'summary': 'summary', 'reviewText': "Great hotel in Central Phoenix for a stay-cation, but not necessarily a place to stay out of town and without a car. Not much around the area, and unless you're familiar with downtown, I would rather have a guest stay in Old Town Scottsdale, etc. BUT if you do stay here, it's awesome. Great boutique rooms. Awesome pool that's happening in the summer. A GREAT rooftop patio bar, and a very very busy lobby with Gallo Blanco attached. A great place to stay, but have a car!", 'overall': 4.0}
qLCpuCWCyPb4G2vN-WZz-Q


### Task 1.2
Loading line by line the reviews from the Yelp beauty/spa resorts and restaurants reviews files

In [4]:
def load_by_line_json_yelp(data_name, data_path = 'yelp_dataset'):
    reviews = []
    with open(f'{data_path}/{data_name}.json', encoding='utf-8') as f:
        f.readline() # first line '['
        numReviews = 0
        while True:
            numReviews += 1
            line = f.readline().strip() # Get next line from file
            if line == ']': # end of file is reached ']'
                print(f'{data_name}: {numReviews} reviews loaded')
                break
            if line[-1] == ',':
                line = line[:-1]
            reviews.append(json.loads(line))
    return reviews

In [5]:
reviews_spas = load_by_line_json_yelp('yelp_beauty_spas')
print(reviews_spas[0])
print(reviews_spas[0].get('reviewerID'))

yelp_beauty_spas: 5580 reviews loaded
{'reviewerID': 'Xm8HXE1JHqscXe5BKf0GFQ', 'asin': 'WGNIYMeXPyoWav1APUq7jA', 'summary': 'summary', 'reviewText': "Good tattoo shop. Clean space, multiple artists to choose from and books of their work are available for you to look though and decide who's style most mirrors what you're looking for. I chose Jet to do a cover-up for me and he worked with me on the design and our ideas and communication flowed very well. He's a very personable guy, is friendly and keeps the conversation going while he's working on you, and he doesn't dick around (read: He starts to work and continues until the job is done). He's very professional and informative. Good customer service combines with talent at the craft.", 'overall': 4.0}
Xm8HXE1JHqscXe5BKf0GFQ


In [6]:
reviews_restaurants = load_by_line_json_yelp('yelp_restaurants')
print(reviews_restaurants[0])
print(reviews_restaurants[0].get('reviewerID'))

yelp_restaurants: 158431 reviews loaded
{'reviewerID': 'rLtl8ZkDX5vH5nAx9C3q5Q', 'asin': '9yKzy9PApeiPPOUJEtnvkg', 'summary': 'summary', 'reviewText': 'My wife took me here on my birthday for breakfast and it was excellent. The weather was perfect which made sitting outside overlooking their grounds an absolute pleasure. Our waitress was excellent and our food arrived quickly on the semi-busy Saturday morning. It looked like the place fills up pretty quickly so the earlier you get here the better.Do yourself a favor and get their Bloody Mary. It was phenomenal and simply the best I\'ve ever had. I\'m pretty sure they only use ingredients from their garden and blend them fresh when you order it. It was amazing.While EVERYTHING on the menu looks excellent, I had the white truffle scrambled eggs vegetable skillet and it was tasty and delicious. It came with 2 pieces of their griddled bread with was amazing and it absolutely made the meal complete. It was the best "toast" I\'ve ever had.An

### Task 1.3
Loading line by line* reviews on other domains like digital music from McAuley’s Amazon dataset2.

Una opción de leer linea por linea un json muy grande es utilizar la función *read_json* de pandas con el atributo 'lines' a True. Después se ha realizado una limpieza del dataframe (eliminado columnas que no estuviesen en el dataset yelp)

In [7]:
#def read_by_line_json_amazon(data_name, data_path = 'amazon_dataset'):
#    df = pd.read_json(f'{data_path}/{data_name}.json', lines=True)
#    df.drop(inplace=True, columns=['verified', 'reviewTime', 'reviewerName', 'reviewText', 'unixReviewTime', 'style', 'image', 'vote'])
#    return df.to_dict('records')

In [8]:
#fashion_reviews = read_by_line_json_amazon('digital_music')
#print(fashion_reviews[0])
#print(fashion_reviews[0].get('reviewerID'))

## Assignment 2: Aspect vocabularies

### Task 2.1
Loading (and printing on screen) the vocabulary of the aspects_hotels.csv
file, and directly using it to identify aspect references in the reviews. In particular, the aspects terms
could be mapped by exact matching with nouns appearing in the reviews. 

In [13]:
def load_aspects(data_name, data_path = 'aspects'):
    with open(f'{data_path}/{data_name}.csv', encoding='utf-8') as f:
        aspects = {}
        for line in f:
            key, synonymous = line.rstrip('\n').split(',')
            if key in aspects and synonymous not in aspects[key]:
                aspects[key].append(synonymous)
            else:
                aspects[key] = []
    return aspects

In [14]:
aspects_hotels = load_aspects('aspects_hotels')
aspects_hotels

{'amenities': ['amenities', 'services'],
 'atmosphere': ['atmospheres',
  'ambiance',
  'ambiances',
  'light',
  'lighting',
  'lights',
  'music'],
 'bar': ['bars', 'bartender', 'bartenders'],
 'bathrooms': ['bathrooms',
  'bath',
  'baths',
  'bathtub',
  'bathtubs',
  'shampoo',
  'shampoos',
  'shower',
  'showers',
  'towel',
  'towels',
  'tub',
  'tubs'],
 'bedrooms': ['bedrooms',
  'bed',
  'beds',
  'pillow',
  'pillows',
  'sheet',
  'sheets',
  'sleep',
  'suite',
  'suites'],
 'booking': ['book', 'reservation', 'reservations', 'reserve'],
 'breakfast': ['breakfasts',
  'morning',
  'mornings',
  'toast',
  'toasts',
  'moorning meal',
  'moorning menu'],
 'building': ['decor',
  'decoration',
  'decorations',
  'furniture',
  'furnitures',
  'garden',
  'gardens',
  'hall',
  'halls',
  'lobbies',
  'lobby',
  'lounge',
  'lounges',
  'patio',
  'patios',
  'salon',
  'salons',
  'spot',
  'spots'],
 'checking': ['check-in',
  'check in',
  'check ins',
  'check out',
  'c

### Task 2.2 

Generating or extending the lists of terms of each aspect with synonyms extracted from WordNet

In [15]:
def extend_aspects(aspects):
    for key in aspects:
        synsets = wn.synsets(key)
        for synset in synsets:
            lemmas = synset.lemma_names()
            aspects[key] = list(set(aspects[key]  + lemmas))
    return aspects

In [16]:
aspects_hotels = extend_aspects(aspects_hotels)
aspects_hotels

{'amenities': ['creature_comforts',
  'conveniences',
  'agreeableness',
  'comforts',
  'amenity',
  'amenities',
  'services'],
 'atmosphere': ['atmosphere',
  'ambiance',
  'lights',
  'atmospheric_state',
  'air',
  'aura',
  'ambience',
  'atmospheres',
  'lighting',
  'light',
  'atm',
  'music',
  'ambiances',
  'standard_atmosphere',
  'standard_pressure'],
 'bar': ['stripe',
  'barroom',
  'barricade',
  'taproom',
  'blockade',
  'bar',
  'ginmill',
  'legal_profession',
  'bartenders',
  'Browning_automatic_rifle',
  'saloon',
  'bartender',
  'bars',
  'prevention',
  'relegate',
  'debar',
  'streak',
  'legal_community',
  'banish',
  'measure',
  'cake',
  'stop',
  'block_up',
  'BAR',
  'exclude',
  'block',
  'block_off'],
 'bathrooms': ['privy',
  'can',
  'baths',
  'shampoo',
  'toilet',
  'bathroom',
  'showers',
  'shampoos',
  'bathtubs',
  'bathrooms',
  'john',
  'tub',
  'bathtub',
  'towels',
  'lavatory',
  'lav',
  'towel',
  'shower',
  'bath',
  'tubs'],

### Task 2.3 
Managing vocabularies for additional Yelp or Amazon domains.

In [17]:
aspects_spas = load_aspects('aspects_spas')
aspects_spas = extend_aspects(aspects_spas)
aspects_spas

{'amenities': ['creature_comforts',
  'conveniences',
  'agreeableness',
  'comforts',
  'amenity',
  'amenities',
  'services'],
 'atmosphere': ['atmosphere',
  'ambiance',
  'lights',
  'atmospheric_state',
  'air',
  'aura',
  'ambience',
  'atmospheres',
  'lighting',
  'light',
  'atm',
  'music',
  'ambiances',
  'standard_atmosphere',
  'standard_pressure',
  'ambiences'],
 'bar': ['stripe',
  'barroom',
  'barricade',
  'taproom',
  'blockade',
  'bar',
  'ginmill',
  'legal_profession',
  'bartenders',
  'Browning_automatic_rifle',
  'saloon',
  'bartender',
  'bars',
  'prevention',
  'relegate',
  'debar',
  'streak',
  'legal_community',
  'banish',
  'measure',
  'cake',
  'stop',
  'block_up',
  'BAR',
  'exclude',
  'block',
  'block_off'],
 'bathrooms': ['privy',
  'can',
  'baths',
  'shampoo',
  'toilet',
  'bathroom',
  'showers',
  'shampoos',
  'bathtubs',
  'bathrooms',
  'john',
  'tub',
  'bathtub',
  'towels',
  'lavatory',
  'lav',
  'towel',
  'shower',
  'ba

In [18]:
aspects_restaurants = load_aspects('aspects_restaurants')
aspects_restaurants = extend_aspects(aspects_restaurants)
aspects_restaurants

{'appetizers': ['entree',
  'appetiser',
  'entrees',
  'starter',
  'starters',
  'appetizer',
  'appetizers'],
 'asian': ['curry',
  'Asiatic',
  'noodle',
  'sushies',
  'Asian',
  'curries',
  'noodles',
  'sushi'],
 'atmosphere': ['atmosphere',
  'ambiance',
  'lights',
  'atmospheric_state',
  'air',
  'aura',
  'ambience',
  'atmospheres',
  'lighting',
  'light',
  'atm',
  'music',
  'ambiances',
  'standard_atmosphere',
  'standard_pressure'],
 'bar': ['stripe',
  'barroom',
  'barricade',
  'taproom',
  'blockade',
  'bar',
  'ginmill',
  'legal_profession',
  'bartenders',
  'Browning_automatic_rifle',
  'saloon',
  'bartender',
  'bars',
  'prevention',
  'relegate',
  'debar',
  'streak',
  'legal_community',
  'banish',
  'measure',
  'cake',
  'stop',
  'block_up',
  'BAR',
  'exclude',
  'block',
  'block_off'],
 'booking': ['reservation',
  'book',
  'reserve',
  'engagement',
  'hold',
  'reservations',
  'booking'],
 'bread': ['scratch',
  'dinero',
  'wampum',
  'c

### Task 2.4
Identifying hidden/implicit aspect references in reviews. For instance, the example review of page 1 has references to the hotel’s location and transportation aspects, since there is “not much around the area” and "going by car to the hotel is recommendable".


For this task, we are going to considerer the hyponym of words. For example, that 'area' is a hyponym of 'location'.

## Assignment 3: Opinion Lexicon

### Task 3.1

Loading Liu’s opinion lexicon composed of positive and negative words, accessible as an NLKT corpus, and exploiting it to assign the polarity values to aspect opinions in assignment 4. Instead of this lexicon, you are allowed to use others, such as SentiWordNet.

In [19]:
import nltk
#nltk.download('opinion_lexicon')
from nltk.corpus import opinion_lexicon

negativeWords = opinion_lexicon.negative()
positiveWords = opinion_lexicon.positive()

In [20]:
from nltk.corpus import sentiwordnet as swn
#nltk.download('sentiwordnet')

# lexicon values between -1,1
lexicon = {}
for negativeWord in negativeWords:
    for senti_word in swn.senti_synsets(negativeWord):
        lexicon[senti_word] = float(- senti_word.neg_score())
for positiveWord in positiveWords:
    for senti_word in swn.senti_synsets(positiveWord):
        lexicon[senti_word] = float(senti_word.pos_score())

In [21]:
import spacy
#!python -m spacy download en_core_web_lg
file = nltk.data.load("vader_lexicon/vader_lexicon.txt")
nlp = spacy.load("en_core_web_lg")

2022-03-25 10:21:42.196735: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-03-25 10:21:42.196759: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.


In [22]:
# lexicon2 between -3, 3
lexicon2 = {}
# $:	-1.5	0.80623	[-1, -1, -1, -1, -3, -1, -3, -1, -2, -1]
for l in file.split("\n"):
    word, polarity = l.strip().split("\t")[0:2]
    lexicon2[word] = float(polarity)

In [23]:
doc = nlp("I do not think the hotel staff was friendly")
displacy.render(doc, style="dep")

NameError: name 'displacy' is not defined

### Task 3.2

Considering modifiers to adjust the polarity values of the aspect opinions in Assignment 4. The modifiers to use could be those provided with the NLTK Sentiment Analyzer (see Appendix G) and/or those given in modifiers.csv.

In [24]:
modifiers = pd.read_csv("modifiers/modifiers.csv")
modifiers

Unnamed: 0,above,2
0,absolutely,2.0
1,abundantly,2.0
2,acutely,2.0
3,amazingly,2.0
4,amply,2.0
...,...,...
294,violently,-1.0
295,whimsically,-1.0
296,wickedly,-1.0
297,wretchedly,-1.0


In [25]:
from nltk.sentiment import SentimentIntensityAnalyzer
from nltk.sentiment.vader import VaderConstants

constants = VaderConstants()
print(constants.BOOSTER_DICT)

{'absolutely': 0.293, 'amazingly': 0.293, 'awfully': 0.293, 'completely': 0.293, 'considerably': 0.293, 'decidedly': 0.293, 'deeply': 0.293, 'effing': 0.293, 'enormously': 0.293, 'entirely': 0.293, 'especially': 0.293, 'exceptionally': 0.293, 'extremely': 0.293, 'fabulously': 0.293, 'flipping': 0.293, 'flippin': 0.293, 'fricking': 0.293, 'frickin': 0.293, 'frigging': 0.293, 'friggin': 0.293, 'fully': 0.293, 'fucking': 0.293, 'greatly': 0.293, 'hella': 0.293, 'highly': 0.293, 'hugely': 0.293, 'incredibly': 0.293, 'intensely': 0.293, 'majorly': 0.293, 'more': 0.293, 'most': 0.293, 'particularly': 0.293, 'purely': 0.293, 'quite': 0.293, 'really': 0.293, 'remarkably': 0.293, 'so': 0.293, 'substantially': 0.293, 'thoroughly': 0.293, 'totally': 0.293, 'tremendously': 0.293, 'uber': 0.293, 'unbelievably': 0.293, 'unusually': 0.293, 'utterly': 0.293, 'very': 0.293, 'almost': -0.293, 'barely': -0.293, 'hardly': -0.293, 'just enough': -0.293, 'kind of': -0.293, 'kinda': -0.293, 'kindof': -0.293,

In [28]:
#nltk.download('vader_lexicon')

analyzer = SentimentIntensityAnalyzer()
s = "I'm really interested in music"
print(s, " => ", analyzer.polarity_scores(s))

I'm really interested in music  =>  {'neg': 0.0, 'neu': 0.572, 'pos': 0.428, 'compound': 0.4576}


## Assignment 4: Aspect opinions

Once the aspect vocabulary and opinion lexicons are loaded, the opinions about aspects have to be extracted
from the reviews. For this purpose, POS tagging, constituency and dependency parsing could be used.
- POS tagging would allow identifying the adjectives in the sentences.
- Constituency and dependency parsing would allow extracting the relations between nouns and adjectives and adverbs.

#### Task 4.1 -: 
Extracting the [aspect, aspect term, opinion word, polarity] tuples from the input reviews

In [191]:
# testing 
review = reviews_hotels[10].get("reviewText")
review = clean(review)

sentences = nltk.sent_tokenize(review)
s = sentences
print(s)
result, = dependency_parser.raw_parse(s[3])
for head, relation, dependent in result.triples():
    print(head, relation, dependent)

['I was hoping for a more adult friendly place.', "I haven't been here in years, it used to be a gay-friendly hotspot.", 'Now it seems to caters more to families and children.', "I've heard the restaurant is good, but I haven't visited yet."]
('heard', 'VBN') nsubj ('I', 'PRP')
('heard', 'VBN') aux ("'ve", 'VBP')
('heard', 'VBN') ccomp ('good', 'JJ')
('good', 'JJ') nsubj ('restaurant', 'NN')
('restaurant', 'NN') det ('the', 'DT')
('good', 'JJ') cop ('is', 'VBZ')
('heard', 'VBN') punct (',', ',')
('heard', 'VBN') conj ('visited', 'VBN')
('visited', 'VBN') cc ('but', 'CC')
('visited', 'VBN') nsubj ('I', 'PRP')
('visited', 'VBN') aux ('have', 'VBP')
('visited', 'VBN') advmod ("n't", 'RB')
('visited', 'VBN') advmod ('yet', 'RB')
('heard', 'VBN') punct ('.', '.')


In [195]:
import re

# Define a function to clean the text
def clean(text):
    # Removes all special characters and numericals leaving the alphabets
    text = re.sub("[^A-Za-z-.]'+", " ", text) 
    return text

def get_aspect(word, aspects):
    for key in aspects:
        if word in aspects[key]:
            return key
    return word

def get_polarity(dict_inf):
    polarity_scores = analyzer.polarity_scores(dict_inf['adjetive'] +' '+ dict_inf["word"])
    if polarity_scores['pos'] >= 0.5 and polarity_scores['neu']<=0.5:
        return 1.0
    elif polarity_scores['neg'] >= 0.5 and polarity_scores['neu']<=0.5:
        return -1.0
    return 0.0

def get_information(head, relation, dependent,  aspect, dict_inf, data):
    word1, pos1 = head
    word2, pos2 = dependent
        
    if relation == 'amod' and pos1.startswith('NN') and pos2.startswith('JJ'):
        if pos1 == 'NN':
            aspect = get_aspect(word1, aspects_hotels)
            if 'adjetive' in dict_inf:
                dict_inf['aspect'] = aspect.upper()
                dict_inf['polarity'] = get_polarity(dict_inf)
                data = data.append(dict_inf, ignore_index=True)
                dict_inf = {}

            dict_inf['adjetive'] = word2.lower()
            dict_inf['word'] = word1.lower()
    
    if relation == 'nsubj' and pos1.startswith('JJ') and pos2.startswith('NN'):
        if pos2 == 'NN':
            aspect = get_aspect(word2, aspects_hotels)
            if 'adjetive' in dict_inf:
                dict_inf['aspect'] = aspect.upper()
                dict_inf['polarity'] = get_polarity(dict_inf)
                data = data.append(dict_inf, ignore_index=True)
                dict_inf = {}

            dict_inf['adjetive'] = word1.lower()
            dict_inf['word'] = word2.lower()
    
    elif relation == 'compound' and pos1.startswith('NN') and pos2.startswith('NN'):
        if 'word' in dict_inf and word1 in dict_inf['word']:
            word_list =  dict_inf['word'].split()
            word_list.insert(-1, word2)
            dict_inf['word'] = ' '.join(word_list).lower()
        
        if pos1 == 'NNS' and pos2 == 'NN':
            aspect = get_aspect(word2, aspects_hotels)
    
    elif relation == 'advmod' and pos1.startswith('JJ') and pos2.startswith('RB'):
        if 'adjetive' in dict_inf and word1 in dict_inf['adjetive']:
            adj_list =  dict_inf['adjetive'].split()
            adj_list.insert(-1, word2)
            dict_inf['adjetive'] = ' '.join(adj_list).lower()
    return aspect, dict_inf, data

def save_information(aspect, dict_inf, data):
    if aspect != None and 'adjetive' in dict_inf:       
        dict_inf['aspect'] = aspect.upper()
        dict_inf['polarity'] = get_polarity(dict_inf)
        data = data.append(dict_inf, ignore_index=True)
    return data

In [207]:
import pandas as pd
from nltk.parse.corenlp import CoreNLPDependencyParser

data = pd.DataFrame(columns = ['polarity', 'aspect', 'adjetive', 'word'])

dependency_parser = CoreNLPDependencyParser()

review = reviews_hotels[0].get("reviewText")
review = clean(review)

sentences = nltk.sent_tokenize(review)

for s in sentences:
    print(s)
    
    result, = dependency_parser.raw_parse(s)
    
    dict_inf = {}
    aspect = None
    for head, relation, dependent in result.triples():
        aspect, dict_inf, data = get_information(head, relation, dependent, aspect, dict_inf, data)
    data = save_information(aspect, dict_inf, data)
    

data

Great hotel in Central Phoenix for a stay-cation, but not necessarily a place to stay out of town and without a car.
Not much around the area, and unless you're familiar with downtown, I would rather have a guest stay in Old Town Scottsdale, etc.
BUT if you do stay here, it's awesome.
Great boutique rooms.
Awesome pool that's happening in the summer.
A GREAT rooftop patio bar, and a very very busy lobby with Gallo Blanco attached.
A great place to stay, but have a car!


Unnamed: 0,polarity,aspect,adjetive,word
0,1.0,HOTEL,great,hotel
1,1.0,POOL,awesome,pool
2,1.0,BUILDING,great,rooftop patio bar
3,0.0,BUILDING,very very busy,lobby
4,1.0,PLACE,great,place


In [68]:
#nltk.download('stopwords')
from nltk.corpus import stopwords

def token_stop_pos(text):
    tags = nltk.pos_tag(nltk.word_tokenize(text))
    newlist = []
    for word, tag in tags:
        if word.lower() not in set(stopwords.words('english')):
            newlist.append(tuple([word, tag]))
    return newlist

token_stop_pos(sentences[0])

[('Great', 'JJ'),
 ('hotel', 'NN'),
 ('Central', 'NNP'),
 ('Phoenix', 'NNP'),
 ('stay-cation', 'NN'),
 ('necessarily', 'RB'),
 ('place', 'NN'),
 ('stay', 'VB'),
 ('town', 'NN'),
 ('without', 'IN'),
 ('car', 'NN'),
 ('.', '.')]

In [71]:
for s in sentences:
    result, = dependency_parser.raw_parse(s)

In [29]:
analyzer.polarity_scores('very very busy')

{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}

In [20]:
def get_aspect(word, aspects):
    for key in aspects:
        if word in aspects[key]:
            return key

get_aspect('place', aspects_hotels)

In [41]:
import pandas as pd

review = reviews_hotels[0].get("reviewText")

from nltk.parse.corenlp import CoreNLPDependencyParser
dependency_parser = CoreNLPDependencyParser()


sentences = nltk.sent_tokenize(review)

data = pd.DataFrame(columns = ['polarity', 'aspect', 'adjetive', 'word'])
for s in sentences:
    print(s)
    
    result, = dependency_parser.raw_parse(s)
    
    dict_inf = {}
    for head, relation, dependent in result.triples():
        word1, pos1 = head
        word2, pos2 = dependent      
        
        if pos1.startswith('NN') and pos2.startswith('JJ') and relation == 'amod':
            
            if pos1 == 'NN':
                aspect = get_aspect(word1, aspects_hotels)
            if 'adjetive' in dict_inf:
                if aspect != None:
                    dict_inf['aspect'] = aspect.upper()
                    
                    polarity_scores = analyzer.polarity_scores(dict_inf['adjetive'] +' '+ dict_inf["word"])
                    if polarity_scores['pos'] > polarity_scores['neg']:
                        polarity = 1.0
                    else:
                        polarity = -1.0

                    dict_inf['polarity'] = polarity
        
                    
                    data = data.append(dict_inf, ignore_index=True)
                    aspect = get_aspect(aspect, aspects_hotels)
                    dict_inf = {}
                
            dict_inf['adjetive'] = word2.lower()
            dict_inf['word'] = word1.lower()
        elif pos1.startswith('NN') and pos2.startswith('NN') and relation == 'compound':
            
            if 'word' in dict_inf and word1 in dict_inf['word']:
                word_list =  dict_inf['word'].split()
                word_list.insert(-1, word2)
                dict_inf['word'] = ' '.join(word_list).lower()
            if pos1 == 'NNS' and pos2=='NN':
                aspect = get_aspect(word2, aspects_hotels)
        elif pos1.startswith('JJ') and pos2.startswith('RB') and relation == 'advmod':
            if 'adjetive' in dict_inf and word1 in dict_inf['adjetive']:
                adj_list =  dict_inf['adjetive'].split()
                adj_list.insert(-1, word2)
                dict_inf['adjetive'] = ' '.join(adj_list).lower()
            
    if aspect != None:
        dict_inf['aspect'] = aspect.upper()
        
        polarity_scores = analyzer.polarity_scores(dict_inf['adjetive'] +' '+ dict_inf["word"])
        if polarity_scores['pos'] > polarity_scores['neg']:
            polarity = 1.0
        else:
            polarity = -1.0
        
        dict_inf['polarity'] = polarity
        data = data.append(dict_inf, ignore_index=True)
        aspect = None
data

Great hotel in Central Phoenix for a stay-cation, but not necessarily a place to stay out of town and without a car.
Not much around the area, and unless you're familiar with downtown, I would rather have a guest stay in Old Town Scottsdale, etc.
BUT if you do stay here, it's awesome.
Great boutique rooms.
Awesome pool that's happening in the summer.
A GREAT rooftop patio bar, and a very very busy lobby with Gallo Blanco attached.
A great place to stay, but have a car!


Unnamed: 0,polarity,aspect,adjetive,word
0,1.0,UNKNOWN,great,hotel
1,1.0,SHOPPING,great,boutique rooms
2,1.0,POOL,awesome,pool
3,1.0,BUILDING,great,rooftop patio bar
4,1.0,BUILDING,very very busy,lobby
5,1.0,UNKNOWN,great,place


In [77]:
def pos_tagging(text):
    sentences = nltk.sent_tokenize(text)
    sentences = [nltk.word_tokenize(s) for s in sentences]
    sentences = [nltk.pos_tag(s) for s in sentences]
    return sentences

postagged_sentences = pos_tagging(reviews_hotels[0].get("reviewText"))
print(postagged_sentences[3])

[('Great', 'NNP'), ('boutique', 'NN'), ('rooms', 'NNS'), ('.', '.')]
