# Load all hotel reviews

As before, load teh hotel reviews file already stored in your Google Drive account.

In [0]:
from google.colab import drive
import os
drive.mount("/content/gdrive")

import pandas as pd

if os.path.isfile("/content/gdrive/My Drive/Data/new-york-city.csv"):
  with open('/content/gdrive/My Drive/Data/new-york-city.csv', 'r') as f:
    reviews = pd.read_csv(f, sep="\t",  header=None, usecols=[0,1,2,3], quoting=3,
                    names = ["Hotel Name", "Date of Review", "Review Headline", "Review Text"])
    reviews["Review Headline"] = reviews["Review Headline"].str.lower() # convert all review headlines to lowercase
    reviews["Review Text"] = reviews["Review Text"].str.lower() # convert all review text to lowercase
    reviews["Review Headline"] = reviews["Review Headline"].dropna() # skip empty reviews
    reviews["Review Text"] = reviews["Review Text"].dropna() # skip empty reviews

    print("Reviews file read successfully")
else:
  print("Data folder does not contain 'new-york-city.csv'")

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).
Reviews file read successfully


# Load the spacy NLP library

This library will be used to perform Named Entity Recognition

In [0]:
import spacy
from spacy import displacy
nlp = spacy.load('en')

Now examine a particular review, specified by the first line of the code cell below. This code cell prints the entities found in teh review as well as teh type of entity. The different entity codes are listed here:https://spacy.io/api/annotation#named-entities
For example, the code 'LOC' means a location. 

In [0]:
review_number = 1 # change this number to any review number desired

selected_review = reviews.iloc[review_number]
selected_review_text = selected_review["Review Text"]

print(selected_review_text)
print()
print()

doc = nlp(selected_review_text) 
print("Entities found in review #{}:".format(review_number))
print('ENTITY\t\t\tTYPE')
for X in doc.ents:
  print("{:<16}\t{}".format(X.text,X.label_))
print()


ok i had a one day stay in nyc on business and wanted to splurge a bit. after searching through dozens of hotels on essentravel.com i found the sherry netherland. the price was $700 a night but i said what the heck.i was not dissapointed. the staff were more than helpful, the rooms were opulent and clean, i had a great stay. if you want to splurge a bit, give the sherry a try. its great!!


Entities found in review #1:
ENTITY			TYPE
one day         	DATE
dozens          	CARDINAL
700             	MONEY



We can also print the entity tags that are applied to each token. The first tag is a B or I, indictaing whether this token is the beginning of the entity or 'inside' the entity (i.e. not the first token).
An 'O' means no entity. The code below only prints the entity information for tokens that are entities. 

In [0]:
for token in doc:
  if token.ent_iob_ != 'O':
    print("{:<16}\t{}\t{}".format(token.text, token.ent_iob_, token.ent_type_))

one             	B	DATE
day             	I	DATE
dozens          	B	CARDINAL
700             	B	MONEY


We can also display a more intuitive rendering of entities, which again uses the entity codes.

In [0]:
displacy.render(doc, jupyter=True, style='ent')

You could simply type any text you like into the string below to see what entities it contains.

In [0]:
text = "IBM released earnings of $5 per share today. I went to New york City and then NYC."
doc = nlp(text)
displacy.render(doc, jupyter=True, style='ent')

In [0]:
text = "But Google is starting from behind."

ex = [{"text": text,
       "ents": [{"start": 4, "end": 10, "label": "ORG"},{"start": 13, "end": 22, "label": "ACT"}]
      }]
html = displacy.render(ex, style="ent", manual=True, jupyter=True)

