### All Reviews

To help TOMIS determine features of negative reviews, the following analysis takes 9,905 scraped from TripAdvisor and runs them through a Naive Bayes classifier, looks at the top 50 most informative features, and then the context surrounding the words that emerge. 

The features I found most interesting and the context in which they appear is listed at the bottom of the notebook. 

I then repeated this analysis for specific groups of companies to compare the results which are captured in separate notebooks labeled by company type.

In [None]:
import nltk
import random
import csv
import re

working_dir = "C:\\Users\\Mary Makris\\Documents\\Applied Data Analytics\\Projects\\TOMIS\\"

The first step is to label each review. 

In [None]:
#initialize list to store labeled reviews in
labeled_reviews = list()

#open file of all the reviews, split it on a comma
with open (working_dir + 'Trip_Advisor_Scrapes.csv', 'r', encoding="latin-1") as xfile:                                        
    splitline = csv.reader(xfile, delimiter = ",")
    next(splitline)
    for line in splitline:
        ##join the title of the review with the content of the review so all the text can be analyzed as one
        joinline = line[0] + " " + line[3]
        #label reviews rating with a 40 or 50 as good and those with a 10, 20, 0r 30 as bad
        if line[2] == "50" or line[2] == "40":
            labeled_reviews.append((joinline, 'Good'))
        else:
            labeled_reviews.append((joinline, 'Bad'))

Once labeled, I shuffled the reviews to ensure they are in a random order. 

In [None]:
random.shuffle(labeled_reviews)

In [None]:
def review_features(review):
    '''This function should take in a review and return 
    a dictionary value with the name of the feature as the key and 
    the value as the feature value'''
     
    split_review = review.split(" ")                            ### split the list of words
    lowcase_review = [thing.lower() for thing in split_review]     ### make all the descriptions lowercase
    clean_review = [re.sub(r'[^\w\s]','',word) for word in lowcase_review]  ## fix punctuation
    print(clean_review)                                                                   
    ret_val = {}
    for word in clean_review:
        ret_val[word] = True
    return(ret_val)

#Test the function
review_features("Is this working correctly")

In [None]:
##run each review through function and store the results in a variable called feature sets
featuresets = [(review_features(review), rating) for (review, rating) in labeled_reviews]   
#Split the reviews into a training set and a test set
train_set, test_set = featuresets[500:], featuresets[:500]
#run the training set through the Naive Bayes classifier to train it
classifier = nltk.NaiveBayesClassifier.train(train_set)   

In [None]:
#look at how accurate the classifier was when analyzing the test set of reviews that were left out
print(nltk.classify.accuracy(classifier, test_set))

In [None]:
#look at the most informative features determined by the classifer
classifier.show_most_informative_features(50)

In [None]:
#initialize a list to store all the reviews
all_reviews = []

#split each review into individual words, and store the words in the all_reviews list
for review, label in labeled_reviews :
    words = review.split() 
    for word in words :
        all_reviews.append(word)

In [None]:
##tokenize each word in the list of reviews
all_reviews = nltk.word_tokenize(" ".join(all_reviews))

In [None]:
##According to the NLTK documentation nltk.text is wrapper around the tokens that allows you to then run functions like 
##concordance. So I run it on the now tokenized words in all the reviews and store with the same name as before
all_reviews = nltk.Text(all_reviews)

In [None]:
#look at the context surrounding each word to get more information about what the review entailed
all_reviews.concordance("informativefeatureword")

After looking at the top 50 features and the context surrounding them, I pulled the 5 to 10 that I felt would be most useful and informative from a business perspective that TOMIS could use in their work with clients. Likely because of the low number of negative reviews, the classifier had low accuracy rates and sometimes the context showed words that were more often in a positive review. 

### Informative Features with their context:

Disappointing
(Bad : Good   =     79.2 : 1.0)

experience with Hydra other than disappointing . A group of 3 of us rafted the 
t Hydra on the Kicking Horse ... Disappointing at Best No crowds , great raftin
far distance away but we got ... Disappointing for the night cruise ... I would
 for a table ... Lovely Cruise , Disappointing Brunch I really did n't think th
ng from the outside but a little disappointing . Room was not ready when we arr
sive . But the show was the most disappointing . We ... Not so great . We have 
o deal with . The Ghost tour ... Disappointing Ghost Tour We did n't do the mea
then walk to ... Ghost Tour Very Disappointing ! Ca n't say enough about how gr
f you like country music and ... Disappointing If you are undecided about wethe
g was excellent ... .rafting was disappointing we had an awesome time at missio
hat both my husband and I do ... Disappointing If I could give YFA more than 5 
 was served immediately . It was disappointing . Pork chop was overcooked and d
nd we were wary that it would be disappointing as a result . It was exactly the
tures ! Without a doubt the most disappointing part of a wonderful weekend in N

TERRIBLE: 
(Bad : Good   =     54.4 : 1.0)

they would n't even reschedule ... Terrible service would n't use them again !
ide was worth the price . Food was terrible and the servers were -- -different
vers should at least be fluent ... Terrible food The staff was amazing and pro
. Definitely would not recommend . Terrible The Dolores River has carved spect
hand ... Great show , but food was terrible I was lucky enough to spend 5 days
ay . I must say I read some of the terrible and poor reviews before staying at
 Excellent ! The experienced wasnt terrible , but far from good , i felt total
a bum guide or what , but this was terrible ( Ghost Tour ) This is a great way
d an awesome experience ! I have a terrible fear of heights , but with Carolin
l . There was one group that had a terrible accent and we could n't understand
ight . I ate better at Taco Bell . Terrible food We went on this ride last Mon
at ... Evening float . Nice ride , terrible access . When some plans changed l
ther companies rooms and they were terrible compared to Mission Escape ! We ca
ood . Over cooked Pork tender with terrible green beans and awful chutney some
y a 1 hr show after . The meal was terrible ... DRY , overcooked chicken ! Ver
 We , of course , did not ... Just Terrible We did a session for a bachelorett
 and entertaining ... the food was terrible . If ... Not worth the money My fi

HORRIBLE:
(Bad : Good   =     51.3 : 1.0)

plementation of ... Embarrassingly Horrible Staff , Earth Shatteringly Awesome
 $ and non-refundable . Drinks ... HORRIBLE , WISH WE NEVER WENT . NOT WORTH T
t from the very beginning this was HORRIBLE . We arrived at 6 to 3 people sitt
n't save it..They had so weird ... Horrible ..abandon ship This was a great ex
in front of stage -- early act was horrible and food the same -- the brussel s
, consoling my group and I for our horrible loss at the end , etc ) ! The Area
 Wasted half a day driving out ... Horrible I have rafted in three other state

RUDE:
(Bad : Good   =     54.4 : 1.0)  

ld n't use them again ! ! ! And very rude ! ! ! Very disappointed ! ! ! Alpine
 and well kept . The staff was a tad rude and in fact gave my husband a hard t
lso be good . But our guide was very rude , he ... Good facility but very rude
rude , he ... Good facility but very rude Guide My wife , three kids ( 12,15,1
 available for the ... Employees are rude and uninformed . My 18-year old son 

PAYMENT:
(Bad : Good   =     37.8 : 1.0)

 cancellation policy as it requires payment well in advance and is not very for
e ( they also offer cabins ) . Full payment in advance was required . After che
d was told that they would hold the payment on my card and then we could all pa

RESPONSE:
(Bad : Good   =     37.8 : 1.0)

te . I received a helpful , upbeat response in less than 24 hours from Arik Wa
time got a prompt and professional response from Catherine . We had explained 
I was planning for . Emails of her response were vague and delayed , however i
se one-off but the lack of a response says otherwise . As a local with f

CLEANING:
(Bad : Good   =     37.8 : 1.0)

o cabin and they were still inside cleaning . There was mold in downstairs bat
own and the venue could use a good cleaning . Lots of fun . Sketchy part of to
the park . The maintenance and the cleaning of the cabins could be improved . 

DIRTY:
(Bad : Good   =     31.8 : 1.0)

n . Instead of giving us a quick and dirty version and moving on with his day ,
ite city ! You helped reveal all the dirty fun little secrets . We had a blast 
rvice was sloppy . Wine glasses were dirty and when I pointed it out to the bar
laces have wet suits that smell like dirty feet ) . Jake is an excellent guide 
ere was nothing related to zombies . Dirty and quickly became boring Name of ac
is excursion . The Boat is a rusty , dirty , Spider infested Tub . You go down

CHEAP:
(Bad : Good   =     31.8 : 1.0)

is place kinda sucked . It was super cheap and cheesy . I felt like I wasted my
ere very disappointed and it was n't cheap Was disappointed in this attraction 
 , no better than one could get at a cheap buffet . If you like country music a
I actually enjoy to paddle , not the cheap ones other companies use . The sched
 . They also only serve one brand of cheap wine . But the show and night time v