### Boat/Park Tours

There were 10 companies in the Boat and National Park Tour group, and about 1,700 reviews. 

In [None]:
import nltk
import random
import csv
import re

working_dir = "C:\\Users\\Mary Makris\\Documents\\Applied Data Analytics\\Projects\\TOMIS\\"

In [None]:
#initialize list to store labeled reviews in
labeled_reviews = list()

#open file of all the reviews, split it on a comma
with open (working_dir + 'boat_parktours.csv', 'r', encoding="Latin-1") as xfile:                                        
    splitline = csv.reader(xfile, delimiter = ",")
    next(splitline)
    for line in splitline:
        ##join the title of the review with the content of the review so all the text can be analyzed as one
        joinline = line[0] + " " + line[3]
        #label reviews rating with a 40 or 50 as good and those with a 10, 20, 0r 30 as bad
        if line[2] == "50" or line[2] == "40":
            labeled_reviews.append((joinline, 'Good'))
        else:
            labeled_reviews.append((joinline, 'Bad'))

In [None]:
#shuffle all the reviews up in case they are in some type of order
random.shuffle(labeled_reviews)

In [None]:
def review_features(review):
    '''This function should take in a review and return 
    a dictionary value with the name of the feature as the key and 
    the value as the feature value'''
     
    split_review = review.split(" ")                            ### split the list of words
    lowcase_review = [thing.lower() for thing in split_review]     ### make all the descriptions lowercase
    clean_review = [re.sub(r'[^\w\s]','',word) for word in lowcase_review]  ## fix punctuation
    print(clean_review)                                                                   
    ret_val = {}                                                  ###store words in a dictionary
    for word in clean_review:
        ret_val[word] = True
    return(ret_val)

#Test the function
review_features("Is this working correctly")

In [None]:
##run each review through function and store the results in a variable called feature sets
featuresets = [(review_features(review), rating) for (review, rating) in labeled_reviews]  
#Split the reviews into a training set and a test set
train_set, test_set = featuresets[500:], featuresets[:500] 
#run the training set through the Naive Bayes classifier to train it
classifier = nltk.NaiveBayesClassifier.train(train_set)   

In [None]:
#look at how accurate the classifier was when analyzing the test set of reviews that were left out
print(nltk.classify.accuracy(classifier, test_set))

In [None]:
#look at the most informative features determined by the classifer
classifier.show_most_informative_features(50)

In [None]:
#initialize a list to store all the reviews
all_reviews = []

#split each review into individual words, and store the words in the all_reviews list
for review, label in labeled_reviews :
    words = review.split() 
    for word in words :
        all_reviews.append(word)

In [None]:
##tokenize each word in the list of reviews
all_reviews = nltk.word_tokenize(" ".join(all_reviews))

In [None]:
##According to the NLTK documentation nltk.text is wrapper around the tokens that allows you to then run functions like 
##concordance. So I run it on the now tokenized words in all the reviews and store with the same name as before
all_reviews = nltk.Text(all_reviews)

In [None]:
#look at the context surrounding each word to get more information about what the review entailed
all_reviews.concordance("informativefeatureword")

After looking at the top 50 features and the context surrounding them, I pulled the 5 to 10 that I felt would
be most useful and informative from a business perspective that TOMIS could use in their work with clients. 
Likely because of the low number of negative reviews, the classifier had low accuracy rates and the context 
shows words in reviews that are positive. 

### Top Informative Features & Probabilites

PRICED:
(Bad : Good   =     21.5 : 1.0)

t was short boat ride that was over priced . My grandson wanted to ride it but
nts . In fact , it was just an over priced boat.. No History - just a boat rid
dly smiling crew , clean ship , and priced reasonably . Loved the waving lady 
ere well appreciated and reasonably priced . Was cold at times ... One of the 
 we made invested in ... A Bit High Priced , But Worth The Memory We took the 
t tour was comfortable , reasonably priced , and informative . This is fun for
ful . The cruise was 90 minutes and priced reasonably . great scenery on a bea
eal this time , we felt it was over priced . We did the cruise and show . The 
enjoy it ! However , it is way over priced for adults . On top of that if you 
e Lodge ) ... and it was reasonably priced ( $ 16ish for adults , $ 8ish for k

DRY:
(Bad : Good   =     17.6 : 1.0)

 the food lacked luster . Chicken was dry and the short ribs were awful . I was
inting . Pork chop was overcooked and dry ... . Great Entertainment We had a gr
how after . The meal was terrible ... DRY , overcooked chicken ! Very pushy wai
ind of fruit . Stove top dressing was dry and tasted like something from ... SO
e food was not good . The chicken was dry and tasteless . The waiters was pushi
 the afternoon , we were successfully dry casting , roll casting , and catching
ock until totally dark outside . Food dry , no better than one could get at a c
.. Boxed mashed potatoes ... Roast so dry that all the gravy in the world could

ENGINE:
(Bad : Good   =     13.7 : 1.0)

 with us as I could n't restart the engine when he was teaching me what to do 
 very hot , loud , exhaust smelling engine . No refund was given . Rudeness fr
! 30 min into our 8hp boat ride the engine quit , then one of the oars split i
e last night of our honeymoon . The engine on the boat broke so the ship cruis

SERVER:
(Bad : Good   =      9.8 : 1.0)
    
ally a better than I expected . Our server , Claude , was ... Sunday Brunch Cr
, prime rib was the highlight . Our server was really good too . The ... What 
f difficulty communicating with our server ... Disappointed We booked a full d
had to eat almost immediately . The server kept telling the people at our tabl
estrooms where not very clean , the server we had English was not good and aft

SOUND:
(Bad : Good   =      9.8 : 1.0)

my son was extremely scared from the sound on the boat and movement of the ... 
h to see along the shoreline and the sound system was poor so we could n't real
 cruise . As someone mentioned , the sound system sometimes made it hard to hea
s some work , definitely needs a new sound man . it was overamplified for the .

BORED:
(Bad : Good   =      9.8 : 1.0)   

year old with us , and she would get bored of some place ) . The only thing I d
 more interested and the kids seemed bored . There ... Relaxing and Informative
d lunch . Had we known how tired and bored we would be at the end of 6 uneventf
 several times and I have never been bored . Pend Orielle is the largest lake i

LOUD:
(Bad : Good   =      9.8 : 1.0)

ery Fun After 3 days of BBQ , beer & loud bars , this was a welcome change of 
 d'Alene . My only complaint was the loud music . Why do people need music eve
tting was right above the very hot , loud , exhaust smelling engine . No refun
by mediocre band that played too too loud ! My interest was to get outside and
e cruise and city sights . Music too loud This was the highlight of our Yosemi