### Escape Rooms

There were 4 different companies in the escape room group and about 1,415 reviews. 

In [None]:
import nltk
import random
import csv
import re

working_dir = "C:\\Users\\Mary Makris\\Documents\\Applied Data Analytics\\Projects\\TOMIS\\"

In [None]:
#initialize list to store labeled reviews in
labeled_reviews = list()

#open file of all the reviews, split it on a comma
with open (working_dir + 'escape_room.csv', 'r', encoding="UTF-8") as xfile:                                        
    splitline = csv.reader(xfile, delimiter = ",")
    next(splitline)
    for line in splitline:
        ##join the title of the review with the content of the review so all the text can be analyzed as one
        joinline = line[0] + " " + line[3]
        #label reviews rating with a 40 or 50 as good and those with a 10, 20, 0r 30 as bad
        if line[2] == "50" or line[2] == "40":
            labeled_reviews.append((joinline, 'Good'))
        else:
            labeled_reviews.append((joinline, 'Bad'))

In [None]:
#shuffle all the reviews up in case they are in some type of order
random.shuffle(labeled_reviews)

In [None]:
def review_features(description):
    '''This function should take in a review and return 
    a dictionary value with the name of the feature as the key and 
    the value as the feature value'''
     
    split_review = description.split(" ")                            ### split the list of words
    lowcase_review = [thing.lower() for thing in split_review]     ### make all the descriptions lowercase
    clean_review = [re.sub(r'[^\w\s]','',word) for word in lowcase_review]  ## fix punctuation
    print(clean_review)                                                                   
    ret_val = {}
    for word in clean_review:                                      ###store words in a dictionary
        ret_val[word] = True
    return(ret_val)

#Test the function
review_features("Is this working correctly")

In [None]:
##run each review through function and store the results in a variable called feature sets
featuresets = [(review_features(review), rating) for (review, ratig) in labeled_reviews]  
#Split the reviews into a training set and a test set
train_set, test_set = featuresets[500:], featuresets[:500]  
#run the training set through the Naive Bayes classifier to train it
classifier = nltk.NaiveBayesClassifier.train(train_set)   

In [None]:
#look at how accurate the classifier was when analyzing the test set of reviews that were left out
print(nltk.classify.accuracy(classifier, test_set))

In [None]:
#look at the most informative features determined by the classifer
classifier.show_most_informative_features(50)

In [None]:
#initialize a list to store all the reviews
all_reviews = []

#split each review into individual words, and store the words in the all_reviews list
for review, label in labeled_reviews :
    words = review.split() 
    for word in words :
        all_reviews.append(word)

In [None]:
##tokenize each word in the list of reviews
all_reviews = nltk.word_tokenize(" ".join(all_reviews))

In [None]:
##According to the NLTK documentation nltk.text is wrapper around the tokens that allows you to then run functions like 
##concordance. So I run it on the now tokenized words in all the reviews and store with the same name as before
all_reviews = nltk.Text(all_reviews)

In [None]:
#look at the context surrounding each word to get more information about what the review entailed
all_reviews.concordance("informativefeatureword")

After looking at the top 50 features and the context surrounding them, I pulled the 5 to 10 that I felt would
be most useful and informative from a business perspective that TOMIS could use in their work with clients. 
Likely because of the low number of negative reviews, the classifier had low accuracy rates and the context 
shows words in reviews that are positive. 

### Top Informative Features & Probabilites

INSTRUCTIONS:
(instructions = True              Bad : Good   =     23.6 : 1.0)
    
tv was not working to provide us instructions . The rep just told us to start 
elpful in his explanation of the instructions Awesome experience I'be become s
m and I enjoyed this a lot . The instructions and guidance were a little vague
s fantastic , funny , clear with instructions and glad to help and answer any 
ob explaining and acting out the instructions then sent us off to escape . So 
ge amount of help with hints and instructions , ... Fun but way too complicate

LOCKS:
(locks = True              Bad : Good   =     23.6 : 1.0

way If looking for keys or combos to locks is your thing this is it ! We were p
 done . Many rooms are just a lot of locks and keys , but this room actually to
to come back A lot of fun One of the locks for the boxes could not be unlocked 
it . We had an issue with one of the locks that a staff member had to come in .

WAITING: 
( Bad : Good   =     23.6 : 1.0)

vague from the staff while we were waiting for our group ... Fun Escape Room T
let us play with some puzzles while waiting for our whole group to arrive . The
be a good idea . Also , we ended up waiting over thirty minutes for a couple ..

STORY:
(Bad : Good   =     10.8 : 1.0)

sloppy and poorly thought out . The story did not make sense , and many clues 
zzle we finished was okay . The back story did n't make any sense , and sometim

(SENSE, Bad : Good   =     16.9 : 1.0, word was in the same comments)

UNDER:
(Bad : Good   =     14.2 : 1.0)

s , two 30+ year olds , and two kids under 12 ) , and even though we did n't su
ving it . Too hard really for anyone under 18 to be of much help . My only knoc
 other escape rooms that were either under developed or way to unrealistic ... 
d and figured out how to communicate under the pressure . This is highly recomm
d who are the ones that get stressed under pressure ... but it was an amazing t
ed with a family that had 2 children under the age of about 5 . It completely h
tly telling the children ... No kids under a certain age please ! We came so cl
icult challenge that we completed in under an hour . I only have two minor sugg

ZOMBIES:
(Bad : Good   =     23.6 : 1.0)

It would be cool if there were fake zombies trying to get you once you go into 
 've been better if their were real zombies trying to get in ) . Izzy was a gre
Almost all my family was there . No zombies was a let down though . We still ha
bie '' there was nothing related to zombies . Dirty and quickly became boring N

MONEY:
(Bad : Good   =     23.6 : 1.0)
me . I do n't think it was worth the money . not what I expected I had no clue 
n and challenging . It was worth the money and a fun date night activity . Izzy
oyed the experience ! Well worth the money . So much fun ! Our family really en
he room was awesome . Well worth the money ! They stayed later so we could have
and cheesy . I felt like I wasted my money and they really got over . I 'll nev
to try another room . Well worth the money paid . Great Trip Quite the challeng
h is perfect bc you want to get your money 's worth . We went with 5 people whi
e . I get people ... Did n't get our money 's worth , but it seems most do We a