# LDA Model Results
<p>Here we will review teh results of the various ldam models that were generated. The goal is to idenity a model that has enough topics of interest while still providing significant deliniation between the topics.  Models are numbered Model 1 through Model 5, and descriptionsof each are included below.</p>

# Results
<p><b>Model 5</b> provides the best results and includes all reviews and will thus be used to identify subtopics in review texts.</p>

## Step 0: Import packages

In [1]:
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:95% !important; }</style>"))

In [2]:
from gensim.models.ldamulticore import LdaMulticore
import itertools
from collections import Counter

import pandas as pd
import numpy as np

___

### Pretty Printer Function

In [3]:
def print_topic_terms(model, num_topics=-1, num_words=10, unique=False, topics_of_interest=[]):
    results = model.print_topics(num_topics=num_topics, num_words=num_words)
    if not unique:
        print('=============================== Terms Per Topic ===============================')
        for r in results:
            topic = r[0]
            term_list = r[1]

            term_list = term_list.split('"')[1::2]
            topic_terms = [term for term in term_list]
            
            if len(topics_of_interest) > 0:
                if topic in list(topics_of_interest.values()):
                    
                    print('{}\t{}'.format(topic, topic_terms))
            else:
                print('{:>2}\t{}'.format(topic, topic_terms))
    else:
        terms = [x[1] for x in results]
        term_lists = [x.split('"')[1::2] for x in terms]

        flatList = itertools.chain.from_iterable(term_lists)
        term_counts = Counter(flatList)

        # non_unique_terms = term_counts
        test = dict(term_counts)

        # extract terms that appear more than once
        non_unique_terms = [key for key, value in test.items() if value > 1]
        
        
        print('============================ Unique Terms Per Topic ===========================')
        for r in results:
            topic = r[0]
            term_list = r[1]

            term_list = term_list.split('"')[1::2]
            topic_terms = [term for term in term_list if term not in non_unique_terms]
            if len(topics_of_interest) > 0:
                if topic in list(topics_of_interest.values()):
                    
                    print('{}\t{}'.format(topic, topic_terms))
            else:
                print('{:>2}\t{}'.format(topic, topic_terms))
            

___

## Step 1: Review Model 5 - All Reviews, All Tokens
<p>Looking at all Reviews, and limiting tokens to only nouns and verb tokens more common than the 10,000th most common noun or verb token.</p>

<ul>
    <li>Num Topics: 50</li>
    <li>Num Terms: 10</li>
    <li>Num Passes: 50</li>
    <li>Key Topics Identified: 1:Loyalty, 7:Wait Time, 8:Atmosphere, 9:Ordering, 13:Cleanliness, 15:Food Quality, 17:Customer Service, 26:Lunch Parking, 35:Price Value</li>
</ul>

In [4]:
model_05 = LdaMulticore.load('../models/ldam_all_restaurants_50_topics_10_terms_50_passes.model')

In [5]:
print_topic_terms(model_05, num_topics=-1, num_words=10, unique=False)

 0	['dessert', 'wine', 'ice', 'cream', 'cake', 'entree', 'meal', 'chocolate', 'course', 'appetizer']
 1	['time', 'back', 'first', 'try', 'place', 'went', 'definitely', 'go', 'next', 'great']
 2	['u', 'table', 'came', 'server', 'food', 'drink', 'waitress', 'asked', 'ordered', 'minute']
 3	['dish', 'flavor', 'sauce', 'like', 'taste', 'menu', 'one', 'would', 'bit', 'meat']
 4	['pizza', 'crust', 'slice', 'topping', 'cheese', 'pie', 'thin', 'good', 'sauce', 'pepperoni']
 5	['crab', 'leg', 'shell', 'pound', 'coworker', 'saving', 'panini', 'hub', 'e', 'angry']
 6	['chicken', 'rice', 'chinese', 'fried', 'food', 'beef', 'soup', 'egg', 'orange', 'sour']
 7	['wait', 'minute', 'time', 'food', 'get', 'order', 'long', 'line', 'hour', 'waiting']
 8	['great', 'nice', 'patio', 'atmosphere', 'outside', 'place', 'cool', 'fun', 'inside', 'food']
 9	['order', 'ordered', 'called', 'delivery', 'extra', 'got', 'time', 'get', 'card', 'call']
10	['tempe', 'school', 'opening', 'mill', 'w', 'college', 'b', 'asu',

## Step 2: Assign labels to interesting topics
<p>The goal here is to inspect qualities and attribures about the restaurant, not what is on the menu.  Many topics identified contain highly specific menu categories.  This information is useful to set asidde from other sub topics.</p>
<p>In another pass, these topics could be used to double check the assigned cuisine categories to each restaurant.</p>

In [6]:
topics_of_interest = {'retention_1': 1,
                      'food_quality_3': 3,
                      'wait_time_7': 7,
                      'atmosphere_8': 8,
                      'ordering_9': 9,
                      'cleanliness_13' : 13,
                      'menu_options_19' : 19,
                      'food_quality_20': 20,
                      'food_quality_21': 21,
                      'customer_service_27' : 27,
                      'customer_Service_44': 44,
                      'value_35': 35}
topic_ids = list(topics_of_interest.values())

## Step 3: Inspect Topics of Interest

In [7]:
print_topic_terms(model_05, num_topics=-1, num_words=10, unique=False, topics_of_interest=topics_of_interest)

1	['time', 'back', 'first', 'try', 'place', 'went', 'definitely', 'go', 'next', 'great']
3	['dish', 'flavor', 'sauce', 'like', 'taste', 'menu', 'one', 'would', 'bit', 'meat']
7	['wait', 'minute', 'time', 'food', 'get', 'order', 'long', 'line', 'hour', 'waiting']
8	['great', 'nice', 'patio', 'atmosphere', 'outside', 'place', 'cool', 'fun', 'inside', 'food']
9	['order', 'ordered', 'called', 'delivery', 'extra', 'got', 'time', 'get', 'card', 'call']
13	['table', 'dirty', 'clean', 'bathroom', 'floor', 'plate', 'hand', 'cup', 'paper', 'chair']
19	['option', 'menu', 'free', 'gyro', 'meat', 'vegetarian', 'veggie', 'choose', 'vegan', 'choice']
20	['food', 'like', 'place', 'ordered', 'tasted', 'bad', 'even', 'back', 'cold', 'taste']
21	['good', 'food', 'place', 'price', 'pretty', 'service', 'better', 'like', 'would', 'really']
27	['great', 'food', 'service', 'place', 'friendly', 'good', 'recommend', 'staff', 'delicious', 'price']
35	['good', 'really', 'got', 'ordered', 'little', 'nice', 'pretty

## Step 4: Assigning Topic to Reviews

### Step 4a: Load Review Data and Restaurant Business Data

In [8]:
reviews = pd.read_csv('../clean_data/az_restaurant_reviews.csv', index_col=0, parse_dates=['date'])

biz = pd.read_csv('../clean_data/az_restaurant_business_clean.csv', index_col=0)
biz = biz.iloc[:,:9].copy()

  interactivity=interactivity, compiler=compiler, result=result)


### Step 4b: Merge Restaurant Name to Reviews

In [9]:
review_df = reviews.merge(biz[['name', 'business_id']], on='business_id', how='left')

In [10]:
review_df.head(3)

Unnamed: 0,business_id,cool,date,funny,review_id,stars,text,useful,user_id,is_fast_food,review_len,name
0,JlNeaOymdVbE6_bubqjohg,0,2014-08-09,0.0,BF0ANB54sc_f-3_howQBCg,1.0,we always go to the chevo's in chandler which ...,3.0,ssuXFjkH4neiBgwv-oN4IA,0.0,422.0,Papa Chevo's Taco Shop
1,0Rni7ocMC_Lg2UH0lDeKMQ,0,2014-08-09,0.0,DbLUpPT61ykLTakknCF9CQ,1.0,this place is always so dirty and grimy been t...,6.0,ssuXFjkH4neiBgwv-oN4IA,0.0,111.0,Barro's Pizza
2,S-oLPRdhlyL5HAknBKTUcQ,0,2017-11-30,0.0,z_mVLygzPn8uHp63SSCErw,4.0,holy portion sizes! you get a lot of bang for ...,0.0,MzEnYCyZlRYQRISNMXTWIg,0.0,130.0,Harumi Sushi


## Step 5: Examples - Extract Reviews From Most Reviewed Fast Food and Non Fast Food Rstaurants

In [11]:
nff_review_counter = Counter(review_df[review_df.is_fast_food == 0].name.values).most_common(5)
nff_most_reviewed = nff_review_counter[0][0]
print(nff_review_counter)

[('Pita Jungle', 4266), ("Oregano's Pizza Bistro", 3926), ("Lo-Lo's Chicken & Waffles", 2730), ('Pizzeria Bianco', 2499), ("Matt's Big Breakfast", 2416)]


In [12]:
pita_jungle_reviews = review_df[review_df.name == nff_most_reviewed].copy()
pita_jungle_reviews.head(3)

Unnamed: 0,business_id,cool,date,funny,review_id,stars,text,useful,user_id,is_fast_food,review_len,name
32,XvTBQotmJpVPjPNaMM7qLA,5,2007-09-26,6.0,P5T_vC327RVB3m8V405Oeg,3.0,i enjoy the juicy succulent taste of murder bu...,5.0,dyhTHLIf6eWBvU78Y3T06A,0.0,3788.0,Pita Jungle
72,ept9mIlqGIsemr6w0RczaA,1,2011-04-29,0.0,ojC-NdzaX_sOxn6znQIEAA,3.0,first rule of brunch club is that if you're go...,1.0,5V8eXkTJb6IejJkMDaj_Bw,0.0,722.0,Pita Jungle
182,XvTBQotmJpVPjPNaMM7qLA,1,2007-02-27,0.0,07z7AKpnIsjxquIcfmfRNA,3.0,i will only add to the many reviews saying tha...,1.0,qYxGJKlYrqNgodzMWHaaGw,0.0,466.0,Pita Jungle


In [13]:
ff_review_counter = Counter(review_df[review_df.is_fast_food == 1].name.values).most_common(5)
ff_most_reviewed = ff_review_counter[0][0]
print(ff_review_counter)

[('Chipotle Mexican Grill', 2920), ("McDonald's", 2833), ('In-N-Out Burger', 2151), ('Pei Wei', 1913), ('Subway', 1810)]


In [14]:
chipotle_reviews = review_df[review_df.name == ff_most_reviewed].copy()
chipotle_reviews.head(3)

Unnamed: 0,business_id,cool,date,funny,review_id,stars,text,useful,user_id,is_fast_food,review_len,name
140,uovqgCcWIqTwUeH_A54t2A,0,2014-05-15,0.0,VSVXCLsGO_MduBX4nfE6dw,2.0,this location is not as stellar as others. the...,1.0,Jt4u7qnfrk35buainfOuGA,1.0,213.0,Chipotle Mexican Grill
177,uovqgCcWIqTwUeH_A54t2A,1,2008-05-10,1.0,-niC6oq1n697C-dIDWaOnA,4.0,i have always been a fan of chipotle's food bu...,2.0,qYxGJKlYrqNgodzMWHaaGw,1.0,1177.0,Chipotle Mexican Grill
213,wAXYLmHuysYTz8i4VPKmaQ,0,2011-02-02,0.0,5MAbf8n0niuIRU1P4rdhZw,4.0,everyone already knows how good chipotle is bu...,0.0,4I_woZLXCO9jaVZvDi18CA,1.0,266.0,Chipotle Mexican Grill


## Step 6: Print Most Frequent Subtopics identified in Given Review

In [15]:
def print_top_n_review_topics(model, review, n_topics=5, valid_topics = []):
    review_topic_categories = []
    for word in review.split(' '):
        try:
            r = model.get_term_topics(word_id = word)
            [review_topic_categories.append(x[0]) for x in r]
        except:
            pass
    
    # count occurances of each identified topic
    topic_counter = Counter(review_topic_categories) 
    top_n_topics = [x[0] for x in topic_counter.most_common(n_topics)]
    
    if len(valid_topics) > 0:
        # prune to only topics we care about
        topics = [topic for topic in top_n_topics if topic in valid_topics]
    else:
        topics = top_n_topics

    print('Review Text:\n\t{}'.format(review.replace('\n', ' ')))
    print('Topics Identified:')

    for n in topics:
        topic_label = list(topics_of_interest.keys())[list(topics_of_interest.values()).index(n)]
        print('\t{}'.format(topic_label))

### Step 6a: Non Fast Food Sample Bad Reviews

In [16]:
nff_bad_reviews = pita_jungle_reviews[pita_jungle_reviews.stars < 3].iloc[1:5,6]
nff_good_reviews = pita_jungle_reviews[pita_jungle_reviews.stars > 3].iloc[1:5,6]

In [17]:
for rev in nff_bad_reviews:
    print_top_n_review_topics(model_05, rev, n_topics=5, valid_topics = topic_ids)
    print('='*80)

Review Text:
	very disappointed in the staff here due to the gossip aloud mocking others included  a manager with blonde hair arguing with a worker infront of customers. the hummus with chicken came over cooked with hard to chew pieces.  he had sharma sandwhich.  well to some it up we won't be coming back to this pita j. shea location blows this one away for food and service!.
Topics Identified:
	customer_Service_44
	customer_service_27
Review Text:
	i am so disappointed. i never have any problems with this location as far as service and the quality of my food but today my experience was less than stellar. i usually order the lentil fatoosh salad but today i decided to order the wood fired salmon and boy was that a mistake. first of all my food took forever to come i waited 40 minutes. once my food arrived my salmon was not well done as requested and the portion size was super small considering the price of $15.99. it was literally maybe 2 big tablespoons of mashed potatoes and a sprin

In [18]:
for rev in nff_good_reviews:
    print_top_n_review_topics(model_05, rev, n_topics=5, valid_topics = topic_ids)
    print('='*80)

Review Text:
	mmmmmm vegetarian heaven!  so many choices i never know what to do!  this visit i tried the cilantro & jalapeno hummus which was delicious and the greek salad.  always yummy!  service is always a little slow but i like to call it more laid back....just enjoy yourself!
Topics Identified:
	retention_1
	menu_options_19
	food_quality_20
Review Text:
	love it love it love it! i have been eating at various pita jungles around the valley since i moved to az over 6 years ago and i have yet to have anything that i didn't love. my favorites include the black bean burger lentil fetoosh salad all three hummus types dolmas and gazpacho but everything is delicious so you really can't go wrong! the tempe location also has breakfast which is amazing!   each location is fairly small so luch and dinner may require a short wait but its worth it. the prices are fair and the portions are huge! don't be afraid to split a dish with a loved one.
Topics Identified:
	atmosphere_8
	retention_1
	val

### Step 6b: Fast Food Sample Reviews

In [19]:
ff_bad_reviews = chipotle_reviews[chipotle_reviews.stars < 3].iloc[1:5,6]
ff_good_reviews = chipotle_reviews[chipotle_reviews.stars > 3].iloc[1:5,6]

In [20]:
for rev in ff_bad_reviews:
    print_top_n_review_topics(model_05, rev, n_topics=5, valid_topics = topic_ids)
    print('='*80)

Review Text:
	i'm not sure how a mexican'ish place runs out of white rice when it's a main staple but they did. this location is hit or miss. if i were you i'd miss it and head down the road.
Topics Identified:
	retention_1
	atmosphere_8
	food_quality_20
Review Text:
	why wait 20 minutes to repeat your order 3 times and then eat a lukewarm bowl of cat food?  not a fan but one extra star for the hot sauce because it's actually hot.  skip the steak...
Topics Identified:
	wait_time_7
	ordering_9
	food_quality_3
Review Text:
	*sniff* i can't believe i was made to cry (on the inside) no less on new year's eve. my hubby and i came here thinking we would split a burrito since i know how big the burritos can be at chipotle.    we waited in line. i watched the two guys in front of me each get 2 tortillas each for their burrito and the girl behind the line really piled on the rice beans and their choice of meat. i was thinking wow these burritos are even bigger than what i remembered.   then whe

In [21]:
for rev in ff_good_reviews:
    print_top_n_review_topics(model_05, rev, n_topics=5, valid_topics = topic_ids)
    print('='*80)

Review Text:
	everyone already knows how good chipotle is but this location has always given us good service they are pretty speedy. they screwed up on on line order we placed once and they fixed it right away and gave a free drink and extra chips really cool we weren't even mad.
Topics Identified:
	food_quality_21
	value_35
	customer_service_27
Review Text:
	this chipotle has the best customer service.  food is always super fresh and tastes delicious. best location!!
Topics Identified:
	customer_service_27
	customer_Service_44
Review Text:
	i like everything about chipotle. i prefer a bowl with easy brown rice heavy fajita veggies strained brown beans chicken two scoops of strained pico de gallo and cheese. i can usually only eat half. i chill the rest and eat it later. i can't decide it i like it better hot or cold. i guess i'll have to come back and eat more to figure it out.
Topics Identified:
	food_quality_20
	food_quality_21
	value_35
	menu_options_19
Review Text:
	stopped by for