Now that our process has been validated in previous files, it's time to apply our previous code in a way that can be applicable for the whole website.

Again, for the sake of our VM, we will limit it to reviews specifically in the cafe_reviews DataFrame.

In [47]:
import pandas as pd
import scipy.sparse
from sklearn.feature_extraction.text import TfidfVectorizer
import nltk
from nltk.corpus import stopwords
from nltk.tokenize.casual import TweetTokenizer
from nltk.corpus import words
from nltk.stem import PorterStemmer
from sklearn.decomposition import NMF
from textblob import TextBlob



reviews = pd.read_parquet("cafe_reviews.parquet")
business = pd.read_parquet("business.parquet")

In [79]:
reviews.info()

<class 'pandas.core.frame.DataFrame'>
Index: 978 entries, 26696 to 6986107
Data columns (total 10 columns):
 #   Column         Non-Null Count  Dtype         
---  ------         --------------  -----         
 0   review_id      978 non-null    object        
 1   business_id    978 non-null    object        
 2   user_id        978 non-null    object        
 3   stars          978 non-null    int64         
 4   text           978 non-null    object        
 5   date           978 non-null    datetime64[ns]
 6   name           978 non-null    object        
 7   review_count   978 non-null    float64       
 8   yelping_since  978 non-null    object        
 9   cafe_review    978 non-null    bool          
dtypes: bool(1), datetime64[ns](1), float64(1), int64(1), object(6)
memory usage: 77.4+ KB


In [90]:
def topic_model(tfidf_df):
        num_topics = 4
        nmf_model = NMF(n_components=num_topics, random_state=42, max_iter=500)
        W = nmf_model.fit_transform(tfidf_df)
        H = nmf_model.components_
        return W, H, nmf_model

In [91]:
def sent_analysis(relevant_reviews):
    relevant_reviews["sentiment"] = relevant_reviews["clean_text"].apply(lambda review: TextBlob(review).sentiment.polarity)
    relevant_reviews['Sentiment_Category'] = relevant_reviews["sentiment"].apply(
    lambda polarity: 'Positive' if polarity > 0.1 else ('Negative' if polarity < 0.05 else 'Neutral'))
    return relevant_reviews

In [92]:
def display_nmf_topics(model, feature_names, num_words=10):
    for topic_idx, topic in enumerate(model.components_):
        top_words = [feature_names[i] for i in topic.argsort()[:-num_words - 1:-1]]
        print(f"Topic {topic_idx + 1}: {', '.join(top_words)}")

In [96]:
english_words = set(words.words())
stop_words = set(stopwords.words('english'))

def get_relevant_review(business_id, preference):
    business_reviews = reviews[reviews["business_id"] == business_id]
    if business_reviews.empty:
        return "No reviews found for this business."
    
    else:
        # tokenizing preference statement and business reviews.
        business_reviews = business_reviews.copy()
        business_reviews["lower_text"] = business_reviews.loc[:,"text"].str.lower()
        review_prep = business_reviews[["lower_text"]]
        preference = preference.lower()
        casual_tknzr = TweetTokenizer(reduce_len=True)
        review_tokenized = [casual_tknzr.tokenize(t) for t in business_reviews['text']] 
        pref_tokens = [casual_tknzr.tokenize(preference)]
        token_txt_filtered = [[word for word in sentence if word.lower() not in stop_words and word.lower() 
                               in english_words] for sentence in review_tokenized]
        
        # vectorization 
        review_prep = review_prep.copy()
        review_prep["clean_text"] = [" ".join(sentence) for sentence in token_txt_filtered]
        pref_prep = [" ".join(word) for word in pref_tokens]
        tfidf_model = TfidfVectorizer()
        df_tfidf_transformed = tfidf_model.fit_transform(review_prep["clean_text"])
        pref_trans = tfidf_model.transform(pref_prep)
        
        # topic modeling + matching preference to appropriate topic
        W, H, nmf_model = topic_model(df_tfidf_transformed)
        print("Top words per topic in NMF:")    # inputting to make sure topics make sense
        display_nmf_topics(nmf_model, tfidf_model.get_feature_names_out())
        review_prep["dominant_topic"] = W.argmax(axis=1)
        pref_topics = nmf_model.transform(pref_trans)
        most_relevant_topic = pref_topics.argmax()
        relevant_reviews = review_prep[review_prep["dominant_topic"]==most_relevant_topic]
                     
        # sentiment analysis
        relevant_reviews_sent = sent_analysis(relevant_reviews.copy())
        sentiment_ratio = relevant_reviews_sent["Sentiment_Category"].value_counts()
        dominant_sentiment = sentiment_ratio.idxmax().lower()
        print(f"Reviews most related to your profile preference ({most_relevant_topic + 1}) for this business are mostly {dominant_sentiment}.")
                     
        # presenting the most relevant review
        needed_text = business_reviews[business_reviews["lower_text"].isin(relevant_reviews_sent["lower_text"])]
        if dominant_sentiment == 'positive':
            positive_reviews = needed_text[needed_text["stars"] > 2]
            given_reviews = positive_reviews.sort_values("review_count", ascending=False).iloc[0:10]
            most_recent_review = given_reviews[given_reviews["date"]==given_reviews["date"].max()]
            the_review = business_reviews[business_reviews["lower_text"].isin(most_recent_review["lower_text"])]["text"]
        
        elif dominant_sentiment == 'negative':
            negative_reviews = needed_text[needed_text["stars"] < 3]
            given_reviews = negative_reviews.sort_values("review_count", ascending=False).iloc[0:10]
            most_recent_review = given_reviews[given_reviews["date"]==given_reviews["date"].max()]
            the_review = business_reviews[business_reviews["lower_text"].isin(most_recent_review["lower_text"])]["text"]
        
        else:
            neutral_reviews = needed_text[(needed_text["stars"] > 1) & (needed_text["stars"] < 4)]
            given_reviews = neutral_reviews.sort_values("review_count", ascending=False).iloc[0:10]
            most_recent_review = needed_text[needed_text["date"]==needed_text["date"].max()]
            the_review = business_reviews[business_reviews["lower_text"].isin(most_recent_review["lower_text"])]["text"]
        return print('\nHere is the most relevant review for you: \n \n' + the_review.iloc[0])

In [100]:
get_relevant_review("5qVG7T82jRnKGR9O7_0cFw", "I prefer a good location.")

Top words per topic in NMF:
Topic 1: coffee, good, day, location, pretty, staff, though, busy, even, best
Topic 2: drink, order, time, get, said, milk, always, go, ever, went
Topic 3: great, service, super, people, slow, friendly, everyone, fantastic, sometimes, rude
Topic 4: canal, place, one, street, st, like, seating, clean, new, location
Reviews most related to your profile preference (1) for this business are mostly positive.

Here is the most relevant review for you: 
 
We stopped in two times to grab tea/water and use the bathroom. The staff is very nice and the store is clean and well stocked. The women's restroom was down and had to use the men's which needed a bit of cleaning. Good location for a pop up wedding parade as well as we saw one each day.


What if you care about the quality of drinks?

In [101]:
get_relevant_review("5qVG7T82jRnKGR9O7_0cFw", "I want tasty drinks.")

Top words per topic in NMF:
Topic 1: coffee, good, day, location, pretty, staff, though, busy, even, best
Topic 2: drink, order, time, get, said, milk, always, go, ever, went
Topic 3: great, service, super, people, slow, friendly, everyone, fantastic, sometimes, rude
Topic 4: canal, place, one, street, st, like, seating, clean, new, location
Reviews most related to your profile preference (2) for this business are mostly negative.

Here is the most relevant review for you: 
 
The WORSE SERVICE EVER if you drink Starbucks then you know the mixes and you know when the baristas do not have the mix right it so we're coming from LA long travels right soooo happy to see SB go in order a drink and it taste SOUR so I ask if we can just change the drink because it's not a mix the person doesn't know how to fix it and instead they try to argue me down so they turn around and make the first one over I give it to the other person in my party it's is still not right I told them thank but no I don't

It seems the different topic paints a very different picture of the Starbucks we are currently analyzing.

To see if our code is reproducible, we apply it to different industries.

In [102]:
reviews = pd.read_parquet("reviews2.parquet")
users = pd.read_parquet("users2.parquet")
reviews = reviews.merge(users, on='user_id', how='left', suffixes = ['_review', '_user'])

In [103]:
tattoo_b = business[business["categories"].str.contains("Tattoo", case = False, na = False)]

In [104]:
tattoo_reviews = reviews[reviews["business_id"].isin(tattoo_b["business_id"])]

In [105]:
tattoo_reviews["business_id"].value_counts()

business_id
i7_3lQzM5SP8Cc2YT-X3BA    420
AsPwZ_T0_eJrKbr-hAqqMw    391
JJt2xm-_1pIbtB6H8-f0Dw    362
QPh2o_h5nVN2_xY7CEM0ug    347
y_4wb-Kds_cvdTlYeWYMdA    261
                         ... 
7iJDafOK6yate0J4DV6AvQ      5
iW3CDR5W6loaYH7yOIM0Zw      5
XCGJsUwRmL8U-OoUABf3qg      5
qU1zyJadUoblMK84KuN1gQ      5
FCx-ClnmgHlxqHGCONMEZw      5
Name: count, Length: 672, dtype: int64

In [110]:
get_relevant_review("i7_3lQzM5SP8Cc2YT-X3BA","I want a calm experience")

Top words per topic in NMF:
Topic 1: piercing, got, pierced, made, went, super, experience, get, process, done
Topic 2: place, professional, clean, friendly, go, recommend, knowledgeable, infinite, staff, piercing
Topic 3: infinite, piercing, get, jewelry, people, one, back, like, go, time
Topic 4: great, jewelry, selection, shop, definitely, helpful, also, staff, friendly, experience
Reviews most related to your profile preference (1) for this business are mostly positive.

Here is the most relevant review for you: 
 
I got several piercings done by guest piercer Jerry and really enjoyed the experience! I set up my appointment quickly and easily using their online system. I called prior to my appointment with a few questions which were answered promptly by the staff. Upon arriving I checked in and had my temperature taken and then was able to ask a few more specific questions to my piercer. We discussed my eligibility based on my anatomy and selected the jewelry before getting started

In [111]:
get_relevant_review("i7_3lQzM5SP8Cc2YT-X3BA","I would like a knowledgeable staff member")

Top words per topic in NMF:
Topic 1: piercing, got, pierced, made, went, super, experience, get, process, done
Topic 2: place, professional, clean, friendly, go, recommend, knowledgeable, infinite, staff, piercing
Topic 3: infinite, piercing, get, jewelry, people, one, back, like, go, time
Topic 4: great, jewelry, selection, shop, definitely, helpful, also, staff, friendly, experience
Reviews most related to your profile preference (2) for this business are mostly positive.

Here is the most relevant review for you: 
 
This place was great. Front desk staff amiable and informed.  My piercer John was extremely well-versed and versatile, and was able to pierce my ear in the exact, unusual area I wanted him to - he told me to tell him exactly where I wanted to get pierced and said we'd work from there, and he made it work.  The place is clean, though in the midst of a renovation (the place does need a little extra loving)  Pricing was reasonable, and they had a good selection of jewelry. 

Lastly, let's try hotels.

In [112]:
hotel_b = business[business["categories"].str.contains("Hotel", case = False, na = False)]
hotel_reviews = reviews[reviews["business_id"].isin(hotel_b["business_id"])]
hotel_reviews["business_id"].value_counts()

business_id
wz8ZPfySQczcPgSyd33-HQ    3634
PY9GRfzr4nTZeINf346QOw    2715
-QI8Qi8XWH3D8y8ethnajA    2235
c_4c5rJECZSfNgFj7frwHQ    1896
Eb1XmmLWyt_way5NNZ7-Pw    1850
                          ... 
UCDghhOm9vMk3HS6ObV85w       5
RjHTlFGxb_GHooFFBZ-jYQ       5
8N4moq96N16VxvlwSq4i5Q       5
mMnsoqPeg7b0lBtE37r0aw       5
XPkkWyAkkPeH-PqXmBMmpg       5
Name: count, Length: 5858, dtype: int64

In [129]:
get_relevant_review("wz8ZPfySQczcPgSyd33-HQ","I want a good bed")

Top words per topic in NMF:
Topic 1: us, get, told, said, front, desk, would, back, service, check
Topic 2: casino, nice, pool, bowling, hotel, free, good, like, really, resort
Topic 3: room, bed, floor, hotel, king, clean, smoking, night, stay, shower
Topic 4: great, love, place, always, staff, clean, friendly, stay, food, service
Reviews most related to your profile preference (3) for this business are mostly positive.

Here is the most relevant review for you: 
 
Recently stayed here again as my family were visiting and had a very early flight out the next day. I am not much of a player here but even then, it was easy to score a free room a few days away. The bell boy who brought up a mountain of luggage was super helpful and scored extra water bottles and blankets for us (by the way, they do not give out extra comforters, just blankets). 

We got a Summit room on the 21st floor (I think), so it was fun to watch the mountains change colors at sunset. There was a $25 resort fee which

In [128]:
get_relevant_review("wz8ZPfySQczcPgSyd33-HQ","I like helpful casino staff")

Top words per topic in NMF:
Topic 1: us, get, told, said, front, desk, would, back, service, check
Topic 2: casino, nice, pool, bowling, hotel, free, good, like, really, resort
Topic 3: room, bed, floor, hotel, king, clean, smoking, night, stay, shower
Topic 4: great, love, place, always, staff, clean, friendly, stay, food, service
Reviews most related to your profile preference (4) for this business are mostly positive.

Here is the most relevant review for you: 
 
Top notch staff. Starbuck full on .. Great big rooms, great pool, great fun stuff, casino, salon, bowling you name it you got here and a bus stop We like to take the bus around town. Great sight seeing
