## Recommendation Model

The goal of this notebook is to take in the text review of a coffee, a flavor description really, and find the most similar coffee to it as a recommendation. Each coffee review has been assigned a nine dimensional flavor score based on NMF topic modeling and these vectors are compared using a cosine distance to find the two most similar coffees.

In [1]:
import pandas as pd
pd.set_option('display.max_columns', None)
import numpy as np
import re
import requests
import pickle
import seaborn as sns
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from nltk.tokenize import word_tokenize
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.decomposition import NMF
from sklearn.metrics import pairwise_distances

In [2]:
with open('coffee_words.pickle','rb') as read_file:
    coffee = pickle.load(read_file)
with open('coffee_ratings.pickle','rb') as read_file:
    ratings = pickle.load(read_file)
with open('combined.pickle','rb') as read_file:
    combined = pickle.load(read_file)
with open('df_full.pickle','rb') as read_file:
    df = pickle.load(read_file)
with open('df_topic_breakdown.pickle','rb') as read_file:
    df_topic_breakdown = pickle.load(read_file)
with open('sentiment.pickle','rb') as read_file:
    sentiment = pickle.load(read_file)

with open('blindtfidf_vec.pickle', 'rb') as read_file:
    blindtfidf = pickle.load(read_file)
with open('blindtfidf_mat.pickle', 'rb') as read_file:
    tfidf_blind = pickle.load(read_file)
ratings = ratings.reset_index().rename(columns={'index':'Roaster'})


In [3]:
from nltk.corpus import stopwords
sw = stopwords.words("english")
sw = sw + ['coffee','coffees','cup','john', 'diruocco','jen','apodaca','ken','kevin','keurig','espresso','serve','capsule','device','serving','flavor','notes','mouthfeel','aroma','finish','brewed','brewing','parts','one','two','three','evaluate','evaluated','hint']

In [4]:
with open('nmf_tfidfblind.pickle', 'rb') as read_file:
    nmf_tfidfblind = pickle.load(read_file)

with open('blindtfidf_topic.pickle', 'rb') as read_file:
    blindtfidf_topic = pickle.load(read_file)

with open('blindtopic_tfidf.pickle', 'rb') as read_file:
    blindtopic_tfidf = pickle.load(read_file)

## NMF topics
As a reminder from previous work, here are the most common words in coffee reviews from the topics based on the NMF modeling.

In [5]:
doc_word = tfidf_blind

nmf_model = nmf_tfidfblind
doc_topic = blindtfidf_topic
topic_word = nmf_model.components_

words = blindtfidf.get_feature_names()
t = nmf_model.components_.argsort(axis=1)[:,-1:-7:-1]

topic_words = [[words[e] for e in l] for l in t]
topic_words

[['black', 'currant', 'cherry', 'savory', 'red', 'pungent'],
 ['chocolate', 'dark', 'cedar', 'milk', 'small', 'chocolaty'],
 ['structure', 'tart', 'sweet', 'zest', 'richly', 'savory'],
 ['cocoa', 'toned', 'powder', 'nib', 'cedar', 'structure'],
 ['fresh', 'cut', 'fir', 'lightly', 'syrupy', 'drying'],
 ['cacao', 'nib', 'roasted', 'drying', 'lively', 'juicy'],
 ['flowers', 'honey', 'silky', 'acidity', 'like', 'bright'],
 ['wood', 'body', 'nut', 'aromatic', 'sweetness', 'rather'],
 ['fruit', 'toned', 'cherry', 'chocolate', 'sweet', 'rich']]

## Example coffee

Below is an example of a coffee that was reviewed as is part of the original corpus. Then, that review was converted into a nine dimensional flavor vector using it's NMF topic values.

In [6]:
coffee.iloc[10].Review

'Deeply pungent, sweetly savory. Dark chocolate, narcissus, black cherry, cardamom, cashew in aroma and cup. Sweet-savory structure with roundly tart acidity; full, creamy mouthfeel. The floral-toned finish leads with notes of narcissus, balanced by dark chocolate and cashew underneath. '

In [7]:
doc_topic[10]

array([0.06078979, 0.06336553, 0.08229621, 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.00760888])

Pairwise distances are computed between the above vector and the vectors of all other coffees in the corpus. Their cosine distances are then sorted and I am interested in finding those which are most similar (recs). By taking the second recommendation (as the first would be the original coffee), we can see our most similarly described coffee.

In [23]:
indices = pairwise_distances(doc_topic[10].reshape(1,-1),doc_topic,metric='cosine').argsort()
recs = list(indices[0][0:4])
df_topic_breakdown.iloc[recs]

coffee.iloc[recs[1]].Review

'Sweetly savory, layered. Rhododendron-like flowers, cinnamon, baker’s chocolate, cedar, red currant in aroma and cup. Round, savory-leaning structure; crisp, velvety mouthfeel. The quiet finish centers on notes of vanilla-like florals and baker’s chocolate, with a hint of cinnamon. '

## Formatting

Same work as above, but working on cleaning up the process before moving it to the streamlit app.

In [20]:
# t = ['Delicate, lyrically sweet, gently tart. Tea rose, pink grapefruit zest, cocoa nib, fresh-cut oak, wild honey in aroma and cup. Sweet structure with gently bright acidity; plush, satiny mouthfeel. The finish consolidates to richly sweet notes of tea rose and honey with cocoa nib undertones.']
# item = ['Crisp, balanced, richly nut-toned. Nutella, red apple, freesia, agave syrup, cedar in aroma and cup. Very sweet in structure with brisk acidity; plush, syrupy-smooth mouthfeel. Sweetly nut-toned finish supported by freesia-like floral tones.']
item = ['Crisply sweet, nut-toned, richly floral. Almond butter, lilac, lemon verbena, red apple, oak in aroma and cup. Sweet structure with soft, round acidity; very full, syrupy-smooth mouthfeel. Lemon verbena supports the resonantly nut-toned finish.']
w= []
w.append(coffee.Review[0])
vt = blindtfidf.transform(item).todense()
tt1 = nmf_model.transform(vt)
tt1

array([[0.00036605, 0.        , 0.09780715, 0.02406052, 0.00661351,
        0.        , 0.02649725, 0.03088748, 0.00711122]])

In [19]:
indices = pairwise_distances(tt1.reshape(1,-1),doc_topic,metric='cosine').argsort()
recs = list(indices[0][0:4])
df_topic_breakdown.iloc[recs]
print('The coffee you liked was described as:',str(item))
print('\n')
print('Based on your input coffee, I recommend you try the',ratings.iloc[recs[0]]['Roast Level'],'roasted',ratings.iloc[recs[0]]['Coffee Origin'],'by',ratings.iloc[recs[0]]['Roaster'],'.','\n','It could be desribed as:',coffee.iloc[recs[0]].Review)

The coffee you liked was described as: ['Crisp, balanced, richly nut-toned. Nutella, red apple, freesia, agave syrup, cedar in aroma and cup. Very sweet in structure with brisk acidity; plush, syrupy-smooth mouthfeel. Sweetly nut-toned finish supported by freesia-like floral tones.']


Based on your input coffee, I recommend you try the Medium-Light roasted Santa Barbara, Honduras by Small Eyes Cafe . 
 It could be desribed as: Crisply sweet, nut-toned, richly floral. Almond butter, lilac, lemon verbena, red apple, oak in aroma and cup. Sweet structure with soft, round acidity; very full, syrupy-smooth mouthfeel. Lemon verbena supports the resonantly nut-toned finish.


In [88]:
t = [coffee.iloc[recs[0]].Review]
vt = blindtfidf.transform(t).todense()
tt2 = nmf_model.transform(vt)
tt2

array([[0.02186967, 0.00078154, 0.04314666, 0.01241035, 0.00014772,
        0.00245299, 0.        , 0.        , 0.03365916]])

In [89]:
pairwise_distances(tt1.reshape(1,-1),tt2.reshape(1,-1),metric='cosine')

array([[0.00301434]])