# Building a Model Using Spruce Eats Data
I used the scraped and cleaned Spruce Eats data to build a recommender engine in this notebook.

### 1. Imports and Functions
* **var_to_pickle**: Writes the given variable to a pickle file
* **read_pickle**: Reads the given pickle file

In [36]:
import pandas as pd
import numpy as np
import re
import spacy
from sklearn.feature_extraction import text
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics import pairwise_distances
from sklearn.decomposition import NMF

from code.lw_pickle import var_to_pickle, read_pickle

### 2. Load DataFrame From Pickle

In [6]:
df_pk = '../data/se_df.pk'
df = read_pickle(df_pk)

### 3. Pre-process Descriptions
In this section I lemmatize descriptions using Spacy.

In [7]:
scy = spacy.load("en_core_web_sm")

In [30]:
# Simple script that lemmatizes lists of names and base spirits
def list_prepro(items):
    item_str = ' '.join(set([i for row in items for i in row]))
    doc = scy(item_str)
    words = [token.lemma_ for token in doc]
    words = list(set(filter(lambda w: '-' not in w, words)))
    return words

# Simple script that lemmatizes a description
def desc_prepro(desc):
    pos_keep = ['ADJ', 'NOUN', 'PROPN']
    doc = scy(desc)
    words = [token.lemma_ for token in doc if token.pos_ in pos_keep]
    words = list(filter(lambda w: '-' not in w, words))
    return ' '.join(words)

In [27]:
descriptions = df['description'].map(desc_prepro)

### 4. Create Lists of Stop Words
I created separate lists of stop words for two models: one includes several shared stop words and the other is more aggressive, containing drink names and base spirits.

In [34]:
# Manually-populated list of generic stop words
gen_stop_words = ['cocktail', 'drink', 'recipe', 'make', 'mix', 'flavor', 'good',
                  'ingredient', 'taste', 'perfect', 'little', 'bar', 'nice', 'blue',
                  'great', 'way', 'favorite', 'new', 'popular', 'delicious', 'green',
                  'party', 'fun', 'black', 'sure', 'time', 'glass', 'woo', 'year',
                  'st', 'shot', 'garnish', 'pink', 'bit', 'different', 'choice',
                  'drink', 'bartender', 'recipe', 'fantastic', 'delicious', 'use',
                  'taste', 'nice', 'liquor', 'drink', 'bit', 'drinker', 'try']
safe_sw = text.ENGLISH_STOP_WORDS.union(gen_stop_words)

# Lemmatized lists of base spirits and drink names
base_spirits = list_prepro(df['base_spirits'].tolist())
name_words = list_prepro(df['name_words'].tolist())

fun_sw = text.ENGLISH_STOP_WORDS.union(gen_stop_words + base_spirits + name_words)

### 5. Create Safe NMF Recommender
This recommender is based on the less aggressive, safe stop words list and returns predictions that contain similar names and base spirits as a given cocktail.

In [39]:
# Create TF-IDF Matrix
safe_tfidf = TfidfVectorizer(stop_words=safe_sw)
safe_mtx = safe_tfidf.fit_transform(descriptions.values)

# Create NMF Vectors
safe_nmf = NMF(n_components = 30)
safe_drink_vec = safe_nmf.fit_transform(safe_mtx)
safe_word_vec = safe_nmf.components_.transpose()

### 6. Create Fun NMF Recommender
This recommender is based on the more aggressive stop words list and returns predictions that can differ wildly from a given cocktail.

In [46]:
# Create TF-IDF Matrix
fun_tfidf = TfidfVectorizer(stop_words=fun_sw)
fun_mtx = fun_tfidf.fit_transform(descriptions.values)

# Create NMF Vectors
fun_nmf = NMF(n_components = 25)
fun_drink_vec = fun_nmf.fit_transform(fun_mtx)
fun_word_vec = fun_nmf.components_.transpose()

### >>>>>>NMF Recommender Comparison

In [299]:
all_idx = df[df['name'].str.contains('daiquiri')].index
idx = all_idx[1]
all_idx

Int64Index([44, 201, 265, 331, 592, 680], dtype='int64')

In [300]:
drink_dist = pairwise_distances(drink_vec, metric='cosine')
recos = drink_dist[idx].argsort()[:10].tolist()
df.loc[recos][['name', 'proc_desc', 'ingredients']]

Unnamed: 0,name,proc_desc,ingredients
201,daiquiri,original daiquiri simple recipe that common in...,"[1 1/2 ounces rum (light), 3/4 ounce ​​lime ju..."
483,naked lady,element great tasting cocktail bacardi rum nak...,"[1 1/2 ounces rum (Bacardi Superior), 1 1/2 ou..."
265,frozen daiquiri,daiquiri popular tropical cocktail many fruity...,"[3/4 cup ​ice, 2 ounces ​rum (aged or gold), 1..."
331,hemingway daiquiri,hemingway daiquiri papa doble hemingway specia...,"[2 ounces white rum, 1/4 ounce maraschino liqu..."
195,cuba libre,cuba libre recipe easy popular mixed drink fam...,"[1/2 lime (juiced), 2 ounces light rum, 4 ounc..."
36,bacardi cocktail,drink bacardi rum few recipe fitting original ...,"[2 ounces Light Rum (Bacardi), 1/2 ounce lemon..."
44,banana daiquiri,banana daiquiri favorite many frozen cocktail ...,"[1 1/2 ounces light rum, 1/2 ounce triple sec,..."
126,cable car,cable car excellent rum sour cocktail tony abo...,"[1 1/2 ounces Captain Morgan Spice Rum, 3/4 ou..."
434,mai tai,mai tai iconic rum tiki scene classic rum cock...,"[1 ounce light rum, 1 ounce Jamaican rum, 1 ou..."
228,el presidente,el presidente fantastic rum cocktail vermouth ...,"[1 1/2 ounces light rum, 3/4 ounce orange cura..."


In [311]:
drink_dist2 = pairwise_distances(drink_vec2, metric='cosine')
recos2 = drink_dist2[idx].argsort()[:10].tolist()
df.loc[recos2][['name', 'proc_desc', 'ingredients']]

Unnamed: 0,name,proc_desc,ingredients
201,daiquiri,original daiquiri simple recipe that common in...,"[1 1/2 ounces rum (light), 3/4 ounce ​​lime ju..."
197,cucumber mint margarita,frozen margarita cool crisp cocktail secret br...,"[1/2 cup cucumber (peeled, seeded and chopped)..."
439,mango spice,pepper cocktail attention spicy cocktail mango...,"[1 1/2 ounces Absolut Mango Vodka, 1 ounce man..."
697,tangerine margarita,tangerine margarita fresh taste citrus fruit w...,"[2 ounces tequila, 1 ounce​ ​orange liqueur, 1..."
210,devil's handshake,devil handshake pleasant mix fruit tequila sem...,"[1 1/2 ounces tequila, 3/4​ ounces lime juice,..."
661,sour patch margarita,little sugar gummy child sour patch kids that ...,"[2 ounces lime tequila, 1/2 ounce melon liqueu..."
754,watermelon margarita,traditional margaritas call ​triple sec lime l...,[5 cups watermelon (about one 3 to 4-pound mel...
320,habanero blood orange margarita,spicy margarita recipe that little different m...,"[For the Blood Orange Habanero Puree:, 2 pints..."
233,english christmas punch,english christmas punch traditional warm drink...,"[750 ml bottle ​dark rum, 750 ml bottle dry re..."
611,ruby rum sunrise,switch rum grapefruit tart upgrade tequila sun...,"[2 ounces rum, 1/2 ounce sour mix, 1 1/2 ounce..."


### Test Recommender With Search Term

In [307]:
idx = 201
search_term = 'strong scotch smokey'
search_vec = nmf_mod.transform(tfv.transform(search_term.split()))
search_vec2 = nmf_mod2.transform(tfv2.transform(search_term.split()))
search_vec.sum(), search_vec2.sum()

(0.5548451272089946, 0.08294488036634144)

Cull drink names from search terms - if a drink is found, look up its recommendations and multiply that vector by the search recommendation vector.

If search term isn't in corpus for one or the other, the resulting vector will have sum zero. If only the less-populated vector is zero, then we can just use the first vector. If both are zero, then the model cannot use the search term. Maybe it can just return some items randomly selected from popular drinks list?

In [308]:
search_vec = np.mean(search_vec, axis=0, keepdims=True)
#search_vec = np.mean([search_vec, drink_vec[[idx]]], axis=0)
search_vec2 = np.mean(search_vec2, axis=0, keepdims=True)
#search_vec2 = np.mean([search_vec2, drink_vec2[[idx]]], axis=0)

In [309]:
search_dist = pairwise_distances(X=drink_vec, Y=search_vec, metric='cosine')
search_recos = search_dist.transpose()[0].argsort()[:10].tolist()
df.loc[search_recos][['name', 'proc_desc', 'ingredients']]

Unnamed: 0,name,proc_desc,ingredients
615,rusty nail,rusty nail ultimate scotch cocktail interested...,"[1 1/2 ounces Scotch whiskey, 3/4 ounce Drambuie]"
629,scotch & soda,easy scotch soda name everything popular scotc...,"[2 ounces scotch whisky, 6 ounces club soda]"
596,rob roy,rob roy scotch manhattan choice whiskey that o...,"[1 1/2 ounces Scotch whisky, 3/4 ounce sweet v..."
297,godfather,godfather nice simple drink that perfect time ...,"[1 1/2 ounces Scotch whisky, 1/2 ounce amarett..."
6,affinity,affinity slight variation perfect manhattan th...,"[1 1/2 ounce Scotch whisky, 1/2 ounce sweet ve..."
48,barbary coast,barbary coast unique flavor everyone liking re...,"[1/2 ounce gin, 1/2 ounce light rum, 1/2 ounce..."
770,witch hunt,witch hunt intriguing scotch lemonade drink na...,"[1 1/2 ounces Scotch whisky, 1/2 ounce dry ver..."
299,gold mine,gold mine intriguing lowball drink available m...,"[1/2 ounce Scotch whiskey, 1/2 ounce Galliano ..."
597,robert burns,robert burns great classic cocktail golden age...,"[2 ounces Scotch whisky, 3/4 ounce sweet vermo..."
775,zesty irishman,zesty irishman surface similar whiskey sour ex...,"[3/4 ounce Drambuie, 1 ounce Irish whiskey, 1/..."


In [310]:
search_dist2 = pairwise_distances(X=drink_vec2, Y=search_vec2, metric='cosine')
search_recos2 = search_dist2.transpose()[0].argsort()[:10].tolist()
df.loc[search_recos2][['name', 'proc_desc', 'ingredients']]

Unnamed: 0,name,proc_desc,ingredients
71,black cat,black cat simple mixed drink that perfect part...,"[1 ounce vodka, 1 ounce cherry brandy, 4 ounce..."
141,cape codder,cape cod cape codder easy mixed drink mystery ...,"[3 ounces cranberry juice, 2 ounces vodka, Gar..."
584,red lotus,distinct flavor lychee fruit star red lotus co...,"[1 1/2 ounces vodka, 1 1/2 ounces lychee lique..."
8,alabama slammer,alabama slammer short history highball shooter...,"[1 ounce Southern Comfort, 1 ounce amaretto, 1..."
213,dirty bird,dirty bird fun little cocktail familiar white ...,"[1 ounce vodka (or tequila), 1 ounce coffee li..."
705,texas tea,texas tea long island tea shot bourbon simple ...,"[1/2 ounce tequila, 1/2 ounce bourbon, 1/2 oun..."
497,nutty irishman,nutty irishman tasty popular drink that variet...,"[3/4 ounce Irish cream liqueur, 3/4 ounce haze..."
423,long island iced tea,long island iced tea popular mixed drink name ...,"[1/2 ounce triple sec, 1/2 ounce light rum, 1/..."
749,washington apple,washington apple fabulous fun drink tasty whis...,[1 ounce whiskey (Crown Royal Canadian Whiskey...
713,tornado,tornado strong mixed drink that great visual i...,"[1 ounce whiskey, 1 ounce vodka, 1 ounce rum, ..."
