# Building NMF Model Using Spruce Eats Data
I used the scraped and cleaned Spruce Eats data to build a recommender engine in this notebook. It loads the **se_df.pk** pickle data created in the **scrape_spruce_eats** notebook.

### Table of Contents
* [1. Imports and Functions](#sec1)
* [2. Load DataFrame From Pickle](#sec2)
* [3. Pre-process Descriptions](#sec3)
* [4. Create Lists of Stop Words](#sec4)
* [5. Create Recommender Model](#sec5)
* [6. Recommender Testing](#sec6)
* [7. Pickle DataFrame](#sec7)

<a id='sec1'></a>
### 1. Imports and Functions
* **var_to_pickle**: Writes the given variable to a pickle file
* **read_pickle**: Reads the given pickle file
* **cocktail_recommender**: Builds recommendation engine using NMF

In [1]:
import sys
import pandas as pd
import numpy as np
import re
import spacy
from sklearn.feature_extraction import text

sys.path.append('../code')
from lw_pickle import var_to_pickle, read_pickle
from cocktail_recommender import cocktail_recommender

<a id='sec2'></a>
### 2. Load DataFrame From Pickle
This cell loads the final DataFrame of scraped and organized cocktail recipes.

In [2]:
df_pk = '../data/se_df.pk'
df = read_pickle(df_pk)

<a id='sec3'></a>
### 3. Pre-process Descriptions
In this section I created a pair of text preprocessing functions that lemmatize words using Spacy. I then restricted drink descriptions to nouns and adjectives and lemmatized them.

In [3]:
scy = spacy.load("en_core_web_sm")

In [4]:
# Simple function that lemmatizes lists of names and base spirits
def list_prepro(items):
    item_str = ' '.join(set([i for row in items for i in row]))
    doc = scy(item_str)
    words = [token.lemma_ for token in doc]
    words = list(set(filter(lambda w: '-' not in w, words)))
    return words

# Simple function that lemmatizes a description
def desc_prepro(desc):
    pos_keep = ['ADJ', 'NOUN', 'PROPN']
    doc = scy(desc)
    words = [token.lemma_ for token in doc if token.pos_ in pos_keep]
    words = list(filter(lambda w: '-' not in w, words))
    return ' '.join(words)

In [5]:
df['description'] = df['description'].map(desc_prepro)

<a id='sec4'></a>
### 4. Create Lists of Stop Words
I created separate lists of stop words for two models: one includes several shared stop words and the other is more aggressive, containing drink names and base spirits as well.

In [6]:
# Manually-populated list of generic stop words
gen_stop_words = ['cocktail', 'drink', 'recipe', 'make', 'mix', 'flavor', 'good',
                  'ingredient', 'taste', 'perfect', 'little', 'bar', 'nice', 'blue',
                  'great', 'way', 'favorite', 'new', 'popular', 'delicious', 'green',
                  'party', 'fun', 'black', 'sure', 'time', 'glass', 'woo', 'year',
                  'st', 'shot', 'garnish', 'pink', 'bit', 'different', 'choice',
                  'drink', 'bartender', 'recipe', 'fantastic', 'delicious', 'use',
                  'taste', 'nice', 'liquor', 'drink', 'bit', 'drinker', 'try']
safe_sw = text.ENGLISH_STOP_WORDS.union(gen_stop_words)

# Lemmatized lists of base spirits and drink names
base_spirits = list_prepro(df['base_spirits'].tolist())
name_words = list_prepro(df['name_words'].tolist())

fun_sw = text.ENGLISH_STOP_WORDS.union(gen_stop_words + base_spirits + name_words)

<a id='sec5'></a>
### 5. Create Recommender Model
The imported **cocktail_recommender** class takes the cocktail DataFrame and stop words lists as input to create two sets of NMF vectors. The safe and fun stop words vectors blend to create a single, adjustable model. The input string is converted to an NMF vector, which is then used to find the most similar recipes to that input.

In [7]:
cr = cocktail_recommender(df, safe_sw, fun_sw)

<a id='sec6'></a>
### 6. Recommender Testing
This cell is for testing recommender calls.

In [8]:
cr.recommend('rum', exclude_inputs=False, weirdness=.5)[1]['name']

613        rustic manhattan
45           banana hammock
612              rum runner
359               hurricane
196              cuban rose
426           lounge lizard
488         nevada cocktail
774          zesty irishman
688    swamp water surprise
36         bacardi cocktail
Name: name, dtype: object

<a id='sec7'></a>
### 7. Pickle DataFrame
Saves the recommender to a pickle file.

In [9]:
reco_pk = '../data/reco.pk'
var_to_pickle(cr, reco_pk)