In [1]:
import numpy as np
import pandas as pd

# Recipe Recommendation: matrix factorization


In this notebook, I improve my naive nearest-neighbor recommendation system with something a little smarter.  The matrix factorization approach assumes that there's some hidden features for both users and recipes.  The product of a user's feature vector with a recipe's feature vector should predict how well the user will like that recipe.


Normally, the features for both users and items are learned by gradient descent.  Unfortunately, in my case, I'm the only user, which makes it impossible to learn features for recipes, as I haven't even tried most of them.  But I can use the ingredients as recipe features and learn user features for myself.


We'll start by writing a quick function that will predict a recipe's rating, once we have the user features.

In [131]:
df = pd.read_csv('one_hot_data.csv')

In [132]:
rated = df.dropna()
unrated = df.drop(rated.index)

def rate_recipe(name, U):
    x = np.array(df[df.Recipe_name==name].drop(columns=['Rating', 'Recipe_name']))
    return x.dot(U.T)[0][0]

I tried using scipy's NMF module first, but it learns both user and item features; you can't provide one of them and only learn the other.  So I'll have to write my own function.  It's a heck of a lot slower, but it seems to work.

In [81]:
def decompose(R, U, I, nsteps=5000, alpha=.0002, beta=.02, epsilon=.001): 
        # I use R for Ratings (what we compute), 
        #       U for User features (what to learn), 
        #       I for item features (given, in my case)
        
        N, M = R.shape # 1 x 58
        #        U is    1 x 282
        #        I is    282 x 58
        
        K = U.shape[1] # latent space size
        
        for z in range(nsteps):
            for i in range(N):
                for j in range(M):
                    e_ij = R[i,j] - U[i,:].dot(I[:,j])  # the error
                    for k in range(K):
                        U[i,k] += alpha * (2 * e_ij * I[k,j] - beta * U[i,k] ) # step down the gradient
                        
            # compute loss
            mse = (R - U.dot(I)).mean(axis=None) ** 2
            reg = .5 * beta * np.linalg.norm(U, 2)
            loss = mse + reg
            if loss < epsilon:
                break
        return U
    

In [82]:
I = np.array(rated.drop(columns=['Rating','Recipe_name'])).T
U_init = np.random.rand(1,282)
R = np.array(rated.Rating).reshape(1,-1)

In [83]:
U = decompose(R, U_init, I)

Now that we have the user feature vector U, we can first check that it reproduces the rated recipes accurately.  In the next cell we print actual and predicted ratings for 15 rated recipes.  Most of them are within 0.2 or so of the true value.  This gives a little confidence that a new prediction might be accurate as well.

In [121]:
for r in rated.Recipe_name[:15]:
    print('{}: Rated {}, predicted rating {:.2f}'.format(r, 
                                    rated[rated.Recipe_name==r].Rating.values[0], rate_recipe(r, U)))

Farmers Market Farro Bowls: Rated 5.0, predicted rating 5.03
Vegetarian Taco Bowls: Rated 4.0, predicted rating 4.04
Smoked Fish and Rice Breakfast Bowl: Rated 4.0, predicted rating 3.99
Mojo Meatballs: Rated 4.0, predicted rating 4.04
Shepherd's Pie: Rated 4.0, predicted rating 4.01
Root Vegetable Gratin: Rated 5.0, predicted rating 4.80
Cauliflower Tacos With Cashew Crema: Rated 5.0, predicted rating 4.91
Cauliflower Bolognese: Rated 5.0, predicted rating 5.24
Kung Pao Cauliflower: Rated 4.0, predicted rating 4.06
Roasted Cauliflower: Rated 5.0, predicted rating 4.37
Crispy-Skin Salmon with Miso-Honey Sauce: Rated 4.0, predicted rating 3.92
Ramen Noodles with Miso Pesto: Rated 4.0, predicted rating 3.63
Make-Ahead Broccoli and Quinoa Salad: Rated 5.0, predicted rating 5.04
Vegetarian Meatballs with Soy-Honey Glaze: Rated 2.0, predicted rating 2.13
Sesame Noodles with Crispy Tofu: Rated 4.0, predicted rating 3.92


Now let's predict ratings for some unrated recipes, and look at some of the highest and lowest ones.  A proper evaluation requires that I make the recipes, but that will take a while.

In [122]:
which = np.random.randint(0, unrated.shape[0], 20) # choose some random recipes

In [123]:
for i in which:
    rating = rate_recipe(df.iloc[i].Recipe_name, U)
    print('{}: predicted rating {}'.format(df.iloc[i].Recipe_name, rating) )

Aromatic Pork and Noodle Soup: predicted rating 3.672084291951153
Brown Butter Apple Tart: predicted rating 3.7262293851143298
Smoked Salmon with Asparagus Toasts: predicted rating 2.1286361616123592
Pupusas: predicted rating 3.6030874550667926
Crunchy Baked Saffron Rice with Barberries (Tachin): predicted rating 3.317922772503179
Rice Salad with Fava Beans and Pistachios: predicted rating 3.528258154843909
Green Shakshuka: predicted rating 4.160252198407605
Marinated Mixed Beans: predicted rating 3.884124483675059
Sweet and Spicy Antipasto Salad: predicted rating 4.363642377738933
Roasted Squash and Grains with Tahini-Honey: predicted rating 4.8512325940521785
Anything Goes Donabe: predicted rating 0.46564863079061514
Ramen Noodle Bowl with Escarole and Spicy Tofu Crumbles: predicted rating 3.2083541921750993
Roasted Cauliflower Larb: predicted rating 4.107674950154039
Grilled Fennel-Rubbed Triple-Cut Pork Chops: predicted rating 3.4441314408890733
Turnip and Kale Gratin: predicted ra

Looks like it predicts pretty safe values around 3.5 for most things.  

I'll create a function to find highly or poorly rated recipes.


In [125]:
def recommend(n=5, neg=False):
    # Recommend five recipes
    names = []
    while len(names) < n:
        i = np.random.randint(0, unrated.shape[0])
        name = df.iloc[i].Recipe_name
        if name in names: continue
        r = rate_recipe(df.iloc[i].Recipe_name, U)
        if neg:
            if r < 2: names.append(name)
        else:
            if r > 4.9: names.append(name)
            
    return names

Let's recommend a few recipes!

In [127]:
recommend()

['Chilaquiles with Fried Eggs',
 'Party-Ready Italian Heros',
 'Steak Salad with Horseradish Dressing',
 'Tofu Sloppy Joes',
 'Smoked Salmon Smørrebrød']

And let's see what the model thinks I won't like:

In [128]:
recommend(neg=True)

['Antipasti Hand Salad',
 'Potluck Chopped Salad',
 "BA's Best Bolognese",
 'Bo Zai Fan (Chinese Chicken and Mushroom Clay Pot Rice)',
 'Chicken Khao Soi']

I have to say the recommendations look pretty good . . . as do most of the negatives.

I guess I'll just have to make more food to get more data!

In [133]:
recommend()

['Spicy Cabbage Salad with Turkey and Peanuts',
 'Spring Chicken Salad with Smashed Green Beans',
 'Chicken Stew with Cannellini Beans and Dried Cherries',
 'Eggplant and Country Ham Ragù',
 'Soba with Tofu and Miso-Mustard Dressing']