In [None]:
import pandas as pd
import numpy as np

In [None]:
recipes = pd.read_csv('myRec/app/recommender_comp/datasets/new_allrecipes.csv', sep=",", error_bad_lines=False, encoding="latin-1")
recipes

In [33]:
recipes['ingredients'].head()

0    ['5 cups cubed potatoes', '2 cups carrots, sli...
1    ['1/2 cup butter', '1 cup white sugar', '1 cup...
2    ['1 (10 pound) whole goose', '2 tablespoons ko...
3    ['1 cup packed brown sugar', '1 cup white suga...
4    ['10 pounds white potatoes, peeled and quarter...
Name: ingredients, dtype: object

In [34]:
recipes['instructions'].head()

0    ['In a 4 quart casserole dish combine cubed po...
1    ['Cream 1/2 cup butter or margarine and 1 cup ...
2    ['Rinse goose and pat dry. Remove excess fat. ...
3    ['In a saucepan, combine the brown sugar, whit...
4    ['Preheat oven to 350 degrees F (175 degrees C...
Name: instructions, dtype: object

###  Compute Term Frequency-Inverse Document Frequency (TF-IDF) vectors for each document
This will give you a matrix where each column represents a word in the overview vocabulary (all the words that appear in at least one document) and each column represents a recipe, as before.

TF-IDF score is the frequency of a word occurring in a document, down-weighted by the number of documents in which it occurs. This is done to reduce the importance of words that occur frequently in plot overviews and therefore, their significance in computing the final similarity score

In [35]:
#Import TfIdfVectorizer from scikit-learn
from sklearn.feature_extraction.text import TfidfVectorizer

In [36]:
#Define a TF-IDF Vectorizer Object. Remove all english stop words such as 'the', 'a'
tfidf = TfidfVectorizer(stop_words='english')

In [37]:
#Replace NaN with an empty string
recipes['ingredients'] =recipes['ingredients'].fillna('')
recipes['ingredients'].head()

0    ['5 cups cubed potatoes', '2 cups carrots, sli...
1    ['1/2 cup butter', '1 cup white sugar', '1 cup...
2    ['1 (10 pound) whole goose', '2 tablespoons ko...
3    ['1 cup packed brown sugar', '1 cup white suga...
4    ['10 pounds white potatoes, peeled and quarter...
Name: ingredients, dtype: object

In [38]:
#Construct the required TF-IDF matrix by fitting and transforming the data
tfidf_matrix = tfidf.fit_transform(recipes['ingredients'])

## 1621 words were used to describe ingredients, for 5422 recipes

In [39]:
#Output the shape of tfidf_matrix
tfidf_matrix.shape

(5422, 1621)

You will be using the cosine similarity to calculate a numeric quantity that denotes the similarity between two movies. You use the cosine similarity score since it is independent of magnitude and is relatively easy and fast to calculate (especially when used in conjunction with TF-IDF scores, which will be explained later). Mathematically, it is defined as follows: 

cosine(x,y)=x.y⊺||x||.||y||cosine(x,y)=x.y⊺||x||.||y||
Since you have used the TF-IDF vectorizer, calculating the dot product will directly give you the cosine similarity score. Therefore, you will use sklearn's linear_kernel() instead of cosine_similarities() since it is faster.

In [40]:
# Import linear_kernel
from sklearn.metrics.pairwise import linear_kernel

In [41]:
# Compute the cosine similarity matrix
cosine_sim = linear_kernel(tfidf_matrix, tfidf_matrix)

You're going to define a function that takes in a movie title as an input and outputs a list of the 10 most similar movies. Firstly, for this, you need a reverse mapping of movie titles and DataFrame indices. In other words, you need a mechanism to identify the index of a movie in your metadata DataFrame, given its title.

In [42]:
#Construct a reverse map of indices and movie titles
indices = pd.Series(recipes.index, index=recipes['title']).drop_duplicates()

- Get the index of the movie given its title.
- Get the list of cosine similarity scores for that particular movie with all movies. Convert it into a list of tuples where the first element is its position and the second is the similarity score.
- Sort the aforementioned list of tuples based on the similarity scores; that is, the second element.
- Get the top 10 elements of this list. Ignore the first element as it refers to self (the movie most similar to a particular movie is the movie itself).
- Return the titles corresponding to the indices of the top elements.

In [None]:
# Function that takes in movie title as input and outputs most similar movies
def get_recommendations(title, cosine_sim=cosine_sim):
    # Get the index of the movie that matches the title
    idx = indices[title]

    # Get the pairwsie similarity scores of all movies with that movie
    sim_scores = list(enumerate(cosine_sim[idx]))

    # Sort the movies based on the similarity scores
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)

    # Get the scores of the 10 most similar movies
    sim_scores = sim_scores[1:11]

    # Get the movie indices
    recipe_indices = [i[0] for i in sim_scores]

    # Return the top 10 most similar movies
    return recipes['title'].iloc[recipe_indices]

In [44]:
get_recommendations('Banana Crumb Muffins')

1009               Banana Split Cookies
2586             Cinnamon Sugar Cookies
4344         Janine's Best Banana Bread
2065                    Pineapple Puffs
717                  Butterscotch Bread
5007    Grandma's Homemade Banana Bread
2411             Lighter Banana Muffins
4214                    Streusel Kuchen
2959                Finnish Pannu Kakku
5054                Banana Bran Muffins
Name: title, dtype: object

In [45]:
get_recommendations('Stuffed Mushrooms IV')

2094    Best Ever Meatloaf with Brown Gravy
3856                      Mushroom Meatloaf
3970                       Mushroom Risotto
1924                            Peanut Soup
2985                   Tomato-Mushroom Soup
4712            Oven Fried Parmesan Chicken
4534       Mouth-Watering Stuffed Mushrooms
3105       Mushroom Stuffed Chicken Rollups
5160                         Mushroom Sauce
3021                       Creamy Corn Soup
Name: title, dtype: object

In [46]:
get_recommendations('Maple Roast Turkey')

4517        Maple Roast Turkey and Gravy
2340     Awesome Tangerine-Glazed Turkey
4813                   Vegetable Chowder
4533            Beef and Barley Soup III
3667                   Oyster Dressing I
4862    Ibby's Pumpkin Mushroom Stuffing
1874                 Veggie Cheddar Soup
4024                 Chicken Jambalaya I
1433                   Mushroom Stuffing
5387               Mulligatawny Soup III
Name: title, dtype: object

# Credits, Genres and Keywords Based Recommender

# the 3 top actors, the director, related genres and the movie plot keywords.

# => category, instructions keywords, description

In [None]:
description = pd.read_csv('myRec/app/recommender_comp/datasets/data_description.csv', sep=",", error_bad_lines=False, encoding="latin-1")
description

In [None]:
recipes = pd.merge(recipes, description.iloc[:, [0,4]], how='left', on='id')
recipes.head()

In [None]:
from ast import literal_eval

In [None]:
features = ['ingredients', 'calories', 'category', 'description']

In [None]:
import numpy as np

In [None]:

recipes[features].head()

In [81]:
from functools import reduce
from string import digits

# Function to convert all strings to lower case and strip names of spaces
repls = ('cups', ''), ('cup', ''), ('potatoes', ''), ('tablespoons', ''), ('pounds', ''), ('pound', '') 

#return reduce(lambda a, kv: a.replace(*kv), repls, x)
def clean_data(x):
    if isinstance(x, list):
        return [str.lower(reduce(lambda a, kv: a.replace(*kv), repls, x) for i in x]
    else:
        #Check if director exists. If not, return empty string
        if isinstance(x, str):
            return str.lower(reduce(lambda a, kv: a.replace(*kv), repls, x))
        else:
            return ''

SyntaxError: invalid syntax (<ipython-input-81-60361d810414>, line 10)

In [None]:
# Apply clean_data function to your features.
features = ['ingredients', 'category', 'description']

for feature in features:
    recipes[feature] = recipes[feature].apply(clean_data)

    
recipes.head()

In [None]:
recipes[features].head()

In [69]:
def create_soup(x):
    return ' '.join(x['ingredients']) + ' ' + x['category'] + ' ' + ' '.join(x['description'])

In [70]:
# Create a new soup feature
recipes['soup'] = recipes.apply(create_soup, axis=1)

In [82]:
# Import CountVectorizer and create the count matrix
from sklearn.feature_extraction.text import CountVectorizer

count = CountVectorizer(stop_words='english')
count_matrix = count.fit_transform(recipes['soup'])
count_matrix.shape

(5422, 28)

In [74]:
# Compute the Cosine Similarity matrix based on the count_matrix
from sklearn.metrics.pairwise import cosine_similarity

cosine_sim2 = cosine_similarity(count_matrix, count_matrix)


In [78]:
# Reset index of your main DataFrame and construct reverse mapping as before

indices = pd.Series(recipes.index, index=recipes['title']).drop_duplicates()
recipes['soup']

0       [ ' 5     c u b e d   ' ,   ' 2     c a r r o ...
1       [ ' 1 / 2     b u t t e r ' ,   ' 1     w h i ...
2       [ ' 1   ( 1 0   )   w h o l e   g o o s e ' , ...
3       [ ' 1     p a c k e d   b r o w n   s u g a r ...
4       [ ' 1 0     w h i t e   ,   p e e l e d   a n ...
5       [ ' 4   e g g   w h i t e s ' ,   ' 1 / 4   t ...
6       [ ' 1   ( 9   i n c h )   p r e p a r e d   c ...
7       [ ' 1   ( 9   i n c h )   p r e p a r e d   g ...
8       [ ' 1   ( 9   i n c h )   p i e   s h e l l , ...
9       [ ' 1   ( 9   i n c h )   u n b a k e d   p i ...
10      [ ' 1     g o l d e n   d e l i c i o u s   a ...
11      [ ' 2   ( 1   o u n c e )   s q u a r e s   u ...
12      [ ' 1   ( 9   i n c h )   p i e   s h e l l ' ...
13      [ ' 1   1 / 2     s o u r   c r e a m ' ,   ' ...
14      [ ' 4     g r o u n d   h a m ' ,   ' 1 / 3   ...
15      [ ' 1     r o t i n i   p a s t a ' ,   ' 1   ...
16      [ ' 2     u n s a l t e d   b u t t e r ' ,   ...
17      [ ' 1 

In [225]:
get_recommendations('Maple Roast Turkey', cosine_sim2)

87          Spicy Pork and Cabbage
116         Dinner in a Pumpkin II
126                    Pork Afelia
162    Almost Beau Monde Seasoning
166            Beef Summer Sausage
182                  Beef and Brew
185               Carrot Rice Loaf
235         Cottage Cheese Loaf II
265                    Yuck-a-Muck
360        Texas Chili Beef Slices
Name: title, dtype: object

In [226]:
get_recommendations('Banana Crumb Muffins', cosine_sim2)

20            Chocolate Coffee Bread
24              Sweet Bread Overnite
42                  Rhubarb Bread II
47                 Totally Rye Bread
51            Whole Wheat Croissants
76                  Pumpkin Bread II
80                         Churros I
86            Amber's Zucchini Bread
100    Pineapple Macadamia Nut Bread
135                       Prusurates
Name: title, dtype: object

In [227]:
get_recommendations('Stuffed Mushrooms IV', cosine_sim2)

72                     Crunchy Carrot Ball
77            Italian Fried Eggplant Balls
124                   Pickled Pig's Feet I
127                      Legion Cheese Dip
132                   Light Pimento Cheese
139    Franks in Peanut Butter and Chutney
147                 Creamy Garlic Escargot
150                          Cheeseball II
180                 Original Buffalo Wings
193                  Good, Good Greenbeans
Name: title, dtype: object