# A Recommendation Engine for The Recipes by Using Collaborative Filtering in Python

#### About the Dataset:
pre-existing dataset for food.com from Kaggle, which includes two CSV files. One is the interactions_train CSV file containing around 160,000 recipe IDs rated by 25,000 user IDs in a total of 699,000 records, and the other is the RAW_recipes CSV file containing 230,000 recipes with names, ingredients, description, and steps, etc.

In [1]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics.pairwise import pairwise_distances

In [2]:
interactions_data = pd.read_csv('./Data_Files/interactions_train.csv')
recipes_data = pd.read_csv('./Data_Files/RAW_recipes.csv')

## Data Preprocessing

We reate a new DataFrame _all by dropping unnecessary columns ('date', 'u', 'i') from the 'interactions_train.csv' dataset.

In [3]:
interactions_data_new = interactions_data.drop(['date', 'u', 'i'], axis = 1)

We group the data by 'user_id', count the number of recipes each user has reviewed, rename the column, and select the top 7500 users based on the number of reviews.

In [4]:
grouped_users = interactions_data_new.groupby(['user_id'], as_index = False, sort = False).agg({'recipe_id':'count'}).reset_index(drop = True)
grouped_users = grouped_users.rename(columns = {'recipe_id':'reviews_count'})
grouped_users = grouped_users.sort_values('reviews_count', ascending = False).iloc[:7500,:]

Similarly, we group the data by 'recipe_id', count the number of reviews for each recipe, rename the column, and select the top 7500 recipes based on the number of reviews.

In [5]:
grouped_recipes = interactions_data_new.groupby(['recipe_id'], as_index = False, sort = False).agg({'user_id':'count'}).reset_index(drop = True)
grouped_recipes = grouped_recipes.rename(columns = {'user_id':'reviews_count'})
grouped_recipes = grouped_recipes.sort_values('reviews_count', ascending = False).iloc[:7500,:]

We merge the original data with the user and recipe information, removing unnecessary columns.

In [6]:
merged_data = pd.merge(interactions_data_new.merge(grouped_users).drop(['reviews_count'], axis = 1), grouped_recipes).drop(['reviews_count'], axis = 1)

The following lines create new DataFrames, grouped_user and grouped_recipe, by grouping data by user and recipe and counting the number of reviews for each.

In [7]:
grouped_user = merged_data.groupby(['user_id'], as_index = False, sort = False).agg({'recipe_id':'count'}).reset_index(drop = True)
grouped_user = grouped_user.rename(columns = {'recipe_id':'reviews_count'})

grouped_recipe = merged_data.groupby(['recipe_id'], as_index = False, sort = False).agg({'user_id':'count'}).reset_index(drop = True)
grouped_recipe = grouped_recipe.rename(columns = {'user_id':'reviews_count'})

These lines create dictionaries to map original user and recipe IDs to new IDs.

In [8]:
new_userID = dict(zip(list(merged_data['user_id'].unique()),
                      list(range(len(merged_data['user_id'].unique())))))

new_recipeID = dict(zip(list(merged_data['recipe_id'].unique()),
                      list(range(len(merged_data['recipe_id'].unique())))))

It replaces the original user and recipe IDs in the dataset with the new IDs.

In [9]:
df = merged_data.replace({'user_id': new_userID, 'recipe_id': new_recipeID})

We creates a new DataFrame recipe by merging recipe names and ingredients with the updated dataset.

In [10]:
recipe = recipes_data[['name', 'id', 'ingredients']].merge(merged_data[['recipe_id']], left_on = 'id', right_on = 'recipe_id', how = 'right').drop(['id'], axis = 1).drop_duplicates().reset_index(drop = True)

Now we calculate the mean rating for each user and add columns for the mean rating and adjusted rating to the dataset.

In [11]:
mean = df.groupby(['user_id'], as_index = False, sort = False).mean().rename(columns = {'rating':'mean_rating'})
df = df.merge(mean[['user_id','mean_rating']], how = 'left')
df.insert(2, 'adjusted_rating', df['rating'] - df['mean_rating'])

We split the dataset into training and testing sets for model evaluation.

In [12]:
train_data, test_data = train_test_split(df, test_size = 0.25)

These lines create a user-item matrix for the training data, where each entry represents a user's rating for a recipe.

In [13]:
number_users = df.user_id.unique()
number_items = df.recipe_id.unique()

train_data_matrix = np.zeros((number_users.shape[0], number_items.shape[0]))
for row in train_data.itertuples():
    train_data_matrix[row[1]-1, row[2]-1] = row[3]

Similarly, we create a user-item matrix for the testing data.

In [14]:
test_data_matrix = np.zeros((number_users.shape[0], number_items.shape[0]))
for row in test_data.itertuples():
    test_data_matrix[row[1]-1, row[2]-1] = row[3]

## Centered cosine similarity

We calculates user similarity based on the centered cosine similarity metric.

In [15]:
user_similarity = 1 - pairwise_distances(train_data_matrix, metric = 'cosine')

## Prediction of ratings

This function predict takes user ratings, similarity matrix, and type ('user' or 'item') as input and returns the predicted ratings.

In [16]:
def predict(ratings, similarity):
    pred = similarity.dot(ratings) / np.array([np.abs(similarity).sum(axis = np.newaxis)]) 
    return pred

We generate user-based predictions for the training data.

In [17]:
prediction = predict(train_data_matrix, user_similarity)

Next we are creating a DataFrame containing user-based predictions for each recipe.

In [18]:
prediction_df = pd.DataFrame(prediction, columns = list(number_items))
prediction_df.insert(0, 'user_id', list(number_users))

## Recommendation Engine

This function getRecommendations_UserBased takes a user ID and the number of top recommendations as input and prints the top recommended recipes for that user.

In [25]:
def get_recommendations(user_id, top_n = 10):
    for old_user, new_user in new_userID.items():
        if user_id == new_user:
            print(f'Top {top_n} Recommended Recipes for Original User ID: {old_user}\n')
    
    recipes_rated = list(df['recipe_id'].loc[df['user_id'] == user_id])
    predictions = prediction_df.loc[prediction_df['user_id'] == user_id].copy()
    predictions.drop(prediction_df[recipes_rated], axis = 1, inplace = True)
    unwatch_sorted = prediction_df.iloc[:,1:].sort_values(by = predictions.index[0], axis = 1, ascending = False)
    top_preds = unwatch_sorted.iloc[:, :top_n].to_dict(orient = 'records')

    i = 1
    for recipe_id in list(top_preds[0].keys()):
        for old_recipe, new_recipe in new_recipeID.items():
            if recipe_id == new_recipe:
                name = recipe[recipe['recipe_id'] == old_recipe]['name'].values[0]
                ingredients = recipe[recipe['recipe_id'] == old_recipe]['ingredients'].values[0]

                print(f'Top {i} Original Recipe ID: {old_recipe} - {name}\n Ingredients: {ingredients}\n')
                
                i += 1
                
    return top_preds[0]

It calls the getRecommendations_UserBased function for a specific user ID (702) and displays the top 10 recommendations.

In [26]:
recommendation2 = get_recommendations(1)

Top 10 Recommended Recipes for Original User ID: 11044
Top 1 Original Recipe ID: 210757 - weight watchers broccoli cheese soup   2 pts per cup
 Ingredients: ['chicken broth', 'frozen broccoli', 'tomatoes and green chilies', 'velveeta reduced fat cheese product']

Top 2 Original Recipe ID: 97838 - super easy honey curry chicken
 Ingredients: ['chicken thighs', 'butter', 'honey', 'mustard', 'salt', 'curry powder']

Top 3 Original Recipe ID: 4571 - layer cookies  magic layer bars
 Ingredients: ['butter', 'graham cracker', 'flaked coconut', 'chocolate chips', 'butterscotch chips', 'sweetened condensed milk', 'nuts']

Top 4 Original Recipe ID: 13640 - mifgash mushrooms
 Ingredients: ['oil', 'onion', 'button mushroom', 'soup mix', 'paprika', 'black pepper', 'water']

Top 5 Original Recipe ID: 68091 - fake rotisserie chicken
 Ingredients: ['chicken', 'onion', 'lemon', 'tarragon', 'soy sauce']

Top 6 Original Recipe ID: 100474 - easy crispy taco turnovers
 Ingredients: ['lean ground beef', 'on

Similarly, it calls the getRecommendations_UserBased function for another user ID (408) with a specified number of top 5 recommendations.

In [28]:
recommendations = get_recommendations(12, 8)

Top 8 Recommended Recipes for Original User ID: 29063

Top 1 Original Recipe ID: 3368 - blackberry pie iii
 Ingredients: ['sugar', 'all-purpose flour', 'cornstarch', 'salt', 'blackberries', 'pastry for double-crust pie']

Top 2 Original Recipe ID: 308022 - summer egg and bacon scramble
 Ingredients: ['onion', 'red pepper', 'olive oil', 'eggs', 'milk', 'bacon', 'toast', 'salt and pepper']

Top 3 Original Recipe ID: 363073 - kristen s grilled cheese and red onion sandwich
 Ingredients: ['rye bread', 'red onions', 'cheddar cheese', 'butter', 'pepper']

Top 4 Original Recipe ID: 273838 - moroccan harira soup
 Ingredients: ['lentils', 'olive oil', 'onion', 'fresh parsley', 'cilantro', 'ground ginger', 'cinnamon', 'diced tomatoes', 'vegetable broth', 'chickpeas', 'orzo pasta', 'flour', 'lemon juice', 'tomato paste']

Top 5 Original Recipe ID: 302000 - ww tomato salad with red onion and basil 2 points
 Ingredients: ['red wine vinegar', 'olive oil', 'sugar', 'salt', 'dijon mustard', 'fresh gro