# I. Project Team Members

| Prepared by | Email | Prepared for |
| :-: | :-: | :-: |
| **Hardefa Rogonondo** | hardefarogonondo@gmail.com | **Food.com Recipe Recommendation Engine** |

# II. Notebook Target Definition

This notebook focuses on the feature engineering phase of the Food.com Recipe Recommendation Engine Project. Unlike traditional feature engineering approaches where new attributes are crafted, our task here is specialized for recommendation systems. We start with the cleaned and preprocessed Food.com dataset and construct a user-item matrix, essential for the functioning of our recommendation algorithms. This matrix captures the relationships between users and recipes, serving as the critical input for model training. Thus, this step prepares our data architecture for the immediate next phase, which involves model building and validation.

# III. Notebook Setup

## III.A. Import Libraries

In [1]:
from scipy.sparse import csr_matrix
import numpy as np
import pandas as pd
import pickle

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

## III.B. Import Data

In [2]:
train = pd.read_pickle('../../data/processed/train.pkl')
test = pd.read_pickle('../../data/processed/test.pkl')

In [3]:
train.head()

Unnamed: 0,user_id,recipe_id,date,rating,review,name,minutes,contributor_id,submitted,tags,nutrition,n_steps,steps,description,ingredients,n_ingredients
59204,794029,144195,2016-09-12,5,This was great. Others had commented that the ...,kittencal s perfect pesto,1,89831,2005-11-07,"['15-minutes-or-less', 'time-to-make', 'course...","[1418.9, 226.0, 9.0, 89.0, 43.0, 107.0, 4.0]",6,"['combine the basil , olive oil , pine nuts an...",i have experimented with many ingredient amoun...,"['fresh basil leaves', 'olive oil', 'pine nuts...",7
51201,128473,527468,2016-08-08,5,"Jackie we loved this jerk pork recipe, it was ...",grilled jerk pork tenderloin,20,386585,2016-07-08,"['30-minutes-or-less', 'time-to-make', 'course...","[268.0, 7.0, 106.0, 5.0, 55.0, 8.0, 9.0]",7,"['preheat grill to medium-high heat', 'in a me...",the zesty spices and refreshing mango make you...,"['mangoes', 'red onions', 'light brown sugar',...",8
56295,2001345166,20376,2017-02-01,5,I am told that these are the best cookies I've...,nestle oatmeal scotchies,25,28846,2002-02-22,"['30-minutes-or-less', 'time-to-make', 'course...","[1311.5, 109.0, 520.0, 47.0, 18.0, 244.0, 53.0]",14,"['preheat oven to 375 degrees f', 'combine flo...",one of our families favorite cookies. i often ...,"['unsifted flour', 'baking soda', 'salt', 'cin...",11
61487,2001276950,293275,2017-02-11,5,After trying several different stew recipes I ...,quick and easy beef stew,25,742802,2008-03-21,"['30-minutes-or-less', 'time-to-make', 'course...","[496.2, 21.0, 97.0, 871.0, 48.0, 28.0, 23.0]",8,['in a medium mixing bowl put flour and teaspo...,"we don't eat red meat very often in our house,...","['sirloin tip roast', 'potatoes', 'carrots', '...",11
16278,1742738,448113,2018-03-27,5,"This was very, very tasty. Only change I made ...",crockpot hamburger stroganoff,390,680724,2011-02-04,"['course', 'preparation', 'main-dish', 'crock-...","[184.3, 17.0, 21.0, 30.0, 5.0, 33.0, 3.0]",7,"['brown ground beef with onion and garlic', 'd...",i have searched high and low for the quickest ...,"['lean ground beef', 'onion', 'garlic clove', ...",10


In [4]:
test.head()

Unnamed: 0,user_id,recipe_id,date,rating,review,name,minutes,contributor_id,submitted,tags,nutrition,n_steps,steps,description,ingredients,n_ingredients
42674,2000301121,59987,2017-11-26,0,I just made some with my left over turkey and ...,chicken or turkey salad,130,58407,2003-04-17,"['lactose', 'weeknight', 'time-to-make', 'cour...","[204.2, 13.0, 11.0, 28.0, 49.0, 12.0, 1.0]",7,"['combine chicken and celery in bowl', 'mix sa...",this is very good served over a bed of lettuce...,"['cooked chicken', 'celery', 'salad dressing',...",7
54112,2001102678,40725,2017-04-18,5,love the punch,island fruit punch,5,30716,2002-09-20,"['15-minutes-or-less', 'time-to-make', 'course...","[224.6, 1.0, 161.0, 0.0, 4.0, 0.0, 18.0]",6,"['chill all of the ingredients', 'you can even...",fresh and fruity tasting cocktail. i recommend...,"['pineapple', 'pineapple juice', 'mango', 'ban...",8
47057,45027729,444189,2018-06-24,1,There is no club soda in a real Old Fashioned.,whiskey old fashioned,5,177443,2010-12-14,"['15-minutes-or-less', 'time-to-make', 'course...","[34.9, 0.0, 30.0, 0.0, 0.0, 0.0, 3.0]",15,['first you will want to grab a cocktail or ro...,an old-fashioned whiskey cocktail. this is the...,"['bourbon whiskey', 'sugar', 'angostura bitter...",9
41159,1277979,58047,2016-10-03,5,"This method of preparation sounds great, and I...",ratatouille crock pot,200,8688,2003-04-04,"['time-to-make', 'course', 'main-ingredient', ...","[195.7, 16.0, 52.0, 18.0, 10.0, 7.0, 8.0]",10,"['sprinkle the eggplant with salt', 'let stand...",make a whole meal out of the french vegetarian...,"['eggplant', 'salt', 'onions', 'fresh tomatoes...",14
12017,2000788124,8782,2016-05-21,5,Thanks so much for this recipe! Its reallllly ...,roast sticky chicken,325,6411,2001-04-20,"['weeknight', 'time-to-make', 'course', 'main-...","[238.0, 24.0, 9.0, 67.0, 35.0, 22.0, 2.0]",12,"['blend all spices together and set aside', 'r...","beautiful and delicious, this incredibly moist...","['salt', 'paprika', 'cayenne pepper', 'onion p...",10


# IV. Feature Engineering

## IV.A. Data Shape Inspection

In [5]:
train.shape, test.shape

((13652, 16), (5851, 16))

## IV.B. Data Information Inspection

In [6]:
train.info()

<class 'pandas.core.frame.DataFrame'>
Index: 13652 entries, 59204 to 29511
Data columns (total 16 columns):
 #   Column          Non-Null Count  Dtype         
---  ------          --------------  -----         
 0   user_id         13652 non-null  int64         
 1   recipe_id       13652 non-null  int64         
 2   date            13652 non-null  datetime64[ns]
 3   rating          13652 non-null  int64         
 4   review          13639 non-null  object        
 5   name            13652 non-null  object        
 6   minutes         13652 non-null  int64         
 7   contributor_id  13652 non-null  int64         
 8   submitted       13652 non-null  datetime64[ns]
 9   tags            13652 non-null  object        
 10  nutrition       13652 non-null  object        
 11  n_steps         13652 non-null  int64         
 12  steps           13652 non-null  object        
 13  description     13652 non-null  object        
 14  ingredients     13652 non-null  object        
 15  n_i

In [7]:
test.info()

<class 'pandas.core.frame.DataFrame'>
Index: 5851 entries, 42674 to 46481
Data columns (total 16 columns):
 #   Column          Non-Null Count  Dtype         
---  ------          --------------  -----         
 0   user_id         5851 non-null   int64         
 1   recipe_id       5851 non-null   int64         
 2   date            5851 non-null   datetime64[ns]
 3   rating          5851 non-null   int64         
 4   review          5849 non-null   object        
 5   name            5851 non-null   object        
 6   minutes         5851 non-null   int64         
 7   contributor_id  5851 non-null   int64         
 8   submitted       5851 non-null   datetime64[ns]
 9   tags            5851 non-null   object        
 10  nutrition       5851 non-null   object        
 11  n_steps         5851 non-null   int64         
 12  steps           5851 non-null   object        
 13  description     5851 non-null   object        
 14  ingredients     5851 non-null   object        
 15  n_in

## IV.C. Unused Feature Removal

In [8]:
def unused_feat_removal(df, feature_to_remove):
    df.drop(columns=feature_to_remove, inplace=True)
    return df

In [9]:
feature_to_remove = ["date", "review", "name", "minutes", "contributor_id", "submitted",
                     "tags", "nutrition", "n_steps", "steps", "description", "ingredients", "n_ingredients"]

In [10]:
unused_feat_removal(train, feature_to_remove)
unused_feat_removal(test, feature_to_remove)
train.shape, test.shape

((13652, 3), (5851, 3))

In [11]:
train.head()

Unnamed: 0,user_id,recipe_id,rating
59204,794029,144195,5
51201,128473,527468,5
56295,2001345166,20376,5
61487,2001276950,293275,5
16278,1742738,448113,5


In [12]:
test.head()

Unnamed: 0,user_id,recipe_id,rating
42674,2000301121,59987,0
54112,2001102678,40725,5
47057,45027729,444189,1
41159,1277979,58047,5
12017,2000788124,8782,5


## IV.D. User-Item Matrix Creation

In [13]:
def create_mappings(train, test):
    user_mapping = {user: i for i, user in enumerate(
        train["user_id"].unique())}
    all_recipe_ids = np.union1d(
        train["recipe_id"].unique(), test["recipe_id"].unique())
    recipe_mapping = {recipe: i for i, recipe in enumerate(all_recipe_ids)}
    return user_mapping, recipe_mapping


def apply_mappings(df, user_mapping, recipe_mapping):
    df["user_id_mapped"] = df["user_id"].map(user_mapping).astype(int)
    df["recipe_id_mapped"] = df["recipe_id"].map(recipe_mapping).astype(int)
    return df


def create_sparse_user_item_matrix(df, n_users, n_items):
    rows = df["user_id_mapped"].values
    cols = df["recipe_id_mapped"].values
    data = df["rating"].values
    return csr_matrix((data, (rows, cols)), shape=(n_users, n_items))

In [14]:
user_mapping, recipe_mapping = create_mappings(train, test)
train = apply_mappings(train, user_mapping, recipe_mapping)
test = apply_mappings(test, user_mapping, recipe_mapping)
n_users = len(np.unique(np.concatenate(
    [train["user_id_mapped"].values, test["user_id_mapped"].values])))
n_items = len(np.unique(np.concatenate(
    [train["recipe_id_mapped"].values, test["recipe_id_mapped"].values])))
train_sparse_matrix = create_sparse_user_item_matrix(train, n_users, n_items)
test_sparse_matrix = create_sparse_user_item_matrix(test, n_users, n_items)

## IV.E. Final Feature Inspection

In [15]:
train_sparse_matrix.shape, test_sparse_matrix.shape

((4447, 12395), (4447, 12395))

In [16]:
def calculate_sparsity(sparse_matrix):
    matrix_size = sparse_matrix.shape[0] * sparse_matrix.shape[1]
    num_non_zero = sparse_matrix.nnz
    sparsity = (1 - (num_non_zero / matrix_size)) * 100
    return sparsity

In [17]:
train_sparsity = calculate_sparsity(train_sparse_matrix)
test_sparsity = calculate_sparsity(test_sparse_matrix)

print(f"Training Matrix Sparsity: {train_sparsity}%")
print(f"Testing Matrix Sparsity: {test_sparsity}%")

Training Matrix Sparsity: 99.9752324744857%
Testing Matrix Sparsity: 99.98938508703603%


In [18]:
train_dense_submatrix = train_sparse_matrix[:10, :10].todense()
train_submatrix = pd.DataFrame(train_dense_submatrix)
train_submatrix

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0
5,0,0,0,0,0,0,0,0,0,0
6,0,0,0,0,5,0,0,0,0,0
7,0,0,0,0,0,0,0,0,0,0
8,0,0,0,0,0,0,0,0,0,0
9,0,0,0,0,0,0,0,0,0,0


In [19]:
test_dense_submatrix = train_sparse_matrix[:10, :10].todense()
test_submatrix = pd.DataFrame(test_dense_submatrix)
test_submatrix

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0
5,0,0,0,0,0,0,0,0,0,0
6,0,0,0,0,5,0,0,0,0,0
7,0,0,0,0,0,0,0,0,0,0
8,0,0,0,0,0,0,0,0,0,0
9,0,0,0,0,0,0,0,0,0,0


## V.F. Export Data

In [20]:
with open('../../data/processed/train_sparse_matrix.pkl', 'wb') as file:
    pickle.dump(train_sparse_matrix, file)

with open('../../data/processed/test_sparse_matrix.pkl', 'wb') as file:
    pickle.dump(test_sparse_matrix, file)

with open('../../data/processed/user_mapping.pkl', 'wb') as file:
    pickle.dump(user_mapping, file)

with open('../../data/processed/recipe_mapping.pkl', 'wb') as file:
    pickle.dump(recipe_mapping, file)