# Recipe Recommender System

## Dataset Description

The dataset, structured in a user-friendly CSV format, comprises detailed information on recipes that users have published on Food.com. Here's the columns of the dataset:

1. Basic Information:
- id: Unique identifier for each recipe.
- name: The title of the recipe.
- description: A brief overview of the recipe.

2. Detailed Recipe Breakdown:
- steps: Step-by-step cooking instructions.

3. Ingredient Insights:
- ingredients_raw: Original list of ingredients as written in the recipe.
- ingredients: Cleaned and processed list of ingredients.

4. Additional Details:
- servings: Suggested number of servings.
- serving_size: The amount per serving.
- tags: Keywords associated with the recipe (e.g., vegan, quick, summer).

Data : https://www.kaggle.com/datasets/realalexanderwei/food-com-recipes-with-ingredients-and-tags?resource=download

## Data Preprocessing

In [99]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns 

import os
from collections import Counter
import ast 

In [68]:
data_folder = 'C:\\Users\\Solid State Drive\\Documents\\GitHub\\empathy-food-waste'
file_path = os.path.join(data_folder, 'recipes_ingredients.csv')

orig_df = pd.read_csv(file_path)
df = pd.read_csv(file_path)

In [69]:
df

Unnamed: 0,id,name,description,ingredients,ingredients_raw,steps,servings,serving_size,tags
0,71247,Cherry Streusel Cobbler,"I haven't made this in years, so I'm just gues...","[""cherry pie filling"", ""condensed milk"", ""melt...","[""2 (21 ounce) cans cherry pie filling"",""2...","[""Preheat oven to 375Â°F."", ""Spread cherry pie...",6.0,1 (347 g),"[""60-minutes-or-less"", ""time-to-make"", ""course..."
1,76133,Reuben and Swiss Casserole Bake,I think this is even better than a reuben sand...,"[""corned beef chopped"", ""sauerkraut cold water...","[""1/2-1 lb corned beef, cooked and choppe...","[""Set oven to 350 degrees F."", ""Butter a 9 x 1...",4.0,1 (207 g),"[""60-minutes-or-less"", ""time-to-make"", ""course..."
2,503816,Yam-Pecan Recipe,A lady I work with heard me taking about ZWT a...,"[""unsalted butter"", ""vegetable oil"", ""all - pu...","[""3/4 cup unsalted butter, at room tempera...","[""Preheat oven to 350Â°F In a mixing bowl, us...",8.0,1 (198 g),"[""time-to-make"", ""course"", ""main-ingredient"", ..."
3,418749,Tropical Orange Layer Cake,An easy and delicious cake. Great for a summ...,"[""orange cake mix"", ""instant vanilla pudding"",...","[""1 (18 ounce) pkge.orange cake mix"",""1 (3...","[""In a large mixing bowl, combine the first 6 ...",16.0,1 (191 g),"[""60-minutes-or-less"", ""time-to-make"", ""course..."
4,392934,Safe to Eat Raw Chocolate Chip Oreo Cookie &qu...,I was searching the web for something like thi...,"[""butter"", ""brown sugar"", ""granulated sugar"", ...","[""1/2 cup butter, room temperature "",""1/2 ...","[""Cream butter and sugars together."", ""Blend i...",24.0,1 (26 g),"[""15-minutes-or-less"", ""time-to-make"", ""course..."
...,...,...,...,...,...,...,...,...,...
500466,173790,Sausage Meatloaf OAMC,"I grew up on this meatloaf, and I love it. It...","[""ground beef"", ""ground sausage"", ""celery chop...","[""1 lb lean ground beef"",""1/2 lb grou...","[""Combine ground beef, sausage, celery, onion,...",6.0,1 (244 g),"[""time-to-make"", ""course"", ""main-ingredient"", ..."
500467,301838,Potato Salad With Olives Capers and Parmesan,Cooking Light.,"[""red potatoes"", ""fresh ground black pepper"", ...","[""3 lbs small red potatoes"",""2 tablespo...","[""Preheat oven to 375Â°."", ""To prepare potatoe...",8.0,1 (223 g),"[""time-to-make"", ""course"", ""main-ingredient"", ..."
500468,130682,Chocolate Banana Pound Cake,"This recipe was passed to me by my mother, and...","[""butter"", ""brown sugar"", ""bananas"", ""buttermi...","[""1 cup butter (room temperature)"",""2 l...","[""Preheat oven to 350 degrees."", ""Grease and f...",24.0,1 (99 g),"[""time-to-make"", ""course"", ""main-ingredient"", ..."
500469,353659,Cheesy Ground Beef and Rice Casserole,This is one of those comfort food dishes I cre...,"[""ground beef"", ""medium onion chopped"", ""brown...","[""1 1/2 lbs ground beef"",""1 medium on...","[""*What I mean by ""partly drained"" with the mu...",6.0,1 (394 g),"[""60-minutes-or-less"", ""time-to-make"", ""prepar..."


In [70]:
df.dtypes

id                   int64
name                object
description         object
ingredients         object
ingredients_raw     object
steps               object
servings           float64
serving_size        object
tags                object
dtype: object

### Datatype Conversion

In [71]:
df['id'] = df['id'].astype(str)
df['name'] = df['name'].astype(str)
df['description'] = df['description'].astype(str)
df['serving_size'] = df['serving_size'].astype(str)

In [72]:
# Convert lists to strings to check duplicates
df['tags'] = df['tags'].apply(lambda x: str(x) if isinstance(x, list) else x)
df['ingredients	'] = df['ingredients'].apply(lambda x: str(x) if isinstance(x, list) else x)
df['ingredients_raw'] = df['ingredients_raw'].apply(lambda x: str(x) if isinstance(x, list) else x)
df['steps'] = df['steps'].apply(lambda x: str(x) if isinstance(x, list) else x)

In [73]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 500471 entries, 0 to 500470
Data columns (total 10 columns):
 #   Column           Non-Null Count   Dtype  
---  ------           --------------   -----  
 0   id               500471 non-null  object 
 1   name             500471 non-null  object 
 2   description      500471 non-null  object 
 3   ingredients      500471 non-null  object 
 4   ingredients_raw  500436 non-null  object 
 5   steps            500471 non-null  object 
 6   servings         499749 non-null  float64
 7   serving_size     500471 non-null  object 
 8   tags             500436 non-null  object 
 9   ingredients	     500471 non-null  object 
dtypes: float64(1), object(9)
memory usage: 38.2+ MB


### Cleaning

In [74]:
# Drop missing rows 
# ----------------------
# do we still need?
# we can still display recipes without tags or specified servings?

### Duplication

In [75]:
# Duplicated rows
print("duplicated rows in recipes dataset:", df.duplicated().sum())

duplicated rows in recipes dataset: 0


In [86]:
# Convert back to original for processing later
df['tags'] = orig_df['tags']
df['ingredients'] = orig_df['ingredients']
df['ingredients_raw'] = orig_df['ingredients_raw']
df['steps'] = orig_df['steps']

### Count Unique Tags

Might need to revise as a data contains malformed string (" and \ problems). A placeholder is used for now

In [83]:
def safe_literal_eval(s):
    # Check for NaN values first to avoid trying to parse them
    if pd.isna(s):
        return s  # Return NaN as is, or you can choose to return an empty list or another placeholder
    
    try:
        # Attempt to parse string using ast.literal_eval
        return ast.literal_eval(s)
    except (ValueError, SyntaxError):  # Catch syntax errors in addition to value errors
        return []  

In [87]:
df['tags'] = df['tags'].apply(lambda x: ast.literal_eval(x) if pd.notna(x) else x)

In [90]:
df['ingredients'] = df['ingredients'].apply(lambda x: ast.literal_eval(x) if pd.notna(x) else x)

Did not convert other columns first as they contain syntax errors(?)

In [91]:
# # Convert strings back to lists for accessibility
# df['ingredients'] = df['ingredients'].apply(lambda x: ast.literal_eval(x) if pd.notna(x) else x)
# df['ingredients_raw'] = df['ingredients_raw'].apply(lambda x: ast.literal_eval(x) if pd.notna(x) else x)
# df['steps'] = df['steps'].apply(lambda x: ast.literal_eval(x) if pd.notna(x) else x)

In [89]:
# Now, extract all tags from the DataFrame's 'tags' column
# This also includes a check to ensure we only attempt to iterate over lists
all_tags = [tag for sublist in df['tags'] if isinstance(sublist, list) for tag in sublist]

# Count the occurrence of each tag
tags_counter = Counter(all_tags)

# Sort tags alphabetically and prepare a list of tuples with tag and its count
sorted_tags_with_count = sorted(tags_counter.items())

sorted_tags_with_count

[('', 388),
 (' juice and zest of', 1),
 (' recipes', 2),
 (' recipies', 2),
 ('1-day-or-more', 4956),
 ('15-minutes-or-less', 84516),
 ('3-steps-or-less', 105146),
 ('30-minutes-or-less', 117567),
 ('4-hours-or-less', 113460),
 ('5-ingredients-or-less', 71605),
 ('60-minutes-or-less', 155872),
 ('Cool Whip', 1),
 ('Throw the ultimate fiesta with this sopaipillas recipe from Food.com.', 1),
 ('a1-sauce', 103),
 ('african', 4598),
 ('american', 62117),
 ('amish-mennonite', 349),
 ('angolan', 19),
 ('appetizers', 45980),
 ('appetizers-seafood', 1),
 ('apple-pie', 1),
 ('apples', 9565),
 ('april-fools-day', 23),
 ('argentine', 198),
 ('artichoke', 183),
 ('asian', 27195),
 ('asparagus', 2891),
 ('australian', 5040),
 ('austrian', 320),
 ('avocado', 406),
 ('bacon', 3217),
 ('baja', 319),
 ('baked-beans', 1),
 ('baking', 1272),
 ('bananas', 4041),
 ('bar-cookies', 7126),
 ('barbecue', 6757),
 ('bass', 168),
 ('bath-beauty', 41),
 ('bean-soup', 5),
 ('beans', 19493),
 ('beans-side-dishes', 

### Adding Features/Columns

In [93]:
# maybe no need muna coz lots of preferences to consider?

## Modelling