# Data Scientist Assessment: Recipe Recommendation Model

## Objective: 
Design and build a model that generates or suggests new food recipes based on a set of 
ingredients, dietary preferences, and cuisine styles.

##  Dataset Collection and Preparation: 

### Load and Explore Dataset:
Load and inspect the dataset to identify structure, null values, and relevant features for the recommendation ystem/

In [1]:
import pandas as pd

In [2]:
df=pd.read_csv("recipes.csv")

In [3]:
print("Dataset Info:",df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1090 entries, 0 to 1089
Data columns (total 15 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Unnamed: 0    1090 non-null   int64  
 1   recipe_name   1090 non-null   object 
 2   prep_time     1039 non-null   object 
 3   cook_time     782 non-null    object 
 4   total_time    1045 non-null   object 
 5   servings      1090 non-null   int64  
 6   yield         879 non-null    object 
 7   ingredients   1090 non-null   object 
 8   directions    1090 non-null   object 
 9   rating        1090 non-null   float64
 10  url           1090 non-null   object 
 11  cuisine_path  1090 non-null   object 
 12  nutrition     1090 non-null   object 
 13  timing        1090 non-null   object 
 14  img_src       1090 non-null   object 
dtypes: float64(1), int64(2), object(12)
memory usage: 127.9+ KB
Dataset Info: None


In [4]:
df.shape

(1090, 15)

In [5]:
print("Sample Data:", df.head())

Sample Data:    Unnamed: 0                  recipe_name prep_time cook_time     total_time  \
0           0     Apple-Cranberry Crostada       NaN       NaN            NaN   
1           1    Apple Pie by Grandma Ople   30 mins     1 hrs  1 hrs 30 mins   
2           2  Sarah's Homemade Applesauce   10 mins   15 mins        25 mins   
3           3                  Apple Crisp   30 mins   45 mins  1 hrs 15 mins   
4           4            Apple Pie Filling   20 mins   20 mins  2 hrs 40 mins   

   servings              yield  \
0         8  6 to 8 - servings   
1         8       1 9-inch pie   
2         4                NaN   
3        12    1 9x13-inch pan   
4        40      5 9-inch pies   

                                         ingredients  \
0  3 tablespoons butter, 2 pounds Granny Smith ap...   
1  8 small Granny Smith apples, or as needed, ½ c...   
2  4  apples - peeled, cored and chopped, ¾ cup w...   
3  10 cups all-purpose apples, peeled, cored and ...   
4  18 cups thin

In [6]:
pd.set_option('display.max_colwidth', None)

### Clean Dataset:
Handle null values and remove irrelevant columns  to focus on essential recipe attributes (e.g., name, ingredients, and cuisine)


In [7]:
# Handle null values
df.isnull().sum()


Unnamed: 0        0
recipe_name       0
prep_time        51
cook_time       308
total_time       45
servings          0
yield           211
ingredients       0
directions        0
rating            0
url               0
cuisine_path      0
nutrition         0
timing            0
img_src           0
dtype: int64

In [8]:
# Drop irrelevant columns
df=df.drop(columns=['Unnamed: 0', 'yield', 'timing', 'url', 'servings', 'rating', 'img_src', 'nutrition', 'prep_time', 'cook_time', 'total_time'])

In [9]:
# normalize recipe names
df['recipe_name'] = df['recipe_name'].str.strip().str.lower()

### Clean Ingredient Data:
Remove unnecessary text and extra spaces from ingredients to ensure uniformity and easier processing

In [10]:
import re
def clean_ingredients(ingredients):
    ingredients=ingredients.split(",")
    clean_list=[]
    for item in ingredients:
        item=re.sub(r"\([^)]*\)","",item)
        item=re.sub(r" +", " ", item).strip()
        clean_list.append(item)
    return ",".join(clean_list)

df['ingredients_clean']=df['ingredients'].apply(clean_ingredients)

### Extract Cuisine Labels:
Extract the specific cuisine category from the cuisine path to use as a categorical label

In [11]:
from sklearn.preprocessing import LabelEncoder
df['cuisine_label']=df['cuisine_path'].apply(lambda x: x.split('/')[-2] )


In [12]:
df=df.drop(['ingredients', 'directions', 'cuisine_path'], axis=1)

In [13]:
df.head(1)

Unnamed: 0,recipe_name,ingredients_clean,cuisine_label
0,apple-cranberry crostada,"3 tablespoons butter,2 pounds Granny Smith apples (or other firm,crisp apples),peeled,quartered,cored and sliced 1/4-inch thick,1 pound Macintosh apples,peeled,quartered,cored,and sliced 1/4-inch thick,½ cup sugar,½ cup dried cranberries,1 sheet frozen puff pastry,thawed but still cold,1 egg white,lightly beaten,1 tablespoon sugar,1 cup Optional: Ice cream or lightly sweetened whipped cream",Apple Dessert Recipes


In [14]:
df.to_csv("cleaned_recipes.csv", index=False)

## Model Development

### Vectorize Ingredient Data:
Convert Ingredients lists into 
vectors for similarity computation

In [15]:
from sklearn.feature_extraction.text import TfidfVectorizer

vectorizer=TfidfVectorizer(tokenizer=lambda x: x.split(','), binary=False)
ingredient_vectors=vectorizer.fit_transform(df['ingredients_clean'])



### Compute Cosine Similarity:
Measure the similarit between recipes based on their ingredient vwctors

In [16]:
from sklearn.metrics.pairwise import cosine_similarity

#Compute cosine similarity matrix 
cosine_sim_matrix=cosine_similarity(ingredient_vectors)


### Build Recommendation Function:
Develop a function to suggest recipes based on ingredient similarity

In [17]:
#Function to recommend recipes
def recommend_recipes(recipe_name,num_recommendations=5,cuisine_filter=None):
    #Find the index of the given recipe
    recipe_index=df[df['recipe_name'] == recipe_name].index[0]

    #Get similarity scores for all recipes
    sim_scores = list(enumerate(cosine_sim_matrix[recipe_index]))

    #Sort recipes by similarity score
    sim_scores = sorted(sim_scores, key=lambda x: x[1],reverse=True)

    #Get the indices of the top recommended rcipes
    recommended_indices = [i[0] for i in sim_scores[1:num_recommendations + 1]]
    
    #Apply cuisine filter
    if cuisine_filter:
        recommended_indices = [
            idx for idx in recommended_indices
            if df.iloc[idx]['cuisine_label'] == cuisine_filter
        ]
    
    #Return recommended recipes
    return df.iloc[recommended_indices][['recipe_name', 'cuisine_label']]

### Test the Model:
Validate the recommendation system with a sample query and observe results.

In [18]:
# Recommend 5 recipes similar to "Espresso Biscotti"
recommendations=recommend_recipes("espresso biscotti", num_recommendations=5)
print(recommendations)


                                 recipe_name          cuisine_label
591                     banana crumb muffins  Banana Muffin Recipes
572                    persimmon pudding pie    Pudding Pie Recipes
390  peachy bread pudding with caramel sauce  Bread Pudding Recipes
0                   apple-cranberry crostada  Apple Dessert Recipes
31                  apple-cranberry crostada  Apple Dessert Recipes
