In [1]:
# Import libraries 
import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.metrics import jaccard_score

# 6.0 Recipe recommender system

After training the model using TensorFlow, the model will output the ingredient name followed by its freshness level. With this information, it can be input into a recommender system, which is then able to generate a few recipes that can be used to cook the ingredient. Now, we will be building several recommender systems and comparing them to determine which is the most trustworthy

## 6.1 Loading of data and perform check on the dataset

In [2]:
# Loading the data 
ingredient_df = pd.read_excel('recipe_list_2.0.xlsx', sheet_name='ingredients')
recipes_df = pd.read_excel('recipe_list_2.0.xlsx', sheet_name='recipes')


In [3]:
# Checking that both dataset has the same set of recipe_id 
ingredient_df.set_index('recipe_id', inplace=True)
recipes_df.set_index('recipe_id', inplace=True)

matching_ids = ingredient_df.index.isin(recipes_df.index)
if not matching_ids.all():
    missing_ids = ingredient_df.index[~matching_ids]
    print(f"Missing recipe IDs in recipes_df: {missing_ids}")
else:
    print("All recipe IDs in ingredient_df have corresponding entries in recipes_df.")

All recipe IDs in ingredient_df have corresponding entries in recipes_df.


In [4]:
# Display the first few rows of the dataframe
ingredient_df.head()

Unnamed: 0_level_0,cabbage _fresh,cabbage_slightly_unfresh,cauliflower_fresh,cauliflower_slightly_unfresh,cherry_tomatoes_fresh,cherry_tomatoes_slightly_unfresh,green_chili_fresh,red_chili_fresh,red_chili_slightly_unfresh,tomatoes_fresh,tomatoes_slightly_unfresh
recipe_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
1,1,0,0,0,0,0,0,0,0,0,0
2,0,1,0,0,0,0,0,0,0,0,0
3,0,0,1,0,0,0,0,0,0,0,0
4,0,0,0,1,0,0,0,0,0,0,0
5,0,0,0,0,1,0,0,0,0,0,0


In [5]:
recipes_df.head()

Unnamed: 0_level_0,recipe_name,cooking_method,recipe_details
recipe_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,Classic Coleslaw,raw,Ingredients:\n\n1/2 head of fresh green cabbag...
2,Stir-Fried Cabbage,stir-fried,Ingredients:\n\n1 head of cabbage (fresh or sl...
3,Roasted Cauliflower Steaks,roast,Ingredients:\n\n1 large head of fresh cauliflo...
4,Stir-Fried Cauliflower,stir-fried,"Ingredients:\n\n1 head of cauliflower, cut int..."
5,Cherry Tomato Bruschetta,raw,"Ingredients:\n\n2 cups fresh cherry tomatoes, ..."


## 6.2.0 Recommender system using on Jaccard score


###  6.2.1 create_user_profile function

This function is used to create a user profile based on the liked recipes. In your scenario, instead of using liked recipes, you'll use it to create a profile based on the identified ingredient. For example, if tomatoes_fresh is identified, you'll create a user profile vector where tomatoes_fresh is set to 1 and all other ingredients are set to 0.

In [19]:
def create_single_ingredient_profile(data, ingredient):
    # Initialize a user profile with zeros
    user_profile = pd.Series(0, index=data.columns)
    # Set the ingredient identified by the image classifier to 1
    user_profile[ingredient] = 1
    return user_profile

# Create the user profile
user_profile = create_single_ingredient_profile(ingredient_df, 'red_chili_slightly_unfresh')

###  6.2.2 calculate_similarity function
This function calculates the Jaccard similarity scores between all pairs of recipes in your dataset. It creates a square matrix where the value at row i and column j represents the similarity between recipe i and recipe j based on their ingredients.

In [20]:
def calculate_similarity(data):
    # Create an empty DataFrame to store similarity scores
    similarity_matrix = pd.DataFrame(index=data.index, columns=data.index)

    # Calculate Jaccard similarity between each pair of recipes
    for i in data.index:
        for j in data.index:
            similarity_matrix.loc[i, j] = jaccard_score(data.loc[i], data.loc[j])

    return similarity_matrix

# Calculate the similarity matrix
similarity_matrix = calculate_similarity(ingredient_df)

###  6.2.3 calculate_similarity function
This function takes a user profile (which could be the vector representing tomatoes_fresh) and the similarity matrix, and it calculates the similarity of the user's preference to each recipe in the matrix. It then sorts these scores and returns the top N recipe IDs as recommendations.

In [21]:
def recommend_recipes(user_profile, data, top_n):
    # Ensure the user profile is in the correct format
    user_profile_df = pd.DataFrame([user_profile])

    # Create an empty Series to store similarity scores
    scores = pd.Series(index=data.index, dtype='float64')

    # Calculate Jaccard similarity between the user profile and each recipe
    for recipe_id in data.index:
        scores[recipe_id] = jaccard_score(user_profile_df.iloc[0], data.loc[recipe_id], average='binary')

    # Sort the scores in descending order and take the top n scores
    recommendations = scores.sort_values(ascending=False).head(top_n).index
    return recommendations

In [22]:
# Get top N recommendations
top_n_recommendations = recommend_recipes(user_profile, ingredient_df, 3)

# Retrieve recommended recipes details
recommended_recipes = recipes_df.loc[top_n_recommendations]
recommended_recipes

Unnamed: 0_level_0,recipe_name,cooking_method,recipe_details
recipe_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
9,Sambal Belacan with Shrimp,stir-fried,"Ingredients:\n\n200 grams of shrimp, peeled an..."
10,Tomato Egg Drop Soup,boil,"Ingredients:\n\n4 ripe tomatoes, chopped (cann..."
16,Steamed Cauliflower with Minced Meat and Red C...,steam,"Ingredients:\n\n1 head of cauliflower, cut int..."


###  6.2.4 Review of recommender system using Jaccard score

When the ingredient 'red chili' and its freshness level of 'slightly unfresh' are detected, the system recommends three recipes. Out of these three, only one recipe, 'Sambal Belacan with Shrimp,' can be used. This is because the rest of the recipes do not contain the ingredient 'red chili' at a 'slightly unfresh' freshness level.

## 6.3.0 Recommender system using cosine similarity 

In [24]:
def create_single_ingredient_profile(data, ingredient):
    # Initialize a user profile with zeros
    user_profile = pd.Series(0, index=data.columns, name='user_profile')
    # Set the ingredient identified by the image classifier to 1
    user_profile[ingredient] = 1
    return user_profile

# Function to calculate cosine similarity between the user profile and each recipe
def calculate_cosine_similarity(data, user_profile):
    # Concatenate the user_profile to the data as if it was another recipe
    data_with_user = pd.concat([data, user_profile.to_frame().T])
    # Compute the cosine similarity matrix
    cosine_sim = cosine_similarity(data_with_user)
    # Get the similarity values for the user profile against all recipes
    user_similarity_scores = cosine_sim[-1][:-1]  # Exclude the last element, which is the user itself
    return user_similarity_scores

# Function to recommend top N recipes based on the cosine similarity scores
def recommend_recipes_cosine(data, user_similarity_scores, top_n):
    # Get the indices of the top N scores
    top_indices = np.argsort(user_similarity_scores)[::-1][:top_n]
    # Retrieve the recipe IDs corresponding to these indices
    recommendations = data.iloc[top_indices].index
    return recommendations

# Create the user profile
user_profile = create_single_ingredient_profile(ingredient_df, 'red_chili_slightly_unfresh')

# Calculate the cosine similarity scores
user_similarity_scores = calculate_cosine_similarity(ingredient_df, user_profile)

# Get the top N recommendations
top_n_recommendations_cosine = recommend_recipes_cosine(ingredient_df, user_similarity_scores, 3)

# Retrieve recommended recipes details assuming recipes_df is a pandas DataFrame with recipe details
recommended_recipes_cosine = recipes_df.loc[top_n_recommendations_cosine]
recommended_recipes_cosine


Unnamed: 0_level_0,recipe_name,cooking_method,recipe_details
recipe_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
9,Sambal Belacan with Shrimp,stir-fried,"Ingredients:\n\n200 grams of shrimp, peeled an..."
8,Red Chili Sambal,stir-fried,Ingredients:\n\n200 grams red chilies (adjust ...
2,Stir-Fried Cabbage,stir-fried,Ingredients:\n\n1 head of cabbage (fresh or sl...


###  6.3.1 Review of recommender system using cosine similarity

When the ingredient 'red chili' and its freshness level of 'slightly unfresh' are detected, the system recommends three recipes. Out of these three, only one recipe, 'Sambal Belacan with Shrimp,' can be used. This is because the rest of the recipes do not contain the ingredient 'red chili' at a 'slightly unfresh' freshness level.

## 6.4.0  Recommender system using filter-based recommendation

In [25]:
def create_user_profile(ingredients, data):
    # Report the number of ingredients
    print(f"Number of ingredients provided: {len(ingredients)}")
    
    # Initialize a user profile with zeros
    user_profile = pd.Series(0, index=data.columns)
    
    # Set the ingredients to 1
    for ingredient in ingredients:
        if ingredient in user_profile.index:
            user_profile[ingredient] = 1
        else:
            print(f"Ingredient '{ingredient}' not found in the data.")
    return user_profile

def recommend_recipes_by_profile(user_profile, data):
    # Filter recipes by matching the user profile
    mask = (data * user_profile).sum(axis=1) > 0
    recommended = data[mask]
    return recommended.index

# Manually input the ingredients
input_ingredients = ['red_chili_slightly_unfresh']  # replace with your input method if needed

# Create the user profile with the input ingredients
user_profile = create_user_profile(input_ingredients, ingredient_df)

# Get recommended recipe IDs
recommended_recipe_ids = recommend_recipes_by_profile(user_profile, ingredient_df)

# Retrieve recommended recipes details
recommended_recipes = recipes_df.loc[recommended_recipe_ids]
recommended_recipes


Number of ingredients provided: 1


Unnamed: 0_level_0,recipe_name,cooking_method,recipe_details
recipe_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
9,Sambal Belacan with Shrimp,stir-fried,"Ingredients:\n\n200 grams of shrimp, peeled an..."


###  6.4.0 Review of recommender system using filter-based recommendation
When the ingredient 'red chili' and its freshness level of 'slightly unfresh' are detected, the system recommends one recipe: 'Sambal Belacan with Shrimp'. This is accurate, as the entire dataset only contains one recipe that can be used based on this ingredient and freshness level.

# 7.0 Key Findings and Limitations

**Key finding for Image Classification Model:**

During the testing phase, I noticed that placing an ingredient on a white background in a well-lit area yields more accurate results compared to positioning it on a non-white background with poor lighting. This may be because ingredients that are not fresh tend to be darker or have a brownish color. Therefore, using a white background with good lighting is sensible for enhancing the accuracy of the results.

**Key finding for Recommendation System:**

During the testing phase, I observed that filter-based recommendations performed better compared to those using Jaccard scores and cosine similarity. With the latter methods, the system often recommended recipes that could not be used because the detected ingredients were not found in the recipe list. However, the filter-based recommendation consistently provided the correct recipes that utilized all of the identified ingredients.

**Limitations of Image Classification Model:**

Given the time constraints of this project, I was only able to source a dataset of fresh ingredients online; there was no time to document the transition of ingredients from fresh to slightly unfresh and then to unfresh. This significant limitation impacted the model's performance, with the best accuracy being only 0.713 (achieved by EfficientNetB3). One key reason for the low model performance might be the lack of a dataset with various freshness levels for each ingredient.

Moreover, most of the ingredient images were taken from an online dataset, which means the size and physical appearance of these ingredients may differ from those we find in local supermarkets. Consequently, the model's performance may decline when tested with local ingredients.


# 7.1 Learning and Future Improvement

**Learning from Image Augmentation and future improvements:**

When augmenting images, try not to augment the same image more than three times for the training dataset, as it may lead to model bias. Always aim to include a greater variety in the images of the ingredients. Ensure a good spread of images with varying lighting conditions; for instance, include pictures of fresh tomatoes taken in well-lit areas as well as in poorly lit areas in the training dataset.

**Learning from Preprocessing Images for modelling and future improvements:**

When using pre-built models from keras.applications, you have the option to use the preprocess_input function from the respective pre-built model that you are utilizing. However, using preprocess_input from the respective pre-built model might cause your model to perform worse during training. Therefore, you can opt to preprocess the data yourself. It is always recommended to first train the model using preprocess_input from the respective pre-built model; if it does not yield good results, then consider preprocessing the data on your own.

**Learning from Building the model and future improvements:**

After training the model, I noticed that using a heavier or larger model does not always yield the best accuracy. Heavier models require significant computing resources for training and also necessitate longer testing times. One area for improvement in modeling would be to select some lightweight and medium-weight models for training as well. 


# 7.2 Conclusion

In conclusion, the ZeroWasteMate project, aimed at reducing food waste in Singapore, yielded valuable insights. Key findings include the superior performance of filter-based recommendations and the importance of image background and lighting in ingredient recognition. The project encountered limitations, notably in dataset diversity and model performance. Future improvements should focus on expanding the dataset and exploring lightweight models. Despite challenges, ZeroWasteMate represents a meaningful step towards environmentally sustainable food management, demonstrating the significant role of technology in ecological conservation.




