# Q1 Matching Dishes: 



## Approach

To return closely matching dishes to the given flavor profile, we are using KNN algorithm (K Nearest Neighbors). In this approach we first represnt each dish in our dataset as vector in multi-dimensional space. It measures how similar any dish is to target flavor profile using a distance metric. There are multiple metrics that we can use here like Manhattan Distance(L1),  Eucldean Distance(L2) or cosine similarity. We are using euclidean distance.

Euclidean distance is well-suited for this task because it measures the straight-line distance between two points in a multi-dimensional space. Unlike cosine similarity, which focuses on direction rather than magnitude, Euclidean distance captures both the intensity and balance of flavors. Where as Manhattan distance calculates absolute difference accros each flavor, it is less sensitve to outlier. But here, we would like give more penalty to dishes that have high differences in the distance even for a single flavor.

In [39]:
import pandas as pd
import numpy as np
from sklearn.neighbors import NearestNeighbors


def load_data(path):
    df = pd.read_csv(path)
    return df

def get_flavor_values(df):
    flavor_cols = ['sweet', 'salty', 'sour', 'bitter', 'umami']
    flavor_data = df[flavor_cols]
    return flavor_data.values


In [24]:
dish_df = load_data('../csv_data/data.csv')

In [25]:
flavor_values = get_flavor_values(dish_df)

flavor_values

array([[80, 10,  5,  5,  0],
       [10, 30,  0, 20, 40],
       [15, 30,  5, 10, 20],
       ...,
       [ 0, 10,  0,  0,  0],
       [20, 30, 10, 10, 60],
       [10, 60, 20, 10, 40]], dtype=int64)

In [26]:
def fit_knn(data, distance_metric):
    knn = NearestNeighbors(metric=distance_metric)
    knn.fit(data)
    return knn

knn = fit_knn(data=flavor_values, distance_metric='euclidean')

In [27]:
# Define flavor profiles
savory_profile = np.array([[10, 25, 15, 5, 45]])
sweet_profile = np.array([[65, 5, 10, 10, 10]])
mixed_profile = np.array([[15, 25, 30, 10, 20]])

In [None]:
# taking top n neighbors for different flavor profiles

distances_savory, indices_savory = knn.kneighbors(savory_profile, n_neighbors=15)
distances_sweet, indices_sweet = knn.kneighbors(sweet_profile, n_neighbors=5)
distances_mixed, indices_mixed = knn.kneighbors(mixed_profile, n_neighbors=5)



In [35]:
distances_savory

array([[10.        , 10.        , 10.        , 10.        , 14.14213562,
        14.14213562, 14.14213562, 14.14213562, 14.14213562, 17.32050808,
        19.36491673, 19.36491673, 19.36491673, 20.        , 20.        ]])

In [36]:
indices_savory

array([[105, 370,   7,  51, 270,   5, 361,  46,  84, 395,  49, 234,  32,
          6, 125]], dtype=int64)

In [30]:
# Extract dish names
savory_dishes = dish_df.iloc[indices_savory[0]]
sweet_dishes = dish_df.iloc[indices_sweet[0]]
mixed_dishes = dish_df.iloc[indices_mixed[0]]

In [37]:
savory_dishes

Unnamed: 0,dish_name,sweet,salty,sour,bitter,umami
105,cauliflower bites,10,30,10,10,40
370,soup of the day,10,30,10,10,40
7,soup,10,30,10,10,40
51,chicken on mixed greens,10,30,10,10,40
270,shrimp salad wrap,20,30,10,10,40
5,chicken salad,20,30,10,10,40
361,chicken salad wrap,20,30,10,10,40
46,leo,20,30,10,10,50
84,chicken salad scoop,20,30,10,10,40
395,italian salad,10,30,20,20,40


In [34]:
sweet_dishes

Unnamed: 0,dish_name,sweet,salty,sour,bitter,umami
479,peanut butter,70,10,10,5,5
29,muffin,70,10,10,5,5
36,french toast,70,10,10,5,5
456,fried oreo madness,60,10,10,5,15
73,buttermilk pancakes,70,10,0,10,10


In [38]:
mixed_dishes

Unnamed: 0,dish_name,sweet,salty,sour,bitter,umami
201,vegetable cold sub,10,30,30,10,20
53,toss salad,15,20,30,10,5
184,breakfast flatbread,10,40,20,10,20
12,egg salad scoop,10,40,20,10,20
342,egg salad sub,10,40,20,10,20
