# Food Recommendation System

## Overview

This project is a vector-based food recommendation system utilizing LanceDB for full-text search (FTS), hybrid search, and vector search. It integrates the  reranker model to enhance search results and provide accurate food recommendations.

## Features

- **Vector-Based Recommendations**: Utilizes advanced vector search to find similar food items.
- **Full-Text Search (FTS)**: Enables efficient searching of food items based on text descriptions.
- **Hybrid Search**: Combines both vector search and full-text search for comprehensive results.
- **Jina Reranker Model**: Improves search result accuracy by reranking models. 




### Install required dependencies

In [None]:
#install packages
!pip install pandas
!pip install lancedb

### Download Data

For this notebook walkthrough, we will use food recommendation data from Kaggle. You can download the dataset from the following link:

Download the food recommendation data from Kaggle

https://www.kaggle.com/datasets/schemersays/food-recommendation-system

In [2]:
#Loading and Merging Data into a Single File
import pandas as pd
df = pd.read_csv("main_food.csv")
df_rating = pd.read_csv("ratings.csv")

In [3]:

main_df = pd.merge(df_rating, df, on='Food_ID', how='inner')
main_df.to_csv('main_df.csv')

In [4]:
#Now, open the main file which contains both merged datasets.
df = pd.read_csv("main_df.csv")

In [5]:
df.head()

Unnamed: 0.1,Unnamed: 0,User_ID,Food_ID,Rating,Name,C_Type,Veg_Non,Describe
0,0,1.0,88.0,4.0,peri peri chicken satay,Snack,non-veg,"boneless skinless chicken thigh (trimmed), sal..."
1,1,1.0,46.0,3.0,steam bunny chicken bao,Japanese,non-veg,"buns, all purpose white flour, dry yeast, suga..."
2,2,1.0,24.0,5.0,green lentil dessert fudge,Dessert,veg,"whole moong beans, cow ghee, raisins, whole mi..."
3,3,1.0,25.0,4.0,cashew nut cookies,Dessert,veg,"cashew paste, ghee, khaand (a sweetening agent..."
4,4,2.0,49.0,1.0,christmas tree pizza,Italian,veg,"pizza dough (2 boules), red pepper, red onion,..."


### Data Preprocessing

In [6]:
# We are adding all important columns into the text column to enhance full-text search (FTS) and overall search performance.
df['text'] = df.apply(lambda row: f"{row['Name']} {row['C_Type']} {row['Veg_Non']}: {row['Describe']}", axis=1)

In [7]:
# just chcking our text data
df['text'][0]

'peri peri chicken satay Snack non-veg: boneless skinless chicken thigh (trimmed), salt and pepper, yogurt, chilli powder, ginger garlic paste, coriander leaves, oil to fry, peri peri sauce, potato fries'

In [8]:
df.head()

Unnamed: 0.1,Unnamed: 0,User_ID,Food_ID,Rating,Name,C_Type,Veg_Non,Describe,text
0,0,1.0,88.0,4.0,peri peri chicken satay,Snack,non-veg,"boneless skinless chicken thigh (trimmed), sal...",peri peri chicken satay Snack non-veg: boneles...
1,1,1.0,46.0,3.0,steam bunny chicken bao,Japanese,non-veg,"buns, all purpose white flour, dry yeast, suga...",steam bunny chicken bao Japanese non-veg: buns...
2,2,1.0,24.0,5.0,green lentil dessert fudge,Dessert,veg,"whole moong beans, cow ghee, raisins, whole mi...",green lentil dessert fudge Dessert veg: whole ...
3,3,1.0,25.0,4.0,cashew nut cookies,Dessert,veg,"cashew paste, ghee, khaand (a sweetening agent...","cashew nut cookies Dessert veg: cashew paste, ..."
4,4,2.0,49.0,1.0,christmas tree pizza,Italian,veg,"pizza dough (2 boules), red pepper, red onion,...",christmas tree pizza Italian veg: pizza dough ...




To improve accuracy, we should include both numerical and string representations of ratings. First, add a new column, rating_str, containing the string values for each rating. Then, append both the numerical and string ratings to the text column. This approach increases the chances of achieving better accuracy.
this kind of trick exp you need to do for improving your accuracy


In [9]:
# Create a mapping from numbers to strings
num_to_string = {
    0.0: 'zero', 1.0: 'one', 2.0: 'two', 3.0: 'three', 4.0: 'four',
    5.0: 'five', 6.0: 'six', 7.0: 'seven', 8.0: 'eight', 9.0: 'nine', 10.0: 'ten'
}
# Replace numerical ratings with their string equivalents
df['Rating_str'] = df['Rating'].map(num_to_string)

In [10]:
df['Rating'] = df['Rating'].astype(int)

In [11]:
df.head()

Unnamed: 0.1,Unnamed: 0,User_ID,Food_ID,Rating,Name,C_Type,Veg_Non,Describe,text,Rating_str
0,0,1.0,88.0,4,peri peri chicken satay,Snack,non-veg,"boneless skinless chicken thigh (trimmed), sal...",peri peri chicken satay Snack non-veg: boneles...,four
1,1,1.0,46.0,3,steam bunny chicken bao,Japanese,non-veg,"buns, all purpose white flour, dry yeast, suga...",steam bunny chicken bao Japanese non-veg: buns...,three
2,2,1.0,24.0,5,green lentil dessert fudge,Dessert,veg,"whole moong beans, cow ghee, raisins, whole mi...",green lentil dessert fudge Dessert veg: whole ...,five
3,3,1.0,25.0,4,cashew nut cookies,Dessert,veg,"cashew paste, ghee, khaand (a sweetening agent...","cashew nut cookies Dessert veg: cashew paste, ...",four
4,4,2.0,49.0,1,christmas tree pizza,Italian,veg,"pizza dough (2 boules), red pepper, red onion,...",christmas tree pizza Italian veg: pizza dough ...,one


In [12]:
df['text'] = df.apply(lambda row: f"{row['text']} rating: {row['Rating']} {row['Rating_str']}", axis=1)

In [13]:
df.head()

Unnamed: 0.1,Unnamed: 0,User_ID,Food_ID,Rating,Name,C_Type,Veg_Non,Describe,text,Rating_str
0,0,1.0,88.0,4,peri peri chicken satay,Snack,non-veg,"boneless skinless chicken thigh (trimmed), sal...",peri peri chicken satay Snack non-veg: boneles...,four
1,1,1.0,46.0,3,steam bunny chicken bao,Japanese,non-veg,"buns, all purpose white flour, dry yeast, suga...",steam bunny chicken bao Japanese non-veg: buns...,three
2,2,1.0,24.0,5,green lentil dessert fudge,Dessert,veg,"whole moong beans, cow ghee, raisins, whole mi...",green lentil dessert fudge Dessert veg: whole ...,five
3,3,1.0,25.0,4,cashew nut cookies,Dessert,veg,"cashew paste, ghee, khaand (a sweetening agent...","cashew nut cookies Dessert veg: cashew paste, ...",four
4,4,2.0,49.0,1,christmas tree pizza,Italian,veg,"pizza dough (2 boules), red pepper, red onion,...",christmas tree pizza Italian veg: pizza dough ...,one


In [14]:
df = df.drop(['User_ID', 'Describe','Unnamed: 0','Rating_str'], axis=1)

In [15]:
df.head()

Unnamed: 0,Food_ID,Rating,Name,C_Type,Veg_Non,text
0,88.0,4,peri peri chicken satay,Snack,non-veg,peri peri chicken satay Snack non-veg: boneles...
1,46.0,3,steam bunny chicken bao,Japanese,non-veg,steam bunny chicken bao Japanese non-veg: buns...
2,24.0,5,green lentil dessert fudge,Dessert,veg,green lentil dessert fudge Dessert veg: whole ...
3,25.0,4,cashew nut cookies,Dessert,veg,"cashew nut cookies Dessert veg: cashew paste, ..."
4,49.0,1,christmas tree pizza,Italian,veg,christmas tree pizza Italian veg: pizza dough ...


In [16]:
# Saving our data
df.to_csv('final_food_recom_data.csv')

In [17]:
# your openai api key for embedding model
import os
os.environ['OPENAI_API_KEY'] = 'sk-proj-'

In [18]:
import numpy as np

import lancedb
from lancedb.embeddings import (
    EmbeddingFunctionRegistry,
    get_registry
)
from lancedb.pydantic import LanceModel, Vector
from lancedb.rerankers import (
    ColbertReranker,
    JinaReranker,
    CohereReranker,
    LinearCombinationReranker
)


db = lancedb.connect("/tmp/foods")

# HF sentence transformer embeddings
registry = EmbeddingFunctionRegistry.get_instance()
func = registry.get("sentence-transformers").create(device="cpu")

#uncomment below things for openai embeddings
# openai embeddings 
# func = get_registry().get("openai").create(name="text-embedding-ada-002")

class Words(LanceModel):

    text: str = func.SourceField()       # Text column is combinations of all columns
    Food_ID: str = func.SourceField()    # food id is food store name
    Name: str = func.SourceField()       #Name of menu
    Rating : str = func.SourceField()    # Rating given by users
    C_Type: str = func.SourceField()     # category type of food
    Veg_Non:str = func.SourceField()     # type of food its veg or non-veg
    vector: Vector(func.ndims()) = func.VectorField()

table = db.create_table("food_recommandations", schema=Words, mode="overwrite")
table.add(data=df)

#Full text search support
table.create_fts_index("text",replace=True)

# check our guidance for othe for reranker  models https://lancedb.github.io/lancedb/reranking/ 
#reranker = JinaReranker(api_key="key")
reranker = ColbertReranker()

query  = ' 6 rating non-veg meal ' 

#lance_reranker_hybrid = table.search(query, query_type="hybrid").rerank(reranker=reranker).limit(5).to_pandas()    # use Hybrid search also 
lance_reranker_fts = table.search(query, query_type="fts").rerank(reranker=reranker).limit(4).to_pandas() 

lance_reranker_fts

  from .autonotebook import tqdm as notebook_tqdm
2024-07-28 12:06:59.058090: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-07-28 12:06:59.065225: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-07-28 12:06:59.074678: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-07-28 12:06:59.077317: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-07-28 12:06:59.0

Unnamed: 0,Food_ID,Name,Rating,C_Type,Veg_Non,vector,text,_relevance_score
0,98,chicken potli,6,Chinese,non-veg,"[-0.04389098, 0.009811659, -0.026069013, 0.008...","chicken potli Chinese non-veg: chicken, onion,...",0.694098
1,132,coffee marinated mutton chops,6,Thai,non-veg,"[-0.04389098, 0.009811659, -0.026069013, 0.008...",coffee marinated mutton chops Thai non-veg: mu...,0.670877
2,136,malabari fish curry,6,Indian,non-veg,"[-0.04389098, 0.009811659, -0.026069013, 0.008...","malabari fish curry Indian non-veg: sear fish,...",0.670778
3,128,thai lamb balls,6,Thai,non-veg,"[-0.04389098, 0.009811659, -0.026069013, 0.008...","thai lamb balls Thai non-veg: lamb (minced), c...",0.668333


In [19]:
#recommendations
def get_recommendations(query):
    results = table.search(query, query_type="fts").rerank(reranker=reranker).limit(4).to_pandas()
    return results[['Food_ID', 'Name','C_Type', 'Veg_Non','Rating']]

# Example usage
query = 'give me rating 6 non-veg food '
recommendations = get_recommendations(query)
print(recommendations)

  Food_ID                      Name        C_Type Veg_Non Rating
0     303                  red rice  Healthy Food     veg      6
1      10  broccoli and almond soup  Healthy Food     veg      6
2      10  broccoli and almond soup  Healthy Food     veg      6
3      36     spicy watermelon soup  Healthy Food     veg      6


In [20]:
# Example usage
query = 'Non veg food near me '
recommendations = get_recommendations(query)
print(recommendations)

  Food_ID                                      Name        C_Type  Veg_Non  \
0     247                   microwave chicken steak  Healthy Food  non-veg   
1      87  roasted spring chicken with root veggies  Healthy Food  non-veg   
2      86         roast turkey with cranberry sauce  Healthy Food  non-veg   
3      86         roast turkey with cranberry sauce  Healthy Food  non-veg   

  Rating  
0      5  
1      8  
2      4  
3      4  


In [21]:
query = ' rating 8 '
recommendations = get_recommendations(query)
print(recommendations)

  Food_ID                          Name    C_Type  Veg_Non Rating
0      81             fruit infused tea  Beverage      veg      8
1     232         apple and walnut cake   Dessert      veg      8
2     292                 chicken tikka    Indian  non-veg      8
3      69  banana and maple ice lollies   Dessert      veg      8


In [22]:

query = 'red wine with chicken'
recommendations = get_recommendations(query)
print(recommendations)
#here we have only one non veg with rating 9 so getting

  Food_ID                                               Name   C_Type  \
0     185                red wine braised mushroom flatbread  Italian   
1     142  fish skewers with coriander and red wine vineg...     Thai   
2      85  garlic and pinenut soup with burnt butter essence   French   
3      85  garlic and pinenut soup with burnt butter essence   French   

   Veg_Non Rating  
0      veg      7  
1  non-veg      6  
2      veg     10  
3      veg      3  


In [23]:

query = 'veg food with rating 6'
recommendations = get_recommendations(query)
print(recommendations)
#here we have only one non veg with rating 9 so getting

  Food_ID                                               Name        C_Type  \
0     303                                           red rice  Healthy Food   
1      10                           broccoli and almond soup  Healthy Food   
2      36                              spicy watermelon soup  Healthy Food   
3     221  amaranthus granola with lemon yogurt, berries ...  Healthy Food   

  Veg_Non Rating  
0     veg      6  
1     veg      6  
2     veg      6  
3     veg      6  


In [24]:
query = ' veg  food menu only'
recommendations = get_recommendations(query)
print(recommendations)

  Food_ID                     Name        C_Type Veg_Non Rating
0     301               brown rice  Healthy Food     veg      1
1     300               black rice  Healthy Food     veg      9
2     270  jalapeno cheese fingers       Mexican     veg      3
3     270  jalapeno cheese fingers       Mexican     veg      5


In [25]:
# Example usage
query = 'rice with chicken spicy  '
recommendations = get_recommendations(query)
print(recommendations)

  Food_ID                            Name    C_Type  Veg_Non Rating
0     100             spicy chicken curry    Indian  non-veg      3
1     100             spicy chicken curry    Indian  non-veg      4
2     100             spicy chicken curry    Indian  non-veg      1
3      93  buldak (hot and spicy chicken)  Japanese  non-veg      7


In [26]:
# Example usage
query = 'coffee '
recommendations = get_recommendations(query)
print(recommendations)

  Food_ID           Name    C_Type Veg_Non Rating
0      83  spiced coffee  Beverage     veg      9
1      84  filter coffee  Beverage     veg     10
2      84  filter coffee  Beverage     veg     10
3      84  filter coffee  Beverage     veg      2


In [27]:
# Example usage
query = 'soup chinese please'
recommendations = get_recommendations(query)
print(recommendations)

  Food_ID                           Name        C_Type  Veg_Non Rating
0     162              prawn potato soup          Thai      veg      9
1      79  beetroot and green apple soup  Healthy Food      veg      1
2     302                 koldil chicken       Chinese  non-veg      5
3     298                     chicken 65       Chinese  non-veg      4



---

Due to limited data, there may be instances where mixed results are returned, especially with a recommendation limit set to 4. The key to achieving better results lies in how you prepare your text data and optimize various hyperparameters, such as query types (hybrid, FTS, vector search). Additionally, experiment with different reranker methods. For further improvements, refer to our vector recipe repository for enhancing RAG methods and consult the LanceDB documentation for more details.
docs: https://lancedb.github.io/lancedb/search/
more such genai projects:https://github.com/lancedb/vectordb-recipes

---