<a href="https://colab.research.google.com/github/husam6101/Food-Detection-Recognition-and-Ingredient-Allergy-Detection/blob/main/Food_Detection%2C_Regocnition%2C_and_Ingredient_Allergy_Prediction_with_YOLOv5%2C_ResNet%2C_and_XGBoost_Fine_Tuning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Food Detection, Regocnition, and Ingredient-Allergy Prediction with YOLOv5, ResNet, and XGBoost Fine-Tuning**
**Project by:** Husam Shamseddine and Roy Zoghbi

In [None]:
#@title Mount google drive
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
# @title Imports
# TensorFlow and Keras
import tensorflow as tf
from tensorflow.keras.applications.resnet50 import ResNet50, preprocess_input, decode_predictions
from tensorflow.keras.layers import GlobalAveragePooling2D, Dense, LSTM, Embedding
from tensorflow.keras.models import Sequential, Model, load_model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import ModelCheckpoint
from tensorflow.keras.preprocessing import image as keras_image
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.preprocessing.text import Tokenizer

# Other ML and data processing libraries
import pandas as pd
import xgboost as xgb
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Other libraries
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
import os
import subprocess
import glob
import math

## **Args Definition and Initialization**

In [None]:
# @markdown # **Defining Args Class**
# @markdown For easier control over hyperparameters and training settings
# @markdown ## **Parameters**
# @markdown - `resnet_batch_size`: Batch size for the ResNet model
# @markdown - `resnet_learning_rate`: Learning rate for the resnet model
# @markdown - `resnet_epochs`: Number of epochs for the resnet model
# @markdown - `resnet_split`: Validation:Training split for the resnet model
# @markdown - `resnet_number_of_classes`: Number of classes that ResNet is being trained on,
# @markdown calculated automatically from the dataset
# @markdown - `resnet_model_path`: Save file path for ResNet
# @markdown - `yolov5_path`: Save file path for YOLOv5
# @markdown - `xgboost_batch_size`: Batch size for the XGBoost Model
# @markdown - `xgboost_learning_rate`: Learning rate for the XGBoost model
# @markdown - `xgboost_n_estimators`: Number of estimators for the XGBoost model
# @markdown - `xgboost_max_depth`: Mad depth parameter for the XGBoost model
# @markdown - `xgboost_split`: Validation:Training split for the XGBoost model
# @markdown - `xgboost_random_state`: Random state parameter for the data going into the XGBoost model
# @markdown - `xgboost_model_location`; Save file path for XGBoost


class Args:
  def __init__(
      self,
      resnet_batch_size,
      resnet_learning_rate,
      resnet_epochs,
      resnet_split,
      resnet_model_path,
      yolov5_path,
      xgboost_n_estimators,
      xgboost_learning_rate,
      xgboost_max_depth,
      xgboost_split,
      xgboost_random_state,
      xgboost_model_path
  ):
    self.resnet_batch_size = resnet_batch_size
    self.resnet_learning_rate = resnet_learning_rate
    self.resnet_epochs = resnet_epochs
    self.resnet_split = resnet_split
    self.resnet_model_path = resnet_model_path
    self.yolov5_path = yolov5_path
    self.xgboost_n_estimators = xgboost_n_estimators
    self.xgboost_learning_rate = xgboost_learning_rate
    self.xgboost_max_depth = xgboost_max_depth
    self.xgboost_split = xgboost_split
    self.xgboost_random_state = xgboost_random_state
    self.xgboost_model_path = xgboost_model_path

    self.resnet_number_of_classes = None

In [None]:
# @markdown # **Initializing Default Args**
# @markdown ## **ResNet For Object Recognition**
resnet_batch_size = 32 # @param {type:"integer"}
resnet_learning_rate = 0.0002 # @param {type:"number"}
resnet_epochs = 20 # @param {type:"integer"}
resnet_split = 0.2 # @param {type:"number"}
resnet_model_path = "/content/drive/MyDrive/project/final/ResNet_best.h5" # @param {type:"string"}
# @markdown ## **YOLOv5 for Food Object Detection**
yolov5_path = "/content/drive/MyDrive/project/final/" # @param {type:"string"}
# @markdown ## **XGBoost for Ingredient and Allergy Prediction**
xgboost_n_estimators = 100 # @param {type:"integer"}
xgboost_learning_rate = 0.05 # @param {type:"number"}
xgboost_max_depth = 4 # @param {type:"integer"}
xgboost_split = 0.2 # @param {type:"number"}
xgboost_random_state = 42 # @param {type:"integer"}
xgboost_model_path = "/content/drive/MyDrive/project/final/XGBoost_best.h5" # @param {type:"string"}

args = Args(
    resnet_batch_size = resnet_batch_size,
    resnet_learning_rate = resnet_learning_rate,
    resnet_epochs = resnet_epochs,
    resnet_split = resnet_split,
    resnet_model_path = resnet_model_path,
    yolov5_path = yolov5_path,
    xgboost_n_estimators = xgboost_n_estimators,
    xgboost_learning_rate = xgboost_learning_rate,
    xgboost_max_depth = xgboost_max_depth,
    xgboost_split = xgboost_split,
    xgboost_random_state = xgboost_random_state,
    xgboost_model_path = xgboost_model_path,
)

# deleting variables to avoid propagating them global beyond this cell
del resnet_batch_size
del resnet_learning_rate
del resnet_epochs
del resnet_split
del resnet_model_path
del yolov5_path
del xgboost_n_estimators
del xgboost_learning_rate
del xgboost_max_depth
del xgboost_split
del xgboost_random_state
del xgboost_model_path

## **YOLOv5**

### **Cloning and Pre-Treating Data**

In [None]:
# @markdown ## **Get Food Detection Data for YOLOv5**
%cd /content/
!gdown https://drive.google.com/uc?id=1vKfBWxTTu2Bcvu-MofBw4GMDsPK8AukH

/content
Downloading...
From: https://drive.google.com/uc?id=1vKfBWxTTu2Bcvu-MofBw4GMDsPK8AukH
To: /content/OID.zip
100% 976M/976M [00:14<00:00, 68.2MB/s]


In [None]:
# @markdown ## **Extract the Data from the .zip File**
%cd /content/
!unzip -o "/content/OID.zip" -d '/content/OID'

In [None]:
# @markdown # **Treating Issues with OID Data**
# @markdown The downloaded and extracted OID data is missing its data.yaml file,
# @markdown therefore it would required to modify all label .txt files to set their class to `0`,
# @markdown and create a data.yaml file for the `food` class for detection
# @markdown ## **Process**
# @markdown 1. **Modify text files for `train_coco` and `val_coco`:** Loop through all text files and set the first part of every line,
# @markdown   that refers to the class to `0`
# @markdown 2. **Change directory names:** of `train_coco`` and `val_coco` to `train` and `valid`
def modify_text_files(folder_path):
    for filename in os.listdir(folder_path):
        if filename.endswith('.txt'):  # Check if it's a text file
            file_path = os.path.join(folder_path, filename)
            with open(file_path, 'r') as file:
                lines = file.readlines()

            with open(file_path, 'w') as file:
                for line in lines:
                    if line.strip():  # Check if the line is not empty
                        file.write('0 ' + line[1:])
                    else:
                        file.write(line)

def change_folder_name(old_name, new_name):
    try:
        os.rename(old_name, new_name)
        print(f"Folder renamed from {old_name} to {new_name}")
    except OSError as e:
        print(f"Error: {e.strerror}")

modify_text_files('/content/OID/OID/labels/train_coco/')
modify_text_files('/content/OID/OID/labels/val_coco/')
change_folder_name('/content/OID/OID/labels/train_coco/', '/content/OID/OID/labels/train/')
change_folder_name('/content/OID/OID/labels/val_coco/', '/content/OID/OID/labels/valid/')

Folder renamed from /content/OID/OID/labels/train_coco/ to /content/OID/OID/labels/train/
Folder renamed from /content/OID/OID/labels/val_coco/ to /content/OID/OID/labels/valid/


In [None]:
# @markdown # **Create data.yaml**
# @markdown Create a data.yaml file and fill it as follows:
# @markdown > train: /content/OID/OID/images/train<br>
# @markdown > val: /content/OID/OID/images/valid<br><br>
# @markdown > nc: 1<br>
# @markdown > names: ['food']
print('Creating data.yaml...')
with open('/content/OID/OID/data.yaml', 'w') as config_file:
    config_file.write("train: /content/OID/OID/images/train\n")
    config_file.write("val: /content/OID/OID/images/valid\n\n")
    config_file.write("nc: 1\n")
    config_file.write("names: ['food']\n")

print('Finished.')

Creating data.yaml...
Finished.


### **Cloning, Training, and Testing of YOLOv5**

In [None]:
# @markdown ## **Clone YOLOv5 and Install Requirements**
# @markdown > Note: Often times does Colab requires a restart of the runtime
# @markdown after installing the requirements,
# @markdown and rerunning past cells becomes necessary for volatile imports, and definitions
%cd {args.yolov5_path}
!git clone https://github.com/ultralytics/yolov5
%cd yolov5
!pip install -r requirements.txt

/content/drive/MyDrive/project/final
Cloning into 'yolov5'...
remote: Enumerating objects: 16094, done.[K
remote: Counting objects: 100% (5/5), done.[K
remote: Compressing objects: 100% (5/5), done.[K
remote: Total 16094 (delta 0), reused 4 (delta 0), pack-reused 16089[K
Receiving objects: 100% (16094/16094), 14.76 MiB | 14.12 MiB/s, done.
Resolving deltas: 100% (11024/11024), done.
/content/drive/MyDrive/project/final/yolov5
Collecting gitpython>=3.1.30 (from -r requirements.txt (line 5))
  Downloading GitPython-3.1.40-py3-none-any.whl (190 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m190.6/190.6 kB[0m [31m5.0 MB/s[0m eta [36m0:00:00[0m
Collecting Pillow>=10.0.1 (from -r requirements.txt (line 9))
  Downloading Pillow-10.1.0-cp310-cp310-manylinux_2_28_x86_64.whl (3.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.6/3.6 MB[0m [31m24.9 MB/s[0m eta [36m0:00:00[0m
Collecting thop>=0.1.1 (from -r requirements.txt (line 14))
  Down

In [None]:
%cd /content/drive/MyDrive/project/final/yolov5
!pip install -r requirements.txt

/content/drive/MyDrive/project/final/yolov5


In [None]:
# @markdown # **Train Yolov5**
# @markdown Train yolo on the previously treated OID data for food detection
# @markdown > Note: Some warnings may appear due to corrupt image data,
# @markdown > but the remaining, uncorrupted data is still more than enough to get a good result
%cd {args.yolov5_path + '/yolov5'}
!python train.py --img 640 --batch 16 --epochs 10 --data '/content/OID/OID/data.yaml' --weights yolov5s.pt --cache

In [None]:
# @title **Test YOLOv5**
# @markdown # **Test YOLOv5**
# @markdown Test yolo on an image
# @markdown > Test cells is currently commented for the sake of a smoother-running process
# %cd {args.yolov5_path + yolov5}
# !python detect.py --weights /content/yolov5/runs/train/exp/weights/best.pt --img 640 --conf 0.25 --source "/content/apple-pie.png" --save-txt

## **ResNet50**

### **Kaggle Setup**

In [None]:
# @markdown # **Setup kaggle.json**
# @markdown > Upload **Kaggle.json** in */content/* then run
import os

# Create a Kaggle folder
os.makedirs('/root/.kaggle', exist_ok=True)

# Move the kaggle.json file
%cd /content/
!mv kaggle.json /root/.kaggle/
# Set permissions
!chmod 600 /root/.kaggle/kaggle.json

/content
mv: cannot stat 'kaggle.json': No such file or directory


In [None]:
# @markdown # **Download Data from Kaggle**
# @markdown Download and unzip the **food101tiny** from Kaggle in <u>*/content*</u>
%cd /content/
!kaggle datasets download -d msarmi9/food101tiny -p /content/
# unzipping contents of .zip dataset
!unzip "/content/food101tiny.zip" -d '/content/food101tiny'

/content
food101tiny.zip: Skipping, found more recently modified local copy (use --force to force download)
Archive:  /content/food101tiny.zip
replace /content/food101tiny/data/food-101-tiny/train/apple_pie/1005649.jpg? [y]es, [n]o, [A]ll, [N]one, [r]ename: 

### **ResNet Setup and Preprocessing**

In [None]:
# @markdown # **Loading Data from Directories + Augmentation**
# @markdown ## **Dataset Directory Parameters**
train_dir = '/content/food101tiny/data/food-101-tiny/train' # @param {type: "string"}
test_dir = '/content/food101tiny/data/food-101-tiny/valid' # @param {type: "string"}

# @markdown ## **Data Augmentation Parameters**
rotation_range = 20 #@param {type: "integer"}
width_shift_range = 0.2 #@param (type: "number")
height_shift_range = 0.2 #@param {type: "number"}
shear_range = 0.2 #@param {type: "number"}
zoom_range = 0.2 #@param {type: "number"}
horizontal_flip = True #@param {type: "boolean"}

# @markdown ## **Process**
# @markdown Load data and perform data augmentation using `ImageDataGenerator`
# @markdown as well as split the data through to `train_generator` and `validation_generator`
# @markdown where the `target_size` is appropriately set to `(224,224)` for ResNet fine-tuning

# Define the data augmentation and preprocessing
train_datagen = ImageDataGenerator(
    preprocessing_function=preprocess_input,
    rotation_range = rotation_range,
    width_shift_range = width_shift_range,
    height_shift_range = height_shift_range,
    shear_range = shear_range,
    zoom_range = zoom_range,
    horizontal_flip = horizontal_flip,
    fill_mode = 'nearest',
    validation_split = args.resnet_split
)

train_generator = train_datagen.flow_from_directory(
    train_dir,
    target_size = (224, 224),
    batch_size = args.resnet_batch_size,
    class_mode = 'categorical',
    subset = 'training'
)

validation_generator = train_datagen.flow_from_directory(
    test_dir,
    target_size = (224, 224),
    batch_size = args.resnet_batch_size,
    class_mode = 'categorical',
    subset = 'validation'
)

del train_dir
del test_dir
del rotation_range
del width_shift_range
del height_shift_range
del shear_range
del zoom_range
del horizontal_flip

Found 1200 images belonging to 10 classes.
Found 100 images belonging to 10 classes.


In [None]:
# @markdown # **Getting Class Number and Treating Labels**
# @markdown ## **Process**
# @markdown First, extract the number of classes as well as `class_labels` from `train_generator` (`validation_generator` would work too)
# @markdown then replace the underscores in `class_labels` with a space

args.resnet_number_of_classes = len(train_generator.class_indices)

# Getting class labels
class_indices = train_generator.class_indices
class_labels = [label.replace('_', ' ') for label in class_indices.keys()]
class_labels.sort(key=lambda label: class_indices[label.replace(' ', '_')])

print(f'Number of classes: {args.resnet_number_of_classes}')
print(f'Labels: {class_labels}')

Number of classes: 10
Labels: ['apple pie', 'bibimbap', 'cannoli', 'edamame', 'falafel', 'french toast', 'ice cream', 'ramen', 'sushi', 'tiramisu']


### **Training and Testing Fine-Tuned ResNet Model**

In [None]:
# @title Fine-Tune the Layers
# @markdown # **Finetune ResNet Layers**
# @markdown ## **Process**
# @markdown 1. Load the ResNet50 model pre-trained on ImageNet data
# @markdown 2. Freeze the layers in the base model
# @markdown 3. Add the layers:
# @markdown   1. `GlobalAveragePooling2D` to the top layer
# @markdown   2. A fully connected `Dense` layer with `ReLU` activation
# @markdown   3. A logistic `Dense` layer for the previously extracted `args.resnet_number_of_classes`
# @markdown     that will serve as an output
# @markdown 4. Compile the model
# @markdown
# @markdown
# @markdown > Note: The model uses `categorical_crossentropy` as a loss function
# @markdown and keeps track of `accuracy` in its metrics

base_model = ResNet50(weights = 'imagenet', include_top = False)
for layer in base_model.layers:
    layer.trainable = False

x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(1024, activation = 'relu')(x)
predictions = Dense(args.resnet_number_of_classes, activation = 'softmax')(x)

model = Model(inputs = base_model.input, outputs = predictions)
model.summary()
model.compile(
    optimizer = Adam(learning_rate = args.resnet_learning_rate),
    loss = 'categorical_crossentropy',
    metrics = ['accuracy'])

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/resnet/resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5
Model: "model"
__________________________________________________________________________________________________
 Layer (type)                Output Shape                 Param #   Connected to                  
 input_1 (InputLayer)        [(None, None, None, 3)]      0         []                            
                                                                                                  
 conv1_pad (ZeroPadding2D)   (None, None, None, 3)        0         ['input_1[0][0]']             
                                                                                                  
 conv1_conv (Conv2D)         (None, None, None, 64)       9472      ['conv1_pad[0][0]']           
                                                                                                  
 conv1_bn (BatchNormalizati  (None, None, None, 64

In [None]:
# @title Train the Model
# @markdown ## **Training the model**
checkpoint_path = args.resnet_model_path
# @markdown ## **Process**

# @markdown 1. Implementing checkpointing for best model with `ModelCheckpoint`
# @markdown   > Note that the checkpoint monitors `val_accuracy` in `max` mode
# @markdown     and that `save_best_only` is set to `True`
# @markdown
# @markdown
# @markdown 2. Calculate the values of `steps_per_epoch` and `validation_steps` based on (samples / batch_size)
# @markdown 3. Train the model

checkpoint = ModelCheckpoint(
    checkpoint_path,
    monitor='val_accuracy',
    verbose=1,
    save_best_only=True,
    mode='max'
)

steps_per_epoch = math.ceil(train_generator.samples / train_generator.batch_size)
validation_steps = math.ceil(validation_generator.samples / validation_generator.batch_size)

history = model.fit(
    train_generator,
    steps_per_epoch=steps_per_epoch,
    validation_data=validation_generator,
    validation_steps=validation_steps,
    epochs=args.resnet_epochs,
    callbacks = [checkpoint])

Epoch 1/20
Epoch 1: val_accuracy improved from -inf to 0.71000, saving model to /content/drive/MyDrive/project/final/ResNet_best.h5


  saving_api.save_model(


Epoch 2/20
Epoch 2: val_accuracy improved from 0.71000 to 0.75000, saving model to /content/drive/MyDrive/project/final/ResNet_best.h5
Epoch 3/20
Epoch 3: val_accuracy improved from 0.75000 to 0.77000, saving model to /content/drive/MyDrive/project/final/ResNet_best.h5
Epoch 4/20
Epoch 4: val_accuracy did not improve from 0.77000
Epoch 5/20
Epoch 5: val_accuracy did not improve from 0.77000
Epoch 6/20
Epoch 6: val_accuracy improved from 0.77000 to 0.79000, saving model to /content/drive/MyDrive/project/final/ResNet_best.h5
Epoch 7/20
Epoch 7: val_accuracy improved from 0.79000 to 0.81000, saving model to /content/drive/MyDrive/project/final/ResNet_best.h5
Epoch 8/20
Epoch 8: val_accuracy did not improve from 0.81000
Epoch 9/20
Epoch 9: val_accuracy did not improve from 0.81000
Epoch 10/20
Epoch 10: val_accuracy did not improve from 0.81000
Epoch 11/20
Epoch 11: val_accuracy improved from 0.81000 to 0.82000, saving model to /content/drive/MyDrive/project/final/ResNet_best.h5
Epoch 12/20

In [None]:
# @title Test the Model
# @markdown > Test cells are currently commented for the sake of a smoother-running process
# @markdown # **Load the Model and Test it Against an Image**
# @markdown ## **Parameters**
img_path = '/content/apple-pie.png' # @param {type:"string"}

# @markdown ## **Process**
# @markdown 1. Load the model
# @markdown 2. Load the image and resize it
# @markdown 3. Process the image into an `img_array`
# @markdown 4. Predict using the loaded model
# @markdown 5. Get the index of the top prediction
# @markdown 6. Get the top prediction's labela dn probability
# @markdown 7. Display results

# loaded_model = load_model(args.resnet_model_path)

# img = Image.load_img(img_path, target_size=(224, 224))
# img_array = Image.img_to_array(img)
# img_array = np.expand_dims(img_array, axis=0)
# img_array = preprocess_input(img_array)

# predictions = loaded_model.predict(img_array)

# # Get the index of the top prediction
# top_index = predictions[0].argmax()

# # Get the top prediction label and probability
# top_prediction_label = class_labels[top_index]
# top_prediction_probability = predictions[0][top_index]

# print("Top Prediction:", top_prediction_label)
# print("Probability:", top_prediction_probability)

## **XGBoost**

### **Loading the Dataset and Preprocessing It**

In [None]:
# @markdown # **Downloaded the dataset .csv from drive**
%cd /content/
!gdown https://drive.google.com/uc?id=1yxoxiYNQW-tylsdBX-nrqSZbzfd_nZoV

/content
Downloading...
From: https://drive.google.com/uc?id=1yxoxiYNQW-tylsdBX-nrqSZbzfd_nZoV
To: /content/food-to-allergies-optimized-dataset.csv
100% 201k/201k [00:00<00:00, 65.9MB/s]


In [None]:
# @markdown # **Load the Dataset and Preprocess it for XGBoost's Fine-Tuning**
# @markdown ## **Process**
# @markdown 1. Load the dataset
# @markdown 2. Check for missing values and handle them if found
# @markdown 3. Encode the `DISH` and `Allergy` columns separately
# @markdown 4. Define the features (`DISH`) and target (`Allergy`)
# @markdown 5. Split the dataset

df = pd.read_csv('/content/food-to-allergies-optimized-dataset.csv') # 1
df.dropna(inplace=True)  # 2. removes rows with missing values

# 3
dish_label_encoder = LabelEncoder()
df['DISH'] = dish_label_encoder.fit_transform(df['DISH'].astype(str))
allergy_label_encoder = LabelEncoder()
df['Allergy'] = allergy_label_encoder.fit_transform(df['Allergy'].astype(str))

# 4
X = df[['DISH']]
y = df['Allergy']

# 5
X_train, X_test, y_train, y_test = train_test_split(
    X,
    y,
    test_size=args.xgboost_split,
    random_state=args.xgboost_random_state
)

### **Train, Evaluate, and Setup XGBoost for Implementation with YOLOv5 and ResNet50**

In [None]:
# @markdown # **Train and Evaluate XGBoost**
# Train the XGBoost model
model = xgb.XGBClassifier(
    n_estimators=args.xgboost_n_estimators,
    learning_rate=args.xgboost_learning_rate,
    max_depth=args.xgboost_max_depth
)
model.fit(X_train, y_train)

model.save_model(args.xgboost_model_path)

# Evaluate the model
predictions = model.predict(X_test)
accuracy = accuracy_score(y_test, predictions)
print(f"Accuracy: {accuracy * 100:.2f}%")

Accuracy: 22.22%




In [None]:
# @markdown # **Extract Ingredients and Allergies, and Construct a Sentence**
# @markdown By decoding the encoded result from the prediction with the corresponding decoder
# @markdown ## **Process**
# @markdown 1. `get_ingredients_and_allergies`
# @markdown   1. The first function takes in `dish_encoded`
# @markdown   and filters through the ddataframe to find the dish
# @markdown   2. It then gets the dish `ingredients` and `allergies` associated with it
# @markdown 2. construct_sentence: Takes in the data and builds the sentence from it
def get_ingredients_and_allergies(dish_encoded, df):
    dish_data = df[df['DISH'] == dish_encoded]
    ingredients = dish_data['Food'].tolist()
    allergies = dish_data['Allergy'].tolist()
    return ingredients, allergies

def construct_sentence(dish_encoded, df, dish_name, allergy_label_encoder):
    ingredients, allergy_codes = get_ingredients_and_allergies(dish_encoded, df)
    if not ingredients or not allergy_codes:
        return f"No allergy information available for {dish_name}."

    allergy_names = allergy_label_encoder.inverse_transform(allergy_codes)
    parts = [f"{ingredient} (which may cause {allergy})" for ingredient, allergy in zip(ingredients, allergy_names)]
    sentence = f"{dish_name.title()} contains " + ', '.join(parts) + '.'
    return sentence

## **Integration of YOLOv5, ResNet50, and XGBoost**

In [None]:
# @markdown # **YOLOv5 and ResNet50 Automatic Run Funtions**
# @markdown **Short summary of the functions in this cell:**<br>
# @markdown `get_latest_exp_folder` Fetches the latest exp folder that contains the latest YOLOv5 result<br>
# @markdown `run_tolo_and_parse_results` is pretty self explanatory<br>
# @markdown `crop_image` cropped the original image to the bbox provided by YoOLOv5<br>
# @markdown `load_and_preprocess_for_resnet` also self explanatory<br>
# @markdown `predict_with_resnet` makes use of the fine-tuned resnet model to recognize which foods got detected by the YOLOv5 model

def get_latest_exp_folder(base_path):
    exp_folders = glob.glob(os.path.join(base_path, 'exp*'))
    if exp_folders:
        latest_folder = max(exp_folders, key=os.path.getmtime)
        print(latest_folder)
        return latest_folder
    else:
        return None

def run_yolo_and_parse_results(image_path, weights_path, confidence=0.25):
    print('Running YOLOv5 script')

    yolo_command = f"python {args.yolov5_path}yolov5/detect.py --weights {weights_path} --img 640 --conf {confidence} --source {image_path} --save-txt"
    print(yolo_command)
    # result = subprocess.run(yolo_command.split(), shell=True, check=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    # os.system(yolo_command)
    with subprocess.Popen(yolo_command.split(), stdout=subprocess.PIPE, stderr=subprocess.PIPE) as proc:
      stdout, stderr = proc.communicate()
      print("Output:", stdout.decode())
      print("Error:", stderr.decode())
    print('Fetching detected result')
    yolo_base_dir = f'{args.yolov5_path}yolov5/runs/detect'
    latest_exp_dir = get_latest_exp_folder(yolo_base_dir)

    print('Treating output')
    if latest_exp_dir:
        yolo_output_file = os.path.join(latest_exp_dir, 'labels', os.path.basename(image_path).replace('.png', '.txt'))
        detections = []
        if os.path.exists(yolo_output_file):
            with open(yolo_output_file, 'r') as file:
                for line in file:
                    detections.append(line.strip().split())
        return detections
    else:
        return []

def crop_image(image_path, bbox):
    with Image.open(image_path) as img:
        cropped_img = img.crop((bbox[0], bbox[1], bbox[2], bbox[3]))  # left, top, right, bottom
        return cropped_img

def load_and_preprocess_for_resnet(img):
    print('Preprocessing image for ResNet')
    # If img is a path, load the image
    if isinstance(img, str):
        img = keras_image.load_img(img, target_size=(224, 224))

    # If img is a PIL Image, resize it
    elif isinstance(img, Image.Image):
        img = img.resize((224, 224))

    img_array = keras_image.img_to_array(img)
    img_array = np.expand_dims(img_array, axis=0)
    img_array = preprocess_input(img_array)
    return img_array


# Function to predict with ResNet
def predict_with_resnet(model_path, img_array):
  print('Feeding through ResNet')
  # Load ResNet model
  loaded_model = load_model(model_path)
  predictions = loaded_model.predict(img_array)

  # Get the top prediction
  top_index = predictions[0].argmax()
  top_prediction_label = class_labels[top_index]  # class_labels must be defined previously
  top_prediction_probability = predictions[0][top_index]

  return top_prediction_label, top_prediction_probability

In [None]:
# @markdown # **Run the Process From Start To Finish**
# @markdown YOLOv5 > ResNet50 > XGBoost > Final Result
# @markdown ## **Parameters**
image_test = '/content/drive/MyDrive/project/final/apple-pie.png' # @param {type:"string"}
yolo_weights = '/content/drive/MyDrive/project/final/yolov5/runs/train/exp/weights/best.pt' # @param {type:"string"}

# @markdown ## **Process**
# @markdown 1. Run YOLOv5 and save its `detections`
# @markdown 2. Loops through `detections` and saves their coordinates adjusts them to the image dimensions
# @markdown 3. Cropps the image, preprocesses it, and runs it into the ResNet50 model
# @markdown 4. After the label is output from the ResNet50 model, it gets encoded and runs through XGBoost
# @markdown 5. XGBoost then outputs the ingredients and their associated allergies,
# @markdown   which then get processed and output in a easy-to-understand format

# Updated usage of YOLO and ResNet
detections = run_yolo_and_parse_results(image_test, yolo_weights)

for detection in detections:
    # Convert bbox coordinates from YOLO format
    # Assuming bbox is [class, x_center, y_center, width, height]
    x_center, y_center, width, height = map(float, detection[1:5])
    image_width, image_height = Image.open(image_test).size
    left = int((x_center - width / 2) * image_width)
    top = int((y_center - height / 2) * image_height)
    right = int((x_center + width / 2) * image_width)
    bottom = int((y_center + height / 2) * image_height)
    bbox = [left, top, right, bottom]

    cropped_img = crop_image(image_test, bbox)
    img_array = load_and_preprocess_for_resnet(cropped_img)
    label, probability = predict_with_resnet(args.resnet_model_path, img_array)
    print("Top Prediction:", label)
    print("Probability:", probability)

    new_dish = label
    try:
      # Load the XGBoost model
      xgboost_model = xgb.XGBClassifier()
      xgboost_model.load_model(args.xgboost_model_path)  # Ensure this is the correct path to your model file

      new_dish_encoded = dish_label_encoder.transform([new_dish.lower()])[0]
      predicted_allergy = xgboost_model.predict([[new_dish_encoded]])[0]
      detailed_info = construct_sentence(new_dish_encoded, df, new_dish, allergy_label_encoder)
      print(detailed_info)
    except ValueError as e:
        # Handle the case where the dish is not in the dataset
        print(f"The dish '{label}' is not available in the dataset.")

Running YOLOv5 script
python /content/drive/MyDrive/project/final/yolov5/detect.py --weights /content/drive/MyDrive/project/final/yolov5/runs/train/exp/weights/best.pt --img 640 --conf 0.25 --source /content/drive/MyDrive/project/final/apple-pie.png --save-txt
Output: 
Error: [34m[1mdetect: [0mweights=['/content/drive/MyDrive/project/final/yolov5/runs/train/exp/weights/best.pt'], source=/content/drive/MyDrive/project/final/apple-pie.png, data=drive/MyDrive/project/final/yolov5/data/coco128.yaml, imgsz=[640, 640], conf_thres=0.25, iou_thres=0.45, max_det=1000, device=, view_img=False, save_txt=True, save_csv=False, save_conf=False, save_crop=False, nosave=False, classes=None, agnostic_nms=False, augment=False, visualize=False, update=False, project=drive/MyDrive/project/final/yolov5/runs/detect, name=exp, exist_ok=False, line_thickness=3, hide_labels=False, hide_conf=False, half=False, dnn=False, vid_stride=1
YOLOv5 🚀 v7.0-249-gf400bba Python-3.10.12 torch-2.1.0+cu118 CPU

Fusing lay



Top Prediction: apple pie
Probability: 0.90357524
Apple Pie contains Butter (which may cause Milk allergy / Lactose intolerance), Butter bean (which may cause Legume Allergy), Buttermilk (which may cause Milk allergy / Lactose intolerance), Sugar (which may cause Sugar Allergy / Intolerance), Sugar beet (which may cause Sugar Allergy / Intolerance), Sugarcane (which may cause Sugar Allergy / Intolerance), Apple (which may cause Oral Allergy Syndrome), Pineapple (which may cause Oral Allergy Syndrome), Chestnut (which may cause Nut Allergy), Ginkgo nut (which may cause Nut Allergy), Peanut (which may cause Peanut Allergy), Walnut (which may cause Nut Allergy).
Preprocessing image for ResNet
Feeding through ResNet




Top Prediction: ice cream
Probability: 0.520703
Ice Cream contains Buttermilk (which may cause Milk allergy / Lactose intolerance), Milk (which may cause Milk allergy / Lactose intolerance).
