## Step 1: Load and Prepare the Dataset
Organize Image Paths and Labels

Use the metadata files (train.txt and test.txt) to create a Pandas DataFrame with image paths and labels.

In [1]:
import pandas as pd

# Define paths
base_path = '/Users/shubhamgaur/Desktop/NU/Sem4/Gen AI/Project/Dataset/food-101/images/'
meta_path = '/Users/shubhamgaur/Desktop/NU/Sem4/Gen AI/Project/Dataset/food-101/meta/'

# Load training and testing data
with open(meta_path + "train.txt", "r") as file:
    train_images = file.read().splitlines()
with open(meta_path + "test.txt", "r") as file:
    test_images = file.read().splitlines()

# Create DataFrames
train_df = pd.DataFrame({
    "image_path": [base_path + path + ".jpg" for path in train_images],
    "label": [path.split("/")[0] for path in train_images]
})

test_df = pd.DataFrame({
    "image_path": [base_path + path + ".jpg" for path in test_images],
    "label": [path.split("/")[0] for path in test_images]
})

print(train_df.head())
print(test_df.head())

                                          image_path      label
0  /Users/shubhamgaur/Desktop/NU/Sem4/Gen AI/Proj...  apple_pie
1  /Users/shubhamgaur/Desktop/NU/Sem4/Gen AI/Proj...  apple_pie
2  /Users/shubhamgaur/Desktop/NU/Sem4/Gen AI/Proj...  apple_pie
3  /Users/shubhamgaur/Desktop/NU/Sem4/Gen AI/Proj...  apple_pie
4  /Users/shubhamgaur/Desktop/NU/Sem4/Gen AI/Proj...  apple_pie
                                          image_path      label
0  /Users/shubhamgaur/Desktop/NU/Sem4/Gen AI/Proj...  apple_pie
1  /Users/shubhamgaur/Desktop/NU/Sem4/Gen AI/Proj...  apple_pie
2  /Users/shubhamgaur/Desktop/NU/Sem4/Gen AI/Proj...  apple_pie
3  /Users/shubhamgaur/Desktop/NU/Sem4/Gen AI/Proj...  apple_pie
4  /Users/shubhamgaur/Desktop/NU/Sem4/Gen AI/Proj...  apple_pie


## Step 2: Preprocessing the Images

### 1. Define Image Preprocessing Pipeline

Resize images to a standard input size (e.g., 224x224 for models like ResNet or MobileNet).
Normalize pixel values to a range of [0, 1]

In [2]:
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.preprocessing.image import ImageDataGenerator

# Parameters
image_size = (224, 224)
batch_size = 32

# Define data generators
train_datagen = ImageDataGenerator(
    rescale=1.0 / 255,  # Normalize pixel values
    rotation_range=20,
    zoom_range=0.2,
    horizontal_flip=True
)

test_datagen = ImageDataGenerator(rescale=1.0 / 255)  # Only rescale for test data

# Flow images from DataFrame
train_generator = train_datagen.flow_from_dataframe(
    train_df,
    x_col="image_path",
    y_col="label",
    target_size=image_size,
    batch_size=batch_size,
    class_mode="categorical"
)

test_generator = test_datagen.flow_from_dataframe(
    test_df,
    x_col="image_path",
    y_col="label",
    target_size=image_size,
    batch_size=batch_size,
    class_mode="categorical"
)


Found 75750 validated image filenames belonging to 101 classes.
Found 25250 validated image filenames belonging to 101 classes.


### 2. Encode Labels

Convert labels to one-hot encoding:

In [3]:
from sklearn.preprocessing import LabelEncoder

# Encode labels
label_encoder = LabelEncoder()
train_df["label_encoded"] = label_encoder.fit_transform(train_df["label"])
test_df["label_encoded"] = label_encoder.transform(test_df["label"])

print("Encoded Labels:", label_encoder.classes_[:5])


Encoded Labels: ['apple_pie' 'baby_back_ribs' 'baklava' 'beef_carpaccio' 'beef_tartare']


## Step 4: Define the Model

### 4.1.Use a Pre-trained Model (Transfer Learning)

Use a pre-trained model like MobileNetV2 or ResNet50.

In [4]:
from tensorflow.keras.applications import EfficientNetB0
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D, Dropout, BatchNormalization
from tensorflow.keras.callbacks import ReduceLROnPlateau, EarlyStopping

# Load pre-trained model
base_model = EfficientNetB0(weights="imagenet", include_top=False, input_shape=(224, 224, 3))
base_model.trainable = False  # Freeze pre-trained layers

# Build model
model = Sequential([
    base_model,
    GlobalAveragePooling2D(),
    BatchNormalization(),
    Dropout(0.3),
    Dense(101, activation="softmax", kernel_regularizer="l2")
])

# Compile model
model.compile(optimizer="adam", loss="categorical_crossentropy", metrics=["accuracy"])

# Callbacks
lr_scheduler = ReduceLROnPlateau(monitor="val_loss", factor=0.5, patience=3, verbose=1, min_lr=1e-6)
early_stopping = EarlyStopping(monitor="val_loss", patience=5, restore_best_weights=True)

Downloading data from https://storage.googleapis.com/keras-applications/efficientnetb0_notop.h5
[1m16705208/16705208[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 0us/step


## Step 5: Train the Model
### 5.1 Fit the Model

Train the model using the training data and validate on the testing data:

In [None]:
# Train model

history = model.fit(
    train_generator,
    validation_data=test_generator,
    epochs=30,  # Increase epochs
    steps_per_epoch=train_generator.samples // batch_size,
    validation_steps=test_generator.samples // batch_size,
    callbacks=[lr_scheduler, early_stopping]
)

  self._warn_if_super_not_called()


Epoch 1/30
[1m  90/2367[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m15:26[0m 407ms/step - accuracy: 0.0074 - loss: 6.4172

### 5.2 Visualize Training Progress

Plot accuracy and loss over epochs:

In [None]:
import matplotlib.pyplot as plt

# Plot accuracy
plt.plot(history.history["accuracy"], label="Train Accuracy")
plt.plot(history.history["val_accuracy"], label="Validation Accuracy")
plt.legend()
plt.title("Model Accuracy")
plt.show()

# Plot loss
plt.plot(history.history["loss"], label="Train Loss")
plt.plot(history.history["val_loss"], label="Validation Loss")
plt.legend()
plt.title("Model Loss")
plt.show()


## Step 6: Save and Evaluate the Model

### 6.1 Save the Trained Model

Save the model for future use:

In [None]:
model.save("food101_model.keras")

### 6.2 Evaluate on Test Data

Evaluate the model’s performance:

In [None]:
loss, accuracy = model.evaluate(test_generator)
print(f"Test Accuracy: {accuracy * 100:.2f}%")


## Step 7: Predict on New Images

### 7.1 Load the Saved Model
Load the model and predict on new images:

In [None]:
from tensorflow.keras.models import load_model
from tensorflow.keras.preprocessing.image import load_img, img_to_array

model = load_model("food101_model.keras")


# Load and preprocess a single image
img = load_img('/Users/shubhamgaur/Desktop/NU/Sem4/Gen AI/Project/Dataset/food-101/images/apple_pie/3917257.jpg', target_size=(224, 224))
img_array = img_to_array(img) / 255.0
img_array = img_array.reshape(1, 224, 224, 3)

# Predict
predictions = model.predict(img_array)
predicted_class = label_encoder.inverse_transform([predictions.argmax()])
print("Predicted Class:", predicted_class[0])


## 2. USDA FoodData Central

### Step 1: Load the Data
Read the CSV file into a Pandas DataFrame and inspect the data:

In [None]:
import pandas as pd

# Load the dataset
file_path = "/Users/shubhamgaur/Desktop/NU/Sem4/Gen AI/Project/Dataset/USDA FoodData/fda_approved_food_items_w_nutrient_info.csv"
data = pd.read_csv(file_path)

# Display column information
print(data.info())

### Step 2: Handle Missing Values
Inspect and handle missing values:

In [None]:
# Check for missing values
missing_values = data.isnull().sum()
print("Missing Values:\n", missing_values)

In [None]:
# Drop columns with more than 50% missing values
threshold = len(data) * 0.5
data = data.dropna(axis=1, thresh=threshold)

# Fill remaining missing values with 0 or other appropriate placeholders
data = data.fillna(0)

### Step 3: Rename Columns
Rename columns for easier access:

In [None]:
# Rename columns for consistency and usability
data = data.rename(columns={
    "fdc_id": "FDC_ID",
    "brand_owner": "Brand",
    "description": "Description",
    "ingredients": "Ingredients",
    "gtin_upc": "UPC",
    "serving_size": "ServingSize",
    "serving_size_unit": "ServingUnit",
    "branded_food_category": "FoodCategory",
    "modified_date": "ModifiedDate",
    "available_date": "AvailableDate",
    "Energy-KCAL": "Calories",
    "Protein-G": "Protein",
    "Total lipid (fat)-G": "Fat",
    "Carbohydrate, by difference-G": "Carbohydrates"
})


### Step 4: Convert Data Types
Convert data types for numerical and date columns:

In [None]:
# Convert date columns to datetime
data["ModifiedDate"] = pd.to_datetime(data["ModifiedDate"], errors="coerce")
data["AvailableDate"] = pd.to_datetime(data["AvailableDate"], errors="coerce")

# Convert numeric columns to appropriate data types
numeric_columns = [
    "ServingSize", "Calories", "Protein", "Fat", "Carbohydrates"
]
data[numeric_columns] = data[numeric_columns].apply(pd.to_numeric, errors="coerce")

# Fill any remaining NaN values in numeric columns
data[numeric_columns] = data[numeric_columns].fillna(0)


### Step 5: Filter and Select Relevant Columns
Drop irrelevant columns or focus only on required fields:

In [None]:
# Select relevant columns
data = data[[
    "FDC_ID", "Brand", "Description", "Ingredients", "UPC",
    "ServingSize", "ServingUnit", "FoodCategory", "Calories",
    "Protein", "Fat", "Carbohydrates", "ModifiedDate", "AvailableDate"
]]


### Step 6: Remove Duplicates
Remove duplicate rows if any:

In [None]:
data = data.drop_duplicates()


### Step 7: Save the Preprocessed Data
Save the cleaned data into a new CSV file:

In [None]:
# Save the cleaned data to a new file
output_file = "/Users/shubhamgaur/Desktop/NU/Sem4/Gen AI/Project/Dataset/USDA FoodData/cleaned_food_data.csv"
data.to_csv(output_file, index=False)

print(f"Cleaned data saved to: {output_file}")


### Step 8: Validate the Cleaned Data
Inspect the final cleaned dataset:

In [None]:
# Load and inspect the cleaned data
cleaned_data = pd.read_csv(output_file)
print(cleaned_data.info())

## Implementation Details

### Step 1: Frontend for Uploading Image

Use Streamlit to create the image upload interface:

In [None]:
import streamlit as st
from PIL import Image

# Upload an image
st.title("NutriVision")
uploaded_file = st.file_uploader("Upload a food image", type=["jpg", "png", "jpeg"])

if uploaded_file:
    image = Image.open(uploaded_file)
    st.image(image, caption="Uploaded Image", use_column_width=True)
    st.write("Analyzing the image...")


### Step 2: Run Food-101 Model
Load and run the pre-trained Food-101 model to predict the food class:

In [None]:
import tensorflow as tf

# Load the trained model
model = tf.keras.models.load_model('/Users/shubhamgaur/Desktop/NU/Sem4/Gen AI/Project/food101_model.keras')

# Preprocess the uploaded image for Food-101
def preprocess_image(image):
    image = image.resize((224, 224))  # Resize to model input size
    image = tf.keras.preprocessing.image.img_to_array(image) / 255.0  # Normalize
    image = tf.expand_dims(image, axis=0)  # Add batch dimension
    return image

# Predict the food class
if uploaded_file:
    processed_image = preprocess_image(image)
    prediction = model.predict(processed_image)
    predicted_class = prediction.argmax(axis=-1)[0]  # Get the predicted class index
    st.write(f"Predicted Food: {predicted_class}")


### Step 3: Query USDA FoodData Central
Search for the predicted class in the USDA dataset:

In [None]:
# Load the cleaned USDA dataset
usda_data = pd.read_csv("path_to_cleaned_usda_data.csv")

# Search for the predicted food in the USDA dataset
def search_usda(predicted_food, usda_data):
    # Filter rows containing the predicted food in the Description or FoodCategory
    matches = usda_data[
        usda_data["Description"].str.contains(predicted_food, case=False, na=False)
        | usda_data["FoodCategory"].str.contains(predicted_food, case=False, na=False)
    ]
    return matches

# Get matching entries
matches = search_usda(predicted_class, usda_data)

# Display top match (if available)
if not matches.empty:
    top_match = matches.iloc[0]
    st.write("Matched Food Item:", top_match["Description"])
    st.write("Nutritional Information:")
    st.write(f"Calories: {top_match['Calories']} kcal")
    st.write(f"Protein: {top_match['Protein']} g")
    st.write(f"Fat: {top_match['Fat']} g")
    st.write(f"Carbohydrates: {top_match['Carbohydrates']} g")
else:
    st.write("No matching food item found in the USDA dataset.")


### Step 4: Add Generative AI Insights (Optional)
Use a language model to generate personalized suggestions based on the nutritional data:

In [None]:
from transformers import pipeline

# Load a text generation model
generator = pipeline("text-generation", model="gpt-2")

# Generate dietary insights
if not matches.empty:
    nutrition_text = f"This {predicted_class} contains {top_match['Calories']} kcal, {top_match['Protein']} g protein, {top_match['Fat']} g fat, and {top_match['Carbohydrates']} g carbohydrates."
    suggestion = generator(f"{nutrition_text} Provide dietary advice:", max_length=50, num_return_sequences=1)
    st.write("Dietary Insight:")
    st.write(suggestion[0]["generated_text"])


End-to-End Workflow

User uploads an image.

Image is analyzed by the Food-101 model to predict the food class.

USDA FoodData Central is queried for the nutritional information of the predicted class.

The results (nutritional information) are displayed.

(Optional) Generative AI provides additional dietary insights.

Next Steps

Integrate Code: Combine the frontend, model prediction, and USDA dataset query.

Test the Pipeline: Test with a variety of food images to ensure accurate predictions and USDA matches.

Refine Matching Logic: Improve the search function for better matching between Food-101 classes and USDA dataset entries.

Add Features (Optional): Allow users to edit serving size and recalculate nutrition.

In [None]:
pip install --upgrade transformers huggingface-hub

In [None]:
from transformers import pipeline

# Initialize the text generation pipeline with authentication
generator = pipeline("text-generation", model="gpt-2", use_auth_token=True)
