<a href="https://colab.research.google.com/github/oliviasteeed/ChefGPT/blob/main/ChefGPT_FINAL.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Chef GPT: IAT 360 Final Project
Olivia Steed & Welle Dias Ouambo

This project returns a recipe suggestion based on input available ingredients. It uses RAG retrieval to match the input to the closest recipe, and then a fine-tuned version of GPT2 to return the result in a recipe format.

### Import Dependencies

In [19]:
import torch
from transformers import GPT2Tokenizer, GPT2LMHeadModel, Trainer, TrainingArguments
from datasets import load_dataset, Dataset
import pandas as pd
import csv
import glob
from transformers import pipeline
from sentence_transformers import SentenceTransformer
import numpy as np
import ast

### Load data

In [13]:
# open combined recipes dataframe csv - this is data for RAG to draw from (run if starting from a new runtime)
df = pd.read_csv('combined_recipes_dataframe.csv')

embedding_model = SentenceTransformer('all-MiniLM-L6-v2')  # use lightweight model for embedding
embeddings = embedding_model.encode(df['Embedding'])  # load embeddings based on key ingredients

In [7]:
# load the fine-tuned GPT2 model
model = GPT2LMHeadModel.from_pretrained("/Users/oliviasteed/Desktop/test 2/gpt2_recipe_model")
tokenizer = GPT2Tokenizer.from_pretrained("/Users/oliviasteed/Desktop/test 2/gpt2_recipe_tokenizer")

### RAG retrieval and generate result

In [36]:
# RAG retrieval functions to get closest recipe and return it using llm

def retrieve_with_pandas(query, top_k=1):

    # Generate embedding for the query
    query_embedding = embedding_model.encode([query])[0]

    # Compute similarity scores

    similarities = np.dot(embeddings, query_embedding) / (np.linalg.norm(embeddings, axis=1) * np.linalg.norm(query_embedding))
    df['Similarity'] = similarities

    # Sort by similarity and return top-k results
    results = df.sort_values(by="Similarity", ascending=False).head(top_k)
    return results[["Title", "Ingredients", "Instructions", "Similarity"]]


# Define a function to use RAG
# def generate_with_rag(query):

#     # Retrieve context
#     context = retrieve_with_pandas(query)[["Title", "Ingredients", "Instructions"]]

#     # Combine context with query
#     input_text = f"Explain a recipe to make a meal using these ingredients:({query}). Start with this title: {context['Title'].iloc[0]}. Then provide a list of these ingredients: {context['Ingredients'].iloc[0]}. Then explain a list of instructions: {context['Instructions'].iloc[0]}."

#     # Use LLM to generate a response
#     response = llm(input_text, max_length=700, num_return_sequences=3, truncation=True)

#     recipe_title = f"Recipe title is: {context['Title'].iloc[0]}."
#     recipe_ingr = f"Recipe ingredients are: {context['Ingredients'].iloc[0]}."
#     recipe_instr = f"Recipe instructions are: {context['Instructions'].iloc[0]}."

#     title_response = llm(recipe_title, max_length=300, num_return_sequences=1)
#     ingr_response = llm(recipe_ingr, max_length=400, num_return_sequences=1)
#     instr_response = llm(recipe_instr, max_length=400, num_return_sequences=1)

#     # return response
#     return title_response[0]['generated_text'], ingr_response[0]['generated_text'], instr_response[0]['generated_text']


In [39]:
# function to generate recipe based on input ingredients

def generate_recipe(query):

    # prompt = f"Here's what you can make with: {ingredients} as key ingredients \n \nIngredients: {ingredients}\nInstructions:"

    context = retrieve_with_pandas(query)[["Title", "Ingredients", "Instructions"]]

    prompt = f"You can make {context['Title'].iloc[0]} with {query} as key ingredients \n\n Ingredients: {context['Ingredients'].iloc[0]} \n\n Instructions:"


    inputs = tokenizer(prompt, return_tensors="pt")

    outputs = model.generate(inputs["input_ids"], max_length=500, num_return_sequences=1, no_repeat_ngram_size=2)

    generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)

    return generated_text


### Generate response from user input

Input what ingredients you have and closest recipe will be returned with RAG, and delivered using fine-tuned GPT2.

In [40]:
# try it out
ingredients = "rice, pineapple, chicken thighs"

recipe = generate_recipe(ingredients)

print("\n*******")
print(recipe)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.



*******
You can make Peach Cobbler with rice, pineapple, chicken thighs as key ingredients 

 Ingredients: 1 c. sugar, 1 c. milk, 1 c. self-rising flour, 1 large can peaches, 1 stick butter 

 Instructions: Mix all ingredients together in a bowl and pour into a greased 9 x 13-inch pan. Bake at 350\u00b0 for 30 minutes or until a toothpick inserted comes out clean. Cool completely.
