# Recipe finder Chatbot

We will build a chatbot that allows the end user to query for a vegetarian recipe. Here are the steps that we'd need to follow to make this bot work:

- Context creation: load the recipe document.
- Generate embeddings: leverage text splitters to create embeddings.
- Vector store: store the embeddings in a local vector store. 
- Retriever creation: test the weighted similarity of the query against the store.
- Prompt engineering: create prompt for the given user query and the retrieved embeddings.
- Response parsing: query against the model and parse the reponse in human readable format.      

## Context creation
Load the recipe document.

In [None]:
from langchain_community.document_loaders import PyPDFLoader

recipe_file = "assets/vegetarian-recipes.pdf"

loader = PyPDFLoader(recipe_file)
pages = []

async for page in loader.alazy_load():
    pages.append(page)

print("Number of pages extracted from the recipe file: ", len(pages))
page_100 = str(pages[100])
print("\nSample 1000 characters of 100th page: ", page_100[50:1050])

Number of pages extracted from the recipe file:  198

Sample 1000 characters of 100th page:   TOMATOES 
Put 1 tablespoon of butter in a frying pan, and when melted lay in thickly sliced 
tomatoes which have been rolled in egg and crumbs; when browned on one side turn 
them with a pancake turner and brown the other side, seasoning with pepper and salt. 
Remove to the serving dish with a pancake turner, seasoning the first side cooked 
after they are turned onto the dish. A half a teaspoon of onion juice may be added to 
the butter in which they are cooking if desired. Serve plain or with white sauce. 
 
DEVILLED TOMATOES 
Cut in half and broil three or four nice solid tomatoes, and serve them with a sauce 
made as follows: Take the yolks of 4 hard-boiled eggs and crush them with a fork, 
add to them a scant teaspoon of dry mustard, 1 heaping saltspoon of salt, and several 
shakes of paprika, or a dash of cayenne pepper; mix these dry ingredients well 
together, and then add to them 5 ta

## Generate Embeddings 

Each model comes with a certain context lenght limit. In many cases, passing the entire document as context may not be allowed (at worst) and might be cost expensive (at best). A potential solution for this problem is to split the "docuement" into a bunch of relevant information chunks to answer a particular question. 

Each of these chunks are further loaded into the local vector store. The store leverages cosine similarity to identify the top few closest matching embeddings to the question. This is then passed as input to the model.

### Recipe document split

In [None]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

# split the chunks in 1000 character count with 20 character overlaps
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=20)
text_splitter.split_documents(pages)[:10]

[Document(metadata={'source': 'assets/vegetarian-recipes.pdf', 'page': 1}, page_content='Veggie! - www.obooko.com \n2 \n \n \n \nVeggie! \nA bumper harvest of delicious meat-free recipes \nwith the compliments of Obooko \n \nEdited by Harrison Parke \n \nThis free edition is published by Obooko Publishing, with immense respect and gratitude to the original \nauthor and compiler, Maud Sharpe, and the book’s transcribers. You may use this book strictly for your \npersonal enjoyment only: it must not be hosted or redistributed on other websites without the written \npermission of the publishers nor offered for sale in any form. If you paid for this book, or to gain access to it, \nwe suggest you demand a refund and report the transaction to the publisher. \nPublisher’s note: the recipes in this book are for recreational purposes only and have not been formally tested \nby us; we therefore do not provide any assurances or guarantees, nor do we accept any responsibility or \nliability with 

### Embedding generation 

In [26]:
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()

embedded_query = embeddings.embed_query("How to make Tomato Soup?")
print("Embeddings for the query:", embedded_query)

Embeddings for the query: [0.015288429334759712, -0.010443132370710373, -0.009089882485568523, -0.014667914249002934, 0.0006060746964067221, 0.01975085586309433, -0.010370519012212753, -0.002011722419410944, -0.00738676730543375, -0.042670298367738724, 0.019566021859645844, 0.015130000188946724, -0.026787757873535156, 0.0167803056538105, -0.014192626811563969, 0.010429929941892624, 0.025586334988474846, 0.002792316721752286, 0.02660292387008667, -0.0349072590470314, -0.0020529800094664097, 0.011941609904170036, 0.028174014762043953, -0.004469027277082205, -0.021968865767121315, -0.004330401308834553, 0.005469112191349268, -0.00766401831060648, 0.019275566563010216, -0.003921125549823046, 0.013275057077407837, -0.016027767211198807, -0.0029424945823848248, -0.022919442504644394, 0.004838695749640465, 0.027461081743240356, -0.0015644895611330867, 0.005010327324271202, 0.013994590379297733, -0.021810436621308327, 0.023698385804891586, -0.009934838861227036, -0.0007162325782701373, -0.0085

### Vector store setup

The chunked documents and their embeddings need to be stored efficiently in a database which can perform similarity searches at scale. This is where we leverage *vector store*. There are a bunch of hosted vector stores but we will use an in-memory vector store for now.

In [27]:
from langchain_community.vectorstores import DocArrayInMemorySearch

vector_store_local = DocArrayInMemorySearch.from_documents(pages, embeddings)
retriever_test = vector_store_local.as_retriever()
retriever_test.invoke("How to make Tomato Soup?")



[Document(metadata={'source': 'assets/vegetarian-recipes.pdf', 'page': 29}, page_content='Veggie! - www.obooko.com \n30 \nslowly. Cut up 2 carrots, 2 turnips, and 3 large onions, and fry in 2 tablespoons of \nbutter. Chop a sprig of parsley very fine, and put with the other vegetables into the \nbarley and water. Let cook slowly for two hours, season with pepper and salt, and \nserve. A ½ teaspoon of soup-browning improves the appearance of the broth. \n  \nSPANISH TOMATO SOUP \nPut 1 tablespoon of butter in a saucepan, and when melted stir into it 3 onions thinly \nsliced, and let simmer for ten minutes; then add to them the juice from 1 can of \ntomatoes and 2 of the tomatoes, and let cook slowly for twenty minutes; strain, \npressing through a sieve, return to the fire, add 1 tablespoon of butter, some pepper \nand salt, and stir in 2 well-beaten eggs. Do not let the soup boil after adding the eggs. \n  \nTOMATO-TAPIOCA SOUP \nPut 2 quarts of water into a double boiler, and when it 

## Prompt Engineering 

Our vector store and embeddings are ready to be passed as context into the model. Lets create a chatbot template which will be used to create a prompt. 

We will chain this prompt to an OpenAI LLM which will answer our question.

In [None]:
import os
from dotenv import load_dotenv
from langchain_openai.chat_models import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser
from langchain.prompts import ChatPromptTemplate 
from langchain_core.runnables import RunnablePassthrough

load_dotenv()

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

print("Secured the OpenAI key of length =", len(OPENAI_API_KEY))

# creating the prompt
template = """"
Please answer the question based on the context below. 
Format the response as steps in making a recipe.
If you don't know the answer to the question, reply with "I don't know."
Context: {context}
Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)

# create the LLM model and the parser to translates the LLM output into a human readable output
model = ChatOpenAI(openai_api_key = OPENAI_API_KEY, model = "gpt-3.5-turbo")
parser = StrOutputParser()

# create a chain with the local vector store as context
chain = (
    {"context": vector_store_local.as_retriever(), "question": RunnablePassthrough()}
    | prompt
    | model
    | parser
)

# invoke the chain with the recipe question
chain.invoke("How to make a tomoto soup?")


Secured the OpenAI key of length = 164


'To make tomato soup, you can follow the recipe provided in the context. One way is to simmer 1 quart can of tomatoes, 2 cups of water (or rice stock), a sprig of parsley, 1 bay leaf, and 1 onion together for fifteen minutes, then press through a sieve and return to the fire to boil. Rub 1 tablespoon of butter and 1 tablespoon of flour together, and stir into the boiling soup until smooth. Add salt, pepper, and a pinch of soda, and serve immediately with croutons. If water in which rice has boiled is used omit the flour and the soda.'