# Chat With Your Data

## Build prompt chains with retrievers

# Install libraries

In [None]:
pip install openai

In [None]:
pip install python-dotenv

In [None]:
pip install langchain

In [None]:
pip install langchain-openai

In [None]:
pip install pypdf

In [None]:
pip install faiss-cpu

In [None]:
pip install langchainhub

In [None]:
pip install langchain-community

## Helper functions

In [43]:
def print_output(docs,type:int=1):
    import textwrap
    match type:
        case 1:
            for doc in docs:
                print('The medatadata is: {}'.format(doc.metadata))
                for t in textwrap.wrap(doc.page_content,width=100):
                    print(t)
        case 2:
            #print('The medatadata is: {}'.format(docs.response_metadata))
            for t in textwrap.wrap(docs.content,width=100):
                print(t)


## Load OpenAI API Key

In [2]:
import os

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

OPENAI_API_KEY=os.environ['OPENAI_API_KEY']

## Prompt model with no knowledge of the Voynich manuscript

In [3]:
from langchain_openai import ChatOpenAI

#initialize the LLM we'll use - OpenAI GPT 3.5 Turbo
llm = ChatOpenAI(openai_api_key=OPENAI_API_KEY, model="gpt-3.5-turbo-0125")

In [42]:
#prompt the model with no additional knowledge of the Voynich manuscript beyond pretraining 
docs = llm.invoke("What are the medicinal insights from the Voynich manuscript?")
#print(docs.pretty_print())
print_output(docs,2)
#print 

The medatadata is: {'token_usage': {'completion_tokens': 184, 'prompt_tokens': 19, 'total_tokens': 203, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}
The Voynich manuscript, a mysterious and undeciphered text dating back to the 15th century, has been
the subject of much speculation and study by historians, linguists, and cryptographers. While the
exact contents and purpose of the manuscript remain unknown, some researchers have suggested that it
may contain information related to medicinal herbs and plants.  One theory is that the Voynich
manuscript is a botanical or herbal guide, containing detailed illustrations and descriptions of
various plants and their medicinal uses. Some researchers have identified similarities betw

In [44]:
docs = llm.invoke("What is Aetherfloris Ventus?")
print_output(docs,2)

Aetherfloris Ventus is a Latin term that translates to "airflower wind" in English. It could
potentially refer to a specific type of wind or air current that is associated with the presence of
flowers or a floral scent. However, without further context, it is difficult to determine a specific
meaning for the term.


## Load vector database from disk

In [45]:
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS


db = FAISS.load_local("../faiss_index", 
                      OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY, model="text-embedding-3-small"), 
                      allow_dangerous_deserialization=True)

## Configure retriever
### Use the similarity search capabilities of a vector store to facilitate retrieval

In [49]:
retriever = db.as_retriever(search_type="similarity", search_kwargs={"k": 6})

## Implement a chain
### Chain together multiple calls in a logical sequence

In [46]:
from langchain import hub

prompt = hub.pull("rlm/rag-prompt")



In [47]:
prompt

ChatPromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, metadata={'lc_hub_owner': 'rlm', 'lc_hub_repo': 'rag-prompt', 'lc_hub_commit_hash': '50442af133e61576e74536c6556cefe1fac147cad032f4377b60c436e6cdcb6e'}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template="You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\nQuestion: {question} \nContext: {context} \nAnswer:"), additional_kwargs={})])

In [50]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

#combine multiple steps in a single chain
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser() #convert the chat message to a string
)

## Send LLM's response to the user

In [51]:
for chunk in rag_chain.stream("What are the medicinal insights from the Voynich manuscript?"):
    print(chunk, end="", flush=True)

The Voynich Manuscript contains detailed anatomical diagrams of mythical beings, possibly conveying ancient medical knowledge intertwined with fantasy. The manuscript may serve as a guide for medicinal and alchemical uses, with the organs of mythical creatures believed to possess magical properties. The Herbal Remedies section showcases diverse herbs illustrated with detailed annotations on their healing properties and potential medicinal uses.

In [52]:
for chunk in rag_chain.stream("What is Aetherfloris Ventus?"):
    print(chunk, end="", flush=True)

Aetherfloris Ventus is a celestial flora with petals lighter than air, appearing to float freely. Its nearly invisible stem dances with the breeze, leading the petals in a delicate ballet. The essence of Aetherfloris Ventus, captured in rare vials, is said to bestow the gift of lightness upon those who partake.

In [53]:
for chunk in rag_chain.stream("What's the most important part of the Voynich manuscript?"):
    print(chunk, end="", flush=True)

The most important part of the Voynich manuscript is the detailed anatomical diagrams of mythical beings, which possibly served medicinal or alchemical purposes and include annotations explaining the function of each organ and system. These diagrams offer insights into ancient medical knowledge intertwined with fantasy, showcasing the manuscript creator's meticulous attention to detail and vivid imagination. The fusion of plant and animal features in the manuscript may carry deeper symbolic or mythological meanings, reflecting ancient beliefs about the unity and connectivity of all life forms.