<a href="https://colab.research.google.com/github/j4jefferson/dataScienceColabs/blob/main/RAG_LAB_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#### <font color=FF595E>Installing packages</font>



In [None]:
! pip install langchain_community tiktoken langchain-openai langchainhub chromadb langchain pypdf rapidocr-onnxruntime streamlit unstructured pdf2image pdfminer.six pikepdf pillow_heif langchain_experimental




#### <font color=FF595E>OpenAI API


In [None]:
#OpenAI API key
from google.colab import userdata
import os
os.environ["OPENAI_API_KEY"] = userdata.get('OPEN_AI_KEY')

In [None]:
#Setup LangSmith to trace development
from langsmith import Client
os.environ["LANGCHAIN_PROJECT"] = 'RAG_LAB'
os.environ["LANGCHAIN_ENDPOINT"] = "https://api.smith.langchain.com"
os.environ["LANGCHAIN_API_KEY"] = userdata.get('LANGSMITH')
os.environ["LANGCHAIN_TRACING_V2"] = "true"

## <font color=FF595E>Creating ChatBot</font>


#### Define models

In [None]:
GPT4 = 'gpt-4-0125-preview'
GPT3 = 'gpt-3.5-turbo-0125'

#### Simple ChatBot, no memory

In [None]:
#Import ChatOpenAI class
from langchain_openai import ChatOpenAI


In [None]:
#Define the LLM. Specify model
Chat = ChatOpenAI(model = GPT4)

In [None]:
# Invoke the chat with simple question to test it out
Chat.invoke('What is your knowledge cut off day?')

AIMessage(content='My knowledge is up to date until April 2023.', response_metadata={'finish_reason': 'stop', 'logprobs': None})

#### Adding memory and memory management

In [None]:
# Import ChatMessageHistory class that will store our chat histor.
# Import chat prompt templates classes and Message placeholders classes
from langchain.memory import ChatMessageHistory
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder

In [None]:
# Initialize a new ChatMessageHistory object
chat_history = ChatMessageHistory()

In [None]:
# Add a user message to the chat history
chat_history.add_user_message("What day ChatGPT was launched")

In [None]:
# Add an AI response message to the chat history
chat_history.add_ai_message("ChatGPT was launched at November 30, 2022")


In [None]:
# Access the messages property of the chat_history object
chat_history.messages

[HumanMessage(content='What day ChatGPT was launched'),
 AIMessage(content='ChatGPT was launched at November 30, 2022')]

In [None]:
# Add another user message to the chat history
chat_history.add_user_message("Was it a successful launch?")

In [None]:
# Create a ChatPromptTemplate using messages
prompt = ChatPromptTemplate.from_messages(
    [
        # Define a system message as a tuple
        (
            "system",
            "You are a helpful assistant. Answer all questions to the best of your ability.",
        ),
        # Add a placeholder for the chat messages
        MessagesPlaceholder(variable_name="messages"),
    ]
)

In [None]:
# Create a simple Chain by passing prompt to LLM
Chain = prompt | Chat

In [None]:
#Invoking simple chain from messages
Chain.invoke({"messages": chat_history.messages})

AIMessage(content="Yes, the launch of ChatGPT by OpenAI in November 2022 was highly successful. It quickly gained widespread attention for its ability to generate coherent and contextually relevant text based on the prompts given to it. Users were impressed by its capabilities in generating human-like text responses, making it useful for a wide range of applications such as conversation simulation, content creation, and more. The launch led to significant media coverage and public discussion, further increasing its popularity and the awareness of AI's potential in natural language processing.", response_metadata={'finish_reason': 'stop', 'logprobs': None})

#### Creating a loop to run Chat with history, printable user imputs and chat outputs

In [None]:
#Import
from langchain_core.runnables.history import RunnableWithMessageHistory
#Define stop words for our chatbot
stop_words = ["exit", "quit", "stop"]

In [None]:
#Define chat history
chat_history = ChatMessageHistory()

#Define LLM

Chat = ChatOpenAI(model = GPT4)

# Create a ChatPromptTemplate using messages
prompt = ChatPromptTemplate.from_messages(
    [
        # Define a system message as a tuple
        (
            "system",
            "You are a helpful assistant. Answer all questions to the best of your ability.",
        ),
        # Add a placeholder for the chat messages
        MessagesPlaceholder(variable_name="messages"),
    ]
)

#Define the chain
Chat_chain = prompt | Chat

#Use RunnableWithMessageHistory as a wrapper to manage message history
Chain_with_message_history = RunnableWithMessageHistory(
    Chat_chain,
    #define access to chat history
    lambda session_id : chat_history,
    input_messages_key="messages",
    history_messages_key="chat_history"
)

# Perform chat turns
print("Starting the chat...")
while True:
    question = input("User: ")

    # Check if the user input matches a stop word
    if question.lower() in stop_words:
        print("Exiting the chat...")
        break

    # Add a user message to the chat history
    chat_history.add_user_message(question)

    #Generate AI response
    ai_response = Chain_with_message_history.invoke({"messages": chat_history.messages}, {"configurable": {"session_id": chat_history}})

    # Add an AI response message to the chat history
    chat_history.add_ai_message(ai_response.content)

    #Display AI answer
    print(f"AI: {ai_response.content}")

Starting the chat...
User: What is the biggest country in Central America?
AI: The biggest country in Central America by land area is Nicaragua. It covers an area of about 130,373 square kilometers (50,337 square miles). This makes Nicaragua the largest country in the region, followed by Honduras.
User: What is its capital? 
AI: The capital of Nicaragua, the largest country in Central America, is Managua.
User: What is its population?
AI: As of my last update in April 2023, the estimated population of Managua, the capital of Nicaragua, was about 1.5 million in the city proper, with the metropolitan area having a larger population. However, it's important to note that population figures can vary depending on the source and the specific year of the estimate. For the most current and precise population data, it's recommended to consult the latest statistics from reliable sources such as the National Institute of Information Development (INIDE) in Nicaragua or international demographic dat

## <font color=FF595E>Building RAG Chatbot</font>

#### <font color=FF595E>Load documents</font>

##### Grab Cognition is all you need paper from [arxiv.org](https://arxiv.org/abs/2403.02164)

In [None]:
#Import pdf loader.
from langchain_community.document_loaders import UnstructuredPDFLoader

In [None]:
#Define loader
loader_pdf = UnstructuredPDFLoader("/content/Cognition is All You Need - Article.pdf")

In [None]:
#Load an article
article_pdf = loader_pdf.load()

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.


In [None]:
#Print doc to check it out
print(article_pdf)

[Document(page_content='Cognition is All You Need The Next Layer of AI Above Large Language Models\n\nPre-Publication Position Paper Draft 1.1 March 4, 2024, For Comments\n\nNova Spivack1, Sam Douglas1, Michelle Crames1, Tim Connors1\n\n1 Mindcorp, Inc (www.mindcorp.ai) contact@mindcorp.ai www.mindcorp.ai www.linkedin.com/company/mindcorp-ai twitter.com/mindcorpai\n\nContents Abstract...................................................................................................................................2 Introduction..................................................................................................................................2 Related Research..................................................................................................................... 5 Defining Conversational AI...................................................................................................... 8 Intelligence Versus Cognition........................................

#### <font color=FF595E>Split the document</font>

In [None]:
# Import text splitter
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Create an instance of RecursiveCharacterTextSplitter with custom chunk size and overlap
chunk_size = 750  # Adjust the chunk size as needed
chunk_overlap = 0  # Set the overlap between chunks

#Initiate splitter with desired parameters
splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)

In [None]:
# Split the document into chunks using the RecursiveCharacterTextSplitter
splits = splitter.split_documents(article_pdf)

In [None]:
# Print the number of splits in the doc
print(len(splits))

273


In [None]:
# Print each split and a separator for readability
for split in splits:
    print(split)
    print("---")

page_content='Cognition is All You Need The Next Layer of AI Above Large Language Models\n\nPre-Publication Position Paper Draft 1.1 March 4, 2024, For Comments\n\nNova Spivack1, Sam Douglas1, Michelle Crames1, Tim Connors1\n\n1 Mindcorp, Inc (www.mindcorp.ai) contact@mindcorp.ai www.mindcorp.ai www.linkedin.com/company/mindcorp-ai twitter.com/mindcorpai' metadata={'source': '/content/Cognition is All You Need - Article.pdf'}
---
page_content='Contents Abstract...................................................................................................................................2 Introduction..................................................................................................................................2 Related Research..................................................................................................................... 5 Defining Conversational AI................................................................................................

#### <font color=FF595E>Create embeddings</font>

In [None]:
#Import vectorstore database and embeddings model
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings

# Embeddings model
embeddings_model = OpenAIEmbeddings

In [None]:
#Define vector DB. Run this line of code only once.
#If accidently did more delete DB

vector_db = Chroma.from_documents(documents=splits, embedding=embeddings_model())

In [None]:
#Code to delete db. (if needed)

# Delete the collection
# vector_db.delete_collection()
# print("Collection deleted successfully.")

NameError: name 'vector_db' is not defined

#### <font color=FF595E>Define Retriever</font>

In [None]:
retriever = vector_db.as_retriever()

#### <font color=FF595E>Test retriever</font>

In [None]:
#Define question
question = 'What are functional requirements for cognitive AI'

#Fetch 3 documents from vector store related to question
vector_db.similarity_search_with_score(question, k=3)

[(Document(page_content="Figure 6. Conversational Versus Cognitive AI Quadrants.\n\nCognitive AI Functional Architecture\n\nCognitive AI represents a paradigm shift, moving beyond the confines of Conversational AI's reliance on probabilistic reasoning simulations to actual programmatic reasoning. This shift is embodied in a dual-layer architecture that elevates reasoning, self-improvement, and adaptability to second-order intelligence, fundamentally distinguishing Cognitive AI from its predecessors. Below we will discuss the functional architecture and formal requirements for Cognitive AI systems.\n\nFunctional Requirements for Cognitive AI\n\nTo qualify as Cognitive AI, a system must be architected to meet the following functional criteria:", metadata={'source': '/content/Cognition is All You Need - Article.pdf'}),
  0.1989898979663849),
 (Document(page_content='Reasoning....................................................................................... 13 Defining Cognitive AI...

#### <font color=FF595E>Biuld chain that will answer queestions over defined docs</font>

In [None]:
from langchain.prompts import ChatPromptTemplate
# Prompt
template = """Answer the question based on the following context:
{context}

Question: {question}
"""

#Define rag_prompt from template
rag_prompt = ChatPromptTemplate.from_template(template)

#Print the promt to check it everything is ok
rag_prompt

ChatPromptTemplate(input_variables=['context', 'question'], messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], template='Answer the question based on the following context:\n{context}\n\nQuestion: {question}\n'))])

In [None]:
#Define LLM

In [None]:
RAG_llm = ChatOpenAI(model=GPT4)

In [None]:
#Define Chain

In [None]:
RAG_chain = rag_prompt | RAG_llm

In [None]:
#Assign docs
docs = vector_db.similarity_search(question, k=3)

In [None]:
#Chain to answer question based on defined docs
RAG_chain.invoke({"context":docs,"question": question})

AIMessage(content="The functional requirements for Cognitive AI, as outlined in the provided context, are not fully enumerated in the excerpts given. However, from the information provided, we can infer some of the key elements that are considered essential for a system to qualify as Cognitive AI:\n\n1. **Reasoning:** Cognitive AI must be capable of programmatic reasoning, moving beyond probabilistic reasoning simulations that are typical in Conversational AI. This implies a more advanced form of reasoning that can deal with complex, abstract concepts and execute conditional logic workflows.\n\n2. **Self-Improvement:** The architecture of Cognitive AI includes the capability for recursive self-improvement. This means the system is designed to continuously refine and enhance its own cognitive functions, including planning and reasoning, without external intervention.\n\n3. **Adaptability:** Cognitive AI systems must be adaptable, able to handle and adjust to new information or changes i

#### <font color=FF595E>Composing the Retrieval-Augmented Generation Chain with dynamic retrieval</font>

In [None]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough


# Create the Retrieval-Augmented Generation (RAG) chain with dynamic retrieval
rag_chain = (
    # Define the input variables for the chain
    {"context": retriever, "question": RunnablePassthrough()}
    # Pipe the input through the RAG prompt template
    | rag_prompt
    # Pass the formatted prompt to the language model (LLM)
    | RAG_llm
    # Parse the LLM's output using the StrOutputParser
    | StrOutputParser()
)

In [None]:
#Invoke the chain

rag_chain.invoke("What lies at the core of Cognitive AI's functional architecture?")


"At the core of Cognitive AI's functional architecture is an intelligence stack comprising two critical layers: a Cognitive Layer and a Conversational Layer."

#### <font color=FF595E>Build RAG BOT chain prompt template</font>

In [None]:
#Import required classes for prompt template

from langchain.prompts import PromptTemplate
from langchain.schema import HumanMessage

# Define the human prompt template
# Here we making sure that context will be passed to LLM before question will be added by human.
human_prompt = """Answer the question based on the following context: {context}

"""

In [None]:
# Initialize the chat history
chat_history = ChatMessageHistory()

In [None]:
# Define the question
question = "What lies at the core of Cognitive AI's functional architecture?"

In [None]:
# Retrieve relevant context based on the question
context = vector_db.similarity_search(question, k=3)

In [None]:
# Create a PromptTemplate from the human prompt
prompt_template = PromptTemplate.from_template(human_prompt)

In [None]:
# Format the prompt with the retrieved context and question
formatted_prompt = prompt_template.format(context=context)

In [None]:
# Create a HumanMessage with the formatted prompt
formatted_human_message = [HumanMessage(content=formatted_prompt)]

In [None]:
# Define the RAG prompt template
rag_bot_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "You are a helpful assistant. Answer all questions over the documents to the best of your ability."),
        *formatted_human_message,
        MessagesPlaceholder(variable_name="messages"),
    ]
)


In [None]:
# Create the RAG LLM chain by piping the RAG prompt to the LLM
rag_bot_chain = rag_bot_prompt | RAG_llm

In [None]:
#Define stop words for our chatbot
stop_words = ["exit", "quit", "stop"]

chat_history = ChatMessageHistory()


rag_chain_with_message_history = RunnableWithMessageHistory(
    rag_bot_chain,
    lambda session_id : chat_history,
    input_messages_key="messages",
    history_messages_key="chat_history"
)

# Perform chat turns
print("Starting the chat...")
while True:
    question = input("User: ")

    # Check if the user input matches a stop word
    if question.lower() in stop_words:
        print("Exiting the chat...")
        break

    # Retrieve relevant context based on the question
    context = vector_db.similarity_search(question, k=3)

    # Add a user message to the chat history
    chat_history.add_user_message(question)

    #Generate AI response
    ai_response = rag_chain_with_message_history.invoke({"messages": chat_history.messages}, {"configurable": {"session_id": chat_history}})

    # Add an AI response message to the chat history
    chat_history.add_ai_message(ai_response.content)

    #Display AI answer
    print(f"AI: {ai_response.content}")

Starting the chat...
AI: Alright, imagine your brain is like a super cool toy that can think and solve puzzles. This toy has two special parts. The first part is really good at playing with words, like when you tell stories or chat with your friends. The second part is like a superhero that can think really hard about problems, make smart guesses, and learn new things from puzzles and games you play. 

Neuro-symbolic reasoning is when both parts work together to help the toy (which is like a computer) understand and think about things almost like a human does. So, if you ask it a tricky question or give it a tough puzzle, it uses the word-playing part to understand the question and the superhero part to figure out the answer. It's like having a buddy who's really good at both telling stories and solving riddles!
User: stop
Exiting the chat...
