 # Install Dependencies

In [67]:
# Install libraries
!pip install openai 
!pip install langchain-openai
!pip install langchain
!pip install langchain-community
!pip install qdrant-client

In [3]:
import os 
from langchain_openai import ChatOpenAI
from openai import OpenAI

# Without using RAG

In [4]:
model = ChatOpenAI(
    api_key="",
    model = "gpt-3.5-turbo-0125", 
    temperature = 0.7, 
    max_tokens = 512, 
    max_retries = 2, 
)

In [5]:
from langchain.schema import SystemMessage, HumanMessage,AIMessage

text = [
    SystemMessage(content="You are an AI assistant."),
    HumanMessage(content="Explain AI technology"),
    AIMessage(content="Model number of parameters"),
]

In [None]:
text.append(HumanMessage(content="What is so special about Mistral 7B?")) 

In [8]:
response = model.invoke(text)
print(response.content)

AI, or artificial intelligence, is a branch of computer science that involves creating computer systems capable of performing tasks that typically require human intelligence, such as visual perception, speech recognition, decision-making, and language translation. AI technologies are designed to simulate human cognitive functions, learn from data, and adapt to new information or stimuli.

There are several types of AI technologies, including machine learning, deep learning, natural language processing, computer vision, and robotic process automation. These technologies enable computers to analyze large amounts of data, recognize patterns, make decisions, and carry out tasks autonomously.

Overall, AI technology aims to improve efficiency, accuracy, and automation in various industries and applications, such as healthcare, finance, transportation, cybersecurity, and customer service. It has the potential to revolutionize many aspects of our daily lives and drive innovation in numerous f

# Adding Context 

In [13]:
# adding context with the question
information_about_Mistral = [
    "Unlike older models, Mistral 7B can grasp intricate details and context in language. This allows it to interpret and respond with more human-like fluency and coherence.",
    "Long attention span: Mistral 7B can consider a whopping 8,192 tokens of text, which is significantly longer than most models. This enables it to analyze broader contexts and generate more relevant responses.",
    "Efficient memory usage:  To handle such a long context without ballooning memory usage, Mistral 7B utilizes a unique rolling buffer cache. This cache only retains a set amount of past information, like a carousel with limited seats, ensuring efficient memory management", 
    "Specialized attention: Traditional models allow every word to attend to every other preceding word. Mistral 7B implements a sliding window attention where each word focuses on a specific window of preceding words. This streamlines processing without sacrificing accuracy.",
    "Fine-tuning capabilities: The base Mistral 7B model excels at understanding language. Additionally, it can be further customized for specific tasks by fine-tuning it on relevant datasets. For instance, Mistral 7B Instruct is fine-tuned for following instructions and achieves impressive results.", 
    "Safety features: While powerful, Mistral 7B incorporates safety features like prompting and content moderation. Prompting allows users to guide the model's output towards a desired outcome, while moderation helps prevent potentially harmful content generation.", 
    "Overall, Mistral 7B's ability to understand complex language, handle long contexts efficiently, and be fine-tuned for specific tasks makes it a significant advancement in NLP. Its safety features further ensure responsible use of this powerful technology."
]

knowledge = "\n".join(information_about_Mistral)

In [14]:
len(source_knowledge)

1020

In [15]:
text = [
    SystemMessage(content="You are an AI assistant."),
    HumanMessage(content="Explain AI technology"),
    AIMessage(content="Model number of parameters"),
]

In [16]:
question = "What is so special about Mistral 7B?"

prompt = f"""Using the contexts below to answer the question.

Contexts:
{knowledge}

Question: {question}"""

In [19]:
full_prompt = HumanMessage(
    content=prompt
)

text.append(full_prompt)

response = model.invoke(text)

In [20]:
# as shown in the response above, the sytem provider model is acble of definiing the feature of Mistral 7B better 
print(response.content)

Mistral 7B stands out due to its ability to grasp intricate details and context in language, enabling it to respond with human-like fluency and coherence. It has a long attention span, capable of considering 8,192 tokens of text, allowing it to analyze broader contexts and generate more relevant responses. The model efficiently manages memory usage through a unique rolling buffer cache, ensuring optimal performance. Mistral 7B's specialized attention mechanism uses a sliding window approach, enhancing processing efficiency without compromising accuracy. Additionally, its fine-tuning capabilities enable customization for specific tasks, enhancing performance in various applications. The model also incorporates safety features like prompting and content moderation to ensure responsible and safe use of its powerful technology. Overall, Mistral 7B's advanced language understanding, efficient handling of long contexts, and adaptability for specific tasks make it a significant advancement in

# Implementing Retrieval Augmented Generation (RAG)

# Load Dataset

In [28]:
# This dataset consists of selected portions from the Mistral 7B research paper.
from datasets import load_dataset

dataset = load_dataset("infoslack/mistral-7b-arxiv-paper-chunked", split="train")

print(dataset[0])

{'doi': '2310.06825', 'chunk-id': '0', 'chunk': 'Mistral 7B\nAlbert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford,\nDevendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel,\nGuillaume Lample, Lucile Saulnier, Lélio Renard Lavaud, Marie-Anne Lachaux,\nPierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix,\nWilliam El Sayed\nAbstract\nWe introduce Mistral 7B, a 7–billion-parameter language model engineered for\nsuperior performance and efficiency. Mistral 7B outperforms the best open 13B\nmodel (Llama 2) across all evaluated benchmarks, and the best released 34B\nmodel (Llama 1) in reasoning, mathematics, and code generation. Our model\nleverages grouped-query attention (GQA) for faster inference, coupled with sliding\nwindow attention (SWA) to effectively handle sequences of arbitrary length with a\nreduced inference cost. We also provide a model fine-tuned to follow instructions,\nMistral 7B – Instruct, that surpasses Llama 2 1

In [34]:
# convert it into a dataframe to more readable
data = dataset.to_pandas()

In [49]:
chunks = data[["chunk", "source"]]
chunks.head(4)

Unnamed: 0,chunk,source
0,"Mistral 7B\nAlbert Q. Jiang, Alexandre Sablayr...",http://arxiv.org/pdf/2310.06825
1,automated benchmarks. Our models are released ...,http://arxiv.org/pdf/2310.06825
2,GQA significantly accelerates the inference sp...,http://arxiv.org/pdf/2310.06825
3,Mistral 7B takes a significant step in balanci...,http://arxiv.org/pdf/2310.06825


In [50]:
from langchain_community.document_loaders import DataFrameLoader

loader = DataFrameLoader(chunks, page_content_column="chunk")
documents = loader.load()

In [51]:
len(documents)

25

In [56]:
print(documents[0].metadata)
print(documents[0].page_content)

{'source': 'http://arxiv.org/pdf/2310.06825'}
Mistral 7B
Albert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford,
Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel,
Guillaume Lample, Lucile Saulnier, Lélio Renard Lavaud, Marie-Anne Lachaux,
Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix,
William El Sayed
Abstract
We introduce Mistral 7B, a 7–billion-parameter language model engineered for
superior performance and efficiency. Mistral 7B outperforms the best open 13B
model (Llama 2) across all evaluated benchmarks, and the best released 34B
model (Llama 1) in reasoning, mathematics, and code generation. Our model
leverages grouped-query attention (GQA) for faster inference, coupled with sliding
window attention (SWA) to effectively handle sequences of arbitrary length with a
reduced inference cost. We also provide a model fine-tuned to follow instructions,
Mistral 7B – Instruct, that surpasses Llama 2 13B – chat model b

In [59]:
from langchain_community.vectorstores import Qdrant
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings(model="text-embedding-3-small", api_key="",)

qdrant = Qdrant.from_documents(
    documents=documents,
    embedding=embeddings,
    url="",
    collection_name="chatbot",
    api_key= ""
)

In [60]:
# Finding the chunks that are similar to the question.
query = "What is so special about Mistral 7B?"
qdrant.similarity_search(query, k=3)

[Document(metadata={'source': 'http://arxiv.org/pdf/2310.06825', '_id': '0c984efe-db82-47dc-b9f5-73b9d3ff6e6c', '_collection_name': 'chatbot'}, page_content='Mistral 7B\nAlbert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford,\nDevendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel,\nGuillaume Lample, Lucile Saulnier, Lélio Renard Lavaud, Marie-Anne Lachaux,\nPierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix,\nWilliam El Sayed\nAbstract\nWe introduce Mistral 7B, a 7–billion-parameter language model engineered for\nsuperior performance and efficiency. Mistral 7B outperforms the best open 13B\nmodel (Llama 2) across all evaluated benchmarks, and the best released 34B\nmodel (Llama 1) in reasoning, mathematics, and code generation. Our model\nleverages grouped-query attention (GQA) for faster inference, coupled with sliding\nwindow attention (SWA) to effectively handle sequences of arbitrary length with a\nreduced inference cost

In [61]:
def custom_prompt(query: str):
    results = qdrant.similarity_search(query, k=3)
    source_knowledge = "\n".join([x.page_content for x in results])
    augment_prompt = f"""Using the contexts below, answer the query:

    Contexts:
    {source_knowledge}

    Query: {query}"""
    return augment_prompt

In [62]:
print(custom_prompt(query))

Using the contexts below, answer the query:

    Contexts:
    Mistral 7B
Albert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford,
Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel,
Guillaume Lample, Lucile Saulnier, Lélio Renard Lavaud, Marie-Anne Lachaux,
Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix,
William El Sayed
Abstract
We introduce Mistral 7B, a 7–billion-parameter language model engineered for
superior performance and efficiency. Mistral 7B outperforms the best open 13B
model (Llama 2) across all evaluated benchmarks, and the best released 34B
model (Llama 1) in reasoning, mathematics, and code generation. Our model
leverages grouped-query attention (GQA) for faster inference, coupled with sliding
window attention (SWA) to effectively handle sequences of arbitrary length with a
reduced inference cost. We also provide a model fine-tuned to follow instructions,
Mistral 7B – Instruct, that surpasses Llama 2 1

In [63]:
text = [
    SystemMessage(content="You are an AI assistant."),
    HumanMessage(content="Explain AI technology"),
    AIMessage(content="Model number of parameters"),
]

In [66]:
prompt = HumanMessage(
    content=custom_prompt(query)
)

text.append(prompt)

reponse = model.invoke(text)

print(reponse.content)

Mistral 7B is a 7-billion-parameter language model that stands out for its superior performance and efficiency. It surpasses other models such as the best open 13B model (Llama 2) and the best released 34B model (Llama 1) in reasoning, mathematics, and code generation across various benchmarks. Mistral 7B leverages innovative attention mechanisms like grouped-query attention (GQA) for faster inference and sliding window attention (SWA) to handle sequences of arbitrary length efficiently with reduced inference costs. These attention mechanisms contribute to Mistral 7B's enhanced performance and efficiency. Additionally, Mistral 7B is designed for ease of fine-tuning across a wide range of tasks and is accompanied by a reference implementation for easy deployment on various platforms, making it adaptable and high-performing.
