## Retrieval-Augmented Generation (RAG) ##

Now we use LangChain to implement the same chat.

In [24]:
# !pip install -qU \
#     langchain==0.0.292 \
#     openai==0.28.0 \
#     datasets==2.10.1 \
#     pinecone-client==2.2.4 \
#     tiktoken==0.5.1

In [2]:
import os
from langchain.chat_models import ChatOpenAI

os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY") 
chat = ChatOpenAI(
    openai_api_key=os.environ["OPENAI_API_KEY"],
    model='gpt-3.5-turbo'
    # model='gpt-4'
)

In [2]:
from langchain.schema import (
    SystemMessage,
    HumanMessage,
    AIMessage
)

messages = [
    SystemMessage(content="You are a university computer science professor that aims to help students pass their assessments"),
    HumanMessage(content="""
Can you guide me through the process of writing a project proposal? A summary of the project requirement is as follows:

The project proposal requirements for this coursework assignment involve designing and developing a software project. The proposal should be no longer than 8,000 words and should be submitted as a single PDF document. The proposal should explore design decisions, consider the context of use, and identify the software development process. It should include a justification for the deliverable components of the software and the necessary research and iterative prototyping to complete them. The proposal should also define the timescale of work, including dependencies, milestones, and contingencies, with a clear breakdown of work and activities. A formal specification of the desired system should be included, along with user-acceptance criteria for testing. The scope of the project should be clearly defined, indicating what will and will not be delivered. Evidence of requirements elicitation involving project stakeholders should be provided, either through literature sources or empirical proof such as usability studies. A research summary should highlight the challenges of the chosen domain and the capabilities of similar tools. The approach to the project should be described, including motivations and reasoning, and the tasks required to complete the project should be identified. Early prototypes should demonstrate iterative design and development activities, highlighting strengths and weaknesses. Assumption testing and validation of designs should also be included. Finally, a critical evaluation of the concept, current state of the project, and proposed software project should be provided, with a clear and systematic rhetoric and overall evaluation of feasibility."""
                 ),
    # AIMessage(content="I'm great thank you. How can I help you?"),
    # HumanMessage(content="I'd like to understand string theory.")
]
res = chat(messages)
print(res.content)

Certainly! Writing a project proposal can be a complex task, but by breaking it down into smaller steps, you can approach it more effectively. Here's a step-by-step guide to help you through the process:

1. Understand the Requirements: Begin by thoroughly understanding the project requirements outlined in the summary. Pay attention to the specific components that need to be included in your proposal.

2. Define the Problem Statement: Clearly state the problem or need that your software project aims to address. This sets the context for your proposal and helps the reader understand the purpose and importance of the project.

3. Conduct Research: Gather relevant information about the chosen domain and similar tools or projects. Identify the challenges and existing capabilities in the field. This research will help you make informed decisions and demonstrate your understanding of the project's context.

4. Define the Scope: Clearly define the scope of your project by specifying what will

In [3]:
# add latest AI response to messages
messages.append(res)

# now create a new user prompt
prompt = HumanMessage(
    content="Can you generate a proposal template?"
)
# add to messages
messages.append(prompt)

# send to chat-gpt
res = chat(messages)

print(res.content)

Certainly! Here's a template for a software project proposal based on the requirements you provided:

[Your Name]
[Your Institution]
[Date]

[Title of the Proposal]

1. Introduction
   - Problem Statement: [Clearly state the problem or need that the software project aims to address]
   - Objectives: [List the objectives and goals of the project]

2. Research Summary
   - Domain Challenges: [Discuss the challenges and complexities of the chosen domain]
   - Existing Capabilities: [Highlight the capabilities of similar tools or projects]

3. Scope
   - Inclusions: [Clearly define what will be delivered as part of the software project]
   - Exclusions: [Specify what will not be included in the project scope]

4. Stakeholder Involvement
   - Requirements Elicitation: [Explain how you plan to involve project stakeholders, either through literature sources or empirical proof such as usability studies]

5. Design and Development Process
   - Software Development Methodology: [Describe the dev

## RAG ##

### Giving a summary as context ###
Recall that LLM has hallucination problem? Now let's use RAG to tackle it. The trivial approach is to provide a summary of LLama 2 as context and add it into the prompt. Let's see if the LLM can do better.

In [4]:
# now create a new user prompt
messages = [HumanMessage(
    content="What is so special about Llama 2?"
)]

# send to OpenAI
res = chat(messages)

print(res.content)

Llama 2 refers to the second version of the Llama programming language. Llama is a functional programming language that is designed to be simple, yet expressive. It focuses on immutability, referential transparency, and composability.

Some of the special features of Llama 2 include:

1. Purely Functional: Llama 2 is a purely functional programming language, meaning that it treats computation as the evaluation of mathematical functions. This allows for easier reasoning about code and facilitates the use of techniques like lazy evaluation and function composition.

2. Immutability: Llama 2 promotes immutability, meaning that once a value is assigned, it cannot be changed. This helps in writing more robust and thread-safe code, as there are no concerns about unexpected side effects or concurrent modifications.

3. Type Inference: Llama 2 incorporates type inference, which means that the language can automatically deduce the types of expressions and variables. This reduces the need for ex

In [5]:
llmchain_information = ["""
What is LLAMA 2?
LLAMA 2 is an open-source language model that can be downloaded and run locally, with different versions available for download, including chat models and normal models.
How does LLAMA 2 perform compared to other models?
LLAMA 2 performs well in terms of sentiment scores and human evaluation results, outperforming open source chat models on most benchmarks.
What are the trade-offs between safety and helpfulness in LLAMA 2?
LLAMA 2 prioritizes both safety and helpfulness in its outputs, thanks to safety patterns and improvements such as a larger pre-training data set, grouped query attention, and reinforcement learning with human feedback.
What are the safety improvements in LLAMA 2?
LLAMA 2 incorporates safety improvements such as a larger pre-training data set, grouped query attention, and reinforcement learning with human feedback.
Can LLAMA 2 be used for startups?
Yes, the 13B model of LLAMA 2 is recommended for startups as it performs well on academic benchmarks.
"""
]

source_knowledge = "\n".join(llmchain_information)

query = "What's so special about LLama 2?"

augmented_prompt = f"""Using the contexts below delimited by tag <knowledge></knowledge>, answer the query delimited by tag <query></query>.

Contexts:

<knowledge>{source_knowledge}</knowledge>

Query: 

<query>{query}</query>"""

print (augmented_prompt)

Using the contexts below delimited by tag <knowledge></knowledge>, answer the query delimited by tag <query></query>.

Contexts:

<knowledge>
What is LLAMA 2?
LLAMA 2 is an open-source language model that can be downloaded and run locally, with different versions available for download, including chat models and normal models.
How does LLAMA 2 perform compared to other models?
LLAMA 2 performs well in terms of sentiment scores and human evaluation results, outperforming open source chat models on most benchmarks.
What are the trade-offs between safety and helpfulness in LLAMA 2?
LLAMA 2 prioritizes both safety and helpfulness in its outputs, thanks to safety patterns and improvements such as a larger pre-training data set, grouped query attention, and reinforcement learning with human feedback.
What are the safety improvements in LLAMA 2?
LLAMA 2 incorporates safety improvements such as a larger pre-training data set, grouped query attention, and reinforcement learning with human feedb

In [17]:
# create a new user prompt
prompt = HumanMessage(
    content=augmented_prompt
)
# add to messages
messages.append(prompt)

# send to OpenAI
res = chat(messages)

print(res.content)

LLAMA 2 is special because it is an open-source language model that can be downloaded and run locally. It offers different versions, including chat models and normal models, providing flexibility and versatility for users. Additionally, LLAMA 2 performs well compared to other models, with high sentiment scores and positive human evaluation results, particularly outperforming open source chat models on most benchmarks. It also prioritizes both safety and helpfulness in its outputs, incorporating safety patterns and improvements such as a larger pre-training data set, grouped query attention, and reinforcement learning with human feedback. This ensures that LLAMA 2 provides reliable and valuable responses while maintaining user safety. Furthermore, LLAMA 2 is recommended for startups, especially the 13B model, as it performs well on academic benchmarks. Overall, LLAMA 2 stands out for its performance, safety features, and suitability for various applications, making it a special language

### Building a knowledge base ###
Giving a brief summary as context could only make the LLM response relevant, but not enough to create meaningful answers. We will need to give LLM a large knowledge base to refer to.

Next, we will use a collection of research publications that are related to LLama 2 as the knowledge base. 

We will use this dataset from Hugging Face

https://huggingface.co/datasets/jamescalam/llama-2-arxiv-papers-chunked

In [6]:
from datasets import load_dataset

dataset = load_dataset(
    "jamescalam/llama-2-arxiv-papers-chunked",
    split="train"
)

dataset

  from .autonotebook import tqdm as notebook_tqdm
Downloading readme: 100%|██████████| 409/409 [00:00<00:00, 785kB/s]


Downloading and preparing dataset json/jamescalam--llama-2-arxiv-papers-chunked to /Users/LJHOLD/.cache/huggingface/datasets/jamescalam___json/jamescalam--llama-2-arxiv-papers-chunked-ea255a807f3039a6/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51...


Downloading data: 100%|██████████| 14.4M/14.4M [00:00<00:00, 16.2MB/s]
Downloading data files: 100%|██████████| 1/1 [00:01<00:00,  1.75s/it]
Extracting data files: 100%|██████████| 1/1 [00:00<00:00, 492.40it/s]
                                                        

Dataset json downloaded and prepared to /Users/LJHOLD/.cache/huggingface/datasets/jamescalam___json/jamescalam--llama-2-arxiv-papers-chunked-ea255a807f3039a6/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51. Subsequent calls will reuse this data.




Dataset({
    features: ['doi', 'chunk-id', 'chunk', 'id', 'title', 'summary', 'source', 'authors', 'categories', 'comment', 'journal_ref', 'primary_category', 'published', 'updated', 'references'],
    num_rows: 4838
})

In [21]:
dataset[0]

{'doi': '1102.0183',
 'chunk-id': '0',
 'chunk': 'High-Performance Neural Networks\nfor Visual Object Classi\x0ccation\nDan C. Cire\x18 san, Ueli Meier, Jonathan Masci,\nLuca M. Gambardella and J\x7f urgen Schmidhuber\nTechnical Report No. IDSIA-01-11\nJanuary 2011\nIDSIA / USI-SUPSI\nDalle Molle Institute for Arti\x0ccial Intelligence\nGalleria 2, 6928 Manno, Switzerland\nIDSIA is a joint institute of both University of Lugano (USI) and University of Applied Sciences of Southern Switzerland (SUPSI),\nand was founded in 1988 by the Dalle Molle Foundation which promoted quality of life.\nThis work was partially supported by the Swiss Commission for Technology and Innovation (CTI), Project n. 9688.1 IFF:\nIntelligent Fill in Form.arXiv:1102.0183v1  [cs.AI]  1 Feb 2011\nTechnical Report No. IDSIA-01-11 1\nHigh-Performance Neural Networks\nfor Visual Object Classi\x0ccation\nDan C. Cire\x18 san, Ueli Meier, Jonathan Masci,\nLuca M. Gambardella and J\x7f urgen Schmidhuber\nJanuary 2011\nAbs

Next, we are going to turn all the chunks into vectors (formally known as **embeddings**) and store it in a vector database.

We will use **Pinecone** as the vector database.

Go to https://app.pinecone.io and register a free account. Then go to **API Keys** to retrieve your api_key and environment.

In [3]:
import pinecone

# get API key from app.pinecone.io and environment from console
os.environ["PINECONE_API_KEY"] = os.getenv("PINECONE_API_KEY") 
pinecone.init(
    api_key= os.environ["PINECONE_API_KEY"],
    environment= "us-east1-gcp"
)

  from tqdm.autonotebook import tqdm


TypeError: str expected, not NoneType

Then we initialize the index. We will be using OpenAI's `text-embedding-ada-002` model for creating the embeddings, so we set the dimension to 1536.

In [8]:
import time

index_name = 'llama-2-rag'

if index_name not in pinecone.list_indexes():
    pinecone.create_index(
        index_name,
        dimension=1536,
        metric='cosine'
    )
    # wait for index to finish initialization
    while not pinecone.describe_index(index_name).status['ready']:
        time.sleep(1)

index = pinecone.Index(index_name)

index.describe_index_stats()

{'dimension': 1536,
 'index_fullness': 0.0,
 'namespaces': {'': {'vector_count': 4838}},
 'total_vector_count': 4838}

Now let's create the embeddings and add to Pinecone

In [9]:
from tqdm.auto import tqdm  # for progress bar

from langchain.embeddings.openai import OpenAIEmbeddings

embed_model = OpenAIEmbeddings(model="text-embedding-ada-002")

data = dataset.to_pandas()  # this makes it easier to iterate over the dataset

batch_size = 100

# for i in tqdm(range(0, len(data), batch_size)):
#     i_end = min(len(data), i+batch_size)
#     # get batch of data
#     batch = data.iloc[i:i_end]
#     # generate unique ids for each chunk
#     ids = [f"{x['doi']}-{x['chunk-id']}" for i, x in batch.iterrows()]
#     # get text to embed
#     texts = [x['chunk'] for _, x in batch.iterrows()]
#     # embed text
#     embeds = embed_model.embed_documents(texts)
#     # get metadata to store in Pinecone
#     metadata = [
#         {'text': x['chunk'],
#          'source': x['source'],
#          'title': x['title']} for i, x in batch.iterrows()
#     ]
#     # add to Pinecone
#     index.upsert(vectors=zip(ids, embeds, metadata))

In [10]:
index.describe_index_stats()


{'dimension': 1536,
 'index_fullness': 0.0,
 'namespaces': {'': {'vector_count': 4838}},
 'total_vector_count': 4838}

### Retrieval Augmented Generation ###
Finally, it's time to do some RAG! We'll be diving back into LangChain and reusing our template prompt from earlier.

To use LangChain here we need to load the LangChain abstraction for a vector index, called a vectorstore. We pass in our vector index to initialize the object.

In [11]:
from langchain.vectorstores import Pinecone

text_field = "text"  # the metadata field that contains our text

# initialize the vector store object
vectorstore = Pinecone(
    index, embed_model, text_field
)

query = "What is so special about Llama 2?"

vectorstore.similarity_search(query, k=3)

[Document(page_content='Alan Schelten Ruan Silva Eric Michael Smith Ranjan Subramanian Xiaoqing Ellen Tan Binh Tang\nRoss Taylor Adina Williams Jian Xiang Kuan Puxin Xu Zheng Yan Iliyan Zarov Yuchen Zhang\nAngela Fan Melanie Kambadur Sharan Narang Aurelien Rodriguez Robert Stojnic\nSergey Edunov Thomas Scialom\x03\nGenAI, Meta\nAbstract\nIn this work, we develop and release Llama 2, a collection of pretrained and ﬁne-tuned\nlarge language models (LLMs) ranging in scale from 7 billion to 70 billion parameters.\nOur ﬁne-tuned LLMs, called L/l.sc/a.sc/m.sc/a.sc /two.taboldstyle-C/h.sc/a.sc/t.sc , are optimized for dialogue use cases. Our\nmodels outperform open-source chat models on most benchmarks we tested, and based on\nourhumanevaluationsforhelpfulnessandsafety,maybeasuitablesubstituteforclosedsource models. We provide a detailed description of our approach to ﬁne-tuning and safety', metadata={'source': 'http://arxiv.org/pdf/2307.09288', 'title': 'Llama 2: Open Foundation and Fine-Tun

Now we have retrieved the top 3 most relevant text trunks. Next, we will attach the retrievd text to augment the prompt. Let's see if the results get better.

In [12]:
def augment_prompt(query: str):
    # get top 3 results from knowledge base
    results = vectorstore.similarity_search(query, k=3)
    # get the text from the results
    source_knowledge = "\n".join([x.page_content for x in results])
    # feed into an augmented prompt
    augmented_prompt = f"""Using the contexts below delimited by tag <knowledge></knowledge>, answer the query delimited by tag <query></query>.

Contexts:

<knowledge>{source_knowledge}</knowledge>

Query: 

<query>{query}</query>"""
    return augmented_prompt

print(augment_prompt(query))

Using the contexts below delimited by tag <knowledge></knowledge>, answer the query delimited by tag <query></query>.

Contexts:

<knowledge>Alan Schelten Ruan Silva Eric Michael Smith Ranjan Subramanian Xiaoqing Ellen Tan Binh Tang
Ross Taylor Adina Williams Jian Xiang Kuan Puxin Xu Zheng Yan Iliyan Zarov Yuchen Zhang
Angela Fan Melanie Kambadur Sharan Narang Aurelien Rodriguez Robert Stojnic
Sergey Edunov Thomas Scialom
GenAI, Meta
Abstract
In this work, we develop and release Llama 2, a collection of pretrained and ﬁne-tuned
large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters.
Our ﬁne-tuned LLMs, called L/l.sc/a.sc/m.sc/a.sc /two.taboldstyle-C/h.sc/a.sc/t.sc , are optimized for dialogue use cases. Our
models outperform open-source chat models on most benchmarks we tested, and based on
ourhumanevaluationsforhelpfulnessandsafety,maybeasuitablesubstituteforclosedsource models. We provide a detailed description of our approach to ﬁne-tuning and safety


In [13]:
# create a new user prompt
prompt = HumanMessage(
    content=augment_prompt(query)
)
# add to messages
messages.append(prompt)

res = chat(messages)

print(res.content)

Llama 2 is a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. These fine-tuned LLMs, specifically L/l.sc/a.sc/m.sc/a.sc/t.sc, are optimized for dialogue use cases. According to benchmarks and human evaluations for helpfulness and safety, Llama 2 models outperform open-source chat models and may be a suitable substitute for closed-source models. The approach to fine-tuning and safety in Llama 2 is described in detail, and it is highlighted that closed-source LLMs are heavily fine-tuned to align with human preferences, enhancing their usability and safety. Llama 2 aims to provide an open and efficient foundation for language models.


Let's try to ask more questions and compare the answers with and without RAG.

In [14]:
prompt = HumanMessage(
    content="what safety measures were used in the development of llama 2?"
)

res = chat(messages + [prompt])
print(res.content)

The safety measures used in the development of Llama 2 are not explicitly mentioned in the provided contexts.


In [15]:
prompt = HumanMessage(
    content=augment_prompt(
        "what safety measures were used in the development of llama 2?"
    )
)

res = chat(messages + [prompt])
print(res.content)

In the development of Llama 2, safety measures were taken to increase the safety of the models. These measures include safety-specific data annotation and tuning, red-teaming, and iterative evaluations. The goal of these safety measures is to improve the safety of the models and enable more responsible development of large language models (LLMs).
