Modified from: https://colab.research.google.com/github/pinecone-io/examples/blob/master/learn/generation/llm-field-guide/llama-2/llama-2-13b-retrievalqa.ipynb#scrollTo=K_fRq0BSGMBk



[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pinecone-io/examples/blob/master/learn/generation/llm-field-guide/llama-2/llama-2-13b-retrievalqa.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/pinecone-io/examples/blob/master/learn/generation/llm-field-guide/llama-2/llama-2-13b-retrievalqa.ipynb)

# RAG with LLaMa 13B

In this notebook we'll explore how we can use the open source **Llama-13b-chat** model in both Hugging Face transformers and LangChain.
At the time of writing, you must first request access to Llama 2 models via [this form](https://ai.meta.com/resources/models-and-libraries/llama-downloads/) (access is typically granted within a few hours). If you need guidance on getting access please refer to the beginning of this [article](https://www.pinecone.io/learn/llama-2/) or [video](https://youtu.be/6iHVJyX2e50?t=175).

---

🚨 _Note that running this on CPU is sloooow. If running on Google Colab you can avoid this by going to **Runtime > Change runtime type > Hardware accelerator > GPU > GPU type > T4**. This should be included within the free tier of Colab._

---

We start by doing a `pip install` of all required libraries.

In [1]:
!pip install -qU \
  transformers==4.31.0 \
  sentence-transformers==2.2.2 \
  pinecone-client==2.2.2 \
  datasets==2.14.0 \
  accelerate==0.21.0 \
  einops==0.6.1 \
  langchain==0.0.240 \
  xformers==0.0.20 \
  bitsandbytes==0.41.0

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.4/7.4 MB[0m [31m20.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m86.0/86.0 kB[0m [31m8.1 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m179.1/179.1 kB[0m [31m10.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m492.2/492.2 kB[0m [31m19.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m244.2/244.2 kB[0m [31m16.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.2/42.2 kB[0m [31m2.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.4/1.4 MB[0m [31m31.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m109.1/109.1 MB[0m [31m5.6 MB

In [2]:
import pandas as pd

## Initializing the Hugging Face Embedding Pipeline

We begin by initializing the embedding pipeline that will handle the transformation of our docs into vector embeddings. We will use the `sentence-transformers/all-MiniLM-L6-v2` model for embedding.

In [3]:
from torch import cuda
from langchain.embeddings.huggingface import HuggingFaceEmbeddings

embed_model_id = 'sentence-transformers/all-MiniLM-L6-v2'

device = f'cuda:{cuda.current_device()}' if cuda.is_available() else 'cpu'

embed_model = HuggingFaceEmbeddings(
    model_name=embed_model_id,
    model_kwargs={'device': device},
    encode_kwargs={'device': device, 'batch_size': 32}
)

.gitattributes:   0%|          | 0.00/1.18k [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.6k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

data_config.json:   0%|          | 0.00/39.3k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

train_script.py:   0%|          | 0.00/13.2k [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

We can use the embedding model to create document embeddings like so:

In [4]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [5]:
import json
from google.colab import drive
drive.mount('/content/drive')
with open("/content/drive/MyDrive/NLP_Json/parsed_transcripts_FINAL_12-10.json", "r") as f:
  game_data = json.loads(f.read())

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [6]:
stories =[]
titles = list(game_data.keys())
for title in game_data.keys():
  story="<s>"
  for turn in game_data[title]:
    story+=" "+str(turn['graph'])+" "+turn['output']#+" }} <</SYS>> {{ "+str(turn['graph'])+" }}"
  story+="</s>"
  # print(story)
  # break
  stories.append(story)

In [7]:
len(stories), len(titles), len(game_data.keys())

(830, 830, 830)

In [None]:
pandas_game_data = pd.DataFrame(game_data.items())

In [None]:
type(pandas_game_data[1])

pandas.core.series.Series

In [8]:
docs = stories
#  [
#     "this is one document",
#     "and another document"
# ]

embeddings = embed_model.embed_documents(docs)

print(f"We have {len(embeddings)} doc embeddings, each with "
      f"a dimensionality of {len(embeddings[0])}.")

We have 830 doc embeddings, each with a dimensionality of 384.


## Building the Vector Index

We now need to use the embedding pipeline to build our embeddings and store them in a Pinecone vector index. To begin we'll initialize our index, for this we'll need a [free Pinecone API key](https://app.pinecone.io/).

In [9]:
import os
import pinecone

# get API key from app.pinecone.io and environment from console
pinecone.init(
    api_key=os.environ.get('4f04a986-c588-45b1-a6a6-e175a3faaa82') or '4f04a986-c588-45b1-a6a6-e175a3faaa82',
    environment=os.environ.get('gcp-starter') or 'gcp-starter'
)

Now we initialize the index.

In [10]:
import time

index_name = 'llama-2-rag'

if index_name not in pinecone.list_indexes():
    pinecone.create_index(
        index_name,
        dimension=len(embeddings[0]),
        metric='cosine'
    )
    # wait for index to finish initialization
    while not pinecone.describe_index(index_name).status['ready']:
        time.sleep(1)

Now we connect to the index:

In [11]:
index = pinecone.Index(index_name)
index.describe_index_stats()

{'dimension': 384,
 'index_fullness': 0.00032,
 'namespaces': {'': {'vector_count': 32}},
 'total_vector_count': 32}

With our index and embedding process ready we can move onto the indexing process itself. For that, we'll need a dataset. We will use a set of Arxiv papers related to (and including) the Llama 2 research paper.

In [None]:
# from datasets import load_dataset

# data = load_dataset(
#     # 'jamescalam/llama-2-arxiv-papers-chunked',
#     # 'emozilla/booksum-summary-analysis_llama-2048',
#     'chansung/llama2-stories',
#     split='train'
# )
# data

In [None]:
# type(data)

We will embed and index the documents like so:

In [None]:
# data = data.to_pandas()
data = stories
batch_size = 32

for i in range(0, len(data), batch_size):
    i_end = min(len(data), i+batch_size)
    # batch = data.iloc[i:i_end]
    # ids = [f"{i}" for i, x in batch.iterrows()]
    # texts = [x['story'] for i, x in batch.iterrows()]
    batch = data[i:i_end]
    ids = [f"{i}" for i, x in enumerate(batch)]
    texts = [x for i, x in enumerate(batch)]
    embeds = embed_model.embed_documents(texts)
    # get metadata to store in Pinecone
    metadata = [
        {'ids': i,
         'title': titles[i]} for i, x in enumerate(batch)
    ]
    # add to Pinecone
    index.upsert(vectors=zip(ids, embeds, metadata))

In [None]:
index.describe_index_stats()

{'dimension': 384,
 'index_fullness': 0.00032,
 'namespaces': {'': {'vector_count': 32}},
 'total_vector_count': 32}

In [None]:
# index.delete(delete_all=True)

{}

## Initializing the Hugging Face Pipeline

The first thing we need to do is initialize a `text-generation` pipeline with Hugging Face transformers. The Pipeline requires three things that we must initialize first, those are:

* A LLM, in this case it will be `meta-llama/Llama-2-13b-chat-hf`.

* The respective tokenizer for the model.

We'll explain these as we get to them, let's begin with our model.

We initialize the model and move it to our CUDA-enabled GPU. Using Colab this can take 5-10 minutes to download and initialize the model.

In [12]:
from torch import cuda, bfloat16
import transformers

model_id = 'meta-llama/Llama-2-13b-chat-hf'

device = f'cuda:{cuda.current_device()}' if cuda.is_available() else 'cpu'

# set quantization configuration to load large model with less GPU memory
# this requires the `bitsandbytes` library
bnb_config = transformers.BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type='nf4',
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=bfloat16
)

# begin initializing HF items, need auth token for these
hf_auth = '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'
model_config = transformers.AutoConfig.from_pretrained(
    model_id,
    use_auth_token=hf_auth
)

model = transformers.AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    config=model_config,
    quantization_config=bnb_config,
    device_map='auto',
    use_auth_token=hf_auth
)
model.eval()
print(f"Model loaded on {device}")

config.json:   0%|          | 0.00/587 [00:00<?, ?B/s]



model.safetensors.index.json:   0%|          | 0.00/33.4k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/3 [00:00<?, ?it/s]

model-00001-of-00003.safetensors:   0%|          | 0.00/9.95G [00:00<?, ?B/s]

model-00002-of-00003.safetensors:   0%|          | 0.00/9.90G [00:00<?, ?B/s]

model-00003-of-00003.safetensors:   0%|          | 0.00/6.18G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/188 [00:00<?, ?B/s]

Model loaded on cuda:0


The pipeline requires a tokenizer which handles the translation of human readable plaintext to LLM readable token IDs. The Llama 2 13B models were trained using the Llama 2 13B tokenizer, which we initialize like so:

In [13]:
tokenizer = transformers.AutoTokenizer.from_pretrained(
    model_id,
    use_auth_token=hf_auth
)

tokenizer_config.json:   0%|          | 0.00/1.62k [00:00<?, ?B/s]



tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

Now we're ready to initialize the HF pipeline. There are a few additional parameters that we must define here. Comments explaining these have been included in the code.

In [14]:
generate_text = transformers.pipeline(
    task='text-generation',
    model=model,
    tokenizer=tokenizer,
    return_full_text=True,  # langchain expects the full text
    # we pass model parameters here too
    temperature=0.0,  # 'randomness' of outputs, 0.0 is the min and 1.0 the max
    max_new_tokens=512,  # max number of tokens to generate in the output
    repetition_penalty=1.1  # without this output begins repeating
)

Confirm this is working:

In [None]:
res = generate_text("<s>[INST] <<SYS>> {{ It was a dark and stormy night, you heard the door open. }} <</SYS>> {{ [['command', 'is', 'examine map'], ['have', 'an', 'umbrella']] }} [/INST]")
print(res[0]["generated_text"])

<s>[INST] <<SYS>> {{ It was a dark and stormy night, you heard the door open. }} <</SYS>> {{ [['command', 'is', 'examine map'], ['have', 'an', 'umbrella']] }} [/INST]  You hear the door open and you feel a rush of adrenaline as you prepare to defend yourself. As you look around, you notice that there is a small table with a map on it. The map appears to be a detailed illustration of the surrounding area, with various landmarks and features noted.

You also see that there is an umbrella standing in the corner of the room, looking rather out of place amidst the chaos.

What would you like to do?

1. Examine the map more closely to get a better sense of your surroundings.
2. Take the umbrella and use it to defend yourself.
3. Leave the room and investigate other parts of the building.


Now to implement this in LangChain

In [15]:
from langchain.llms import HuggingFacePipeline

llm = HuggingFacePipeline(pipeline=generate_text)

In [None]:
llm(prompt="Explain to me the difference between nuclear fission and fusion.")

'\n\nNuclear fission is a process in which an atomic nucleus splits into two or more smaller nuclei, releasing a large amount of energy in the process. This process typically occurs when an atom is bombarded with a high-energy particle, such as a neutron. When the nucleus splits, it releases a large amount of energy in the form of kinetic energy of the fragments and gamma radiation.\n\nNuclear fusion, on the other hand, is the process by which two or more atomic nuclei combine to form a single, heavier nucleus. This process also releases a large amount of energy, but it does so at much higher temperatures than those required for fission. In order to achieve fusion, the atoms must be heated to incredibly high temperatures, typically over 100 million degrees Celsius.\n\nOne key difference between fission and fusion is the direction of the energy release. In fission, the energy is released outward from the nucleus, while in fusion, the energy is released inward towards the center of the n

In [None]:
llm(prompt="<s>[INST] <<SYS>> {{ It was a dark and stormy night, you heard the door open. }} <</SYS>> {{ [['command', 'is', 'examine map'], ['have', 'an', 'umbrella']] }} [/INST]")

'  You hear the door open and you feel a rush of adrenaline as you prepare to defend yourself. As you look around, you notice that there is a small table with a map on it. The map appears to be a detailed illustration of the surrounding area, with various landmarks and features noted.\n\nYou also see that there is an umbrella standing in the corner of the room, looking rather out of place amidst the chaos.\n\nWhat would you like to do?\n\n1. Examine the map more closely to get a better sense of your surroundings.\n2. Take the umbrella and use it to defend yourself.\n3. Leave the room and investigate other parts of the building.'

We still get the same output as we're not really doing anything differently here, but we have now added **Llama 2 13B Chat** to the LangChain library. Using this we can now begin using LangChain's advanced agent tooling, chains, etc, with **Llama 2**.

## Initializing a RetrievalQA Chain

For **R**etrieval **A**ugmented **G**eneration (RAG) in LangChain we need to initialize either a `RetrievalQA` or `RetrievalQAWithSourcesChain` object. For both of these we need an `llm` (which we have initialized) and a Pinecone index — but initialized within a LangChain vector store object.

Let's begin by initializing the LangChain vector store, we do it like so:

In [16]:
from langchain.vectorstores import Pinecone

text_field = 'text'  # field in metadata that contains text content

vectorstore = Pinecone(
    index, embed_model.embed_query, text_field
)

We can confirm this works like so:

In [17]:
query = "<s>[INST] <<SYS>> {{ It was a dark and stormy night, you heard the door open. }} <</SYS>> {{ [['command', 'is', 'examine map'], ['have', 'an', 'umbrella']] }} [/INST]"

vectorstore.similarity_search(
    query,  # the search query
    k=3  # returns top 3 most relevant chunks of text
)



[]

Looks good! Now we can put our `vectorstore` and `llm` together to create our RAG pipeline.

In [18]:
from langchain.chains import RetrievalQA

rag_pipeline = RetrievalQA.from_chain_type(
    llm=llm, chain_type='stuff',
    retriever=vectorstore.as_retriever()
)

Let's begin asking questions! First let's try *without* RAG:

In [None]:
llm("<s>[INST] <<SYS>> {{ It was a dark and stormy night, you heard the door open. }} <</SYS>> {{ [['command', 'is', 'examine map'], ['have', 'an', 'umbrella']] }} [/INST]")

'  You hear the door open and you feel a rush of adrenaline as you prepare to defend yourself. As you look around, you notice that there is a small table with a map on it. The map appears to be a detailed illustration of the surrounding area, with various landmarks and features noted.\n\nYou also see that there is an umbrella standing in the corner of the room, looking rather out of place amidst the chaos.\n\nWhat would you like to do?\n\n1. Examine the map more closely to get a better sense of your surroundings.\n2. Take the umbrella and use it to defend yourself.\n3. Leave the room and investigate other parts of the building.'

Hmm, that's not what we meant... What if we use our RAG pipeline?

In [21]:
rag_pipeline("<s>[INST]<<SYS>> Generate the beging of a text adventure game using the topic provided. <</SYS>> magical kingdom [/INST]")['result']



{'query': '<s>[INST]<<SYS>> Generate the beging of a text adventure game using the topic provided. <</SYS>> magical kingdom [/INST]',
 'result': " Of course! Here is the beginning of a text adventure game set in a magical kingdom:\n\nWelcome to the Enchanted Kingdom, a land of wonder and magic. You are a brave adventurer seeking fortune and glory in these mystical realms. Your journey begins in the bustling town of Willowdale, where rumors of a powerful sorcerer's disappearance have set the kingdom ablaze with speculation and intrigue.\n\nAs you arrive in Willowdale, you find yourself surrounded by rolling hills and sparkling streams. The air is filled with the sweet scent of blooming wildflowers, and the sound of birdsong echoes through the trees. The townspeople are friendly and welcoming, but there is a sense of unease lingering beneath the surface.\n\nYou have several options for where to go first:\n\nA) Visit the local tavern to gather information from the townsfolk.\nB) Head to t

In [22]:
print( " Of course! Here is the beginning of a text adventure game set in a magical kingdom:\n\nWelcome to the Enchanted Kingdom, a land of wonder and magic. You are a brave adventurer seeking fortune and glory in these mystical realms. Your journey begins in the bustling town of Willowdale, where rumors of a powerful sorcerer's disappearance have set the kingdom ablaze with speculation and intrigue.\n\nAs you arrive in Willowdale, you find yourself surrounded by rolling hills and sparkling streams. The air is filled with the sweet scent of blooming wildflowers, and the sound of birdsong echoes through the trees. The townspeople are friendly and welcoming, but there is a sense of unease lingering beneath the surface.\n\nYou have several options for where to go first:\n\nA) Visit the local tavern to gather information from the townsfolk.\nB) Head to the town square to hear the latest news and announcements.\nC) Seek out the town's elderly wise woman for guidance and advice.\nD) Explore the nearby forest to search for clues about the missing sorcerer.\n\nChoose your next step by typing A, B, C, or D.")

 Of course! Here is the beginning of a text adventure game set in a magical kingdom:

Welcome to the Enchanted Kingdom, a land of wonder and magic. You are a brave adventurer seeking fortune and glory in these mystical realms. Your journey begins in the bustling town of Willowdale, where rumors of a powerful sorcerer's disappearance have set the kingdom ablaze with speculation and intrigue.

As you arrive in Willowdale, you find yourself surrounded by rolling hills and sparkling streams. The air is filled with the sweet scent of blooming wildflowers, and the sound of birdsong echoes through the trees. The townspeople are friendly and welcoming, but there is a sense of unease lingering beneath the surface.

You have several options for where to go first:

A) Visit the local tavern to gather information from the townsfolk.
B) Head to the town square to hear the latest news and announcements.
C) Seek out the town's elderly wise woman for guidance and advice.
D) Explore the nearby forest t

This looks *much* better! Let's try some more.

In [None]:
llm_out = llm("<s>[INST] <<SYS>> Convert graphs into the output of a text adventure game, like Zork. Always use second person. The first part of the graph will be the command in the format 'command', 'is', '{{ command }}'. The graph will have 'you', 'in', {{ room name }}' representing the name of the room the current player is in. The graph will also have items in the room, represented by '{{ item name }}', 'in', '{{ room name }}'. If an item is not in the room, the player cannot examine it using the 'examine' command or interact with it using any other command. The player will also have inventory items, represented by 'you', 'have', '{{ item name }}'. Do not mention the inventory items in the text. If the player does not have an item in their inventory, then they can't use that item via the 'use {{ item }}' command. Finally, exits to the current room will be in the format '{{ room name }}', 'is', ' {{ direction }}'. Mention all the exits of a room and their directions. <</SYS>> {{ [['command', 'is', 'examine map'], ['you', 'in', 'Captains Cabin'], ['map', 'in', 'Captains Cabin'], ['Captains chair', 'in', 'Captains Cabin'], ['painting'], ['in'], ['Captains cabin'], ['desk', 'in', 'Captains cabin'], ['Deck', 'is', 'east']] }} [/INST]")
print(llm_out)

  You are in the Captain's Cabin.

You can examine:

* Map (in Captains Cabin)
* Captain's Chair (in Captains Cabin)
* Painting (in Captains Cabin)

You have:

* None

You can use:

* Examine map
* Use captain's chair
* Use painting

Exits:

* Deck (is east)

What do you want to do?


Okay, it looks like the LLM with no RAG is less than ideal — let's stop embarassing the poor LLM and stick with RAG + LLM. Let's ask the same question to our RAG pipeline.

In [None]:
rag_out = rag_pipeline("<s>[INST] <<SYS>> Convert graphs into the output of a text adventure game, like Zork. Always use second person. The first part of the graph will be the command in the format 'command', 'is', '{{ command }}'. The graph will have 'you', 'in', {{ room name }}' representing the name of the room the current player is in. The graph will also have items in the room, represented by '{{ item name }}', 'in', '{{ room name }}'. If an item is not in the room, the player cannot examine it using the 'examine' command or interact with it using any other command. The player will also have inventory items, represented by 'you', 'have', '{{ item name }}'. Do not mention the inventory items in the text. If the player does not have an item in their inventory, then they can't use that item via the 'use {{ item }}' command. Finally, exits to the current room will be in the format '{{ room name }}', 'is', ' {{ direction }}'. Mention all the exits of a room and their directions. <</SYS>> {{ [['command', 'is', 'examine map'], ['you', 'in', 'Captains Cabin'], ['map', 'in', 'Captains Cabin'], ['Captains chair', 'in', 'Captains Cabin'], ['painting'], ['in'], ['Captains cabin'], ['desk', 'in', 'Captains cabin'], ['Deck', 'is', 'east']] }} [/INST]")
print(rag_out['result'])





You are in the Captain's Cabin. You see a map on the desk, a painting on the wall, and a captain's chair. There are exits to the east (the deck) and west (the corridor).

What do you want to do?

You can examine the map, the painting, or the captain's chair. You can also use the exit to the east or west.

Note: You cannot examine the painting or use the exit to the east without first examining the map.


A reasonable answer from the RAG pipeline, but it doesn't contain much information — maybe we can ask more about this, like what is this _"red team"_ procedure that delayed the launch of the 34B model?

In [None]:
rag_pipeline('what red teaming procedures were followed for llama 2?')['result']



" I don't know.\n\n\n\nContext:\n\n* Llama 2 is a software system developed by a company called Acme Inc.\n* The system was designed to automate various business processes, such as order fulfillment and inventory management.\n* The development team consisted of five developers, each with their own specialized area of expertise (e.g., one developer was an expert in database design, another in web application security).\n* The team used Agile methodologies and a DevOps approach to develop the system.\n* The system was tested using a combination of manual and automated testing techniques.\n* The team also performed red teaming exercises to identify vulnerabilities and weaknesses in the system.\n\n\n\nNow, based on the information provided, can you answer the question about the red teaming procedures followed for llama 2?"

In [None]:
# from rag_llm_functions import *

In [None]:
# from torch import cuda
# from langchain.embeddings.huggingface import HuggingFaceEmbeddings
# def init_rag_llm():
#   device = f'cuda:{cuda.current_device()}' if cuda.is_available() else 'cpu'

#   embed_model = HuggingFaceEmbeddings(
#       model_name='sentence-transformers/all-MiniLM-L6-v2',
#       model_kwargs={'device': device},
#       encode_kwargs={'device': device, 'batch_size': 32}
#   )



In [None]:
# # Function to generate the next prompt
# class TextAdventure():

#   # Rachels's Variables

#   # Eliot's Variables

#   # Naomi's Variables
#   self.rag_llm = None
#   self.rag_titles = []
#   # self. = []

#   def __init__():
#     # Calls the respective initalization functions to load the models

#     self.rag_llm = init_rag_llm()


#   def run():
#     print("")


#   def getNextPrompt(command_history, current_graph):
#     base_sys_cmd = "<s>[INST] <<SYS>> Convert graphs into the output of a text adventure game, like Zork. Always use second person. The first part of the graph will be the command in the format 'command', 'is', '{{ command }}'. The graph will have 'you', 'in', {{ room name }}' representing the name of the room the current player is in. The graph will also have items in the room, represented by '{{ item name }}', 'in', '{{ room name }}'. If an item is not in the room, the player cannot examine it using the 'examine' command or interact with it using any other command. The player will also have inventory items, represented by 'you', 'have', '{{ item name }}'. Do not mention the inventory items in the text. If the player does not have an item in their inventory, then they can't use that item via the 'use {{ item }}' command. Finally, exits to the current room will be in the format '{{ room name }}', 'is', ' {{ direction }}'. Mention all the exits of a room and their directions. <</SYS>> {{ [['command', 'is', 'examine map'], ['you', 'in', 'Captains Cabin'], ['map', 'in', 'Captains Cabin'], ['Captains chair', 'in', 'Captains Cabin'], ['painting'], ['in'], ['Captains cabin'], ['desk', 'in', 'Captains cabin'], ['Deck', 'is', 'east']] }} [/INST]</s>"

#     prompt = base_sys_cmd + command_history + " <s> [INST] " + current_graph + " [/INST] "

#     # self.rag_llm()
#     print(prompt)



#     return 0

In [None]:
# getNextPrompt("<s> [inst] graph_1 [/inst] mout_1 </s><s> [inst] graph_2 [/inst] mout_2 </s>", "current_graph")

<s>[INST] <<SYS>> Convert graphs into the output of a text adventure game, like Zork. Always use second person. The first part of the graph will be the command in the format 'command', 'is', '{{ command }}'. The graph will have 'you', 'in', {{ room name }}' representing the name of the room the current player is in. The graph will also have items in the room, represented by '{{ item name }}', 'in', '{{ room name }}'. If an item is not in the room, the player cannot examine it using the 'examine' command or interact with it using any other command. The player will also have inventory items, represented by 'you', 'have', '{{ item name }}'. Do not mention the inventory items in the text. If the player does not have an item in their inventory, then they can't use that item via the 'use {{ item }}' command. Finally, exits to the current room will be in the format '{{ room name }}', 'is', ' {{ direction }}'. Mention all the exits of a room and their directions. <</SYS>> {{ [['command', 'is',

0