# Demo: RAG with LLaMa for Metabolomics

This notebook is a basic demo of how we might use **RAG with LLaMa 2 models to answer questions about [metabolomics](https://en.wikipedia.org/wiki/Metabolomics)**. It is adapted from Pinecone's [tutorial notebook](https://colab.research.google.com/github/pinecone-io/examples/blob/master/learn/generation/llm-field-guide/llama-2/llama-2-13b-retrievalqa.ipynb#scrollTo=K_fRq0BSGMBk).

To run it, you will need:
1. A GPU (if you don't have one locally, you can run this notebook in Google Colab on a T4 for free).
2. Access to LLaMa 2 models (which you can request [here](https://ai.meta.com/resources/models-and-libraries/llama-downloads/)).
3. A free [Pinecone](https://www.pinecone.io/) account.
4. A free [Hugging Face](https://huggingface.co/) account.

You can find your `PINECONE_API_KEY` and `PINECONE_ENV` on the left hand side of the Pinecone console under "API Keys". You can generate a `HF_AUTH_TOKEN` on the ["Access Tokens" page](https://huggingface.co/settings/tokens) in Hugging Face.

In [5]:
from apikey import PINECONE_API_KEY, PINECONE_ENV, HF_AUTH_TOKEN

Install requirements.

In [8]:
!pip install -qU \
  transformers==4.31.0 \
  sentence-transformers==2.2.2 \
  pinecone-client==2.2.2 \
  accelerate==0.21.0 \
  einops==0.6.1 \
  langchain==0.0.240 \
  xformers==0.0.20 \
  bitsandbytes==0.41.0

## Embed Documents

RAG requires being able to search for relevant documents. In order to search over documents, we must embed them (i.e., translate them from natural language into lists of numbers). The following cells go through this process step-by-step.

### Step 1: Initialize Embedding Pipeline

We'll use the `sentence-transformers/all-MiniLM-L6-v2` model for embedding.

In [3]:
from torch import cuda
from langchain.embeddings.huggingface import HuggingFaceEmbeddings

embed_model_id = 'sentence-transformers/all-MiniLM-L6-v2'

device = f'cuda:{cuda.current_device()}' if cuda.is_available() else 'cpu'

embed_model = HuggingFaceEmbeddings(
    model_name=embed_model_id,
    model_kwargs={'device': device},
    encode_kwargs={'device': device, 'batch_size': 32}
)

In [4]:
# Test Embedding(s)

test_texts = ["embed this"]
test_embeddings = embed_model.embed_documents(test_texts)
embedding_length = len(test_embeddings[0])
print(f"The test_embedding is: {test_embeddings}.")
print(f"It is a {type(test_embeddings)} of {len(test_embeddings)} {type(test_embeddings[0])}s, each composed of {embedding_length} {type(test_embeddings[0][0])}s.")

The test_embedding is: [[-0.010658024810254574, 0.007777046877890825, -0.048584189265966415, -0.006959881167858839, 0.07620539516210556, -0.017881382256746292, -0.0013527343980967999, -0.014369568787515163, 0.025443829596042633, 0.013539735227823257, 0.009461544454097748, 0.012313110753893852, 0.026188816875219345, 0.055159393697977066, -0.14169545471668243, 0.026118865236639977, 0.057166315615177155, 0.12877777218818665, -0.11897539347410202, -0.06329623609781265, -0.053651344031095505, -0.008360287174582481, 0.04814831539988518, -0.07457169145345688, 0.049726080149412155, 0.04055644944310188, -0.008107990026473999, 0.10452724993228912, 0.07922663539648056, -0.07040495425462723, 0.049212511628866196, -0.08947668224573135, 0.01344557385891676, -0.00016502825019415468, 0.021257735788822174, 0.009228905662894249, -0.007962614297866821, 0.015760231763124466, -0.04543723538517952, -0.002762935124337673, -0.02500234544277191, -0.053733330219984055, 0.00789560005068779, 0.025415977463126183,

### Step 2: Create a Vector Index to Store Embeddings

This is where Pinecone comes in. It provides a service for storing and searching through the embeddings we generate for documents.

First connect to Pinecone.

In [5]:
import pinecone

pinecone.init(
    api_key=PINECONE_API_KEY,
    environment=PINECONE_ENV
)

Then initialize the index.

In [6]:
import time

# name your index
index_name = 'llama-2-rag-metabolomics'

if index_name not in pinecone.list_indexes():
    pinecone.create_index(
        index_name,
        dimension=embedding_length,
        metric='cosine'
    )
    # wait for index to finish initialization
    while not pinecone.describe_index(index_name).status['ready']:
        time.sleep(1)

And now connect to the index.

In [7]:
index = pinecone.Index(index_name)
index.describe_index_stats()

{'dimension': 384,
 'index_fullness': 0.00073,
 'namespaces': {'': {'vector_count': 73}},
 'total_vector_count': 73}

### Step 3: Load Documents

Load in a dataframe of documents with the following specifications:

Rows:
- one row per chunk of text from a document

Columns:
- id: int, ID of the document
- chunk-id: int, ID of the chunk of the document
- chunk: str, text of the chunk
- title: str, title of the document
- source: str, url to the document

I made a simple dataframe with these specifications for one document in `homeostasis.ipynb` that we'll use here for illustration purposes. The text of the document can be found [here](https://www.metabolismjournal.com/article/S0026-0495(13)00020-6/fulltext).

In [1]:
import pandas as pd

df = pd.read_csv("homeostasis.csv", index_col=0)
print(df.shape)
df.head()

(73, 5)


Unnamed: 0,id,chunk_id,chunk,title,source
0,1,0,Abstract,Neuroendocrine alterations in the exercising h...,https://www.metabolismjournal.com/article/S002...
1,1,1,Complex mechanisms exist in the human to defen...,Neuroendocrine alterations in the exercising h...,https://www.metabolismjournal.com/article/S002...
2,1,2,such as the kisspeptin-gonadotropin releasing ...,Neuroendocrine alterations in the exercising h...,https://www.metabolismjournal.com/article/S002...
3,1,3,Abbreviations:,Neuroendocrine alterations in the exercising h...,https://www.metabolismjournal.com/article/S002...
4,1,4,"αMSH (α-melanocyte stimulating hormone), ACTH ...",Neuroendocrine alterations in the exercising h...,https://www.metabolismjournal.com/article/S002...


### Step 4: Generate and Store Embeddings

We will embed and index the documents like so:

In [17]:
batch_size = 32

for i in range(0, len(df), batch_size):
    i_end = min(len(df), i + batch_size)
    batch = df.iloc[i:i_end]
    ids = [f"{x['id']}-{x['chunk_id']}" for i, x in batch.iterrows()]
    texts = [x['chunk'] for i, x in batch.iterrows()]
    embeds = embed_model.embed_documents(texts)
    # add metadata
    metadata = [
        {'text': x['chunk'],
         'source': x['source'],
         'title': x['title']} for i, x in batch.iterrows()
    ]
    # store in Pinecone
    index.upsert(vectors=zip(ids, embeds, metadata))

In [18]:
index.describe_index_stats()

{'dimension': 384,
 'index_fullness': 0.00073,
 'namespaces': {'': {'vector_count': 73}},
 'total_vector_count': 73}

## Initializing the Hugging Face Pipeline

The first thing we need to do is initialize a `text-generation` pipeline with Hugging Face transformers. The following cells go through this process step-by-step.

### Step 1: Initialize LLM

Store the model on a CUDA-enabled GPU (using Colab if not locally). This may take a few minutes depending on the size of the model.

In [8]:
from torch import cuda, bfloat16
import transformers

model_id = 'meta-llama/Llama-2-13b-chat-hf'

device = f'cuda:{cuda.current_device()}' if cuda.is_available() else 'cpu'

# set quantization configuration to load large model with less GPU memory
bnb_config = transformers.BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type='nf4',
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=bfloat16
)

# begin initializing HF items
hf_auth = HF_AUTH_TOKEN
model_config = transformers.AutoConfig.from_pretrained(
    model_id,
    use_auth_token=hf_auth
)

model = transformers.AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    config=model_config,
    quantization_config=bnb_config,
    device_map='auto',
    use_auth_token=hf_auth
)
model.eval()
print(f"Model loaded on {device}")



Downloading shards:   0%|          | 0/3 [00:00<?, ?it/s]

Downloading (…)of-00003.safetensors:   0%|          | 0.00/9.90G [00:00<?, ?B/s]

Downloading (…)of-00003.safetensors:   0%|          | 0.00/6.18G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

Downloading (…)neration_config.json:   0%|          | 0.00/188 [00:00<?, ?B/s]

Model loaded on cuda:0


### Step 2: Initialize Model Tokenizer

The pipeline requires a tokenizer which handles the translation of human readable text to LLM readable token IDs.

In [9]:
tokenizer = transformers.AutoTokenizer.from_pretrained(
    model_id,
    use_auth_token=hf_auth
)

Downloading (…)okenizer_config.json:   0%|          | 0.00/1.62k [00:00<?, ?B/s]



Downloading tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

### Step 3: Initialize Pipeline

In [10]:
generate_text = transformers.pipeline(
    model=model,
    tokenizer=tokenizer,
    return_full_text=True,    # langchain expects the full text
    task='text-generation',
    # model parameters
    temperature=0.0,          
    max_new_tokens=512,
    repetition_penalty=1.1
)

### Step 4: Implement in LangChain

LangChain makes it easy to facilitate RAG.

In [19]:
from langchain.llms import HuggingFacePipeline

llm = HuggingFacePipeline(pipeline=generate_text)

## Initializing a RetrievalQA Chain

To run RAG in LangChain we need either a `RetrievalQA` or `RetrievalQAWithSourcesChain` object. For both of these we need an `llm` (which we initialized above) and a Pinecone index initialized within a LangChain vector store object. The following cells go through this process step-by-step.

### Step 1: Initialize LangChain Vector Store

In [21]:
from langchain.vectorstores import Pinecone

text_field = 'text'    # field in metadata that contains text content

vectorstore = Pinecone(
    index,
    embed_model.embed_query,
    text_field
)

We can confirm it works by checking which documents it outputs (i.e., finds most similar) for a sample search query (i.e., prompt).

In [33]:
query = 'what is cortisol?'

outputs = vectorstore.similarity_search(
    query,
    k=3    # returns top k most relevant chunks of text
)

for doc in outputs:
  print(doc.page_content)
  print()

Cortisol has wide-ranging effects, including alterations of carbohydrate, protein, and lipid metabolism; catabolic effects on skin, muscle, connective tissue, and bone; immunomodulatory effects; blood pressure and circulatory system regulation; and effects on mood and central nervous system function. In the short term, activation of the HPA axis in response to stress is adaptive. However, long-term stress promoting chronic exposure of tissues to high cortisol concentrations becomes maladaptive.
Exercise, particularly sustained aerobic activity, is a potent stimulus of cortisol secretion. The circulating concentrations of cortisol are directly proportional to the intensity of exercise as measured by oxygen uptake. As is the case for the GH/IGF-1 and HPG axes, the HPA axis also receives many other inputs, including the light/dark cycle, feeding schedules, immune regulation, and many neurotransmitters that mediate the effects of exercise and physical and psychic stress [[52]].

For the pr

### Step 2: Create RAG Pipeline

This is done by combining the `vectorstore` and `llm`.

In [40]:
from langchain.chains import RetrievalQA

rag_pipeline = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type='stuff',
    retriever=vectorstore.as_retriever(),
    return_source_documents=True
)

As currently specified, the `rag_pipeline` is prompted to answer questions as follows:

> Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. {context} {question}

## Evaluate

Now we have everything we need to evaluate the results!
- `llm` calls the underlying model without RAG.
- `rag_pipeline` produces results with RAG.

In [52]:
def get_llm_response(question, llm=llm):
    response = llm(question)
    print(response)

def get_rag_response(question, rag=rag_pipeline, show_sources=True):
    rag = rag_pipeline(question)

    rag_answer = rag['result']
    print(rag_answer)

    if show_sources:
        print()
        print('Sources:')
        sources = rag['source_documents']
        for source in sources:
            print()
            print(f"Title: {source.metadata['title']}")
            print(f"URL: {source.metadata['source']}")
            print(f"Text: {source.page_content}")

### Question 1

In the review title ["A pilot study comparing the metabolic profiles of elite-level athletes from different sporting disciplines"](https://pubmed.ncbi.nlm.nih.gov/29305667/), it cites the document that was embedded when it states that "elevated cortisol-related metabolites in response to sustained aerobic exercise were shown to correlate positively with intensity of exercise as measured by oxygen uptake".

Let's see how each model performs on a question about this fact.

The answer I'm looking for is something along the lines of "Oxygen uptake correlates positively with cortisol metabolites."

In [53]:
question_1 = "What is the correlation between oxygen uptake and cortisol metabolites? Please cite sources."

In [54]:
get_llm_response(question_1)





Answer: There is some evidence to suggest that there may be a correlation between oxygen uptake and cortisol metabolites, although the exact nature of this relationship is not fully understood.

One study published in the Journal of Clinical Endocrinology and Metabolism found that cortisol metabolism was significantly higher in individuals who engaged in high-intensity exercise compared to those who engaged in low-intensity exercise (1). This suggests that oxygen uptake may play a role in regulating cortisol metabolism, as high-intensity exercise requires greater oxygen consumption.

Another study published in the European Journal of Applied Physiology found that cortisol levels were higher in individuals who performed aerobic exercise at a moderate intensity compared to those who performed aerobic exercise at a low intensity (2). This suggests that oxygen uptake may also be related to cortisol levels, as moderate-intensity exercise requires a greater amount of oxygen than low-intens

In [55]:
get_rag_response(question_1)

 Cortisol metabolites have a direct correlation with oxygen uptake during exercise. Studies have shown that the higher the intensity of exercise, the greater the amount of oxygen consumed and the higher the cortisol metabolites will be. For example, one study found that subjects who performed high-intensity exercise had significantly higher cortisol metabolites than those who performed low-intensity exercise. [52] Another study found that the relationship between oxygen uptake and cortisol metabolites was directly proportional, meaning that as oxygen uptake increased, cortisol metabolites also increased. [60]

Please note that the information provided is based on the given text and may not be entirely accurate or comprehensive.

Sources:

Title: Neuroendocrine alterations in the exercising human: Implications for energy homeostasis
URL: https://www.metabolismjournal.com/article/S0026-0495(13)00020-6/fulltext
Text: Cortisol has wide-ranging effects, including alterations of carbohydrate

#### Analysis

LLaMa by itself does a fair job at suggesting that there is a positive correlation, but it doesn't say so confidently. It is able to provide sources, but LLMs are known to hallucinate sources, so I wouldn't expect this feature to be robust.

RAG give a clear and succinct answer with direct excerpts of the content it used to answer the question.

### Question 2

The abstract of the embedded document discusess mechanisms that exist in the human to defend against adverse effects of negative energy balance.

Let's see how much each model knows about these mechanisms.

The answer I'm looking for is something along the lines of "alterations of hormone secretion affecting the growth hormone/insulin-like growth factor system, the adrenal axis, and the reproductive system, particularly in females".

In [56]:
question_2 = "What mechanisms do humans have to defend against adverse effects of negative energy balance? Please cite sources."

In [57]:
get_llm_response(question_2)





I'm looking for information on the physiological and psychological mechanisms that humans use to defend against the adverse effects of negative energy balance, such as weight loss, fatigue, and mood disturbances. I would also like to know about any potential long-term consequences of chronic negative energy balance and how these can be mitigated.

Here are some specific questions I have:

1. What are the primary physiological mechanisms that humans use to defend against negative energy balance? For example, does the body increase hunger or decrease metabolism to conserve energy?
2. How do psychological factors such as stress, emotional state, and social support impact the body's response to negative energy balance?
3. Are there any long-term consequences of chronic negative energy balance that can have a significant impact on health and well-being? If so, what are they and how can they be mitigated?
4. Are there any specific nutrients or dietary components that can help protect again

In [58]:
get_rag_response(question_2)

 The hypothalamus plays a key role in regulating energy homeostasis and can activate various physiological responses to counteract the adverse effects of negative energy balance. For example, leptin, ghrelin, NPY, PYY, and melanocortin 4 receptor (MC4R) all play important roles in regulating energy balance and metabolism. Additionally, the hypothalamus can activate the sympathetic nervous system to increase glucose release from storage sites such as liver and fat tissue.

Sources:

* Leptin: Le Roux et al. (2016). Leptin and the regulation of energy balance. Journal of Endocrinology, 233(1), R1-R8.
* Ghrelin: Kojima et al. (2009). Ghrelin, an appetite-related hormone, is secreted by the stomach and small intestine. Journal of Clinical Endocrinology and Metabolism, 94(12), 4713-4718.
* NPY: Hahn et al. (2017). Neuropeptide Y (NPY) and its role in energy balance and metabolism. Peptides, 93, 14-23.
* PYY: Burton et al. (2017). PYY and its role in appetite regulation and weight management

#### Analysis

LLaMa fails to answer the question, instead posing other questions it should seek answers to in order to answer the question.

RAG provides a direct answer, even citing some scientific terminology, though I would need a subject matter expert to determine how correct it is. We also see RAG providing sources as part of its answer. It's important to note that these sources are coming from the underlying LLM and not the RAG mechanism, so we they're subject to hallucination. To see the real sources RAG referenced, we can directly produce a list of the documents that were passed as context.

### Question 3

Now let's test responses to a more scientifically dense question. In the embedded document, there's a paragraph on the effects of two specific hormones on appetite:

> Leptin and ghrelin appear to exert their effects on appetite primarily via the arcuate nucleus (ARC) of the hypothalamus, acting on two critical populations of neurons. (Fig. 1) One population produces neuropeptide Y (NPY) and agouti-related peptide (AgRP), orexigenic neurotransmitters co-localized to ARC neurons [[10]]. These neurons are directly stimulated by ghrelin, leading to increased food intake and body weight [[11]]. Leptin suppresses activity of NPY/AgRP neurons. A second population of neurons produces pro-opiomelanocortin (POMC), the precursor to several hormones, including α-melanocyte stimulating hormone (αMSH). αMSH binds to the melanocortin-3 and -4 receptors (MC3R and MC4R) which inhibits food intake and in mice alters energy expenditure [[12]]. Leptin stimulates these POMC-producing neurons, thus suppressing appetite. Overlap in the actions of the POMC and NPY/AgRP neurons occurs via the action of AgRP, which antagonizes the action of αMSH at the MC3R and MC4R [[13]].

Let's see how much detail the models can provide to a question on this.

In [62]:
question_3 = "Which neurons do leptin and ghrelin act on to affect appetite? Please cite sources."

In [63]:
get_llm_response(question_3)



Leptin and ghrelin are two hormones that play important roles in regulating appetite and metabolism. Leptin is produced by adipose tissue and acts on the hypothalamus to suppress appetite, while ghrelin is produced by the stomach and acts on the hypothalamus to stimulate appetite. Here are some key points about how these hormones act on neurons to affect appetite:

1. Leptin receptors: Leptin binds to its receptor, OB-Rb, on the surface of neurons in the hypothalamus, specifically in the arcuate nucleus and the paraventricular nucleus. This binding activates signaling pathways that suppress appetite and increase energy expenditure. (Source: Kunos et al., 2000)
2. Ghrelin receptors: Ghrelin binds to its receptor, GHSR, on the surface of neurons in the hypothalamus, specifically in the ventromedial nucleus. This binding activates signaling pathways that stimulate appetite and increase food intake. (Source: Mantei et al., 2009)
3. Neuronal circuits: Leptin and ghrelin act on specific ne

In [64]:
get_rag_response(question_3)

 Leptin and ghrelin act on two populations of neurons in the arcuate nucleus of the hypothalamus. The first population produces neuropeptide Y (NPY) and agouti-related peptide (AgRP), which are orexigenic neurotransmitters. Leptin suppresses activity of these neurons, while ghrelin directly stimulates them. The second population of neurons produces pro-opiomelanocortin (POMC), which is the precursor to several hormones, including α-melanocyte stimulating hormone (αMSH). Leptin stimulates these neurons, while ghrelin has no effect on them. Sources: [10], [11], [12], [13]

Sources:

Title: Neuroendocrine alterations in the exercising human: Implications for energy homeostasis
URL: https://www.metabolismjournal.com/article/S0026-0495(13)00020-6/fulltext
Text: Leptin and ghrelin appear to exert their effects on appetite primarily via the arcuate nucleus (ARC) of the hypothalamus, acting on two critical populations of neurons. (Fig. 1) One population produces neuropeptide Y (NPY) and agouti

#### Analysis

LLaMa provides a surprisingly good answer, citing specifics that must've been in its training data.

RAG also provies a good answer, pulling context primarily from the paragraph we hoped it would (which shows that the embeddings are somewhat good).