# __opensource-rag-example__
### _University of Oklahoma_
### _October 2023_

_NOTE: This notebook will ONLY work with a colab runtime which has GPU.
So make sure that you have a GPU instance. Runtime -> Change Runtime Type -> GPU -> T4_

This notebook demonstrates how to setup a Retrieval Augmented Generation engine.  The engine uses open source resouces to show a method that doesn't require the administrator to pay for OpenAI API access.  It uses the following resources:
* __ChromaDB__:  vector storage database
* __HuggingFace__:  transformer library and API
* __LangChain__:  wrapper library to engineer prompts to transformers


In [1]:
# Clone the Git repository into the local runtime
!git clone https://github.com/mjbeattie/OU-RAG-seminar/

Cloning into 'LLMCourseware'...
remote: Enumerating objects: 55, done.[K
remote: Counting objects: 100% (55/55), done.[K
remote: Compressing objects: 100% (44/44), done.[K
remote: Total 55 (delta 25), reused 18 (delta 8), pack-reused 0[K
Receiving objects: 100% (55/55), 2.83 MiB | 17.05 MiB/s, done.
Resolving deltas: 100% (25/25), done.


In [2]:
# Display the status of CUDA.  You must connect to a GPU runtime for this to work.
!nvidia-smi

Mon Oct  9 20:27:53 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   65C    P8    11W /  70W |      0MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces



Install Prerequsites.
* datasets, transformers - to use Huggiging face transformers library
* langchain - Langchain python library for chaining, RAG and agent examples
* bitsandbytes - to enable loading models in 8bit
* accelerate - runtime optimization of inference
* ChromaDB - Vector Database for indexing and RAG examples

In [3]:
!pip install datasets transformers==4.28.0 numpy langchain bitsandbytes accelerate chromadb pdfplumber pypdf sentence-transformers

Collecting datasets
  Downloading datasets-2.14.5-py3-none-any.whl (519 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m519.6/519.6 kB[0m [31m8.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting transformers==4.28.0
  Downloading transformers-4.28.0-py3-none-any.whl (7.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.0/7.0 MB[0m [31m27.2 MB/s[0m eta [36m0:00:00[0m
Collecting langchain
  Downloading langchain-0.0.311-py3-none-any.whl (1.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m39.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting bitsandbytes
  Downloading bitsandbytes-0.41.1-py3-none-any.whl (92.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.6/92.6 MB[0m [31m10.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting accelerate
  Downloading accelerate-0.23.0-py3-none-any.whl (258 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m258.1/258.1 kB[0m [31m27.2

## __Create a vector store database__
Take a set of information, in this case a set of PDFs, and load them into a vector database.

We use ChromaDB, an open source vector database, to store the information.

In [4]:
from langchain.document_loaders import PyPDFDirectoryLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.embeddings import HuggingFaceEmbeddings, SentenceTransformerEmbeddings
import chromadb

# Define directory and system variables
config = {"pdfpath" : "./OU-RAG-seminar/sample_pdfs/",
          "vdbmspath" : "./vdbms",
          "load_in_8bit" : False,
          "embedder" : "all-MiniLM-L6-v2",
          "llm": "google/flan-t5-base"
          }

# Load the sentence-transformer embedder.  This is a BERT-type model
embeddings = HuggingFaceEmbeddings(model_name=config["embedder"])

# Read the PDFs using LangChain
pdf_loader = PyPDFDirectoryLoader(config["pdfpath"])
documents = pdf_loader.load()

# Split text into chunks
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
chunks = text_splitter.split_documents(documents)

# Create vector store and store chunks in it
vector_store = Chroma.from_documents(documents=chunks, embedding=embeddings, persist_directory=config["vdbmspath"])


Downloading (…)e9125/.gitattributes:   0%|          | 0.00/1.18k [00:00<?, ?B/s]

Downloading (…)_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Downloading (…)7e55de9125/README.md:   0%|          | 0.00/10.6k [00:00<?, ?B/s]

Downloading (…)55de9125/config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

Downloading (…)ce_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

Downloading (…)125/data_config.json:   0%|          | 0.00/39.3k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

Downloading (…)nce_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

Downloading (…)e9125/tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

Downloading (…)9125/train_script.py:   0%|          | 0.00/13.2k [00:00<?, ?B/s]

Downloading (…)7e55de9125/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)5de9125/modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

# __Interact with the vector store__

Langchain creates a rudimentary vector store without metadata and other useful information.  It would be better to create the vector store directly in Chromadb, but we won't do that here.

We also test query the vector store.

In [5]:
import chromadb

# Look at the collection.  Notice it's called langchain
client = chromadb.PersistentClient(path=config["vdbmspath"])
collections = client.list_collections()
print("List of collections in the vector store:")
for collection in collections:
    print(collection.name)

# Load the vector store from persistent storage
query = "what is the gateway hypothesis"
print("\nThe query we will ask is:  ", query, "\n\n")

vdbms = Chroma(persist_directory=config["vdbmspath"], embedding_function=embeddings)
docs = vdbms.similarity_search(query, k=3)
for i in range(0, len(docs)):
  print(docs[i].page_content)

List of collections in the vector store:
langchain

The query we will ask is:   what is the gateway hypothesis 


4.3 Identiﬁcation of the Stable Clusters . . . . . . . . . . . . . . . . . 68
4.4 Characteristics of the Stable Clusters . . . . . . . . . . . . . . . . 71
5 Discussion 76
5.1 Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
5.2 Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.3 Limitations of this study . . . . . . . . . . . . . . . . . . . . . . . 81
5.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
6 Investigating the Gateway Hypothesis 84
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
6.2 Gateway Hypothesis literature review . . . . . . . . . . . . . . . . 87
6.3 Gaps in the literature . . . . . . . . . . . . . . . . . . . . . . . . . 90
6.4 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
6.4.1 Evaluating drug initiation sequen

# __Define an LLM__

HuggingFace has thousands of models that can be accessed either via API or by downloading them to a local resource.  For this example, we will download the model and load it into the Colab runtime.  This method is generally more difficult than passes queries to an API.  However, if we were to retrain a general model, we might want to load it to a local resource.

We select a model from HuggingFace, setup its parameters, and load it into an llm class instance from the LangChain library.

In [31]:
from langchain.llms import HuggingFacePipeline

llm = HuggingFacePipeline.from_model_id(
    model_id=config["llm"],
    task="text2text-generation",
    device=0,
    model_kwargs={"load_in_8bit": False, "max_length": 512, "temperature": 0.}
)

In [32]:
# Query vector store using RetrievalQA
from langchain.chains import RetrievalQA
qa = RetrievalQA.from_chain_type(llm=llm, chain_type='stuff', retriever=vector_store.as_retriever())

In [41]:
query = "Who was the author of this information?"

In [42]:
qa.run(query)

'MATTHEW J. BEATTIE'