## RAG with LangChain on local ollama LLM deployment
Author: **Peeyush Sharma**; Feedback: **PSharma3@gmail.com**

This notebooks has a sample workflow of Retrival Augmented Generation (RAG). Components covered in this scaled-down implementation include:
- Meta's llama 3.2 as LLM.
- Ollama as LLM deployment framework.
- Model 'all-MiniLM-L6-v2' for generating embeddings.
- The documents ingested are publicly available Loan IQ sample documents. Loan IQ is used in Commercial Syndicated Lending business.
- Chroma DB has been used as the Vector DB. The production implementation can use Zilliz, pgVector, MongoDB, Pinecone etcetera based on latency and cost considerations.




In [14]:
import os
from uuid import uuid4

import chromadb
import torch
from langchain.chains import RetrievalQA
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain_chroma import Chroma
from langchain_community.llms import Ollama
from langchain_core.prompts import ChatPromptTemplate
from langchain_huggingface import HuggingFaceEmbeddings

In [15]:
DOC_DIR = "../documents"
DOC_TYPE = "LoanIQ_Docs"
EMBEDDING_MODEL_NAME = "all-MiniLM-L6-v2"
LLM_MODEL_NAME = "llama3.2"
CHUNK_SIZE = 4096
CHUNK_OVERLAP = 256

In [18]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "mps" if torch.backends.mps.is_available() else "cpu")
embeddings = HuggingFaceEmbeddings(model_name=EMBEDDING_MODEL_NAME)
collection_name = "rag_collection"
chroma_client = chromadb.PersistentClient()

try:
    # chroma_client.delete_collection(name=collection_name)
    chroma_client.get_or_create_collection(collection_name)
except ValueError:
    chroma_client.delete_collection(collection_name)

collection = chroma_client.get_or_create_collection(collection_name)

vector_store = Chroma(
    client=chroma_client,
    collection_name=collection_name,
    embedding_function=embeddings
)

In [20]:
# Split the document into chunks
def insert_document_to_vector_db(doc_path):
    loader = PyPDFLoader(doc_path)
    documents = loader.load()
    text_splitter = CharacterTextSplitter(chunk_size=CHUNK_SIZE, chunk_overlap=CHUNK_OVERLAP, separator="\n")
    docs = text_splitter.split_documents(documents=documents)
    uuids = [str(uuid4()) for _ in range(len(docs))]
    # print(len(uuids))
    ids = vector_store.add_documents(documents=docs, ids=uuids)
    return ids

In [21]:
dir_path = os.path.join(DOC_DIR, DOC_TYPE)
vector_indices = []
if os.path.exists(dir_path) and os.path.isdir(dir_path):
    temp_indices = [insert_document_to_vector_db(os.path.join(dir_path, doc)) for doc in os.listdir(dir_path)]
    vector_indices.extend(temp_indices)
# vector_indices

[['82f263e8-5bff-4a7d-8ee7-5678db918302',
  'f353bdef-ba29-4e2f-8c00-30308eb3edb3',
  'a8e10fb0-0073-4c70-914d-0bc7bb8f8ab4',
  '9533d3c9-66a9-4e0a-89c7-7303000c5ae5',
  '44a2e15b-73cd-41ea-a726-59a32b6b9e08',
  'e3488073-0a0a-43c9-acb7-a85fd81646b2',
  'fa0f02ea-3ae0-4012-8c22-ea266be2ff97',
  '7376a77f-8e08-46cb-a53d-8aa9221d699a',
  '741b87a9-a3f4-48a4-b6f1-b64dc5adbc96',
  'c52f4b13-86ab-43c4-b07d-ad3c5ecb1685',
  '3202047b-b4b9-42a0-ba73-d8115e47e8e7',
  'e57770b7-5d74-416e-a2d8-2cffd68e2140',
  '801a2e4a-53fb-434f-a0bb-7e5685c48631',
  'b49cf61d-bf1f-4fcc-9ad1-ee70bdef759b',
  '02856630-6aec-4d09-8e1b-e78bd38c8955',
  'ee0a9b7a-b778-41d9-bcbd-49bc17c2692a',
  '810fe828-1f4d-46a4-b35b-0a0e3f10d6ae',
  '9db693fa-c32a-4777-93d3-6e0ebac88787',
  '0b44de33-1bc5-4c8b-9b67-0ba8d7eeb774',
  'd109a9e8-ef4d-464a-acbe-5c0a3cc4ace2',
  '499f6023-7662-41e9-8b85-181c8600dd9a',
  '506c24a7-7fd7-4d9a-b03c-2b202a7f9584',
  'aef6d226-0b27-4899-b677-127e2e024735',
  'be6480eb-301f-4f07-ba13-38843a3

In [25]:
# Initialize the local model
llm = Ollama(model=LLM_MODEL_NAME)

# Create RetrievalQA
retriever = vector_store.as_retriever(search_type="similarity",
                                      search_kwargs={'k': 10})

template = """
You are a Loan IQ consultant answering questions about Loan IQ.
Loan IQ is a commercial lending software managing the origination
and cash flows of syndicated loans.

You are given loan IQ documentation: {documentation}

You are to provide answers to user's questions: {question}
"""

prompt = ChatPromptTemplate.from_template(template=template)
chain = prompt | llm

questions = [
    "What is syndicate lending?",
    "Who is the vendor behind Loan IQ? What business purposes does Loan IQ server",
    "What are the different interest rate calculation methods that Loan IQ supports?",
]

for question in questions:
    print("\n\n----------------------")
    print("Question: " + question)
    documents = retriever.invoke(question)
    result = chain.invoke({"documentation": [documents], "question": question})
    print(f"\nLLM Response: {result}")




----------------------
Question: What is syndicate lending?

LLM Response: Based on the provided documents, I can answer that:

Syndicated lending refers to a type of loan structure where multiple lenders participate in a single loan transaction. In this structure, one lender acts as the Administrative Agent (also known as the Lead Arranger or Coordinating Bank) and coordinates the participation of other lenders, who are typically referred to as Syndicators.

The Administrative Agent is responsible for managing the loan, negotiating with borrowers, and ensuring that all participating lenders receive their share of the loan proceeds. The syndicate structure allows multiple lenders to participate in a single transaction, which can be beneficial for both lenders and borrowers.

In the context of the provided documents, the role of Syndicated Lending is mentioned in the job description for a Syndicated Loan Operations role at Truist Bank, where the candidate will be responsible for suppo