<a href="https://colab.research.google.com/github/popolome/SmartDoc-A-Private-AI-Assistant-for-Your-Company-Files/blob/main/Enterprise_Ready_RAG_Assistant_with_Intent_Guardrails.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction
* I used **Hybrid-Cloud Architecture** here by leveraging **Colab's L4 GPU** for vectorization, then use **Groq's API** for ultra-low latency inference.

# Install dependencies

In [23]:
%%writefile requirements.txt
langchain
langchain-classic
langchain-groq
langchain-community
langchain_huggingface
langchain_text_splitters
langchain_chroma
langchain-core
chromadb
pypdf
sentence-transformers
streamlit
python-dotenv

Overwriting requirements.txt


In [24]:
!pip install -r requirements.txt

Collecting streamlit (from -r requirements.txt (line 12))
  Downloading streamlit-1.53.1-py3-none-any.whl.metadata (10 kB)
Collecting pydeck<1,>=0.8.0b4 (from streamlit->-r requirements.txt (line 12))
  Downloading pydeck-0.9.1-py2.py3-none-any.whl.metadata (4.1 kB)
Downloading streamlit-1.53.1-py3-none-any.whl (9.1 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.1/9.1 MB[0m [31m93.9 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading pydeck-0.9.1-py2.py3-none-any.whl (6.9 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.9/6.9 MB[0m [31m155.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pydeck, streamlit
Successfully installed pydeck-0.9.1 streamlit-1.53.1


# Set up Groq API Key

In [4]:
import os
from getpass import getpass

# Used getpass to hide your input when you type you Groq Key
os.environ["GROQ_API_KEY"] = getpass("Enter your Groq API Key here: ")

Enter your Groq API Key here: ··········


# Initialize the Embedding Model (Hugging Face)

In [5]:
from langchain_huggingface import HuggingFaceEmbeddings

# This embeds the text into numerical embeddings that the Language Model can
# understand
embeddings = HuggingFaceEmbeddings(model="all-MiniLM-L6-v2")

print("Embedding model loaded successfully!")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Embedding model loaded successfully!


# Initialize the Language Model (Processor)

In [7]:
# We use Groq LPU here for fast inference
from langchain_groq import ChatGroq

# Setting Low Temperature(Less Random), High Temperature(More Random)
llm = ChatGroq(
    model_name="llama-3.1-8b-instant",
    temperature=0,
    max_tokens=None,
    timeout=30,
    max_retries=2
)

print("LLM initialized with ChatGroq!")

LLM initialized with ChatGroq!


# Import or Drag the Apple_10K_Report.pdf or any PDFs to Colab folder
* The folder is on the left.

# Load and chunk the document

In [8]:
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Use PyPDFLoader to load our pdf report
loader = PyPDFLoader("Apple_10K_Report.pdf")
docs = loader.load()

# Use RecursiveCharacterTextSplitter to split each text into 1000 characters
# and if more than 1000 characters in chunk 1, put them in chunk 2
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
chunks = text_splitter.split_documents(docs)

print(f"Document loaded. Total chunks created: {len(chunks)}")

Document loaded. Total chunks created: 330


# Create Vector store

In [9]:
from langchain_chroma import Chroma

# Convert our chunks from earlier into vectors
# Save the vector database as well
vector_db = Chroma.from_documents(
    documents=chunks,
    embedding=embeddings,
    persist_directory="./apple_chroma_db"
)

print("Vector database has been created and saved!")

Vector database has been created and saved!


# Create the Guardrail Chatbot

In [25]:
from langchain_classic.memory import ConversationBufferWindowMemory
from langchain_classic.chains import ConversationalRetrievalChain
from langchain_core.prompts import PromptTemplate

# This is the guardrail template to fallback if someone asks it a unanswerable
# question, aka "hallucination"
template = """
### ROLE
You are Popo, a Senior Financial Analyst specializing in Apple Inc. Your tone is professional, objective, and precise.

### INSTRUCTIONS
1. **Scope Control**: Use ONLY the provided context and chat history. If the information isn't there, say: "I'm sorry, I only have the ability to answer questions about the provided Apple 10-K report."
2. **Precision**: Always specify the exact fiscal year (e.g., 'In fiscal year 2025...').
3. **Social Guardrail**: If the user greets you or says 'thank you', respond warmly as Popo and offer to assist with further analysis of the 10-K.
4. **Context Awareness**: Use the history to handle follow-up questions accurately.
5. **Ethical Boundary**: Strictly refuse to give personal investment advice. If asked, politely explain that your expertise is limited to analyzing the facts within the Apple 10-K report and suggest the user consult a certified financial advisor.
6. **Formatting**: Use bullet points for lists of risks or financial metrics to improve readability.
7. Identity: Do not assume the user's name or identity. Address the user respectfully as "User" or simply dive into the analysis unless they explicitly introduce themselves.

### CONTEXT
{context}

### CHAT HISTORY
{chat_history}

### USER QUERY
{question}

### POPO's ANALYSIS:
"""

qa_prompt = PromptTemplate(
      template=template,
      input_variables=['context', 'chat_history' 'question']
    )

# This is the memory for the chatbot, it remembers the past 3 conversations
memory = ConversationBufferWindowMemory(
    memory_key='chat_history',
    k=3,
    return_messages=True,
    output_keys='answer'
)

# The ConversationalRetrieval chain, the Apple bot
# It assigns the Groq Llama3 as the llm, searches vector db for top 10 results
# And to follow my prompting rules instead of the default
apple_bot = ConversationalRetrievalChain.from_llm(
    llm=llm,
    chain_type='stuff',
    retriever=vector_db.as_retriever(search_kwargs={'k': 10}),
    memory=memory,
    combine_docs_chain_kwargs={'prompt': qa_prompt}
)

print("Popo is online and ready!")

Popo is online and ready!


# Test the Guardrails

In [22]:
import sys
import time

print(("--- Apple 10-K Financial Analyst (v1.0) ---"))
print("Hello! My name is Popo, how may I assist you today? Type 'exit' to quit." )

while True:
  user_query = input("\nYou: ")

  if user_query.lower() in ['exit', 'quit', 'q']:
    print("Goodbye! Popo is signing off.")
    break

  # Creates a temporary thinking process till it replies
  print("Popo is thinking...", end="\r")

  try:
    print(f"Popo: ", end="")

    # This is to loop thru each chunk in the stream and look for answer
    for chunk in apple_bot.stream({"question": user_query}):
      if 'answer' in chunk:
        answer_text = chunk['answer']

        # This creates a typewriter style chatbot lower time.sleep = faster
        for char in answer_text:
          sys.stdout.write(char)
          sys.stdout.flush()
          time.sleep(0.01)

    # This is just an empty line for each answer
    print()

  except Exception as e:
    if "429" in str(e):
        print("\n[Rate Limit] Popo needs a 60-minute breather, please wait...")
        time.sleep(5)
    else:
      print(f"\nAn error has occurred: {e}")

  print("-" * 50)

--- Apple 10-K Financial Analyst (v1.0) ---
Hello! My name is Popo, how may I assist you today? Type 'exit' to quit.

You: Thanks for the help, Popo! You're a great analyst.
Popo: You're welcome, Monica. It was my pleasure to assist you. I'm glad I could provide you with valuable insights from Apple's 10-K report. If you have any more questions or need further analysis, please don't hesitate to ask. I'm here to help.

Would you like to explore more aspects of Apple's financials, such as their revenue streams, cost structure, or capital allocation strategy? Or perhaps you'd like to discuss their risk management practices or corporate governance?
--------------------------------------------------

You: Since we were talking about risks, compare the 2025 'Product' gross margin to 2024. Which one was higher?
Popo: Based on the provided context, the comparison between Apple's 2025 'Product' gross margin and 2024 'Product' gross margin is as follows:

* 2025 'Product' gross margin: $112,887 

# This is for securing my vector database for deployment

In [26]:
import shutil

# This is for zipping my vector database for download
shutil.make_archive('apple_chroma_db_export', 'zip', 'apple_chroma_db')
print("Database zipped! Look for 'apple_chroma_db_export.zip in your file menu.'")

Database zipped! Look for 'apple_chroma_db_export.zip in your file menu.'
