# Project Idea: Ideal Judah Agent
## Business Requirements & Goals
- Build an AI agent that acts as the ideal version of yourself (Judah), providing guidance, motivation, and actionable plans.
- The agent should accept your personal goals, routines, and preferences, and use them to generate daily/weekly plans, reminders, and suggestions.
- When asked, the agent should retrieve and summarize your goals, routines, and progress.
- If you feel down or need motivation, the agent should suggest personalized visualizations, affirmations, and uplifting content.
- Integrate Retrieval-Augmented Generation (RAG) to pull relevant information from your personal notes, documents, or curated knowledge base to provide context-aware advice and answers.
- The agent should be able to answer questions about your goals, suggest improvements, and adapt its responses based on your feedback and emotional state.
- Provide a simple interface (chat, dashboard, or voice) for easy interaction.
- Ensure privacy and security of your personal data.
## Example Use Cases
- "What are my goals for this week?"
- "Remind me of my morning routine."
- "I'm feeling low, can you help?" (Agent responds with affirmations, visualizations, and motivational content)
- "How am I progressing towards my fitness goal?" (Agent uses RAG to pull from your logs/notes)
- "Suggest improvements to my daily schedule."
## RAG Opportunities
- Use RAG to fetch and summarize your personal notes, past journal entries, or curated articles for context-aware responses.
- Enhance affirmations and visualizations by retrieving relevant content from a knowledge base or external sources.
- Provide tailored advice by combining your data with best practices and expert recommendations.

In [10]:
# Import required libraries for PDF extraction, text processing, and vector storage
from pypdf import PdfReader  # For reading PDF files
import os  # For file and folder operations
import glob  # For listing PDF files in a folder
from chromadb import Client  # For using ChromaDB as a vector stores
from chromadb.utils import embedding_functions  # For embedding text
import numpy as np  # For numerical operations (if needed)
from typing import List, Dict  # For type hints
import gradio as gr  # For building the UI
import logging  # For logging and debugging
import sys  # For system operations
from openai import OpenAI
from dotenv import load_dotenv  # For loading environment variables

In [11]:
load_dotenv(override=True)
openai = OpenAI()

In [12]:
# list all the pdf files in the documents directory
pdf_files = glob.glob("documents/*.pdf")
print(f"Found {len(pdf_files)} PDF files in the documents directory.")


Found 6 PDF files in the documents directory.


In [13]:
# Extract text from a PDF file and extract the text using PyPDF2 and save the text in a dictionary with filename as key and text as value
def extract_text_from_pdf(file_path: str) -> str:
    try:
        reader = PdfReader(file_path)
        text = ""
        for page in reader.pages:
            text += page.extract_text() + "\n"
        return text
    except Exception as e:
        logging.error(f"Error reading {file_path}: {e}")
        return ""

In [14]:
pdf_texts = {}
for pdf_file in pdf_files:
    text = extract_text_from_pdf(pdf_file)
    if text:
        pdf_texts[os.path.basename(pdf_file)] = text

In [15]:
print(f"Extracted text from {len(pdf_texts)} PDF files.")

Extracted text from 6 PDF files.


In [16]:
# Create and configure ChromaDB client
chroma_client = Client()

In [23]:
# Create and configure ChromaDB client with OpenAI embedding function
openai_ef = embedding_functions.OpenAIEmbeddingFunction(api_key=os.getenv("OPENAI_API_KEY"), model_name="text-embedding-ada-002")
chroma_client = Client()

In [21]:
# Delete existing judah_documents collection and create a new one with OpenAI embeddings
chroma_client.delete_collection(name="judah_documents")
collection = chroma_client.create_collection(name="judah_documents", embedding_function=openai_ef)
for filename, text in pdf_texts.items():
    collection.add(
        documents=[text],
        ids=[filename],
        metadatas=[{"filename": filename}]
    )
print(f"Re-indexed {len(pdf_texts)} documents in new ChromaDB collection.")

BadRequestError: Error code: 400 - {'error': {'message': "This model's maximum context length is 8192 tokens, however you requested 10541 tokens (10541 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.", 'type': 'invalid_request_error', 'param': None, 'code': None}} in add.

In [None]:
# Helper function to split text into chunks (approx. 2000 tokens, ~8000 characters)
def split_text(text, max_length=8000):
    return [text[i:i+max_length] for i in range(0, len(text), max_length)]

# Delete existing judah_documents collection and create a new one with OpenAI embeddings
#chroma_client.delete_collection(name="judah_documents")
collection = chroma_client.create_collection(name="judah_documents", embedding_function=openai_ef)

# Add all PDF texts in chunks to ChromaDB
for filename, text in pdf_texts.items():
    chunks = split_text(text)
    for idx, chunk in enumerate(chunks):
        chunk_id = f"{filename}_chunk{idx+1}"
        collection.add(
            documents=[chunk],
            ids=[chunk_id],
            metadatas=[{"filename": filename, "chunk": idx+1}]
        )
print(f"Re-indexed {sum(len(split_text(t)) for t in pdf_texts.values())} chunks in new ChromaDB collection.")

Re-indexed 15 chunks in new ChromaDB collection.


In [17]:
# Create a ChromaDB collection and add PDF texts as documents
collection = chroma_client.create_collection(name="judah_documents")
for filename, text in pdf_texts.items():
    collection.add(
        documents=[text],
        ids=[filename],
        metadatas=[{"filename": filename}]
    )
print(f"Indexed {len(pdf_texts)} documents in ChromaDB.")

/Users/judahgeorge/.cache/chroma/onnx_models/all-MiniLM-L6-v2/onnx.tar.gz: 100%|██████████| 79.3M/79.3M [00:09<00:00, 9.03MiB/s]


Indexed 6 documents in ChromaDB.


In [36]:
# test query
query = "work I did in 1 year"
results = collection.query(query_texts=[query], n_results=3)
for i, (result, meta) in enumerate(zip(results['documents'][0], results['metadatas'][0])):
    print(f"Result {i+1} (File: {meta['filename']}, Chunk: {meta['chunk']}):\n{result}\n")

Result 1 (File: Resume to update.pdf, Chunk: 1):
These  were  the  work  I  did  in  1  year,  can  you  sum  it  up  into  10  lines  so  that  I  can  update  in  my  
resume
  Led  Network  Automation  initiatives,  detecting  outages  and  traffic  fluctuations,  reducing  alert  
noise,
 
and
 
implementing
 
Machine
 
Learning
 
models
 
for
 
improved
 
efficiency.
 Played  a  pivotal  role  in  CCP  Migration,  ensuring  seamless  application  connectivity,  fixing  
post-migration
 
issues,
 
and
 
demonstrating
 
leadership
 
and
 
troubleshooting
 
skills.
 Managed  Legacy  CM  responsibilities  adeptly,  swiftly  resolving  application  issues,  conducting  
knowledge
 
sharing
 
sessions,
 
and
 
assisting
 
in
 
team
 
transition
 
post-migration.
 Enhanced  IOP  TTS  Deflection,  deploying  solutions  and  adjusting  automations  to  handle  delays  
effectively,
 
ensuring
 
incident
 
resolution
 
and
 
reporting
 
to
 
stakeholders.
 Strengthened  Auto  Paging  Email 

In [38]:
# RAG: Use retrieved chunks as context for OpenAI LLM answer, with improved prompt for Ideal Judah Agent objectives
query = "work I did in 1 year"
results = collection.query(query_texts=[query], n_results=3)
context = "\n---\n".join(results['documents'][0])
prompt = f"""You are the Ideal Judah Agent: an AI designed to help Judah achieve his goals, stay motivated, and provide actionable, personalized advice.\n\nYour objectives:\n- Summarize Judah's achievements, routines, and progress.\n- Offer encouragement and motivation based on his personal context.\n- Suggest improvements and next steps.\n- Ensure privacy and empathy in your responses.\n\nUsing the following context from Judah's personal documents, answer the question: '{query}'\n\nContext:\n{context}\n\nYour response should be clear, supportive, and tailored to Judah's needs.\n\nAnswer:"""
response = openai.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": "You are the Ideal Judah Agent, a helpful, empathetic, and motivating assistant."},
        {"role": "user", "content": prompt}
    ]
 )
print("LLM Answer:\n", response.choices[0].message.content)

LLM Answer:
 In the past year, you've achieved remarkable milestones in network automation, migration projects, and technology adoption. Your leadership in various initiatives, such as Network Automation, CCP Migration, and IOP TTS Deflection, has showcased your problem-solving skills and technical expertise. Your dedication to continuous improvement and learning has been evident with your diverse technology exploration and application development projects.

Moreover, your proactive troubleshooting approach, teamwork spirit, and mentorship have been valuable contributions to the team and organization. Your positive feedback and recognition from colleagues, customers, and leadership highlight the impact of your problem-solving abilities and project execution.

To enhance your achievements further, consider focusing on sustaining a balance between multiple projects, prioritizing work effectively, and seeking collaboration opportunities. Strive to improve time management skills and seek s

In [42]:
def answer_query(user_query, history):
    results = collection.query(query_texts=[user_query], n_results=3)
    context = "\n---\n".join(results['documents'][0])
    prompt = f"""You are the Ideal Judah Agent: an AI designed to help Judah achieve his goals, stay motivated, and provide actionable, personalized advice.

Your objectives:
- Summarize Judah's achievements, routines, and progress.
- Offer encouragement and motivation based on his personal context.
- Suggest improvements and next steps.
- Ensure privacy and empathy in your responses.

Using the following context from Judah's personal documents, answer the question: '{user_query}'

Context:
{context}

Your response should be clear, supportive, and tailored to Judah's needs.

Answer:"""
    response = openai.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "system", "content": "You are the Ideal Judah Agent, a helpful, empathetic, and motivating assistant."},
            {"role": "user", "content": prompt}
        ]
    )
    return response.choices[0].message.content

In [43]:
# let's create a chat interface using Gradio
iface = gr.ChatInterface(fn=answer_query, title="Ideal Judah Agent", description="Ask the Ideal Judah Agent anything about your goals, progress, and motivation!")
iface.launch()

  self.chatbot = Chatbot(


* Running on local URL:  http://127.0.0.1:7862
* To create a public link, set `share=True` in `launch()`.




In [44]:
# using the chunking strategy helped to reduce the context size and get more relevant results
# OpenAI embeddings are more effective for this use case compared to the default embeddings
# The difference between the embedding models is that OpenAI embeddings are specifically trained on a wide range of text data and are optimized for capturing semantic meaning, making them more suitable for tasks like document retrieval and question answering in this context.

In [45]:
## Improvements to be done to the agent:
# 1. Better chunking strategy to avoid breaking sentences.
# 2. More sophisticated prompt engineering for the LLM.
# 3. Add memory to the agent to remember past interactions.
# 4. Add more valuable documents to the corpus for richer context.
# --- IGNORE ---