# UPLOADING DOCUMENT AND CREATE VECTOR STORE

In [2]:
# RAG System using CrewAI for a Jupyter Notebook Environment
# Step 1: Ingestion - Processing the document and creating a vector store.

import os
import warnings
from langchain_community.document_loaders import PyPDFLoader # Using LangChain's loader

# LangChain components for document processing
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import HuggingFaceEmbeddings

# --- Suppress Warnings ---
warnings.filterwarnings("ignore", category=DeprecationWarning)
warnings.filterwarnings("ignore", category=UserWarning)

# --- 1. Configuration & Setup ---

# IMPORTANT: Place the document you want to query in the same directory as your notebook
# or provide the full path as shown below.
file_path = r"C:\Users\reuel\Downloads\UNIT-1_Notes.pdf" # Using the full path you provided

# Define the path for the local vector store
VECTOR_STORE_PATH = "vectorstore/faiss_index"

# Initialize embeddings model
# This will be used to convert text into numerical vectors for the database.
print("Initializing embeddings model...")
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
print("Embeddings model initialized.")

# --- 2. Ingestion Pipeline: Process and Store the Document ---

def process_document(path):
    """Loads a document, chunks its text, and creates a vector store."""
    if not os.path.exists(path):
        print(f"Error: File not found at '{path}'. Please make sure the file is in the correct directory.")
        return False

    print(f"\nProcessing document: {path}...")

    # Load the document using PyPDFLoader
    try:
        print("Loading document with PyPDFLoader...")
        loader = PyPDFLoader(path)
        documents = loader.load()
        print(f"Successfully loaded {len(documents)} pages from the document.")
    except Exception as e:
        print(f"An error occurred during document loading: {e}")
        return False

    # Chunk the loaded documents
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=1000,
        chunk_overlap=200
    )
    text_chunks = text_splitter.split_documents(documents)
    print(f"Document split into {len(text_chunks)} chunks.")

    # Create and save a FAISS vector store from the document chunks
    try:
        print("Creating vector store... This may take a moment.")
        # Use from_documents since we have LangChain Document objects
        vectorstore = FAISS.from_documents(documents=text_chunks, embedding=embeddings)
        vectorstore.save_local(VECTOR_STORE_PATH)
        print(f"Vector store created and saved successfully in the '{VECTOR_STORE_PATH}' folder.")
        return True
    except Exception as e:
        print(f"An error occurred while creating the vector store: {e}")
        return False

# --- 3. Main Execution Block ---

# This block will run the ingestion process when you execute the script.
if __name__ == "__main__":
    process_document(file_path)



Initializing embeddings model...
Embeddings model initialized.

Processing document: C:\Users\reuel\Downloads\UNIT-1_Notes.pdf...
Loading document with PyPDFLoader...
Successfully loaded 13 pages from the document.
Document split into 32 chunks.
Creating vector store... This may take a moment.
Vector store created and saved successfully in the 'vectorstore/faiss_index' folder.


In [3]:
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import HuggingFaceEmbeddings

VECTOR_STORE_PATH = "vectorstore/faiss_index"
EMBEDDINGS_MODEL = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

def retrieve_context(user_query, vector_store_path, embedding_model, top_k=4):
    # Load vector store
    vector_store = FAISS.load_local(vector_store_path, embedding_model, allow_dangerous_deserialization=True)
    retriever = vector_store.as_retriever(search_kwargs={"k": top_k})
    # Retrieve top relevant chunks
    retrieved_docs = retriever.invoke(user_query)
    # Concatenate their text
    context = "\n\n".join([doc.page_content for doc in retrieved_docs])
    print(f"\nRetrieved {len(retrieved_docs)} relevant context chunks for the question:")
    print(f"----- CONTEXT START -----\n{context}\n----- CONTEXT END -----")
    return context

if __name__ == "__main__":
    # Assume ingestion has already run successfully
    user_query = input("Enter your question: ")   # B1
    context = retrieve_context(user_query, VECTOR_STORE_PATH, EMBEDDINGS_MODEL) # B2



Retrieved 4 relevant context chunks for the question:
----- CONTEXT START -----
o
 
Avoid  using  long-term  access  keys.   
Demo:  IAM  Setup  and  Usage   
Scenario :  Create  an  IAM  user,  assign  permissions,  and  test  access.   
Step  1:  Create  an  IAM  User   
1.  Go  to  the  AWS  Management  Console  >  IAM  >  Users .   2.  Click  Add  user .

AWS  Identity  and  Access  Management  (IAM)  with  Demo   
Introduction  to  IAM   
•
 
What  is  IAM?   
o
 
AWS  Identity  and  Access  Management  (IAM)  is  a  service  that  enables  you  to   securely  manage  access  to  AWS  resources.   
o
 
It  allows  fine-grained  control  over  who  can  access  resources  and  what  actions   they  can  perform.   
•
 
Purpose  of  IAM :   
o
 
Enhance  security  by  controlling  user  access.   
o
 
Manage  permissions  for  multiple  users,  groups,  and  roles.   
o
 
Ensure  least-privilege  access.

high  availability  and  disaster  recovery.  6.  Elasticity :   
o
 
Quickly

## Working

In [2]:
from crewai import Agent, Task, Crew
import concurrent.futures
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import HuggingFaceEmbeddings

# ------------------------------------------------------------------
# Vector Store Setup
# ------------------------------------------------------------------
VECTOR_STORE_PATH = "vectorstore/faiss_index"
EMBEDDINGS_MODEL = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

def retrieve_context(user_query, vector_store_path, embedding_model, top_k=4):
    # Load vector store
    vector_store = FAISS.load_local(vector_store_path, embedding_model, allow_dangerous_deserialization=True)
    retriever = vector_store.as_retriever(search_kwargs={"k": top_k})
    # Retrieve top relevant chunks
    retrieved_docs = retriever.invoke(user_query)
    # Concatenate their text
    context = "\n\n".join([doc.page_content for doc in retrieved_docs])
    print(f"\nRetrieved {len(retrieved_docs)} relevant context chunks for the question:")
    print(f"----- CONTEXT START -----\n{context}\n----- CONTEXT END -----")
    return context

# ------------------------------------------------------------------
# Create Expert Agents
# ------------------------------------------------------------------
llama_agent = Agent(
    role="Tutor: Llama",
    goal="Explain the concept with clarity and foundational knowledge using the provided context.",
    backstory="A patient tutor who focuses on basics and clear examples.",
    llm="ollama/llama2",
    verbose=True
)

deepseek_agent = Agent(
    role="Coach: DeepSeek",
    goal="Explain with analogies, encouragement, and friendly examples using the provided context.",
    backstory="A motivational coach who helps learning feel fun and inspiring.",
    llm="ollama/deepseek-r1",
    verbose=True
)

mistral_agent = Agent(
    role="Analyst: Mistral",
    goal="Provide deep insights, point out pitfalls, and enrich understanding using the provided context.",
    backstory="An analytical thinker highlighting nuances and deeper context.",
    llm="ollama/mistral",
    verbose=True
)

synth_agent = Agent(
    role="Synthesizer",
    goal="Merge the three expert answers into a coherent, polished explanation for students.",
    backstory="An editor that harmonizes different teaching styles into one effective answer.",
    llm="ollama/llama2",
    verbose=True
)

# ------------------------------------------------------------------
# Define Tasks (Modified to include context)
# ------------------------------------------------------------------
def create_expert_task(agent, description, context, question):
    return Task(
        description=f"{description}\n\nContext from documents:\n{context}\n\nStudent Question: {question}",
        agent=agent,
        expected_output="A helpful explanation based on the provided context and question."
    )

def create_synthesis_task(agent, expert_answers):
    return Task(
        description=f"Combine these three expert answers into a single final answer:\n\n{expert_answers}",
        agent=agent,
        expected_output="A polished, engaging answer perfect for students—combines clarity, examples, and depth."
    )

# ------------------------------------------------------------------
# Helper to run individual tasks
# ------------------------------------------------------------------
def _run_single_task(task_obj):
    temp_crew = Crew(
        agents=[task_obj.agent],
        tasks=[task_obj],
        verbose=False
    )
    result = temp_crew.kickoff()
    return task_obj.agent.role, str(result)


# ------------------------------------------------------------------
# Main Execution
# ------------------------------------------------------------------
if __name__ == "__main__":
    # Get user question and retrieve context
    user_query = input("Enter your question: ")
    context = retrieve_context(user_query, VECTOR_STORE_PATH, EMBEDDINGS_MODEL)
    
    # Create tasks with context
    task1 = create_expert_task(
        llama_agent, 
        "Provide a clear, foundational explanation of the topic", 
        context, 
        user_query
    )
    
    task2 = create_expert_task(
        deepseek_agent, 
        "Explain the same topic with friendly analogies and encouragement", 
        context, 
        user_query
    )
    
    task3 = create_expert_task(
        mistral_agent, 
        "Provide deeper insights, cautionary points, and analytical depth", 
        context, 
        user_query
    )

    # --- Run the three experts in parallel ---
    expert_tasks = [task1, task2, task3]
    expert_outputs = {}

    with concurrent.futures.ThreadPoolExecutor() as pool:
        futures = {pool.submit(_run_single_task, t): t for t in expert_tasks}
        for fut in concurrent.futures.as_completed(futures):
            role, answer = fut.result()
            expert_outputs[role] = answer
            print(f"\n[{role} completed]\n{answer}\n")

    # --- Synthesis step ---
    combined_context = "\n\n".join(expert_outputs.values())
    
    task4 = create_synthesis_task(synth_agent, combined_context)
    synth_crew = Crew(
        agents=[synth_agent],
        tasks=[task4],
        verbose=True
    )
    
    final_answer = synth_crew.kickoff()

    print("\n========== FINAL SYNTHESIZED ANSWER ==========\n")
    print(final_answer)



Retrieved 4 relevant context chunks for the question:
----- CONTEXT START -----
AWS  Identity  and  Access  Management  (IAM)  with  Demo   
Introduction  to  IAM   
•
 
What  is  IAM?   
o
 
AWS  Identity  and  Access  Management  (IAM)  is  a  service  that  enables  you  to   securely  manage  access  to  AWS  resources.   
o
 
It  allows  fine-grained  control  over  who  can  access  resources  and  what  actions   they  can  perform.   
•
 
Purpose  of  IAM :   
o
 
Enhance  security  by  controlling  user  access.   
o
 
Manage  permissions  for  multiple  users,  groups,  and  roles.   
o
 
Ensure  least-privilege  access.

Key  Features  of  IAM   
1.  User  Management :   
o
 
Create  and  manage  IAM  users  to  access  AWS  services.   
o
 
Each  user  has  their  own  credentials  (username  and  password,  access  keys).  2.  Groups :   
o
 
Organize  users  into  groups  to  manage  permissions  collectively.   
o
 
Example:  
AdminGroup
,  
ReadOnlyGroup
.   3.  Roles 


[Tutor: Llama completed]
IAM (AWS Identity and Access Management) is a service provided by Amazon Web Services that enables organizations to securely manage access to AWS resources. It allows for fine-grained control over who can access resources and what actions they can perform, providing a centralized and secure way to manage user identities and permissions.

IAM provides several key features to help organizations manage access to their AWS resources:

1. User Management: IAM users can be created, managed, and assigned permissions to access AWS resources. Each user has their own credentials (username and password or access keys) that are stored securely in AWS.
2. Groups: Users can be organized into groups, allowing administrators to manage permissions collectively. For example, an administrator can create a group called "Developers" and assign it permissions to access certain AWS resources.
3. Roles: IAM roles can be assigned to EC2 instances or Lambda functions, allowing them to 


[Coach: DeepSeek completed]
Okay, let's imagine you're building your dream house (or apartment complex) using services from AWS – think of all the virtual servers, storage buckets, databases, etc., as individual rooms or buildings. IAM is like the **security system and access control center** for this entire development!

Here’s a breakdown:

1.  **Who Needs Access?**
    *   You need keys (or permissions) to enter different parts.
    *   IAM lets you create specific identities within your AWS world – these are called **IAM Users** or sometimes just "Users".
    *   Think of it like having individual house keys for each family member living in the apartment complex. Each person gets their own unique key.

2.  **Different Needs, Different Keys?**
    *   You don't want every single person to be able to open everything.
    *   IAM allows you to define exactly what they can and cannot do (the actions) on specific resources (like "only the kitchen door" or "all the server rooms").
    *


[Analyst: Mistral completed]
AWS Identity and Access Management (IAM) is a service offered by Amazon Web Services that enables secure and centralized management of access to AWS resources. IAM allows fine-grained control over who can access these resources and what actions they can perform, thereby enhancing security in your AWS environment. Key components of IAM include users, groups, roles, and policies.

Users are individual identities within your AWS account with their own credentials (username and password, access keys). Groups organize users into manageable collections for easier administration. Roles assign permissions to AWS resources or services, allowing them to perform specific tasks without requiring explicit user credentials. Policies are JSON documents that define permissions, which can be either predefined policies provided by AWS (AWS Managed Policies) or custom policies created by users (Customer Managed Policies).

To further secure your IAM environment, Multi-Factor

Output()



AWS Identity and Access Management (IAM) is a crucial service provided by Amazon Web Services that enables organizations to securely manage access to their AWS resources. It offers several key features to help organizations manage permissions, monitor resource usage, and improve security and compliance. These features include User Management, Group Management, Role-Based Access Control (RBAC), Policy Management, Multi-Factor Authentication (MFA), and Federated Identity Management.

User Management allows for the creation, management, and assignment of permissions to access AWS resources. Users have their own credentials (username and password or access keys) that are stored securely in AWS. Group Management enables administrators to organize users into manageable collections, making it easier to administer permissions. Roles allow for fine-grained control over what actions can be performed on AWS resources, without requiring explicit user credentials. Policies are JSON documents that