In [1]:
from typing import List, TypedDict, Union, Optional
from pathlib import Path

from langchain_core.documents import Document
from langchain_community.document_loaders import PyPDFDirectoryLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_core.embeddings import Embeddings
from langchain_chroma import Chroma
from langchain_core.messages import AIMessage, HumanMessage
from langchain_ollama import OllamaLLM
from langgraph.graph import StateGraph, END

# 1. Document preparation

This section is about the document processing in the application. I use a specialized loader to extract text from PDF documents located in the `pdf_docs` directory. The documents are then split into manageable chunks using the `RecursiveCharacterTextSplitter`, which creates semantic units of text with appropriate overlap to maintain context between chunks.

The `prepare_documents` function handles the entire document preparation workflow:
1. Loading PDFs from the specified directory
2. Extracting text content while handling errors behind the scenes
3. Splitting documents into smaller chunks (1000 characters with 200 character overlap)
4. Adding source metadata and position tracking for better reference

In [2]:
def prepare_documents(docs_folder: str = "./pdf_docs") -> List[Document]:
    """
    Load and split PDF documents from a specified directory.

    Args:
        docs_folder (str, optional): The file containing folder. Defaults to "./pdf_docs".

    Returns:
        List[Document]: A list of Document objects containing the text from the PDFs split into chunks.
    """

    loader = PyPDFDirectoryLoader(
        path=docs_folder,
        glob="**/*.pdf",
        silent_errors=True
    )

    documents = loader.load()

    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=1000,
        chunk_overlap=200,
        length_function=len,
        add_start_index=True,
    )

    chunks = text_splitter.split_documents(documents)

    return chunks

In [3]:
chunks = prepare_documents()

# 2. Vector store creation

This section focuses on creating a vector store to enable semantic search capabilities in my application. After processing and splitting the PDF documents into chunks, I need to convert them into vector representations for efficient retrieval.

The process involves:

1. Initializing a Hugging Face embedding model (`sentence-transformers/all-MiniLM-L6-v2`) that converts text into numerical vector representations
2. Creating a Chroma vector database from the document chunks using the embedding model
3. Persisting the vector store to disk for future use without reprocessing documents

The vector store serves as the foundation for my retrieval-augmented generation system, allowing the application to find the most relevant document chunks based on semantic similarity rather than simple keyword matching.

Reason of Hugging Face embedding:
- The Hugging Face embedding model is lightweight and efficient, making it suitable for real-time applications.
- It provides high-quality embeddings that capture semantic meaning, enabling better retrieval of relevant information.

Reason of Chroma vector database:
- Chroma is designed for fast and efficient vector storage and retrieval, making it ideal for applications that require quick access to large amounts of vector data.
- It supports various indexing and querying methods, allowing for flexible and scalable vector search capabilities.

In [4]:
embedding_model = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2",
)

In [5]:
def create_chroma_vector_store(documents: List[Document], embedding_model: Embeddings, persist_dir: str = "./chroma_db") -> Chroma:
    """
    Create a Chroma vector store from the provided documents and embedding model.

    Args:
        documents (List[Document]): List of Document objects to be stored in the vector store.
        embedding_model (Embeddings): The embedding model to be used for vectorization.
        dir (str, optional): Directory to persist the collection. Defaults to "./chroma_db".

    Returns:
        Chroma: _description_
    """
    vector_store = Chroma.from_documents(
        documents=documents,
        embedding=embedding_model,
        persist_directory=persist_dir,
    )
    return vector_store

In [6]:
vector_store = create_chroma_vector_store(chunks, embedding_model)

# 3. Agent state definition

This section defines the state management for the Retrieval-Augmented Generation (RAG) workflow. The `AgentState` uses a TypedDict to provide a structured representation of the agent's state throughout the execution process.

The state contains:

- **messages**: Conversation history between the user and AI
- **context**: Additional information or extracted context relevant to the query
- **query**: The user's original question or request
- **retrieval_strategy**: Method used to retrieve documents (e.g., "multi_step", "deep_analysis", "direct")
- **next_nodes**: List of nodes to be executed in the workflow
- **retrieved_docs**: Documents fetched from the vector store based on the query
- **response**: The generated answer to the user's query
- **needs_correction**: Flag indicating if the response needs revision
- **reflection_feedback**: Feedback from verification processes

This state-based approach enables the workflow to maintain context across different processing nodes and make decisions based on accumulated information, creating a more coherent and informed interaction.

In [7]:
class AgentState(TypedDict):
    """
    AgentState represents the state of the agent in the state graph.
    """
    messages: List[Union[HumanMessage, AIMessage]]
    context: Optional[str]
    query: str
    retrieval_strategy: Optional[str]
    next_nodes: Optional[List[str]]
    retrieved_docs: List[str]
    response: Optional[str]
    needs_correction: Optional[bool]
    reflection_feedback: Optional[str]

# 4. LLM model definition

In this section, I initialize a large language model (LLM) to handle the natural language processing tasks in my RAG application. I'm using the Ollama framework to run a local instance of the Llama3 model, which offers several advantages for this implementation:

- **Local deployment**: Running the model locally ensures data privacy and reduces latency
- **No API costs**: Eliminates usage fees associated with commercial LLM services
- **Customization**: Ability to fine-tune parameters for specific use cases
- **Offline operation**: Can function without internet connectivity

The Llama3 model provides a good balance between performance and resource requirements, making it suitable for deployment on consumer hardware while still delivering high-quality responses for document-based question answering.

This LLM powers various components of the workflow, including query analysis, document retrieval optimization, response generation, and fact-checking processes.

In [8]:
llm = OllamaLLM(model="llama3")

# 5. Node implementations

### Orchestrator node

The Orchestrator node is the central decision-making component in the RAG workflow. It analyzes the user's query to determine the most appropriate processing path based on the query's nature and complexity.

Key functions of the Orchestrator:

- Query analysis: Examines the question to classify it into different categories (factual information search, complex analysis, or simple question)
- Workflow routing: Selects the optimal processing flow based on the query classification
- Strategy assignment: Sets the retrieval strategy to guide how documents are fetched and processed
- Context enhancement: Adds query classification information to the state context

The Orchestrator implements adaptive pathways, ensuring that simple questions receive direct answers while complex queries trigger deeper information retrieval and analysis. This intelligence-driven approach optimizes both processing efficiency and response quality.

In [9]:
def orchestrator(state: AgentState) -> AgentState:
    """
    Orchestrator function to determine the next steps based on the user's query.

    Args:
        state (AgentState): The current state of the agent, including the user's query and context.

    Returns:
        AgentState: The updated state of the agent with the next nodes to process.
    """

    query = state["query"]

    analysis_prompt = f"""
    Analyze the following user question and determine the appropriate processing flow:

    QUESTION: {query}

    Choose from the following options:
    1. Factual information search
    2. Complex analysis
    3. Simple question
    """

    analysis_result = llm.invoke(analysis_prompt)

    if "1" in analysis_result:
        state["retrieval_strategy"] = "multi_step"
        state["next_nodes"] = ["keyword_analyzer", "document_retriever"]
    elif "2" in analysis_result:
        state["retrieval_strategy"] = "deep_analysis"
        state["next_nodes"] = ["keyword_analyzer", "document_retriever", "summarizer"]
    else:
        state["retrieval_strategy"] = "direct"
        state["next_nodes"] = ["response_generator"]

    state["context"] = f"Type of the question: {analysis_result.strip()}"
    return state

### Document retriever node

The Document Retriever node is responsible for querying the vector database to find the most relevant document chunks based on the user's query. This critical component implements different retrieval strategies based on the query type identified by the Orchestrator:

- **Multi-step retrieval**: Uses Maximum Marginal Relevance (MMR) search to find diverse but relevant results, balancing similarity with information diversity
- **Deep analysis retrieval**: Expands the original query using the LLM before searching, retrieving more comprehensive results for complex questions
- **Direct retrieval**: Uses standard similarity search for straightforward questions that need concise answers

The node processes the retrieved documents by extracting metadata (source file and page number) and formatting the content for easy reference in subsequent processing steps. This approach ensures that responses can be properly attributed to their sources, maintaining traceability and allowing for fact verification.

By adapting the retrieval method to the query type, the system optimizes both search relevance and computational efficiency, retrieving just enough information to answer the question accurately without overwhelming later processing steps with irrelevant content.

In [10]:
def document_retriever(state: AgentState) -> AgentState:
    """
    Document retrieval function to fetch relevant documents based on the user's query.

    Args:
        state (AgentState): The current state of the agent, including the user's query and context.

    Returns:
        AgentState: The updated state of the agent with the retrieved documents.
    """

    query = state["query"]
    strategy = state["retrieval_strategy"]

    if strategy == "multi_step":
        docs = vector_store.max_marginal_relevance_search(query, k=5, fetch_k=20)
    elif strategy == "deep_analysis":
        expanded_query = llm.invoke(f"Expand the search query: {query}")
        docs = vector_store.similarity_search(expanded_query, k=7)
    else:
        docs = vector_store.similarity_search(query, k=3)

    processed_docs = []
    for doc in docs:
        metadata = doc.metadata
        source = metadata.get("source", "Unknown source")
        page = metadata.get("page", "N/A")
        processed_docs.append(
            f"[{Path(source).name} - page {page}]\n{doc.page_content}"
        )

    state["retrieved_docs"] = processed_docs
    return state

### Fact checker node

The Fact Checker node is responsible for verifying the accuracy of generated responses against the retrieved document evidence. This critical validation step ensures that the information provided to users is factually correct and properly supported by source materials.

Key functions of the Fact Checker:

- Systematic verification: Compares each statement in the response against information in the retrieved documents
- Contradiction detection: Identifies claims that conflict with source materials or lack supporting evidence
- Quantitative assessment: Provides a numerical accuracy score to measure response quality
- Correction guidance: When inaccuracies are found, highlights specific errors and suggests improvements

The node uses a structured prompt that guides the LLM to methodically evaluate response accuracy, looking for supported statements, contradictions, and information gaps. This approach creates a feedback loop that improves response quality and maintains factual integrity throughout the RAG workflow.

By implementing this verification step before delivering responses to users, the system significantly reduces the risk of hallucinations and factual errors that can undermine trust in AI-generated content.

In [11]:
def fact_checker(state: AgentState) -> AgentState:
    """
    Fact-checking function to verify the accuracy of the generated response based on retrieved documents.

    Args:
        state (AgentState): The current state of the agent, including the user's query and context.

    Returns:
        AgentState: The updated state of the agent with the verification results.
    """

    response = state["response"]
    docs = state["retrieved_docs"]

    verification_prompt = f"""
    Check the accuracy of the following answer based on the retrieved documents:

    ANSWER: {response}

    DOCUMENTS: {docs}

    Identify:
    1. Statements supported by the documents
    2. Contradictory parts of the documents
    3. Missing information

    Format:
    - Accuracy level: X%
    - Incorrect items: [list]
    - Suggestions: [list]
    """

    verification_result = llm.invoke(verification_prompt)

    if "Incorrect items:" in verification_result and len(verification_result.split("Incorrect items:")) > 1:
        errors = verification_result.split("Incorrect items:")[1].split("- Suggestions:")[0]
        state["needs_correction"] = True
        state["reflection_feedback"] = f"Errors found: {errors}"
    else:
        state["needs_correction"] = False
    return state

### Keyword analyzer node

The Keyword Analyzer node extracts key search terms and contextual information from user queries to enhance retrieval accuracy. It serves as a preprocessing step that helps narrow down and focus document searches by identifying:

- Main topics and concepts relevant to the query
- Potential search operators for more targeted retrieval
- Time-based constraints or references that might limit the search scope

This analysis provides additional context that improves downstream processing by:

1. Enriching the query with semantic dimensions beyond simple keyword matching
2. Identifying specialized terminology that might require focused attention
3. Determining temporal aspects that might affect document relevance

By breaking down complex queries into their fundamental components, the keyword analyzer enables more precise document retrieval and helps optimize the use of computational resources by focusing searches on the most relevant content areas.

The structured output from this node becomes part of the state context, guiding subsequent retrieval operations and ensuring that document searches align closely with the user's informational needs.

In [12]:
def keyword_analyzer(state: AgentState) -> AgentState:
    """
    Keyword analysis function to extract keywords and search contexts from the user's query.

    Args:
        state (AgentState): The current state of the agent, including the user's query and context.

    Returns:
        AgentState: The updated state of the agent with the extracted keywords and contexts.
    """

    query = state["query"]

    extracion_prompt = f"""
    Break down the following question into keywords and search contexts:

    QUESTION: {query}

    Format:
    - Main topics: [comma separated]
    - Search operators: [site:, filetype:, etc.]
    - Time limits: [years, dates]
    """

    extraction_result = llm.invoke(extracion_prompt)
    state["context"] = extraction_result
    return state

### Summarizer node

The Summarizer node creates concise summaries of retrieved document chunks, serving as an intermediate processing step for complex queries. This component streamlines vast amounts of information into digestible formats that:

1. **Highlight key information**: Extracts the most relevant facts and concepts from document chunks
2. **Reduce cognitive load**: Condenses lengthy text into essential points for easier comprehension
3. **Maintain source attribution**: Preserves document origins for traceability and verification

When the Orchestrator determines that a query requires deep analysis, the summarization process runs after document retrieval but before response generation. This approach helps manage information complexity by:

- Synthesizing multiple document perspectives into a coherent narrative
- Eliminating redundant information across document chunks
- Preserving crucial details while reducing overall volume

The summarizer uses a structured prompt template that instructs the LLM to create bullet-point summaries with source citations, ensuring that information remains attributable and verifiable. By functioning as an information filter, this node significantly improves response quality for complex analytical queries.

In [13]:
def summarizer(state: AgentState) -> AgentState:
    """
    Summarization function to create a concise summary of the retrieved documents.

    Args:
        state (AgentState): The current state of the agent, including the user's query and context.

    Returns:
        AgentState: The updated state of the agent with the generated summary.
    """

    docs = state["retrieved_docs"]
    
    summary_prompt = f"""
    Summarize the following document excerpts, highlighting the most important information:

    {"".join(docs)}

    Important:
    - Maximum 5 sentences
    - Use bullet points (but only as text)
    - Cite sources: [Document Name - page X]
    """

    state["context"] = llm.invoke(summary_prompt)
    return state

### Response generator node

The Response Generator node is responsible for creating comprehensive and informative answers to user queries based on the retrieved documents and analyzed context. This component transforms raw document information into coherent, user-friendly responses that directly address the original question.

Key functions of the Response Generator:

1. **Contextual understanding**: Leverages both the original query and accumulated context to generate relevant responses
2. **Structured output**: Creates organized answers with clear sections including summary, detailed analysis, and source references
3. **Source attribution**: Maintains academic integrity by citing the specific documents that informed the response
4. **Supplementary guidance**: Provides additional suggestions or related topics when appropriate

The node uses a structured prompt template that guides the LLM to create consistent, well-formatted responses that balance brevity with comprehensiveness. This approach ensures that users receive clear, actionable information while maintaining transparency about the source of that information.

By functioning as the synthesis engine of the RAG workflow, this node translates complex document retrieval and analysis into natural language responses that effectively satisfy user information needs while maintaining factual accuracy and source traceability.

In [14]:
def response_generator(state: AgentState) -> AgentState:
    """
    Response generation function to create a comprehensive answer based on the user's query and context.

    Args:
        state (AgentState): The current state of the agent, including the user's query and context.

    Returns:
        AgentState: The updated state of the agent with the generated response.
    """

    context = state["context"]
    query = state["query"]

    response_prompt = f"""
    Prepare a comprehensive answer to the following question:

    QUESTION: {query}

    CONTEXT: {context}

    Follow the structure below:
    1. Short summary (1 sentence)
    2. Detailed analysis (maximum 5 sentences)
    3. References: [Document Name - page X]
    4. Additional suggestions (if relevant)
    """

    state["response"] = llm.invoke(response_prompt)
    return state

### Reflection node

The Reflection node serves as a quality control mechanism that critically evaluates generated responses before they reach the user. This node performs a thorough analysis of the response against the retrieved documents to ensure factual accuracy and consistency.

Key functions of the Reflection node:

- **Deep verification**: Analyzes the response for factual accuracy by cross-referencing with retrieved documents
- **Error identification**: Detects inaccurate statements, contradictions, missing citations, and inconsistencies
- **Feedback loop**: Provides detailed analysis when issues are found to guide correction
- **Quality assurance**: Acts as a final checkpoint before delivering information to the user

The node uses a structured prompt that guides the LLM to systematically evaluate the response across multiple dimensions of accuracy. When errors are detected, the workflow routes the response back to the Response Generator for revision based on the reflection feedback.

This self-correcting mechanism significantly improves response quality by catching potential errors that might have escaped earlier verification steps. By implementing this multi-layered verification approach, the system can maintain high standards of accuracy while reducing the likelihood of delivering misleading information to users.

In [15]:
def reflection(state: AgentState) -> AgentState:
    """
    Reflection function to analyze the generated response and check for inconsistencies or errors.

    Args:
        state (AgentState): The current state of the agent, including the user's query and context.

    Returns:
        AgentState: The updated state of the agent with the reflection results.
    """

    response = state["response"]
    retrieved_docs = state["retrieved_docs"]

    reflection_prompt = f"""
    Analyze the following answer using the retrieved documents:

    ANSWER: {response}

    DOCUMENTS: {retrieved_docs}

    Identify:
    - Inaccurate statements
    - Contradictory statements
    - Missing source references
    - Inconsistencies with documents

    If the answer is correct, respond with "allcorrect". If not, provide a detailed analysis of the issues found.
    """

    reflection_result = llm.invoke(reflection_prompt)
    if "allcorrect" not in reflection_result:
        state["needs_correction"] = True
        state["reflection_feedback"] = reflection_result
    else:
        state["needs_correction"] = False
    return state

# 6. Workflow building

This section details the construction of a state-based workflow graph for our Retrieval-Augmented Generation (RAG) system. The workflow orchestrates how information flows between the specialized nodes to create an intelligent document retrieval and question answering system.

## Workflow architecture

The workflow follows a directed graph structure where:

- Each node represents a specialized processing component (orchestrator, retriever, generator, etc.)
- Edges determine the flow of information between components
- Conditional logic enables dynamic routing based on processing outcomes
- State management preserves context across the entire processing pipeline

## Key workflow components

1. **Core processing path**: Orchestrates the flow from query analysis through document retrieval to response generation
2. **Adaptive routing**: Implements conditional paths based on query complexity and verification results
3. **Self-correction mechanism**: Creates feedback loops when responses need improvement
4. **State preservation**: Maintains context, retrieved documents, and processing history throughout execution

## Benefits of graph-based workflow

- **Modularity**: Each node can be developed, tested, and optimized independently
- **Flexibility**: Easy to modify the workflow by adding, removing, or rerouting nodes
- **Visibility**: Clear visualization of information flow and decision points
- **Maintainability**: Well-defined interfaces between components simplify updates

The graph design pattern enables complex decision-making while maintaining clean separation of concerns between different stages of the RAG process, resulting in more robust and adaptable document-based question answering.

### Init the graph

In [16]:
workflow = StateGraph(AgentState)

### Adding the nodes

In [17]:
workflow.add_node("orchestrator", orchestrator)
workflow.add_node("keyword_analyzer", keyword_analyzer)
workflow.add_node("document_retriever", document_retriever)
workflow.add_node("summarizer", summarizer)
workflow.add_node("response_generator", response_generator)
workflow.add_node("fact_checker", fact_checker)
workflow.add_node("reflection", reflection)

<langgraph.graph.state.StateGraph at 0x27e5f999400>

### Adding the edges

In [18]:
workflow.add_edge("orchestrator", "keyword_analyzer")
workflow.add_edge("keyword_analyzer", "document_retriever")
workflow.add_edge("document_retriever", "summarizer")
workflow.add_edge("summarizer", "response_generator")
workflow.add_edge("response_generator", "fact_checker")
workflow.add_edge("fact_checker", "reflection")

<langgraph.graph.state.StateGraph at 0x27e5f999400>

### Adding conditional transitions

This section implements conditional routing logic in our RAG workflow graph, enabling dynamic decision paths based on processing outcomes. The conditional transitions create intelligent branching that allows the system to:

1. **Self-correct responses**: When fact-checking or reflection detects inaccuracies, the workflow routes back to the response generator for revision
2. **Optimize processing efficiency**: Only execute necessary nodes based on query complexity and intermediate results
3. **Create feedback loops**: Enable iterative improvement through verification-correction cycles

The branching logic particularly focuses on the reflection step, where the system determines whether to:
- Return to the response generator when corrections are needed
- Conclude the workflow when the response meets quality standards

This conditional architecture creates a self-improving system that can identify and address potential inaccuracies before delivering final responses to users, significantly enhancing the reliability of AI-generated content.

In [19]:
def route_after_reflection(state: AgentState) -> str:
    """
    Routing function to determine the next step after reflection based on the state.

    Args:
        state (AgentState): The current state of the agent, including the user's query and context.

    Returns:
        str: The next node to process based on the reflection results.
    """

    return "response_generator" if state["needs_correction"] else END

workflow.add_conditional_edges(
    "reflection",
    route_after_reflection,
    {
        "response_generator": "response_generator",
        END: END,
    },
)

<langgraph.graph.state.StateGraph at 0x27e5f999400>

### Defining entry and endpoints

In [20]:
workflow.set_entry_point("orchestrator")
workflow.add_edge("reflection", END)

<langgraph.graph.state.StateGraph at 0x27e5f999400>

# 7. Compilation

In [21]:
agentic_rag = workflow.compile()

# 8. Starting the workflow

In [22]:
def chat_with_agent():
    while True:
        query = input("Ask something (to leave type 'exit'): ")
        if query.lower().strip() == "exit":
            break

        initial_state = {
            "messages": [HumanMessage(content=query)],
            "query": query,
            "retrieved_docs": [],
            "context": None,
            "done": False,
        }

        final_state = agentic_rag.invoke(initial_state)

        print("\nResponse:")
        print(final_state["response"])
        print("\nUsed documents:")
        for doc in final_state["retrieved_docs"]:
            print(f"- {doc[:100]}...")

In [23]:
chat_with_agent()


Response:
Here is a comprehensive answer to the question "What cats do around humans":

**Summary:** Domestic cats form social groups and have forms of communication that help them live harmoniously, even in close proximity to humans.

**Detailed Analysis:** While it's common to think of cats as solitary animals, research has shown that domestic cats are actually social creatures. They have a natural inclination to form colonies or groups when there is sufficient food resources available, which allows for social interaction and communication. This social behavior is not limited to free-living domestic cats; even those living with humans can exhibit forms of social organization and communication. In fact, many domestic cats choose to live in close proximity to their human caregivers, often seeking out human interaction and affection.

**References:** [FelineBehaviorGLS.pdf - page 8]

**Additional Suggestions:** For cat owners or those interested in feline behavior, observing and unders

# -- How to test and measure the performance of the system --

## Performance metrics

The evaluation of the RAG system should be based on both quantitative metrics and qualitative assessments:

### Quantitative metrics
- **Response time**: Measure the time taken from query submission to final response
- **Retrieval precision**: Calculate the relevance of retrieved documents to the query
- **Retrieval recall**: Assess how many relevant documents were retrieved versus missed
- **Answer accuracy**: Compare generated responses against ground truth answers
- **Correction rate**: Track how often the system needed to self-correct responses

### Qualitative assessments
- **Response coherence**: Evaluate the logical flow and organization of responses
- **Source attribution**: Check if responses properly cite relevant document sources
- **Factual consistency**: Verify that responses align with information in the documents
- **Query understanding**: Assess how well the system interprets different question types

## Testing methodology

### Benchmark testing
1. Create a test set of representative queries with known ground truth answers
2. Execute each query through the workflow and record performance metrics
3. Compare results across different query types (factual, analytical, simple)
4. Identify patterns in performance variations based on query complexity

### Component-level testing
- Test each node individually with controlled inputs to validate its specific functionality
- Measure processing time for each component to identify bottlenecks
- Evaluate embedding model performance using standard NLP metrics

### End-to-end evaluation
- Conduct user acceptance testing with real-world queries
- Gather feedback on response quality, relevance, and usefulness
- Use A/B testing to compare different workflow configurations

## Continuous improvement
- Implement logging to track performance metrics over time
- Establish baseline performance and set improvement targets
- Regularly review and refine the workflow based on performance data
- Consider creating a feedback mechanism for users to rate response quality

# -- Bottlenecks of the current implementation --

## Performance limitations
- **Memory consumption**: Loading multiple document embeddings simultaneously creates significant RAM requirements
- **Processing latency**: Complex queries with deep analysis can take several seconds to complete
- **Local model constraints**: Ollama's local deployment trades convenience for reduced inference capabilities compared to cloud models

## Architectural constraints
- **Sequential processing**: The workflow executes nodes sequentially rather than leveraging parallel processing
- **Limited error handling**: Error management primarily focuses on content accuracy rather than technical failures
- **Fixed retrieval strategies**: Predefined retrieval methods may not adapt well to highly specialized queries

## Scaling challenges
- **Document volume limitations**: Performance degrades with very large document collections
- **Cold-start efficiency**: Initial setup requires full processing of all documents before first query

## Future optimization opportunities
- Add caching mechanisms for frequently accessed documents and embeddings
- Introduce parallel processing for independent workflow nodes
- Develop more sophisticated retrieval strategies with hybrid search capabilities