# Manual Agentic AI – Pure Python Baseline

This notebook demonstrates a basic example of agentic behavior first using pure Python, without relying on high-level orchestration frameworks like LangChain or CrewAI.

We define an agent-like structure by combining:
- **Perception:** Load and embed document content into vector representations.
- **Planning/Reasoning:** Retrieve the most relevant paragraph for a question using semantic similarity.
- **Action:** Use a question-answering model to extract an answer from the selected context.

This hands-on example shows how we can *manually construct agentic behavior* using existing NLP tools such as `sentence-transformers` and `transformers`.

In later parts of the notebook, we will revisit this same logic using LangChain and CrewAI to contrast manual and framework-based approaches to agentic AI.


In [1]:
# Agentic LLMs Demo with Local Ollama
# Goal: Student uploads a climate document. Agents can: summarize, answer question, use tools (web search if needed).
# Implemented progressively with: Pure Python, LangChain, LlamaIndex, CrewAI, AutoGen

# ------------------- Pure Python Agentic AI -------------------

# Import required libraries for sentence embedding and question answering
from sentence_transformers import SentenceTransformer, util
from transformers import pipeline

# Load the sentence transformer model for converting text to embeddings.
# This will allow us to compare text paragraphs and questions semantically.
embedder = SentenceTransformer("all-MiniLM-L6-v2")

# Load a question-answering pipeline using a pretrained transformer model.
# This model will extract answers from the relevant paragraph we provide.
qa_pipeline = pipeline("question-answering", model="distilbert-base-cased-distilled-squad")

# ------------------- Perception Step: Ingest and Encode Knowledge -------------------

# Load the knowledge base from a local text file.
# Assume each paragraph is separated by two line breaks ("\n\n").
with open("climate.txt") as f:
    paragraphs = f.read().split("\n\n")

# Convert each paragraph into a vector representation (embedding).
# These embeddings will later be compared against the question to find the best match.
embeddings = embedder.encode(paragraphs, convert_to_tensor=True)

# ------------------- Planning Step: Interpret Question and Select Context -------------------

# Define a user question – this is the task the agent will try to solve.
question = "What causes the most CO2 emissions?"

# Convert the question into the same embedding space as the document.
q_embedding = embedder.encode(question, convert_to_tensor=True)

# Compute cosine similarity between the question and each paragraph.
# Find the paragraph most semantically similar to the question.
idx = util.pytorch_cos_sim(q_embedding, embeddings)[0].argmax()

# Retrieve the best-matching paragraph as the context for answering the question.
context = paragraphs[idx]

# ------------------- Action Step: Answer the Question -------------------

# Pass the question and selected context to the QA model.
# The model returns a short answer extracted from the context.
result = qa_pipeline(question=question, context=context)

# Display the answer to the user.
print("[Pure Python]", result["answer"])



Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


[Pure Python] burning of fossil fuels


# Agent with Decision-Making Logic (Multi-Source QA)

In this example, we extend the previous agent to include a simple decision-making mechanism.

The agent tries to answer the question using a local document (`climate.txt`). If the most relevant paragraph from that document is not semantically close enough to the question, the agent assumes that the document does not contain the needed information.

It then **falls back to a secondary source** — here, Wikipedia — and attempts to retrieve a summary related to the question and answer based on that.

This illustrates:
- Basic **planning** (decide between multiple tools/sources),
- Use of **semantic similarity threshold** to evaluate confidence,
- Use of external resources for dynamic behavior.


In [2]:
# ------------------- Agentic QA with Fallback -------------------

from sentence_transformers import SentenceTransformer, util
from transformers import pipeline
import wikipedia

# Load models
embedder = SentenceTransformer("all-MiniLM-L6-v2")
qa_pipeline = pipeline("question-answering", model="distilbert-base-cased-distilled-squad")

# Load document and compute embeddings - try climate.txt and Noclimate.txt
with open("climate.txt") as f:
    paragraphs = f.read().split("\n\n")
embeddings = embedder.encode(paragraphs, convert_to_tensor=True)

# Define the question
question = "What causes the most CO2 emissions?"
q_embedding = embedder.encode(question, convert_to_tensor=True)

# Compute similarity between the question and all paragraphs
cosine_similarities = util.pytorch_cos_sim(q_embedding, embeddings)[0]
best_idx = cosine_similarities.argmax()
best_score = cosine_similarities[best_idx].item()
best_context = paragraphs[best_idx]

# Define a confidence threshold (tunable)
THRESHOLD = 0.5

if best_score > THRESHOLD:
    print(f"Using local context (score={best_score:.2f})")
    context = best_context
else:
    print(f"Local context not relevant enough (score={best_score:.2f}), using Wikipedia instead.")
    
    # Fallback: Use Wikipedia summary as external context
    wiki = Wikipedia(
    language="en",
    user_agent="AgenticAITutorial/1.0 (yourname@example.com)"
    )
    wikipedia.set_rate_limiting(True)

    try:
        # Search Wikipedia for the most relevant page titles
        search_results = wikipedia.search(question)
    
        if search_results:
            # Use the top result
            page_title = search_results[0]
            context = wikipedia.summary(page_title, sentences=5)
            print(f"Using Wikipedia page: {page_title}")
        else:
            context = "No relevant Wikipedia search result found."
            print(context)

    except Exception as e:
        context = f"Error retrieving from Wikipedia: {str(e)}"
        print(context)

# Run question-answering on selected context
result = qa_pipeline(question=question, context=context)
print("[Agent Answer]", result["answer"])


Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


Using local context (score=0.80)
[Agent Answer] burning of fossil fuels


# Agentic QA with LangChain and Ollama

In this version, we implement the same logic as the manual agentic example — retrieve context, answer a question — but using **LangChain**, a framework that abstracts and automates agentic patterns.

LangChain allows us to:
- Load and split documents into chunks,
- Convert these chunks into vector embeddings and store them in a vector database,
- Retrieve the most relevant chunk using semantic search (e.g. via Max Marginal Relevance),
- Send the selected context and question to a language model for answering.

We use the following LangChain modules:
- `TextLoader` to load the document,
- `RecursiveCharacterTextSplitter` to chunk it,
- `OllamaEmbeddings` to generate vector embeddings (via a local embedding model),
- `Chroma` as the vector database,
- `RetrievalQA` to wrap everything into an end-to-end QA system.

This implementation demonstrates how **agentic workflows** (perception → retrieval → reasoning → output) can be modularized and scaled using LangChain components.


In [3]:
# --------------- LANGCHAIN + OLLAMA ---------------
from langchain_community.llms import Ollama
from langchain.embeddings import OllamaEmbeddings
from langchain.vectorstores import Chroma
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import TextLoader
from langchain.chains import RetrievalQA

loader = TextLoader("climate.txt")
raw_docs = loader.load()
splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
docs = splitter.split_documents(raw_docs)

embeddings = OllamaEmbeddings(model="mxbai-embed-large")
vs = Chroma.from_documents(docs, embeddings, persist_directory="lc_chroma")
retriever = vs.as_retriever(search_type="mmr")

llm = Ollama(model="llama3")
qa = RetrievalQA.from_chain_type(llm=llm, retriever=retriever)
print("[LangChain]", qa.run("What causes the most CO2 emissions?"))
# --------------- LANGCHAIN + OLLAMA -------------------

# Load the required LangChain components
from langchain_community.llms import Ollama                    # Local LLM interface (e.g., llama3)
from langchain.embeddings import OllamaEmbeddings              # Embedding model via Ollama (e.g., mxbai-embed-large)
from langchain.vectorstores import Chroma                      # Vector store to store and search chunk embeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter  # Splits long text into manageable chunks
from langchain.document_loaders import TextLoader              # Utility to load text files as documents
from langchain.chains import RetrievalQA                       # Retrieval-augmented QA pipeline

# ------------------- Perception: Load and Prepare Knowledge -------------------

# Load the raw document from file
loader = TextLoader("climate.txt")
raw_docs = loader.load()  # List of Document objects

# Split the document into smaller overlapping chunks for better semantic matching
splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
docs = splitter.split_documents(raw_docs)

# ------------------- Representation: Embed and Store Chunks -------------------

# Load a local embedding model via Ollama (can be changed to any compatible embedding model)
embeddings = OllamaEmbeddings(model="mxbai-embed-large")

# Create or load a Chroma vector store from the split chunks
vs = Chroma.from_documents(docs, embeddings, persist_directory="lc_chroma")

# Create a retriever using Max Marginal Relevance (MMR) to reduce redundancy
retriever = vs.as_retriever(search_type="mmr", search_kwargs={"k": 4})

# ------------------- Reasoning: Setup the LLM -------------------

# Load the local language model (e.g., LLaMA 3 via Ollama)
llm = Ollama(model="llama3")

# Wrap everything in a RetrievalQA chain: retrieve relevant chunk, pass to LLM, and get answer
qa = RetrievalQA.from_chain_type(llm=llm, retriever=retriever)

# ------------------- Action: Ask a Question -------------------

# Run the full agentic pipeline
print("[LangChain]", qa.invoke("What causes the most CO2 emissions?"))




  embeddings = OllamaEmbeddings(model="mxbai-embed-large")
  llm = Ollama(model="llama3")
  print("[LangChain]", qa.run("What causes the most CO2 emissions?"))


[LangChain] The largest contributor to CO2 emissions is the burning of fossil fuels for energy and transportation. This includes coal-fired power plants, gasoline-powered vehicles, and industrial processes like cement production.
[LangChain] {'query': 'What causes the most CO2 emissions?', 'result': 'The largest contributor to CO2 emissions is the burning of fossil fuels for energy and transportation. This includes coal-fired power plants, gasoline-powered vehicles, and industrial processes like cement production.'}


This LangChain RAG setup behaves like an agent in a sandbox: it perceives its environment (a fixed document), chooses the most relevant information (via retrieval), i.e. it makes decisions, and takes a single goal-driven action (answering the question). It’s not a fully autonomous agent, because it does not select between multiple possible tools, it does not need to keep memory of past steps, does not need multi-step reasoning — but it captures the core structure of one. 


In [4]:
# --------------- LLAMAINDEX + OLLAMA ---------------

# Import the vector index and document loader
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

# Import Ollama-compatible embedding and LLM modules
from llama_index.embeddings.ollama import OllamaEmbedding
from llama_index.llms.ollama import Ollama as LlamaIndexOllama

# ------------------- Perception: Load and Understand Data -------------------

# Instantiate an Ollama-powered LLM (e.g., llama3)
llm = LlamaIndexOllama(model="llama3")

# Create an embedding model for representing text chunks semantically
embed = OllamaEmbedding(model_name="mxbai-embed-large")

# Load the raw document(s) using LlamaIndex's simple file reader
docs = SimpleDirectoryReader(input_files=["climate.txt"]).load_data()

# ------------------- Representation: Build Index -------------------

# Create a vector index from the loaded documents using the embedding model
index = VectorStoreIndex.from_documents(docs, embed_model=embed)

# ------------------- Reasoning & Action: Query Engine -------------------

# Create a query engine that:
# - Uses the LLM to reason over chunks
# - Automatically retrieves the most relevant content from the index
qe = index.as_query_engine(llm=llm)

# Ask a question and get the response from the LLM
response = qe.query("What causes the most CO2 emissions?")

# Output the result
print("[LlamaIndex]", response)




[LlamaIndex] The burning of fossil fuels for energy and transportation.


### Why This Is Agentic

Even though the code looks shorter than LangChain’s version, the behavior is still *agentic* because:

- **Autonomous Retrieval**: The system decides *what context to retrieve* from the index in response to a user query.
- **LLM Reasoning**: The model interprets the question, synthesizes the answer, and doesn’t follow a rigid path.
- **Separation of Concern**: The document is separated from the logic. The system autonomously chooses relevant chunks at runtime.
- **Tool Use (Implied)**: The LLM is *using* the retrieval tool (index) to guide its generation — a core sign of agentic reasoning.

---

### Comparison with LangChain (in short)

| Aspect              | LangChain                                                                 | LlamaIndex                                                   |
|---------------------|---------------------------------------------------------------------------|---------------------------------------------------------------|
| **Design Style**    | **Modular**: You explicitly wire LLM, embeddings, and tools               | **Monolithic**: Index handles loading, embedding, retrieval   |
| **Control Granularity** | **Fine-grained** control over each step (useful for chaining tools)    | **High-level abstraction**, more compact but less modular     |
| **Agentic Flow**    | **Explicit**: Each step can be reasoned about individually                | **Implicit**: Agentic behavior is packaged in `as_query_engine` |
| **Ease of Use**     | Requires more setup but more flexible                                     | Easier to get started, excellent for document QA              |


In [5]:
# --------------- CREWAI + OpenAI  ---------------
# CrewAI does not support Ollama or local modesl by default, you can use LangChain if you need a local model, but with native CrewAI, use a cloud solution like OpenAI

import os
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
from crewai import Agent, Task, Crew

# Step 1: Load API key
load_dotenv("keys.env")
api_key = os.getenv("OPENAI_API_KEY")
print("[INFO] Using OpenAI API key starts with:", api_key[:8])

# Step 2: Define the LLM with stricter parameters
llm = ChatOpenAI(
    openai_api_key=api_key, 
    model="gpt-3.5-turbo", 
    temperature=0.1,  # Lower temperature for more focused responses
    max_tokens=500    # Limit response length
)

# Step 3: Define Agents with more specific configurations
summarizer = Agent(
    role="Document Summarizer",
    goal="Read the provided climate document and create exactly one concise summary.",
    backstory="You are an expert at reading documents and creating brief summaries. You always complete your task in one attempt.",
    llm=llm,
    verbose=False,  # Reduce verbosity to prevent loops
    allow_delegation=False,
    max_iter=1,  # Force completion in one iteration
    memory=False  # Disable memory to prevent confusion
)

responder = Agent(
    role="Climate QA Analyst", 
    goal="Analyze the summary and extract key climate issues and solutions.",
    backstory="You are a climate policy expert who identifies key issues and solutions from summaries.",
    llm=llm,
    verbose=False,
    allow_delegation=False,
    max_iter=1,
    memory=False
)

# Step 4: Load document
with open("climate.txt", "r") as f:
    doc = f.read()

print(f"[INFO] Document length: {len(doc)} characters")

# Step 5: Create Tasks with very specific, clear instructions
t1 = Task(
    description=f"""
DOCUMENT TO SUMMARIZE:
{doc}

TASK: Write a summary of this document in 100-150 words. Focus on the main climate topics discussed.
Do not repeat this instruction. Just provide the summary and nothing else.
""",
    expected_output="A 100-150 word summary of the climate document",
    agent=summarizer
)

t2 = Task(
    description="""
Based on the summary from the previous task, identify:
1. The 3 most important climate issues mentioned
2. Any solutions or recommendations discussed

Format your response as:
ISSUES:
- Issue 1
- Issue 2  
- Issue 3

SOLUTIONS:
- Solution 1
- Solution 2
- etc.
""",
    expected_output="A formatted list of climate issues and solutions",
    agent=responder,
    context=[t1]
)

# Step 6: Create Crew with minimal configuration
crew = Crew(
    agents=[summarizer, responder],
    tasks=[t1, t2],
    verbose=False,  # Turn off verbose to reduce output
    process="sequential",
    max_rpm=10,  # Limit requests per minute
    memory=False  # Disable crew memory
)

try:
    print("[INFO] Starting crew execution...")
    result = crew.kickoff()
    print("\n[FINAL OUTPUT]")
    print(result)
except Exception as e:
    print(f"[ERROR] An error occurred: {e}")

[INFO] Using OpenAI API key starts with: sk-proj-
[INFO] Document length: 1292 characters
[INFO] Starting crew execution...

[FINAL OUTPUT]
ISSUES:
- Greenhouse gas emissions from human activities, particularly from fossil fuel burning and deforestation
- Global warming leading to rising sea levels and extreme weather events
- Ecosystem disruptions due to climate change impacts

SOLUTIONS:
- Transitioning to renewable energy sources to reduce reliance on fossil fuels
- Implementing policies like carbon pricing to incentivize emission reductions
- Enhancing global cooperation and commitment to climate action, as demonstrated in the Paris Agreement
- Investing in adaptation measures such as building defenses and developing resilient crops to address the impacts of climate change


### Why This Is Agentic (CrewAI)

This code demonstrates an **agentic system** using the **CrewAI framework**, where **multiple specialized agents** collaborate to solve a problem using an LLM. Here's why this qualifies as *agentic*:

- **Role-Based Agents**: Each agent is defined with a clear *goal*, *backstory*, and *task-specific instructions* — emulating autonomous reasoning entities.
- **Tool Use (LLM as Cognitive Engine)**: Each agent uses the LLM (OpenAI GPT-3.5-Turbo) to analyze, summarize, and reason through their part of the problem.
- **Task-Driven Execution**: Agents are assigned distinct tasks with defined outputs, and the system coordinates the handoff (e.g., Task 2 builds on Task 1).
- **No Fixed Script**: Although the process is `sequential`, the content and structure of the response are *dynamically generated* by the agents at runtime.
- **Independent Execution Flow**: Agents operate independently, without loops or memory, using their configured reasoning capability.

---

### Why OpenAI Instead of Ollama?

CrewAI **does not natively support local LLMs like Ollama**. Instead, it relies on:

- The `langchain_openai.ChatOpenAI` wrapper to integrate OpenAI models (e.g., `gpt-3.5-turbo`, `gpt-4`)
- A cloud-based inference flow (API key required)

If you want to use **local models like LLaMA via Ollama**, you should consider using **LangChain**, which:
- Supports `Ollama` as a local LLM backend
- Allows more modular and customizable agent-tool combinations

---

### Agentic Flow Summary (CrewAI)

| Component        | Description                                                                 |
|------------------|-----------------------------------------------------------------------------|
| **LLM**          | `ChatOpenAI` provides reasoning capabilities to all agents                  |
| **Agents**       | Defined roles (Summarizer, QA Analyst) handle specific tasks independently  |
| **Tasks**        | Contain scoped instructions and expected output, forming the task plan      |
| **Crew**         | Orchestrates execution order and coordination between agents                |
| **Mode**         | `sequential` process emulates multi-agent workflow                         |

This setup offers a powerful example of **modular agent design** using declarative instructions — well suited for teaching autonomous behavior without needing to build complex control logic.

