## Environment Setup

This notebook demonstrates LangChain concepts using **local models via Ollama**. Before continuing, make sure the required models and Python packages are installed.

---

### 1. Install and Configure Ollama

If you haven't already, install [Ollama](https://ollama.com) and ensure it's running locally.

Then pull the required models:

```bash
ollama pull gemma3
ollama pull nomic-embed-text
```

> These models will be used for generation (`gemma3`) and embeddings (`nomic-embed-text`).

---

### 2. Install Required Python Packages

In [17]:
'''
!pip install --upgrade pip --quiet
!pip install -U \
  langchain langchain-ollama langchain-community langchain-core \
  langchain-experimental chromadb pandas pymupdf ipython \
  duckduckgo-search --quiet
'''

'\n!pip install --upgrade pip --quiet\n!pip install -U   langchain langchain-ollama langchain-community langchain-core   langchain-experimental chromadb pandas pymupdf ipython   duckduckgo-search --quiet\n'

### 3. Start Ollama (if not already running)

Make sure Ollama is running in the background. You can check by running:

```bash
ollama list
```

This should list both `gemma3:latest` and `nomic-embed-text:latest` models.

## Local vs. Cloud Models

### Ollama (Local Execution)

Ollama allows you to run open-source models (e.g., `gemma`, `llama`, `nomic-embed-text`) locally. This provides control, privacy, and fast iteration without API latency or cost.

### Amazon Bedrock (Cloud Execution)

[Amazon Bedrock](https://aws.amazon.com/bedrock/) provides access to multiple foundation models through a single API. It is fully managed, scalable, and integrates into AWS‚Äôs security and governance ecosystem.

## Basic Concepts of Generative AI

This section introduces core concepts for working with Large Language Models (LLMs), including how to prompt them effectively and build applications using frameworks like **LangChain**‚Äîwhether deploying models locally via **Ollama** or through cloud services like **AWS Bedrock**.

---

### What Are LLMs?

**Large Language Models (LLMs)** are machine learning models trained on massive text corpora. They predict the next token in a sequence based on context, making them powerful pattern-recognition tools for generating human-like text. However, they do not understand meaning or possess reasoning capabilities‚Äîtheir output is probabilistic, not factual by default.

---

### Prompting and Prompt Engineering

**Prompts** are structured inputs that tell an LLM what to do. A well-crafted prompt can significantly influence the quality, accuracy, and usefulness of the model's response.

Prompts generally consist of three key components:

---

#### Instruction

The **instruction** specifies the task or behavior expected from the model. It can be a simple question, a directive, or even a persona setup.

**Examples:**
- ‚ÄúGenerate a list of known malicious domains associated with the Qakbot malware family.‚Äù
- ‚ÄúWrite a poem about the Qakbot malware family.‚Äù
- ‚ÄúProduce a sonnet about my love of Franklin‚Äôs BBQ.‚Äù

---

#### Context

**Context** includes relevant information provided to the model to help it complete the instruction. This is particularly important for specialized or domain-specific use cases.

**Examples of context:**
- Extracted text from a PDF or website
- Lists of known facts or structured datasets
- Internal business knowledge

Context is critical when the model lacks the necessary background knowledge. It also enables **in-context learning** and is foundational to **RAG (Retrieval-Augmented Generation)** systems.

**Reference:**  
- [AWS Blog: Context Window Overflow ‚Äì Breaking the Barrier](https://aws.amazon.com/blogs/security/context-window-overflow-breaking-the-barrier/)

---

#### Output Format

Defining the **output format** ensures the model returns a response in a structure that's compatible with downstream processing or presentation.

**Examples:**
- JSON/JSONL for programmatic consumption
- Markdown for documentation or emails
- Lists, CSV, or tabular formats for parsing

Providing formatting examples in your prompt often leads to more consistent and usable output.

---

#### Best Practices for Prompt Engineering

- Be explicit and unambiguous in your instructions.
- Focus on what you want, not just what to avoid.
- Supply sufficient context for the model to reason accurately.
- Specify format, tone, language, or response length as needed.
- Iterate and refine‚Äîprompt design is an interactive process.
- Be cost-conscious‚Äîlonger prompts and responses mean higher token usage.
- Leverage in-context examples to teach the model new behavior.

**Prompting strategies:**
- **Zero-shot**: No examples provided.
- **One-shot**: One example included.
- **Few-shot**: Multiple examples included.

---

### System Prompts

Some platforms (e.g., OpenAI, Anthropic) support **system prompts**‚Äîinstructions that shape the model‚Äôs persona, tone, or behavioral constraints.

**Example system message:**  
> ‚ÄúYou are a cybersecurity analyst generating threat intelligence summaries.‚Äù

System prompts are not typically hidden; providers often document their usage:

- [Anthropic: System Prompt Guidelines](https://docs.anthropic.com/en/release-notes/system-prompts)

---

### Fine-Tuning vs. In-Context Learning

- **Fine-tuning** modifies the model‚Äôs internal weights using domain-specific data. It‚Äôs powerful but resource-intensive and harder to maintain.
- **In-context learning** gives examples inline in the prompt. It‚Äôs flexible, efficient, and sufficient for many real-world use cases.

---

**Learn more:** [Prompt Engineering Guide](https://www.promptingguide.ai/)

## Core LangChain Concepts

[LangChain](https://python.langchain.com/) is a modular framework for building generative AI applications powered by Large Language Models (LLMs). It simplifies the development of multi-step pipelines, retrieval-augmented workflows, and agent-based reasoning systems.

---

### LangChain Capabilities

LangChain provides key building blocks for LLM applications:

- **Prompt templates** for reusability and structure
- **Memory** for multi-turn interactions
- **Tools** to access external data sources
- **Chains** to compose tasks in sequence or branching logic
- **Agents** for dynamic tool use and reasoning
- **Document loaders, chunkers, and retrievers** for knowledge ingestion

---

### Prompt Templates

**Prompt templates** allow you to define reusable prompts with variables. This is essential for consistent prompt formatting in production use cases.

```python
from langchain.prompts import PromptTemplate

template = "Generate a list of domains associated with {malware_family}"
prompt = PromptTemplate.from_template(template)
prompt.format(malware_family="Qakbot")
```

---

### Memory

LLMs are inherently **stateless**, meaning they do not retain context between interactions. LangChain provides memory modules such as `ConversationBufferMemory` that persist prior messages and automatically re-inject them into prompts.

```python
from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory()
```

When used in a chain:

```python
chain = LLMChain(llm=llm, prompt=prompt_template, memory=memory)
```

Memory is critical for applications like chatbots, copilots, or multi-turn reasoning agents.

---

### Tools

**Tools** allow LLMs to interact with external systems‚ÄîAPIs, web search engines, internal databases, or third-party services. LangChain supports tool integration via descriptors that let agents reason about what tool to use and when.

**Examples of tools:**
- Google/Bing search
- DomainTools, VirusTotal, Wikipedia
- Custom internal APIs (e.g., threat intelligence, ticketing systems)

More: [LangChain Tools Documentation](https://python.langchain.com/docs/modules/tools/)

---

### Chains

**Chains** are workflows composed of modular steps. Each chain links one or more components (e.g., prompt ‚Üí LLM ‚Üí post-processor).

Common chain types include:

- `LLMChain`: single prompt-response (Note: LLMChain is deprecated in favor of LCEL - LangChain Expression Language)
- `SimpleSequentialChain`: sequential chaining of steps
- `MultiInputChain`: accepts multiple inputs
- `RouterChain`: dynamically selects downstream chains based on input

Example with memory:

```python
chain = LLMChain(llm=llm, prompt=prompt_template, memory=ConversationBufferMemory())
```

Chains provide structure and allow you to build reliable pipelines with conditional logic and state.

---

### Agents

**Agents** enable dynamic, reasoning-driven workflows. Unlike static chains, agents interpret user input, determine which tools or sub-chains to invoke, gather additional data, and synthesize final responses.

They excel at handling ambiguous prompts, complex logic, or open-ended tasks.

More: [LangChain Agents Documentation](https://python.langchain.com/docs/modules/agents/)

## RAG and Embeddings

### Vector Stores

Vector databases (e.g., Chroma, FAISS, Pinecone) store **text embeddings**, which allow similarity searches against known documents or facts.

### RAG (Retrieval-Augmented Generation)

**RAG** improves response quality by retrieving relevant content from an external knowledge source before prompting the model. This enables high-accuracy answers even from models that weren‚Äôt explicitly trained on the content.

### Chunking

Long documents are split into smaller **chunks** to improve retrieval accuracy and avoid exceeding token limits. LangChain includes text splitters like `RecursiveCharacterTextSplitter` that optimize chunk size and overlap.

### Embeddings

Embeddings are numeric vectors that represent semantic meaning. They are used for similarity search in vector stores.

- **Local**: e.g., `nomic-embed-text` via Ollama
- **Cloud**: e.g., Titan via AWS Bedrock, HuggingFace SentenceTransformers

---

### LLM Limitations

- **Hallucination**: Models may fabricate plausible but incorrect information.
- **Context limitations**: Only a fixed number of tokens can be processed per prompt.
- **Data leakage risk**: LLMs trained on sensitive data may inadvertently reveal it.
- **Stateless**: Models forget previous interactions unless provided memory mechanisms.

---

### Understanding Tokenization

LLMs process input/output as **tokens**, not words. Token count affects:

- Cost (in usage-based billing models)
- Whether your full prompt fits in the model‚Äôs context window
- Output truncation risk

Useful tools:
- [Claude Tokenizer](https://claude-tokenizer.vercel.app/)
- [TikToken Tokenizer](https://tiktikenizer.vercel.app/)

## Calling Ollama via LangChain - Zero Shot
https://python.langchain.com/api_reference/ollama/chat_models/langchain_ollama.chat_models.ChatOllama.html

In [18]:
from IPython.display import Markdown
from langchain_ollama import ChatOllama 

# Note: Ensure Ollama is running locally and the gemma3 model is installed
# Run: ollama pull gemma3
# If Ollama is not running, you'll get a connection error

llm = ChatOllama(
    model="gemma3",
    temperature=0.8,
    # base_url="http://remote-ip:port", # Ollama default port: 11434
    # timeout=5,
    num_predict=256,  # Use num_predict instead of max_tokens for Ollama
    # other params ...
)

messages = [
    ("system", "You are a helpful assistant."),
    ("human", "why is the sky blue?"),
]

Markdown(llm.invoke(messages).content)

Okay, let's talk about why the sky is blue! It‚Äôs a really fascinating phenomenon, and the short answer is due to something called **Rayleigh scattering**. Here‚Äôs a breakdown:

**1. Sunlight and Colors:**

* Sunlight appears white, but it‚Äôs actually made up of all the colors of the rainbow ‚Äì red, orange, yellow, green, blue, indigo, and violet. Think of a prism splitting light.

**2. Entering the Atmosphere:**

* When sunlight enters the Earth‚Äôs atmosphere, it bumps into tiny air molecules (mostly nitrogen and oxygen).

**3. Rayleigh Scattering ‚Äì The Key!**

* **Rayleigh scattering** is the scattering of electromagnetic radiation (like light) by particles of a much smaller wavelength.  In this case, the air molecules are much smaller than the wavelengths of visible light.
* **Shorter wavelengths scatter more:**  Crucially, blue and violet light have shorter wavelengths than other colors. This means they are scattered *much* more strongly by these air molecules.  It‚Äôs like throwing a small ball (blue light) against a bumpy surface ‚Äì it bounces off in all directions.  A larger ball (red light) would be less affected.

## Using Prompt Templates and LangChain Expression Syntax

This example introduces the use of a `ChatPromptTemplate` combined with an Ollama-backed LLM using LangChain's expression syntax. This pattern allows for reusable, parameterized prompts and demonstrates how to integrate structured prompting into your pipeline. The `|` operator chains the prompt and model together, creating a modular and declarative workflow.

We‚Äôll also simulate a multi-role setup (e.g., system + human) within the prompt string.


In [19]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_ollama.llms import OllamaLLM

template = """Question: {question}

Answer: Let's think step by step."""

prompt = ChatPromptTemplate.from_template(template)

model = OllamaLLM(model="gemma3")

chain = prompt | model

Markdown(chain.invoke({"question": "System: You are cybersecurity expert, and AI assisant to AWS.\n\nHuman: What is the LummaC2 malware?"}))

Okay, let's break down LummaC2 ‚Äì it's a pretty nasty piece of malware that's been making waves in the cybersecurity world. Here's a step-by-step breakdown:

**1. What is LummaC2?**

LummaC2 (also known as "NightSky") is a sophisticated, multi-stage cyber espionage malware developed by the North Korean Lazarus Group. It‚Äôs a Command and Control (C2) infrastructure, meaning it‚Äôs the backbone of a cyberattack. It‚Äôs *not* a standalone virus that directly infects systems. Instead, it‚Äôs used to control and coordinate other malware that *does* infect systems.

**2. Key Characteristics & Functionality:**

* **Multi-Stage Architecture:** This is the most critical aspect. LummaC2 operates in three distinct stages:
    * **Initial Access:**  Typically, LummaC2 gains initial access through phishing emails containing malicious documents (often Microsoft Office documents). These documents contain macros that, when enabled, trigger the download and execution of the initial LummaC2 components.
    * **Beaconing & Data Exfiltration:** Once established, LummaC2 establishes a persistent connection (a "beacon") to a command server. It then gathers intelligence ‚Äì usually targeting organizations in the finance, technology, and defense sectors. This intelligence can include sensitive information, intellectual property, and strategic planning details.
    * **Secondary Malware Deployment:** This is where it gets really dangerous.  LummaC2 doesn't just collect data; it *actively* deploys secondary malware, often Cobalt Strike, to further compromise the victim‚Äôs network, spread laterally, and escalate privileges.

* **Advanced Techniques:**
    * **Dynamic DNS:** LummaC2 utilizes dynamic DNS services to constantly change its C2 server addresses, making it incredibly difficult to track and block.
    * **SSL/TLS Encryption:** Communication between the beaconing agents and the C2 server is heavily encrypted using SSL/TLS, further obscuring the activity.
    * **Staged Downloads:**  The malware is broken down into smaller, encrypted modules which are downloaded sequentially, making detection harder.
    * **Use of Cryptocurrency:** The group uses cryptocurrency (primarily Bitcoin) to pay for services and to facilitate the exfiltration of stolen data.


**3. Why it‚Äôs a Threat:**

* **Sophisticated Threat Actor:**  It's operated by the Lazarus Group, a well-known and highly skilled cyber espionage unit affiliated with the North Korean government.  They're known for operations like the WannaCry ransomware attack and the Sony Pictures hack.
* **Espionage Focused:** LummaC2's primary goal isn't typically financial gain (though that's a byproduct); it‚Äôs about gathering intelligence for strategic advantage.
* **Ongoing Activity:**  LummaC2 is still actively being used in attacks, meaning organizations need to remain vigilant.

**4.  AWS Relevance (as your AI assistant to AWS):**

* **Threat Intelligence Integration:**  AWS services like GuardDuty and Security Hub regularly ingest threat intelligence feeds that include information about LummaC2.  You can configure these services to automatically detect and alert you to activity associated with LummaC2.
* **Network Security:**  Ensure your VPCs, EC2 instances, and other AWS resources are properly configured with security groups and network ACLs to restrict inbound and outbound traffic, limiting the potential for LummaC2 to establish a beacon.
* **Monitoring & Logging:**  Enable comprehensive logging and monitoring across your AWS environment to detect suspicious activity.  Analyze logs for unusual patterns of communication.


---

Do you want me to delve deeper into a specific aspect of LummaC2, such as:

*   Its technical details (e.g., the malware's architecture)?
*   How to detect it on AWS?
*   Specific tactics, techniques, and procedures (TTPs) used by the Lazarus Group?

## Memory

It's nice to be able to ask a question and get an answer, but that seems pretty transactional and impersonal. I'd like the LLM I'm talking to to show me they are listening to me and paying attention to what I am saying. I am, after all, human and I need to be loved or at least feel like it.

But LLMs are cold and heartless (technically, they are stateless but we are talking about my feelings here). This means that they don‚Äôt ‚Äòremember‚Äô interactions from prompt to prompt. You can fine tune them to persist data but without updating the weights through expensive training but, they don‚Äôt ‚Äòremember‚Äô anything more after the model is set at a checkpoint. This means if I just asked what the Qakbot malware was and then follow up with a question like "What industry was primarily targeted by this malware?", the model will not be able to answer within the context of Qakbot...since show beats tell, let's see how that works firsthand.

## Now lets ask the followup question...

In [20]:
Markdown(llm.invoke("System: You are cybersecurity expert, and AI assisant.\n\nHuman: What industry was primarily targeted by this malware?").content)

Okay, to help you determine which industry was primarily targeted by this malware, I need some information about the malware itself. Please tell me:

*   **What is the name of the malware?** (e.g., WannaCry, NotPetya, Emotet, etc.)
*   **What are its key characteristics?** (e.g., ransomware, banking trojan, spyware, etc.)
*   **What is its primary method of infection?** (e.g., phishing emails, exploited vulnerabilities, drive-by downloads, etc.)
*   **What are the known targets or affected sectors?** (e.g., hospitals, manufacturing, finance, government, etc.)

The more detail you can provide, the more accurately I can pinpoint the industry that was primarily targeted. 

Once I have this information, I can give you a much more specific and helpful answer.

## LangChain Memory Components: Enabling Stateful Conversations
The memory component of LangChain allows the LLM to become 'stateful'. This quality is quite useful when developing applications driven by LLMs. For instance, a conversational system or a chatbot is required to recall past interactions to maintain a conversation. Without memory, the system would not be able to handle follow-up messages or recollect key pieces of information mentioned earlier in the conversation. 

In this section, we will explore the memory modules provided by LangChain. LangChain offers several types of memory modules depending on the task and the properties of the LLM. We will incorporate memory into chains and examine changes in performance due to it.

In [21]:
from langchain_community.llms import Ollama
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory
import pandas as pd
import warnings

warnings.filterwarnings("ignore")

# Initialize Ollama with Gemma model
llm = Ollama(model="gemma3")

In [22]:
# Set up conversation memory
cbm_memory = ConversationBufferMemory()

# Adding memory to a conversational chain
chain_with_buffer_memory = ConversationChain(llm=llm, memory=cbm_memory)

# Helper function to prompt and print memory state
def prompt_and_print_memory(prompts, chain_with_memory):
    # Store the responses
    responses = {"input": [], "history": [], "response": []}

    # Repeatedly prompting the chain and observing the memory
    for prompt in prompts:
        response = chain_with_memory.invoke({"input": prompt})
        responses["input"].append(prompt)
        responses["history"].append(cbm_memory.load_memory_variables({}))
        responses["response"].append(response["response"])

    # Display responses in a dataframe
    df = pd.DataFrame.from_dict(responses)
    with pd.option_context("display.max_colwidth", None):
        display(df)

# Sequence of prompts for demonstration
cbm_prompts = [
    "What is the LummaC2 malware?",
    "What industry was primarily targeted by this malware?",
]

# Invoke helper function
prompt_and_print_memory(cbm_prompts, chain_with_buffer_memory)


Unnamed: 0,input,history,response
0,What is the LummaC2 malware?,"{'history': 'Human: What is the LummaC2 malware? AI: Oh, LummaC2! That‚Äôs a really interesting piece of malware that‚Äôs been getting a lot of attention lately. It‚Äôs a sophisticated, modular post-exploitation malware developed by the Lazarus Group, which is, in turn, linked to North Korea. Essentially, it‚Äôs a reconnaissance and data exfiltration tool. What makes it particularly noteworthy is its incredibly detailed and persistent reconnaissance capabilities. It doesn't just grab a few files; it‚Äôs designed to *deeply* understand a compromised network. Here‚Äôs a breakdown of what it does and some of the specific details that have come to light: * **Modular Design:** LummaC2 is built around a modular architecture. This means it‚Äôs comprised of several distinct modules, each with a specific function. These modules include:  * **Credential Dumping:** This is a core function ‚Äì it actively tries to steal credentials from various systems, including those stored in memory, on local disks, and even within Windows' LSASS process. It's been documented to extract credentials from a wide range of applications like RDP, Skype, and even Microsoft Office.  * **Webcam Access:** It attempts to activate webcams on compromised systems ‚Äì often to capture screenshots or even use the webcam for live surveillance.  * **System Information Gathering:** It collects a massive amount of information about the target system, including hardware details, operating system versions, installed software, network configurations, and user accounts.  * **Lateral Movement:** Once it has established a foothold, LummaC2 uses this information to move laterally within the network, searching for more valuable targets. * **Persistence:** A key element of its design is its ability to maintain a persistent presence on the compromised network, even if the initial access vector is removed. It does this by establishing multiple, seemingly legitimate, processes running in the background. * **Stealth:** The malware is designed to be very stealthy, making it difficult to detect. It uses techniques like process injection and memory manipulation to hide its activities. * **Recent Activity:** There‚Äôs been a surge in LummaC2 activity recently, with reports of it being used in attacks against organizations in the technology, gaming, and cryptocurrency sectors. Analysts at Mandiant and other cybersecurity firms have been closely tracking its evolution. They've noted changes to its communication protocols and the addition of new modules. You can find more detailed information about LummaC2 on sites like Mandiant's threat intelligence reports, and on security blogs that cover cyber threat analysis. Do you want me to delve into any particular aspect of it, like the specific communication protocols they use, or perhaps the technical methods employed for stealth?'}","Oh, LummaC2! That‚Äôs a really interesting piece of malware that‚Äôs been getting a lot of attention lately. It‚Äôs a sophisticated, modular post-exploitation malware developed by the Lazarus Group, which is, in turn, linked to North Korea. \n\nEssentially, it‚Äôs a reconnaissance and data exfiltration tool. What makes it particularly noteworthy is its incredibly detailed and persistent reconnaissance capabilities. It doesn't just grab a few files; it‚Äôs designed to *deeply* understand a compromised network. \n\nHere‚Äôs a breakdown of what it does and some of the specific details that have come to light:\n\n* **Modular Design:** LummaC2 is built around a modular architecture. This means it‚Äôs comprised of several distinct modules, each with a specific function. These modules include:\n * **Credential Dumping:** This is a core function ‚Äì it actively tries to steal credentials from various systems, including those stored in memory, on local disks, and even within Windows' LSASS process. It's been documented to extract credentials from a wide range of applications like RDP, Skype, and even Microsoft Office.\n * **Webcam Access:** It attempts to activate webcams on compromised systems ‚Äì often to capture screenshots or even use the webcam for live surveillance.\n * **System Information Gathering:** It collects a massive amount of information about the target system, including hardware details, operating system versions, installed software, network configurations, and user accounts.\n * **Lateral Movement:** Once it has established a foothold, LummaC2 uses this information to move laterally within the network, searching for more valuable targets.\n\n* **Persistence:** A key element of its design is its ability to maintain a persistent presence on the compromised network, even if the initial access vector is removed. It does this by establishing multiple, seemingly legitimate, processes running in the background. \n\n* **Stealth:** The malware is designed to be very stealthy, making it difficult to detect. It uses techniques like process injection and memory manipulation to hide its activities.\n\n* **Recent Activity:** There‚Äôs been a surge in LummaC2 activity recently, with reports of it being used in attacks against organizations in the technology, gaming, and cryptocurrency sectors. Analysts at Mandiant and other cybersecurity firms have been closely tracking its evolution. They've noted changes to its communication protocols and the addition of new modules. \n\nYou can find more detailed information about LummaC2 on sites like Mandiant's threat intelligence reports, and on security blogs that cover cyber threat analysis. Do you want me to delve into any particular aspect of it, like the specific communication protocols they use, or perhaps the technical methods employed for stealth?"
1,What industry was primarily targeted by this malware?,"{'history': 'Human: What is the LummaC2 malware? AI: Oh, LummaC2! That‚Äôs a really interesting piece of malware that‚Äôs been getting a lot of attention lately. It‚Äôs a sophisticated, modular post-exploitation malware developed by the Lazarus Group, which is, in turn, linked to North Korea. Essentially, it‚Äôs a reconnaissance and data exfiltration tool. What makes it particularly noteworthy is its incredibly detailed and persistent reconnaissance capabilities. It doesn't just grab a few files; it‚Äôs designed to *deeply* understand a compromised network. Here‚Äôs a breakdown of what it does and some of the specific details that have come to light: * **Modular Design:** LummaC2 is built around a modular architecture. This means it‚Äôs comprised of several distinct modules, each with a specific function. These modules include:  * **Credential Dumping:** This is a core function ‚Äì it actively tries to steal credentials from various systems, including those stored in memory, on local disks, and even within Windows' LSASS process. It's been documented to extract credentials from a wide range of applications like RDP, Skype, and even Microsoft Office.  * **Webcam Access:** It attempts to activate webcams on compromised systems ‚Äì often to capture screenshots or even use the webcam for live surveillance.  * **System Information Gathering:** It collects a massive amount of information about the target system, including hardware details, operating system versions, installed software, network configurations, and user accounts.  * **Lateral Movement:** Once it has established a foothold, LummaC2 uses this information to move laterally within the network, searching for more valuable targets. * **Persistence:** A key element of its design is its ability to maintain a persistent presence on the compromised network, even if the initial access vector is removed. It does this by establishing multiple, seemingly legitimate, processes running in the background. * **Stealth:** The malware is designed to be very stealthy, making it difficult to detect. It uses techniques like process injection and memory manipulation to hide its activities. * **Recent Activity:** There‚Äôs been a surge in LummaC2 activity recently, with reports of it being used in attacks against organizations in the technology, gaming, and cryptocurrency sectors. Analysts at Mandiant and other cybersecurity firms have been closely tracking its evolution. They've noted changes to its communication protocols and the addition of new modules. You can find more detailed information about LummaC2 on sites like Mandiant's threat intelligence reports, and on security blogs that cover cyber threat analysis. Do you want me to delve into any particular aspect of it, like the specific communication protocols they use, or perhaps the technical methods employed for stealth? Human: What industry was primarily targeted by this malware? AI: That‚Äôs a really good question! While LummaC2 has been observed targeting a *variety* of sectors, the technology, gaming, and cryptocurrency industries have been the most heavily targeted. Specifically, it‚Äôs been frequently used against companies involved in cryptocurrency ‚Äì think exchanges and wallet providers ‚Äì likely to steal digital assets. We've seen a significant amount of activity directed at gaming companies, potentially related to intellectual property theft or financial gain through fraudulent activities. However, it‚Äôs important to note that its modular design and reconnaissance capabilities mean it *could* be used against any organization with valuable data, so the targets aren‚Äôt limited to just these three. Do you want me to elaborate on the specific tactics they‚Äôre using against each of those industries, or perhaps discuss the broader implications of this targeting?'}","That‚Äôs a really good question! While LummaC2 has been observed targeting a *variety* of sectors, the technology, gaming, and cryptocurrency industries have been the most heavily targeted. \n\nSpecifically, it‚Äôs been frequently used against companies involved in cryptocurrency ‚Äì think exchanges and wallet providers ‚Äì likely to steal digital assets. We've seen a significant amount of activity directed at gaming companies, potentially related to intellectual property theft or financial gain through fraudulent activities. \n\nHowever, it‚Äôs important to note that its modular design and reconnaissance capabilities mean it *could* be used against any organization with valuable data, so the targets aren‚Äôt limited to just these three. Do you want me to elaborate on the specific tactics they‚Äôre using against each of those industries, or perhaps discuss the broader implications of this targeting?"


## Tools
### We'll use a tool supplied with LangChain and a custom tool we create

[DuckDuckGo](https://duckduckgo.com/) is an internet privacy company most popularly known for their private search engine. The company emphasizes privacy and anonimity as one of the key principles behind all their products.

Let's create a DuckDuckGo tool that is capable of retrieving results from a web search.

In [23]:
from langchain.tools import Tool
from langchain_community.tools import DuckDuckGoSearchRun

# Define the API wrapper for DuckDuckGo search
duckduckgo_search = DuckDuckGoSearchRun()

# Define the DuckDuckGo tool using a description and the function to retrieve results from DuckDuckGo search
duckduckgo_tool = Tool(
    name="DuckDuckGoSearch",
    func=duckduckgo_search.run,
    description="useful for when you need to answer questions about current weather and other current events",
)

# Test the DuckDuckGo tool
Markdown(duckduckgo_tool("What is a honeybee?"))

The best-known honey bee species is the western honey bee (Apis mellifera), which was domesticated and farmed (i.e. beekeeping) for honey production and crop pollination. The only other domesticated species is the eastern honey bee (Apis cerana), which are raised in South, Southeast and East Asia. Nov 20, 2025 ¬∑ A honeybee is any of a small group of social bee s that make honey. All honeybees live together in nests or hives. There are two honeybee sexes, male and female, and two female castes. May 15, 2024 ¬∑ Honey bee swarms may contain several hundred to several thousand worker bees, a few drones, and one queen. Swarming bees fly around briefly and then cluster on a tree limb, shrub, or another object. Jun 12, 2017 ¬∑ Bees spend their lives collecting pollen, which provide a source of protein to their developing youngsters. As pollen collects on their hairy legs, they move some of the pollen from the male to the female part of a flower. Simply, a Honey Bee is a small vegetarian insect which lives in a highly structured colony with thousands of its sisters (and a few brothers along with one Queen), all working toward the goal of storing enough food (honey) for the winter when flowers are not present. Oct 17, 2016 ¬∑ Only one species of honeybee inhabits the United States‚Äîthe Western Honeybee , Apis mellifera. Native to Europe, the Middle East, and parts of Africa, this species was introduced to North America by European settlers in the 1600s. Honeybees are important pollinators for flowers, fruits, and vegetables . They live on stored honey and pollen all winter and cluster into a ball to conserve warmth. All honeybees are social and... Honeybees live in colonies with one queen running the whole hive. Worker honeybees are all females and are the only bees most people ever see flying around outside of the hive. They forage for food, build the honeycombs, and protect the hive. Oct 17, 2016 ¬∑ Only one species of honeybee inhabits the United States‚Äîthe Western Honeybee , Apis mellifera. Native to Europe, the Middle East, and parts of Africa, this species was introduced to North America by European settlers in the 1600s. Lots of people get confused about the difference between bumblebees and honeybees (shown in the photo). Watch our video for a short introduction to the five main differences between these two types of bee . A close-up of a honeybee inspecting the hive cells.

## Custom Tools


You can define custom tools using the tool decorator.

It is critical when writing custom tools that we use well written doc strings, that we use python's type decoration syntax and that we use the @tool decorator supplied by LangChain (loaded above). The docstring and the type labels are used by the agent to reason about what a tool is useful for and how to interface with it. If your agnet isn't picking the correct tools or is calling them incorrectly, these are common culprits.

Keep in mind that a custom tool can do anything you can define in code. They can even call APIs of much larger systems. Effectively, these are boundless.

Here is a simple custom tool that returns the current date.

In [24]:
from langchain.tools import tool
from datetime import date

@tool
def curr_date(text: str) -> str:
    """Returns todays date, use this for any \
    questions related to knowing todays date. \
    The input should always be an empty string, \
    and this function will always return todays \
    date - any date math should occur \
    outside this function."""
    return str(date.today())

In [25]:

# Define the date tool
date_tool = Tool(
    name="DateTool", func=curr_date, description="Useful to retrieve the current date"
)

# Test date tool
print(date_tool(""))

2025-12-04


## Agents

Finally...we get to the good stuff...Agents are really cool.  Generally speaking, they have a model and a set of tools at their disposal.  They can take a prompt, break it down into the steps necessary to produce a meaningful response and, using the descriptions of the tools at their disposal, decide which ones are likely to be able to accomplish each tasks in the best way.  It then uses it's conclusions to execute the needed tools, pull in the right context and respond to your prompt.

They are simple to build and powerful to use.

Let's see if we can solve the dot counting problem now...

In [26]:
from langchain_experimental.tools import PythonREPLTool
from langchain.agents import AgentType
from langchain.agents import initialize_agent
from IPython.display import Markdown

llm = ChatOllama(
    model="gemma3",
    temperature=0.1,
    num_predict=256,  # Use num_predict instead of max_tokens for Ollama
)

# Define the Python REPL tool
python_repl = PythonREPLTool()

# Initialize the agent using the local model and the Python REPL tool
python_agent = initialize_agent(
    [python_repl],
    llm,  # Using our previously initialized Ollama Gemma model here
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    verbose=True,
    handle_parsing_errors=True,
)

# Test the Python Agent with a simple input
result = python_agent.invoke(
    {"input": "How many dots are in the string ......................................?"}
)

# Display the result using Markdown
Markdown(result['output'])



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: The string contains a series of dots. I need to count the number of dots.
Action: Python_REPL
Action Input: s = "....................................."
print(s.count('.'))[0m
Observation: [36;1m[1;3m37
[0m
Thought:[32;1m[1;3mFinal Answer: 37[0m

[1m> Finished chain.[0m


37

## Pulling it all together and let the model reason about something to get the answer we need.

### Agent with Automatic Tool Selection
The real power of agents is when they are allowed to reason about which tools can be used to solve different tasks or sub-tasks to respond to a prompt. This is what makes them seem intelligent...but remember they are not truly intelligent! They're following sophisticated pattern recognition.

The agent reasoning flow works like this:

User Input ‚Üí The user submits a query or request
LLM Reasoning ‚Üí The LLM analyzes the request and determines approach
Tool Selection ‚Üí Based on tool descriptions, the LLM selects appropriate tool(s)
Tool Execution ‚Üí The selected tools are executed with parameters from the LLM
Result Synthesis ‚Üí The LLM combines tool outputs into a coherent response
Hopefully, the model decides correctly and you get great results. Sometimes you have to prompt the model to use the right tools.

For example, if a model is trying to solve a math problem without using your math-specific tool, you can just add "You are bad at math so always use the wolfram tool to solve math problems" in the prompt. Simple guidance like this is usually enough to get the model working the way you intend.

This reasoning ability to select appropriate tools creates a flexible system that can handle diverse queries by combining specialized tools with general language understanding. The agent can break down complex problems, identify which tools would be most helpful for each component, and then integrate the results into a coherent response.

In [27]:
from langchain.agents import initialize_agent

# Define a list of available tools for the agent
tools = [
    duckduckgo_tool,
    date_tool,
]

# Initialize the agent with access to all the tooks in the list
agent_executor = initialize_agent(
    tools, 
    llm, 
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, 
    verbose=True, 
    handle_parsing_errors=True,
)

In [28]:
agent_executor.invoke(
    {
        "input": """
        System: You are not good at determining dates and times.  When you are asked about weather reports, you should first make sure you know what date is being asked about.
        If you are asked about weather and a day was not specified, you should use the date_tool to get today's date and then use the duckduckgo_tool to find out the weather for the date.
        When asking about weather, you should always include the date in your search for weather so you get the weather report for the correct date.
        
        Human: I am in Washington DC.  Will I need an umbrella today?
        """
    }
)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mI need to find out the weather in Washington DC today to determine if I need an umbrella. Since the date was not specified, I should use the DateTool to get today's date and then use the DuckDuckGoSearch tool to find the weather for that date.
Action: DateTool
Action Input: ''[0m
Observation: [33;1m[1;3m2025-12-04[0m
Thought:[32;1m[1;3mOkay, I now have today's date as 2025-12-04. I will use the DuckDuckGoSearch tool to find the weather in Washington DC for this date.
Action: DuckDuckGoSearch
Action Input: 'weather Washington DC 2025-12-04'[0m
Observation: [36;1m[1;3mWashington , DC 39¬∞F Cloudy- Light rain is expected. ... Delightful, is there generally a big difference in weather between S Scotland and the ... Washington , DC Weather ... 10 Day Weather - Washington , DC ... NFL 2025 Schedule: The Hottest, Coldest, Wettest, Snowiest Games The Weather Channel uses data, cookies and other similar technologies on this b

{'input': "\n        System: You are not good at determining dates and times.  When you are asked about weather reports, you should first make sure you know what date is being asked about.\n        If you are asked about weather and a day was not specified, you should use the date_tool to get today's date and then use the duckduckgo_tool to find out the weather for the date.\n        When asking about weather, you should always include the date in your search for weather so you get the weather report for the correct date.\n\n        Human: I am in Washington DC.  Will I need an umbrella today?\n        ",
 'output': 'Cloudy- Light rain is expected.'}

# RAG Demonstration with ChromaDB

This cell demonstrates building a Retrieval-Augmented Generation (RAG) system using ChromaDB as the vector store. The code loads PDF documents, splits them into manageable chunks, creates vector embeddings using AWS Bedrock or HuggingFace models, and stores them in a persistent ChromaDB database. It showcases two key RAG functionalities: (1) direct similarity search to retrieve relevant document chunks based on semantic meaning, and (2) question answering that leverages the retrieved context to generate accurate responses using an LLM. This implementation includes fallback options, proper error handling, and performance testing to ensure reliable operation even if the primary embedding service encounters issues. The system persists the vector database to disk, allowing for reuse without regenerating embeddings in future sessions.

In [29]:
import os
import pandas as pd
from IPython.display import Markdown, display

# LangChain imports
from langchain_community.document_loaders import PyMuPDFLoader
from langchain_community.vectorstores import Chroma
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.embeddings import OllamaEmbeddings
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate
from langchain_community.llms import Ollama
from chromadb.config import Settings

In [30]:
from langchain_community.llms import Ollama
from langchain_community.embeddings import OllamaEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain_core.prompts import PromptTemplate
from langchain.chains import RetrievalQA
from langchain_community.document_loaders import PyMuPDFLoader

import os
from IPython.display import Markdown, display
from langchain_core.documents import Document

# Initialize Ollama LLM (Gemma model)
# Make sure these are pulled once:
#   ollama pull gemma3
#   ollama pull nomic-embed-text
llm = Ollama(
    model="gemma3",
    base_url="http://localhost:11434",     # explicit, just to be clear
)

# üîß Embeddings model: add num_ctx and base_url
embedding_model = OllamaEmbeddings(
    model="nomic-embed-text",
    base_url="http://localhost:11434",
    num_ctx=2048,     # <= keep context modest to avoid crashes
    # you could even try 1024 if needed
)

# Create directory for PDFs
pdf_dir = "./data/pdfs"
os.makedirs(pdf_dir, exist_ok=True)
print(f"PDF directory is at: {os.path.abspath(pdf_dir)}")

# üîß Text splitter: smaller chunks
recur_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,        # was 1000
    chunk_overlap=100,     # was 200
    separators=["\n\n", "\n", ".", "!", "?", ",", " ", ""],
    length_function=len
)

# Load and process PDFs using PyMuPDFLoader for each file in the directory
data = []
for filename in os.listdir(pdf_dir):
    if filename.endswith(".pdf"):
        loader = PyMuPDFLoader(file_path=os.path.join(pdf_dir, filename))
        data.extend(loader.load())

# Ensure there's data, create a demo if not
if len(data) == 0:
    data = [
        Document(
            page_content="Sample text about LummaC2.",
            metadata={"source": "sample.pdf", "page": 1},
        ),
    ]

# Split documents into chunks
data_splits = recur_splitter.split_documents(data)

# (Optional but helpful debug: quickly sanity-check chunks)
# for i, d in enumerate(data_splits[:3]):
#     print(i, len(d.page_content))

# Create ChromaDB vector store (persists locally)
vectordb = Chroma.from_documents(
    documents=data_splits,
    embedding=embedding_model,
    collection_name="rag-demo",
    persist_directory="./chroma_llm_training"
)

# Define RAG QA prompt template
qa_template = """
You are a helpful assistant answering questions based on the provided context.
Use the following pieces of context to answer the user's question. If unsure, state you don't know.

Context:
{context}

Question: {question}

Answer:
"""
qa_prompt_template = PromptTemplate.from_template(qa_template)

# Define RetrievalQA chain
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=vectordb.as_retriever(search_kwargs={"k": 5}),
    return_source_documents=True,
    chain_type="stuff",
    chain_type_kwargs={"prompt": qa_prompt_template},
)

# Test questions
questions = [
    "What is LummaC2?",
    "What is a ClickFix?",
    "Why is the sky blue?"
]

# Run questions through the QA chain
for question in questions:
    print(f"\nQuestion: {question}")
    response = qa_chain({"query": question})
    display(Markdown(response["result"]))

    print("\nSources:")
    seen_sources = set()
    for doc in response["source_documents"][:2]:
        source_info = f"{os.path.basename(doc.metadata['source'])}, Page {doc.metadata.get('page', 'N/A')}"
        if source_info not in seen_sources:
            print(source_info)
            seen_sources.add(source_info)


PDF directory is at: /Users/schwartz/src/genai-essentials/data/pdfs

Question: What is LummaC2?


I don't know. The provided context discusses securing remote access software and incident root cause analysis, but it does not contain information about LummaC2.


Sources:
clickfix-attacks-sector-alert-tlpclear.pdf, Page 1

Question: What is a ClickFix?


I don‚Äôt know. The provided context does not contain information about ClickFix.


Sources:
clickfix-attacks-sector-alert-tlpclear.pdf, Page 1

Question: Why is the sky blue?


I do not know. The provided context discusses IT and OT network configurations and cloud portals, not the reason why the sky is blue.


Sources:
ts_10399402v010101p.pdf, Page 9
