<a href="https://colab.research.google.com/github/micah-shull/LLMs/blob/main/LLM_053_RAG_CahsFlow4Cast_Embeddings.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### import Libraries

In [5]:
!pip install -q sentence-transformers faiss-cpu
!pip install -q transformers load_dotenv huggingface_hub

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m30.7/30.7 MB[0m [31m53.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m4.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.8/13.8 MB[0m [31m70.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.6/24.6 MB[0m [31m45.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m883.7/883.7 kB[0m [31m44.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m664.8/664.8 MB[0m [31m1.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m211.5/211.5 MB[0m [31m5.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m56.3/56.3 MB[0m [31m11.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

### import Blog Chunks

In [6]:
import pandas as pd
import numpy as np

# Save to local Colab (optional)
csv_path = "/content/cleaned_blog_chunks.csv"
df_clean_chunks = pd.read_csv(csv_path)
df_clean_chunks

Unnamed: 0,title,filename,chunk_id,text
0,About Micah Shull,about_micah_shull.txt,0,About Micah Shull | Data Scientist & Founder o...
1,About Micah Shull,about_micah_shull.txt,1,That’s why I created Cashflow 4Cast — a servic...
2,Consistency That Builds Confidence,consistency_that_builds_confidence.txt,0,It Wasn’t Just One Store — It Was Every Store ...
3,Consistency That Builds Confidence,consistency_that_builds_confidence.txt,1,"And in every case, it consistently cut forecas..."
4,Consistency That Builds Confidence,consistency_that_builds_confidence.txt,2,MAPE (Mean Absolute Percentage Error): Shows f...
...,...,...,...,...
135,🤖 How it Works,🤖_how_it_works.txt,0,Forecasting Face-Off:Traditional Tools vs Mach...
136,🤖 How it Works,🤖_how_it_works.txt,1,Includes real-world economic indicators to ant...
137,🤖 How it Works,🤖_how_it_works.txt,2,"sales) 10+ (sales, category, date, promotions,..."
138,🤖 How it Works,🤖_how_it_works.txt,3,MAPE (Mean Absolute Percentage Error): Shows f...


#Retrieval-Augmented Generation (RAG)

This code block is **fundamental to how Retrieval-Augmented Generation (RAG)** works.

---

### ✅ Purpose of This Code

This block takes your **blog content**, turns it into **vector embeddings**, and builds a **FAISS index** for fast and smart search. This allows a chatbot (or any system) to *retrieve relevant content* by meaning — not just keywords.

---

### 🧠 Step-by-Step Breakdown

```python
# 1. Load Sentence Transformer model
embedder = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")
```
- Loads a lightweight, powerful model that converts text into vectors (embeddings) that capture meaning.
- Each chunk of your blog becomes a **384-dimensional vector**.

---

```python
# 2. Embed all chunks
doc_embeddings = embedder.encode(texts, convert_to_numpy=True)
```
- Takes every blog chunk (from your CSV) and embeds it into a numeric vector.
- These vectors represent meaning — so similar topics are **close together in vector space**.

---

```python
# 3. Build FAISS index
index = faiss.IndexFlatL2(embedding_dim)
index.add(doc_embeddings.astype("float32"))
```
- Creates a **FAISS index**, which is a fast similarity search tool.
- Allows you to later embed a user’s question, then **find the top-k most relevant chunks** instantly.

---

### 📌 What You Should Learn From This Code

> **This is how you prepare a knowledge base for semantic search** — the core of Retrieval-Augmented Generation.

Key takeaways:
- **Embeddings** turn human language into numbers that represent meaning.
- **FAISS** enables fast similarity search in that space.
- Once set up, you can answer questions by finding the most relevant content and feeding it to a language model (LLM).

This block is the engine behind search, recommendation, and context-aware AI — so it's an essential building block of modern NLP pipelines.


In [7]:
import pandas as pd
import numpy as np
import faiss
from sentence_transformers import SentenceTransformer

# Step 1: Load Sentence Transformer model
model_name = "sentence-transformers/all-MiniLM-L6-v2"
embedder = SentenceTransformer(model_name)

# Step 2: Embed all chunks
texts = df_clean_chunks["text"].tolist()
doc_embeddings = embedder.encode(texts, convert_to_numpy=True)

# Step 3: Create FAISS index and add vectors
embedding_dim = doc_embeddings.shape[1]
index = faiss.IndexFlatL2(embedding_dim)
index.add(doc_embeddings.astype("float32"))

# Final confirmation
print(f"✅ Embedded {len(doc_embeddings)} chunks into {embedding_dim}-dimensional vectors.")
print("📦 FAISS index is ready for semantic search.")


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.5k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

✅ Embedded 140 chunks into 384-dimensional vectors.
📦 FAISS index is ready for semantic search.


### 💾 Save the FAISS Index and Metadata

In [None]:
# Create folders if needed
import os
os.makedirs("faiss_index", exist_ok=True)

# Save the FAISS index to disk
faiss.write_index(index, "faiss_index/blog_index.faiss")

# Save the DataFrame with metadata (title, filename, text, etc.)
df_clean_chunks.to_csv("faiss_index/blog_metadata.csv", index=False)

print("✅ FAISS index and metadata saved to 'faiss_index/' folder.")


✅ FAISS index and metadata saved to 'faiss_index/' folder.


### 🔁 Reload Later When Needed

In [None]:
import faiss
import pandas as pd

# Reload FAISS index and metadata
index = faiss.read_index("faiss_index/blog_index.faiss")
df_clean_chunks = pd.read_csv("faiss_index/blog_metadata.csv")

print("✅ FAISS index and metadata reloaded.")


# 🧠 Semantic Search (QA via FAISS)

In [None]:
import textwrap

def search_blog(question, k=3, wrap_width=100):
    # Embed the user query
    query_vec = embedder.encode([question])[0].astype("float32")

    # Search top-k results
    D, I = index.search(np.array([query_vec]), k)

    print(f"\n📌 Question: {question}\n{'=' * 80}")

    for rank, idx in enumerate(I[0]):
        result = df_clean_chunks.iloc[idx]
        print(f"\n🔹 Result {rank + 1}")
        print(f"📘 Title   : {result['title']}")
        print("📄 Chunk   :")
        print(textwrap.fill(result['text'], width=wrap_width))
        print(f"🔗 Source  : {result['filename']}")
        print("-" * 80)




In [None]:
search_blog("What economic indicators should I monitor as a small business?")



📌 Question: What economic indicators should I monitor as a small business?

🔹 Result 1
📘 Title   : 🚀 Looking Ahead: The Power of Economic Indicators in Forecasting
📄 Chunk   :
Why Economic Indicators Matter Looking Ahead: The Power of Economic Indicators in Forecasting Most
businesses use Excel or QuickBooks forecasts that only look inward — past sales, simple averages,
and outdated assumptions. We look outward. “You’re modeling what's coming, not just what already
happened.” That simple difference makes all the difference when it comes to preparing for risk,
spotting opportunity, and staying one step ahead. 📊 What We Include That Excel Doesn’t Local
Employment Trends (Retail, Manufacturing, Hospitality) Alachua County Median Household Income
Average Weekly Wages in Gainesville Per Capita Income in Alachua Federal Reserve Interest Rates
Inflation & Loan Delinquency Rates Statewide Unemployment & Payrolls These factors shape the
financial environment your customers live in — and your s

Great insight — and you're absolutely right. That response looks *too focused on one post*, when your knowledge base spans dozens of entries. We want it to feel like a **real assistant**, not just a blog highlighter.

---

## ✅ Goal: Make Answers Feel More Holistic

Right now, we're retrieving and printing **individual chunks**, but you want something that feels more:

- 🧠 **Integrated** — combines info across multiple posts
- 💬 **Conversational** — gives a real answer, not a quote
- 🧾 **Cited** — points to the original blog post(s) as backup

To do that, we can now introduce a **response generator**, and here's how:

---

## 🧪 Plan: RAG-Style Response Generation

### 🔄 New Flow

1. **Embed the user question**
2. **Retrieve top-k chunks across all posts**
3. **Concatenate** them into a reference document
4. **Send to an LLM** with a prompt like:

```text
You are a helpful business assistant. Use the following blog content to answer the user's question:

[Retrieved chunks...]

Question: What economic indicators should I monitor as a small business?
Answer:
```

---

## ✅ Do You Want To Try That?

We’ll need:

- A small LLM (`transformers` pipeline or OpenAI-compatible if you have a key)
- A combined retrieval + generation function

This will let your chatbot answer questions with **clarity, citations, and confidence** — powered by your own blog.

Would you like me to set that up?

In [None]:
from huggingface_hub import login
from dotenv import load_dotenv
import os

# Load the .env file containing your token
load_dotenv("/content/HUGGINGFACE_HUB_TOKEN.env")

# Login using the token
login(token=os.environ["HUGGINGFACE_HUB_TOKEN"])


# 🧠 Set Up a Text Generation Pipeline



In [None]:
from transformers import pipeline

# Fast and efficient instruction-tuned model
generator = pipeline(
    "text2text-generation",
    model="google/flan-t5-base",
    tokenizer="google/flan-t5-base"
)

config.json:   0%|          | 0.00/1.40k [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/990M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/2.54k [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

Device set to use cpu


### Define the RAG-Style Answer Function

In [None]:
import textwrap

def rag_answer(question, k=4, show_chunks=False):
    # Embed question
    query_vec = embedder.encode([question])[0].astype("float32")
    D, I = index.search(np.array([query_vec]), k)

    # Combine top-k retrieved chunks
    retrieved_chunks = "\n\n".join(df_clean_chunks.iloc[i]["text"] for i in I[0])

    # Optional: show chunks retrieved
    if show_chunks:
        print("\n📚 Retrieved Chunks:\n" + "-" * 80)
        for i in I[0]:
            chunk = df_clean_chunks.iloc[i]
            print(f"📘 {chunk['title']} → {chunk['filename']}")
            print(textwrap.fill(chunk['text'], width=100))
            print("-" * 80)

    # Construct prompt
    prompt = f"""
              prompt = f"""
              Answer the question using the blog content below.

              {retrieved_chunks}

              Question: {question}
              """

    # Generate the response
    response = generator(prompt, max_new_tokens=250, do_sample=False)[0]["generated_text"]

    # Extract just the LLM-generated answer
    answer = response.split("ANSWER:")[-1].strip()

    # Print the final result
    print(f"\n📌 Question: {question}")
    print("=" * 80)
    print(textwrap.fill(answer, width=100))
    print("=" * 80)



In [None]:
response = generator(prompt, max_new_tokens=250, do_sample=False)[0]["generated_text"]

#### Clean up Memory

In [None]:
import torch
torch.cuda.empty_cache()

#### Remove Widgets from Notebook to save to Github

In [None]:
import json
from google.colab import drive
drive.mount('/content/drive')

# Path to your current notebook file (adjust if different)
notebook_path = "/content/drive/My Drive/LLM/LLM_053_RAG_CahsFlow4Cast_Embeddings.ipynb"


# Load the notebook JSON
with open(notebook_path, 'r', encoding='utf-8') as f:
    nb = json.load(f)

# Remove the widget metadata if it exists
if 'widgets' in nb.get('metadata', {}):
    del nb['metadata']['widgets']

# Save the cleaned notebook
with open(notebook_path, 'w', encoding='utf-8') as f:
    json.dump(nb, f, indent=2)

print("Notebook metadata cleaned. Try saving to GitHub again.")
