<a href="https://colab.research.google.com/github/micah-shull/LLMs/blob/main/LLM_054_RAG_CahsFlow4Cast_Chatbot.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# ✅ The Goal:  
Run a chatbot that uses your **FAISS + blog chunks** to answer questions 24/7, likely via your website or app.

---

### 🧱 Key Components of a 24/7 Chatbot

| Component         | Purpose |
|------------------|---------|
| 🔍 **Vector Index (FAISS)** | Fast document retrieval |
| 📚 **Blog Chunks**          | Your knowledge base |
| 🤖 **LLM** (like Falcon or OpenAI) | Generates the response |
| 💬 **Chat UI / API**        | User-facing interface |
| ⚙️ **Backend App** (e.g., FastAPI) | Runs RAG logic and serves responses |
| ☁️ **Hosting Platform**     | Keeps your app running 24/7 |

---

### 🧱 Step-by-Step Plan

#### 1. **Start with a Simple Chatbot (Gradio + LLM)**
- Use `Gradio` to set up a basic chatbot interface
- Use a **helpful, instruction-tuned model** (like `tiiuae/falcon-7b-instruct`, `mistralai/Mistral-7B-Instruct`, or even `google/flan-t5-base`)
- Let the chatbot answer general questions naturally
- Test the tone, quality, and response clarity

---

#### 2. **Add Your Brand Voice or Domain Focus**
- Adjust prompts: "You are a helpful assistant for small businesses trying to forecast revenue."
- Add example questions to guide response style
- Validate that it feels like *your* bot

---

#### 3. **Integrate RAG for Accuracy + Depth**
- Once basic chat is solid:
  - Load your FAISS index
  - On user query, retrieve top chunks
  - Add those chunks as **context** to the prompt
  - Let the LLM answer with that grounded knowledge

---

### 🔄 Why This Works Better

| 🚫 All-at-once RAG Bot       | ✅ Build-Then-Integrate |
|-----------------------------|-------------------------|
| Hard to debug               | Easy to test components |
| Can feel robotic or brittle | Lets you iterate naturally |
| Context injection can break | You know what "normal" output looks like first |
| Slower to deploy            | Can get to a working version in minutes |

---

Great call — planning ahead will save us from frustration and make this smooth. Here's a list of **common issues** we might encounter when setting up your Gradio chatbot, along with **proactive solutions**:

---

### 🚧 Potential Problems & ✅ Proactive Fixes

| 🛑 Issue | 🧠 What It Means | ✅ How to Prevent or Fix |
|--------|----------------|-------------------------|
| **Model loading fails** | Large models like `falcon-7b` might be too heavy for free-tier Spaces or Colab | ✅ Use a lightweight instruction-tuned model like `google/flan-t5-base`, `mistralai/Mistral-7B-Instruct`, or `tiiuae/falcon-7b-instruct` with caution (test locally first) |
| **Long load times** | First-time model loading can take minutes | ✅ Print a "loading model..." message, and cache in Colab to test performance |
| **Memory crashes (OOM)** | Large models on limited RAM environments crash | ✅ Try smaller models (like `flan-t5-base`), or test in Colab before deploying |
| **Chatbot doesn't respond naturally** | Prompting may not guide the model well | ✅ Use clear system prompts like: `"You are a helpful assistant for small business owners..."` |
| **API rate limits / Hugging Face issues** | Too many requests or expired token | ✅ Log in once with `huggingface_hub.login()` and monitor API usage. Use `.env` for token safety. |
| **Gradio UI issues (formatting, text overflow)** | Output is too long or hard to read | ✅ Use `Textbox`, add a line-wrap or max length, and validate layout in Colab |
| **Chatbot resets every time** | No memory between turns | ✅ This is expected unless we add memory — which we can layer in later |
| **Hugging Face Spaces deployment errors** | Missing files or bad requirements.txt | ✅ Include all files: `app.py`, `requirements.txt`, index files, metadata, and test the app locally before uploading |
| **RAG integration breaks** | When we add chunked context, prompt gets too long | ✅ Start simple, test with 1 chunk, and truncate carefully with `tokenizer.encode(..., truncation=True)` when we get there |

---

### 🔒 Best Practices (Now)

- ✅ **Use Colab to test locally** before pushing to Hugging Face Spaces
- ✅ **Pick a small model first** (`flan-t5-base`, `mistral-7b` if GPU available)
- ✅ **Start with a single-turn chatbot**
- ✅ **Save your `.env` token in your Colab securely**
- ✅ **Install dependencies cleanly** (`!pip install gradio transformers`)




## Install and Import Dependencies

In [None]:
!pip install -q gradio transformers dotenv huggingface_hub

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m46.9/46.9 MB[0m [31m9.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m322.2/322.2 kB[0m [31m11.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m95.2/95.2 kB[0m [31m5.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m11.4/11.4 MB[0m [31m13.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m72.0/72.0 kB[0m [31m4.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.3/62.3 kB[0m [31m3.1 MB/s[0m eta [36m0:00:00[0m
[?25h

##  Load the Model and Set Up Chat Function

### ✅ Key Code Blocks

#### 1. **Model Setup**
```python
chatbot = pipeline("text2text-generation", model="google/flan-t5-base")
```
- **What it does:** Loads an instruction-tuned model (FLAN-T5) that can generate text responses.
- **Why it matters:** The core intelligence of your chatbot. You can swap this out for other models later.

---

#### 2. **Prompt Engineering**
```python
prompt = f"Answer the following question clearly and helpfully:\n\n{message}"
```
- **What it does:** Gives the model context to shape its response.
- **Why it matters:** This is where you can guide behavior, tone, focus, and even safety. This is one of the most important skills when building chatbots.

---

#### 3. **Gradio UI**
```python
demo = gr.Interface(
    fn=respond,
    inputs=gr.Textbox(...),
    outputs="text",
    title="💬 Simple Small Business Chatbot"
)
```
- **What it does:** Creates a front-end interface with a textbox input and text output.
- **Why it matters:** Gradio handles the interaction between user and model, so you can focus on the model’s logic.

---

### 💡 What Is an **Instruction-Tuned Model**?

An **instruction-tuned model** is a language model that has been **trained to follow human instructions**.

#### 🔍 Normal vs. Instruction-Tuned:

| Type               | Behavior                                                                 |
|--------------------|--------------------------------------------------------------------------|
| **Standard model** | Predicts the next word in a sentence (language modeling objective)       |
| **Instruction-tuned** | Learns to follow **explicit human instructions** and complete tasks        |

#### 🧠 Example:

If you ask a **standard model**:
> "What are economic indicators?"

It might just continue the sentence or generate random facts.

But if you ask an **instruction-tuned model**:
> "Explain economic indicators in simple terms."

It knows you're asking for a **helpful explanation** and will give a more structured, user-focused answer.

#### ✅ FLAN-T5 is instruction-tuned.
That’s why your prompt like:
```python
"Answer the following question clearly and helpfully:\n\n{message}"
```
actually works!


### 📌 What to Learn Next

| Topic                        | Why It’s Important                                           |
|-----------------------------|---------------------------------------------------------------|
| **Prompt Engineering**       | Crafting effective instructions is key to good responses.     |
| **State and History Handling** | Allows multi-turn conversations that feel natural.           |
| **System Prompt Design**     | Sets tone, focus, safety boundaries.                         |
| **Model Selection**          | Smaller = faster, larger = smarter (usually).                |
| **RAG Integration**          | Adds real-world knowledge your model wasn’t trained on.       |
| **Deployment Options**       | Hugging Face Spaces, Flask, FastAPI, etc., for 24/7 hosting.  |



In [None]:
from huggingface_hub import login
from dotenv import load_dotenv
import os
from transformers import pipeline
import gradio as gr
import warnings
warnings.filterwarnings("ignore", message=".*The secret.*")

# Load the .env file
load_dotenv("/content/HUGGINGFACE_HUB_TOKEN.env")
# Login using the token
login(token=os.environ["HUGGINGFACE_HUB_TOKEN"])

# Load the model
chatbot = pipeline("text2text-generation", model="google/flan-t5-base")

# Define a basic response function
def respond(message):
    prompt = f"Answer the following question clearly and helpfully:\n\n{message}"
    response = chatbot(prompt, max_new_tokens=150)[0]["generated_text"]
    return response

# Create the Gradio interface
demo = gr.Interface(
    fn=respond,
    inputs=gr.Textbox(lines=2, placeholder="Ask a question..."),
    outputs="text",
    title="💬 Simple Small Business Chatbot"
)

# Launch the chatbot
demo.launch(share=True)


Device set to use cpu


Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://a95020aae5cd43e22d.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




## Model 1 | Prompt 1

### Updated Prompt with a Business Forecasting Persona

In [None]:
# Load the model
chatbot = pipeline("text2text-generation", model="google/flan-t5-base")

# Store chat history for debugging
chat_log = []

# Define a smarter response function
def respond(message):
    prompt = f"""You are a helpful assistant for small business owners.

Your job is to answer questions related to:
- Business forecasting
- Economic indicators (like inflation, unemployment, sales data)
- Improving cash flow
- Making data-driven decisions

Always explain in plain English and give clear, helpful advice.

Question: {message}
Answer:"""

    response = chatbot(prompt, max_new_tokens=200)[0]["generated_text"].strip()

    # Log this interaction
    chat_log.append((message, response))

    # Print entire chat history (for you, not the user)
    print("\n🧾 Chat Log:")
    for i, (q, a) in enumerate(chat_log, 1):
        print(f"{i}. Q: {q}\n   A: {a}\n{'-'*50}")

    return response

# Create the Gradio interface
demo = gr.Interface(
    fn=respond,
    inputs=gr.Textbox(lines=2, placeholder="Ask about forecasting, economics, or small business strategy..."),
    outputs="text",
    title="📊 Small Business Forecasting Assistant"
)

# Launch the chatbot
demo.launch(share=True)


Device set to use cpu


Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://3975d6d24f5552a66a.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




### Chat Logs

In [None]:
print("🧾 Chat Log\n" + "=" * 40)
for i, (user_msg, bot_msg) in enumerate(chat_log, start=1):
    print(f"\n🧑‍💼 User {i}: {user_msg}")
    print(f"  🤖 Bot  {i}: {bot_msg}")
    # print("-" * 40)

🧾 Chat Log

🧑‍💼 User 1: hi can you help me optimize my business?

  🤖 Bot  1: Improving cash flow

🧑‍💼 User 2: hi can you help me optimize my business?
What else would you recommend?

  🤖 Bot  2: Improving cash flow

🧑‍💼 User 3: 
What else would you recommend?

  🤖 Bot  3: Improving cash flow

🧑‍💼 User 4: what are economic indicators?

  🤖 Bot  4: inflation, unemployment, sales data

🧑‍💼 User 5: what do they tell me about my business?

  🤖 Bot  5: Making data-driven decisions

🧑‍💼 User 6: can you give me a more elaborate answer?

  🤖 Bot  6: Making data-driven decisions


# Next Steps

### 🧠 Option 1: **Fine-Tune the Chat Prompt First**
This gives you **stronger answers** *immediately* and helps validate your use case before adding complexity.

#### Why it’s valuable:
- Clarifies your assistant’s **tone, role, and style**
- Prevents vague or generic answers
- Easier to test how well the base model works *before* adding RAG

#### What we can do:
- Add a **clear system-style prompt** (like: "You are a helpful forecasting assistant for small business owners.")
- Preload common topics (economic indicators, forecasting accuracy, etc.)
- Add examples of ideal responses as in-context few-shot learning

✅ **Recommended next if you want better answers right now.**

---

### 🔍 Option 2: **Add RAG (Retrieval-Augmented Generation)**
This helps you **pull in real knowledge** from your blog and business content.

#### Why it’s valuable:
- Gives the model *real substance* from your blog
- Helps it answer questions that aren't in the base model's training
- Makes your chatbot **truly yours**

#### What we can do:
- Embed the question
- Retrieve top-k chunks from your blog using FAISS
- Concatenate the chunks and send them as context with the question

✅ **Recommended next if you're ready to scale to real business Q&A.**

---

### 🚀 Best Practice Order:
1. ✅ **Fine-tune your prompt** to get the tone, role, and helpfulness right  
2. ✅ Then **add RAG** to feed it real data for more grounded answers  
3. ✅ Later: add multi-turn memory, logging, and deployment (e.g., Gradio Spaces)


## ✅ Step 1: Create a Strong System Prompt

Adding **guardrails through a strong system prompt** is **one of the most effective ways** to:

✅ Keep the chatbot focused  
✅ Avoid wandering into unsafe or off-topic territory  
✅ Increase user trust and experience  

---

### 🔒 Why Guardrails Matter

Even small models like `flan-t5-base` will try to answer **anything** you throw at them unless you tell them:

> ❌ "That's not within my expertise."  
> ✅ "Let me help you with something related to forecasting or business planning."

And in public-facing tools, this **prevents confusion**, **content liability**, and **weird answers** that can harm credibility.

---

### ✅ How to Add These Guardrails (Prompt Snippet)

Here's how you might extend your system prompt:

```txt
You are a helpful assistant for small business owners.

Your expertise includes:
- Cash flow forecasting
- Economic indicators
- Business decision-making
- Data-driven growth strategies

Only answer questions related to these topics. If a question is outside your domain (e.g. politics, health, personal advice), politely respond:

"I'm trained to assist with business forecasting and strategy. Let me know how I can help in that area!"

Here are a few examples of questions you can help with:
Q: What economic indicators matter most for retail?
A: ...
...
```

---

### 🛡️ Bonus Techniques You Can Add Later

- **Classify user questions** before answering (zero-shot or intent classifier)
- **Reject or redirect** with: `"I'm only able to assist with..."`  
- **Log off-topic queries** to improve your examples over time

---

Would you like me to help you update your current prompt with this restriction built in? We can also add a fallback message for out-of-scope topics.

### Model 1 | Prompt 2

In [None]:
# Load the model
chatbot = pipeline("text2text-generation", model="google/flan-t5-base")

# Store chat history for debugging
chat_log = []

# Define a smarter response function
def respond(message):
    prompt = f"""
You are a helpful assistant for small business owners.

Your job is to answer questions related to:
- Business forecasting
- Economic indicators (like inflation, unemployment, sales data)
- Improving cash flow
- Making data-driven decisions

Always explain in plain English and give clear, helpful advice.

Examples:
Q: What are economic indicators?
A: Economic indicators like inflation, consumer confidence, and employment rates help you anticipate changes in customer spending and business risks.

Q: How can I improve cash flow?
A: Track expenses weekly, forecast sales 30 days out, and reduce inventory waste. These steps can increase your available cash.

Now answer this question:

Q: {message}
A:"""

    response = chatbot(prompt, max_new_tokens=200)[0]["generated_text"].strip()

    # Log this interaction
    chat_log.append((message, response))

    # Print entire chat history (for debugging in notebook)
    print("\n🧾 Chat Log:")
    for i, (q, a) in enumerate(chat_log, 1):
        print(f"{i}. Q: {q}\n   A: {a}\n{'-'*50}")

    return response

# Create the Gradio interface
demo = gr.Interface(
    fn=respond,
    inputs=gr.Textbox(lines=2, placeholder="Ask about forecasting, economics, or small business strategy..."),
    outputs="text",
    title="📊 Small Business Forecasting Assistant"
)

# Launch the chatbot
demo.launch(share=True)


Device set to use cpu


Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://5ccc952365652dd8be.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




#### Chat Logs

In [None]:
print("🧾 Chat Log\n" + "=" * 40)
for i, (user_msg, bot_msg) in enumerate(chat_log, start=1):
    print(f"\n🧑‍💼 User {i}: {user_msg}")
    print(f"  🤖 Bot  {i}: {bot_msg}")
    # print("-" * 40)

🧾 Chat Log

🧑‍💼 User 1: hi can you help me optimize by small business?
  🤖 Bot  1: Improving cash flow

🧑‍💼 User 2: how do i improve my cash flow?
  🤖 Bot  2: Track expenses weekly, forecast sales 30 days out, and reduce inventory waste. These steps can increase your available cash.

🧑‍💼 User 3: What about forecasting? an i iprve my cash flow with more accurate  cash flow forecasting?
  🤖 Bot  3: Improving cash flow

🧑‍💼 User 4: Can you teach me about econimic indicators?
  🤖 Bot  4: Economic indicators like inflation, unemployment, and sales data help you anticipate changes in customer spending and business risks.


### Model 2 | Try a Bigger Instruction-Tuned Model



### 🔍 What We're Seeing in the Logs:

| Prompt | Bot Response | Feedback |
|-------|---------------|---------|
| "hi can you help me optimize by small business?" | "Improving cash flow" | 🟥 **Too short and generic** — looks like it's picking a single keyword. |
| "how do i improve my cash flow?" | ✅ Full, helpful response | ✅ Great! The model picked up on your in-context example. |
| "What about forecasting? Can I improve..." | "Improving cash flow" | 🟥 Weak again — it dropped context and defaulted to short text. |
| "Can you teach me about economic indicators?" | ✅ Great explanation | ✅ Another success — matches few-shot example. |

---

### 🧠 Likely Cause: Model Limitation

You're using **`flan-t5-base`** — a fantastic small model for quick prototyping, but:
- It's **small (~250M parameters)** and may struggle with nuance or retaining context across varying phrasing.
- It does well with **questions that match your examples**, but falters when you phrase things differently.

---

### ✅ Recommendation: Try a Bigger Instruction-Tuned Model

To really test your prompt design and get better completions, try one of these:

| Model | Size | Why Use It |
|-------|------|------------|
| `google/flan-t5-large` | ~780M | Same family, better quality, works in Colab with GPU |
| `tiiuae/falcon-7b-instruct` | 7B | Very capable, good instruction tuning, larger GPU required |
| `mistralai/Mistral-7B-Instruct-v0.2` | 7B | Excellent overall performance, Hugging Face compatible |

You can try switching to:

```python
chatbot = pipeline("text2text-generation", model="google/flan-t5-large")
```

---

### ⚠️ Tip: Bigger Models Need More Time and GPU

Try `flan-t5-large` first — it's a safe step up and works well in Colab.



In [None]:
# Load the model
chatbot = pipeline("text2text-generation", model="google/flan-t5-large")

config.json:   0%|          | 0.00/662 [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/3.13G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/2.54k [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

Device set to use cpu


In [None]:
# Store chat history for debugging
chat_log = []

# Define a smarter response function
def respond(message):
    prompt = f"""
You are a helpful assistant for small business owners.

Your job is to answer questions related to:
- Business forecasting
- Economic indicators (like inflation, unemployment, sales data)
- Improving cash flow
- Making data-driven decisions

Always explain in plain English and give clear, helpful advice.

Examples:
Q: What are economic indicators?
A: Economic indicators like inflation, consumer confidence, and employment rates help you anticipate changes in customer spending and business risks.

Q: How can I improve cash flow?
A: Track expenses weekly, forecast sales 30 days out, and reduce inventory waste. These steps can increase your available cash.

Now answer this question:

Q: {message}
A:"""

    response = chatbot(prompt, max_new_tokens=200)[0]["generated_text"].strip()

    # Log this interaction
    chat_log.append((message, response))

    # Print entire chat history (for debugging in notebook)
    print("\n🧾 Chat Log:")
    for i, (q, a) in enumerate(chat_log, 1):
        print(f"{i}. Q: {q}\n   A: {a}\n{'-'*50}")

    return response

# Create the Gradio interface
demo = gr.Interface(
    fn=respond,
    inputs=gr.Textbox(lines=2, placeholder="Ask about forecasting, economics, or small business strategy..."),
    outputs="text",
    title="📊 Small Business Forecasting Assistant"
)

# Launch the chatbot
demo.launch(share=True)


Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://f1f525f5ce25e4c121.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




#### Chat Logs

In [None]:
print("🧾 Chat Log\n" + "=" * 40)
for i, (user_msg, bot_msg) in enumerate(chat_log, start=1):
    print(f"\n🧑‍💼 User {i}: {user_msg}")
    print(f"  🤖 Bot  {i}: {bot_msg}")
    # print("-" * 40)

🧾 Chat Log

🧑‍💼 User 1: hi can you help me optimize by small business?
  🤖 Bot  1: Can you help me optimize by small business?

🧑‍💼 User 2: how do i improve my cash flow?
  🤖 Bot  2: Track expenses weekly, forecast sales 30 days out, and reduce inventory waste. These steps can increase your available cash.

🧑‍💼 User 3: What about forecasting? an i iprve my cash flow with more accurate  cash flow forecasting?
  🤖 Bot  3: Track expenses weekly, forecast sales 30 days out, and reduce inventory waste. These steps can increase your available cash.

🧑‍💼 User 4: Can you teach me about econimic indicators?
  🤖 Bot  4: Economic indicators like inflation, consumer confidence, and employment rates help you anticipate changes in customer spending and business risks.


### Model 2 | Prompt 2 : Extensive Prompt


### 🧠 Key Takeaway for Learning

The most important things to understand here if you're learning how chatbots work:

| 🔍 Concept | 💡 What to Learn |
|-----------|------------------|
| **System Prompt** | Shapes the assistant’s personality, domain, and rules. This is the *heart* of in-context learning. |
| **Prompt Formatting** | Using `Q:` and `A:` makes it easier for the model to follow expected patterns. |
| **Inference Parameters** | `max_new_tokens`, `temperature`, `top_k`, etc., control output length and creativity. |
| **Pipeline Type** | You're using `text2text-generation`, which works well for task-based, formatted outputs. |
| **User Interface** | Gradio is your frontend — it makes testing and deploying fast, easy, and interactive. |


In [None]:
# Store chat history for debugging
chat_log = []

# Define a smarter response function with a detailed system prompt
SYSTEM_PROMPT = """You are a helpful, knowledgeable assistant for small business owners.

Your role is to answer questions related to:
- Business forecasting
- Economic indicators (local, state, national)
- Improving cash flow
- Reducing uncertainty in business decisions
- Data-driven strategy for small businesses

You are trained using content from the Cashflow 4Cast blog, which covers:
- Forecasting accuracy
- Interpreting inflation, unemployment, and sales data
- How local and national economics affect small businesses
- How to reduce forecasting errors

You specialize in:
- Explaining economic indicators like CPI, consumer confidence, and retail sales
- Helping owners interpret forecasting accuracy
- Recommending how to adjust based on business conditions
- Using real-world examples and simple language

📌 Always keep your answers focused on business forecasting, cash flow, or economic strategy.
📌 If a question is outside your domain (e.g. health, politics, personal advice), respond with:

"I'm here to help with forecasting, business planning, and data insights. Let me know how I can assist in that area!"

Answer clearly and concisely in plain English, avoid overly technical jargon, and always aim to educate and guide with a friendly tone.

---

Here are examples of how you answer:

Q: What economic indicators matter most for small business?
A: Key indicators include inflation (CPI), consumer confidence, retail sales, and local employment trends. These help you anticipate changes in customer spending and adjust your cash flow planning accordingly.

Q: How accurate are forecasting models like Prophet or Excel?
A: Traditional tools like Excel rely heavily on past averages. In contrast, machine learning models trained on local data can reduce forecasting errors by 50% or more, especially when accounting for current economic conditions.

Q: What should I do if my forecast is off?
A: Look for patterns — was there a spike in costs? Did a key economic indicator change? Adjust your assumptions and use shorter forecast windows (e.g. 30-day) for better agility.

Q: Can you help me understand my recent drop in sales?
A: Yes! Let’s explore recent local economic shifts — like employment, inflation, or consumer sentiment — and how they may be affecting your customers.

---
Always stay on-topic, helpful, and calm."""

def respond(message):
    # Combine system prompt + user message
    prompt = f"{SYSTEM_PROMPT}\n\nQ: {message}\nA:"

    response = chatbot(prompt, max_new_tokens=300, temperature=0.7)[0]["generated_text"].strip()
    chat_log.append((message, response))

    # Print the chat log
    print("\n" + "="*60 + "\n🧾 Chat Log\n" + "="*60)
    for i, (q, a) in enumerate(chat_log, 1):
        print(f"\n{i}. Q: {q}\n   A: {a}\n{'-'*50}")

    return response

# Create the Gradio interface
demo = gr.Interface(
    fn=respond,
    inputs=gr.Textbox(lines=2, placeholder="Ask about forecasting, economics, or small business strategy..."),
    outputs=gr.Textbox(lines=6, label="Assistant Response"),
    title="📊 Small Business Forecasting Assistant"
)

# Launch the chatbot
demo.launch(share=True)

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://77ec0a54daab550483.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




In [None]:
print("🧾 Chat Log\n" + "=" * 40)
for i, (user_msg, bot_msg) in enumerate(chat_log, start=1):
    print(f"\n🧑‍💼 User {i}: {user_msg}")
    print(f"  🤖 Bot  {i}: {bot_msg}")
    # print("-" * 40)

🧾 Chat Log

🧑‍💼 User 1: hi can you help me optimize by small business?
  🤖 Bot  1: I'm here to help with forecasting, business planning, and data insights. Let me know how I can assist in that area! ---

🧑‍💼 User 2:  how do i improve my cash flow?

  🤖 Bot  2: i'm here to help with forecasting, business planning, and data insights. Let me know how I can assist in that area.


##✅ Step 2: Add In-Context Few-Shot Examples


You're performing what's called **prompt engineering** or **in-context learning**, which is a powerful way to *steer the model* without needing to retrain it.

---

### 🧠 What’s Happening Behind the Scenes

- When you send a message like  
  > “What is CPI and why does it matter?”  
  you're not just sending that alone.

- You're sending the entire crafted prompt:

```plaintext
You are a helpful, friendly assistant who helps small business owners...
[+ examples of great Q&A]
Now answer this new question:
Q: What is CPI and why does it matter?
A:
```

- The model uses the **patterns** in your examples to generate a consistent, on-brand, and topic-aware response.

---

### 🔍 Why This Works So Well

✅ No fine-tuning required  
✅ Super flexible — just update your examples  
✅ You can A/B test prompts easily  
✅ It's cheaper and faster than model training  
✅ Perfect for early-stage prototypes or small business chatbots

---

If you're up for it, we can also:
- Add **topic-aware branching** (e.g. different examples depending on whether it's about inflation or sales)
- Store previous prompts to experiment with **prompt refinement**
- Add a slider for *response length or helpfulness*

💡 You nailed it — and you're thinking like an AI architect now.

There’s a **spectrum** between **in-context learning** and **RAG**, and you’re absolutely right: if you keep feeding the model more and more context manually, you’re slowly building a *manual RAG system* without calling it that.

---

## 🧠 In-Context Learning vs RAG

| Feature | **In-Context Learning** | **RAG (Retrieval-Augmented Generation)** |
|--------|-------------------------|-----------------------------------------|
| 🔧 Setup | Handcrafted prompt with examples | Dynamic retrieval of relevant info |
| 📦 Data | Static and hardcoded | Stored in an index (like FAISS) |
| 📈 Scaling | Manual — hits token limit fast | Automatic — fetches only what’s needed |
| 🔁 Adaptability | Rigid, needs manual updates | Flexible, can grow as your docs grow |
| 💡 Use Case | Small fixed topics | Large, evolving knowledge bases |

---

## ✅ So What Are the Limits of In-Context Learning?

1. **Token Limits**  
   Most models like `flan-t5-base` cap around 512–1024 tokens. You’ll run out of room fast if you add too many examples.

2. **Relevance Weakens**  
   The more you pack in, the harder it is for the model to focus. Unlike RAG, it won’t *retrieve* the best examples — it just reads all of them equally.

3. **Hard to Update**  
   If your blog changes or you want to update one fact, you have to manually rewrite your prompt examples.

---

## 🎯 Where RAG Begins to Shine

RAG steps in **exactly when**:
- You have too many examples to fit in context
- Your content changes often
- You want to scale without prompt clutter

So yes — in-context learning is great for:
> “Small, focused, handcrafted expertise.”

RAG is great for:
> “Scaleable, flexible knowledge bases.”



In [None]:
EXAMPLES = """
Q: What is CPI and why does it matter for my store?
A: CPI stands for Consumer Price Index. It tracks how much prices are rising for everyday goods. If CPI is going up, it means inflation is rising — and your customers may start cutting back on spending. Watching CPI helps you plan for slower sales and adjust your pricing.

Q: How can I tell if my forecasts are accurate?
A: One good way is to look at your forecasting error. If your forecast says you’ll sell $10,000 but you only sell $8,000, your error is 20%. A well-tuned model should get you under 10% error most of the time.

Q: What economic indicators are most useful?
A: Start with local unemployment, consumer spending, and inflation. These show how confident your customers are and how much they can afford. If those numbers shift, your sales probably will too.
"""


## Model 2 | Prompt 3

In [None]:
def respond(message):
    prompt = f"""Q: What are economic indicators?
A: Economic indicators like inflation, consumer confidence, and employment rates help you anticipate changes in customer spending.

Q: How can I improve cash flow?
A: Track expenses weekly, forecast sales 30 days out, and reduce inventory waste.

Q: How do forecasting models work?
A: Traditional tools like Excel rely on past data averages. More advanced models use machine learning and local trends to improve forecast accuracy.

Q: {message}
A:"""

    response = chatbot(prompt, max_new_tokens=250, temperature=0.7)[0]["generated_text"].strip()
    chat_log.append((message, response))

    print("\n" + "="*60 + "\n🧾 Chat Log\n" + "="*60)
    for i, (q, a) in enumerate(chat_log, 1):
        print(f"\n{i}. Q: {q}\n   A: {a}\n{'-'*50}")

    return response

# Create the Gradio interface
demo = gr.Interface(
    fn=respond,
    inputs=gr.Textbox(lines=2, placeholder="Ask about forecasting, economics, or small business strategy..."),
    outputs=gr.Textbox(lines=6, label="Assistant Response"),
    title="📊 Small Business Forecasting Assistant"
)

# Launch the chatbot
demo.launch(share=True)

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://4427f9d365f12538ee.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




In [None]:
print("🧾 Chat Log\n" + "=" * 40)
for i, (user_msg, bot_msg) in enumerate(chat_log, start=1):
    print(f"\n🧑‍💼 User {i}: {user_msg}")
    print(f"  🤖 Bot  {i}: {bot_msg}")
    # print("-" * 40)

🧾 Chat Log

🧑‍💼 User 1: hi can you help me optimize by small business?
  🤖 Bot  1: I'm here to help with forecasting, business planning, and data insights. Let me know how I can assist in that area! ---

🧑‍💼 User 2:  how do i improve my cash flow?

  🤖 Bot  2: i'm here to help with forecasting, business planning, and data insights. Let me know how I can assist in that area.

🧑‍💼 User 3: hi can you help me optimize by small business?
  🤖 Bot  3: Small business is a category of business.

🧑‍💼 User 4: how do i improve my cash flow?
  🤖 Bot  4: Track expenses weekly, forecast sales 30 days out, and reduce inventory waste.

🧑‍💼 User 5: What about forecasting? an i iprve my cash flow with more accurate  cash flow forecasting?
  🤖 Bot  5: Traditional tools like Excel rely on past data averages. More advanced models use machine learning and local trends to improve forecast accuracy.

🧑‍💼 User 6: Can you teach me about econimic indicators?
  🤖 Bot  6: Economic indicators like inflation, consume

# Memory Clean Up & Remove Widgets from Notebook to Save to Github

In [1]:
import json
from google.colab import drive
drive.mount('/content/drive', force_remount=True)

notebook_path = "/content/drive/My Drive/LLM/LLM_054_RAG_CahsFlow4Cast_Chatbot.ipynb"

# Load the notebook JSON
with open(notebook_path, 'r', encoding='utf-8') as f:
    nb = json.load(f)

# 1. Remove widgets from notebook-level metadata
if "widgets" in nb.get("metadata", {}):
    del nb["metadata"]["widgets"]
    print("✅ Removed notebook-level 'widgets' metadata.")

# 2. Remove widgets from each cell's metadata
for i, cell in enumerate(nb.get("cells", [])):
    if "metadata" in cell and "widgets" in cell["metadata"]:
        del cell["metadata"]["widgets"]
        print(f"✅ Removed 'widgets' from cell {i}")

# Save the cleaned notebook
with open(notebook_path, 'w', encoding='utf-8') as f:
    json.dump(nb, f, indent=2)

print("✅ Notebook deeply cleaned. Try uploading to GitHub again.")


Mounted at /content/drive
✅ Notebook deeply cleaned. Try uploading to GitHub again.
