<a href="https://colab.research.google.com/github/micah-shull/RAG-LangChain/blob/main/LC_010_RAG_CustServiceBot_Model_DocOptimzation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Pip Intsall


In [20]:
!pip install --upgrade --quiet \
    langchain \
    langchain-huggingface \
    langchain-openai \
    langchain-community \
    chromadb \
    python-dotenv \
    transformers \
    accelerate \
    sentencepiece

## Load Libs

In [21]:
# 🌿 Environment setup
import os                                 # File paths and OS interaction
from dotenv import load_dotenv            # Load environment variables from .env file
import langchain; print(langchain.__version__)  # Check LangChain version

# 📄 Document loading and preprocessing
from langchain_core.documents import Document                   # Base document type
from langchain_community.document_loaders import TextLoader     # Loads plain text files
from langchain.text_splitter import RecursiveCharacterTextSplitter  # Splits long docs into smaller chunks

# 🔢 Embeddings + vector storage
from langchain_huggingface import HuggingFaceEmbeddings         # HuggingFace embedding model
from langchain.vectorstores import Chroma                       # Persistent vector DB (Chroma)

# 💬 Prompting + output
from langchain_core.prompts import ChatPromptTemplate           # Chat-style prompt templates
from langchain_core.output_parsers import StrOutputParser       # Converts model output to string

# 🔗 Chains / pipelines
from langchain_core.runnables import Runnable, RunnableLambda   # Compose custom pipelines

# 🧠 (Optional) Hugging Face LLM client setup
# from langchain_huggingface import HuggingFaceEndpoint, ChatHuggingFace  # For HF inference API

# 🧾 Pretty printing
import textwrap                         # Format long strings for printing
from pprint import pprint               # Nicely format nested data structures

0.3.25




## 🤖 LLM Selection for RAG Customer Service Chatbot

### 🎯 Use Case

Cashflow4cast’s chatbot provides customer support by answering questions about:

* Business purpose and services
* Pricing and features
* How the platform works
* Differentiators and benefits

The system must deliver concise, accurate, and professional responses using information retrieved from curated `.txt` documents.

---

### 🧠 Model Options

| Model                 | Type                 | Pros                                               | Cons                                          | Use Case Fit               |
| --------------------- | -------------------- | -------------------------------------------------- | --------------------------------------------- | -------------------------- |
| **`gpt-3.5-turbo`**   | GPT                  | Fast, cost-efficient, strong instruction following | May occasionally miss nuance in complex cases | ✅ Recommended              |
| **`gpt-4-turbo`**     | GPT                  | Higher accuracy, better consistency with retrieval | Higher cost and slightly slower               | ✅ Ideal for production     |
| `gpt-4o`              | Multimodal Reasoning | Advanced reasoning, handles tools and files        | Overkill for simple RAG QA; adds latency      | ❌ Not needed for now       |
| Claude, Mistral, etc. | Varies               | Open-source options exist                          | Less aligned with ChatGPT prompt tuning       | Optional exploration later |

---

### ✅ Recommendation

Use the **GPT family** of models—specifically:

* **Dev/Test**: `gpt-3.5-turbo` (cheaper, fast, sufficient for most queries)
* **Prod or Premium**: `gpt-4-turbo` (more reliable for edge cases and nuanced phrasing)




## Set Params


## 📏 Chunk Size & Overlap

### ✅ Why it Works:

* Your docs are **short and structured**, thanks to the optimization work — most sections now fall well within the 200-token range.
* Smaller chunks make the **retriever more precise** and reduce context mixing.
* Overlap of 50 ensures **section continuity** without overloading the vector DB.

### ⚠️ Minor Note:

* You might want to **verify that chunks aren't too small** (i.e., below \~100 tokens), which could dilute embedding quality. If you notice retrieval noise later, consider bumping `CHUNK_SIZE` to 250 or 300.

---

## 🔍 K = 2 for Retriever

### ✅ Why it Works:

* Small `k` = **tight, focused context** for the LLM.
* Prevents hallucinations or conflicting information from adjacent (but irrelevant) chunks.
* Your docs are **distinct and purpose-driven**, so fewer chunks = cleaner answers.

### 💡 Optional Enhancement Later:

* Use `k=2` for general Q\&A, and consider `k=3` or `k=4` only if:

  * You add longer-form docs (e.g., whitepapers, multi-topic blogs)
  * You use rerankers or hybrid search models that can filter relevance more finely

---

## 🤖 Embedding Model

### `all-MiniLM-L6-v2` is:

* Fast and light
* Good for short texts and precise queries
* Well suited to your tight, domain-specific docs

Unless you notice low retrieval precision, there's **no urgent need to upgrade** to a larger embedding model like `bge-large` or `text-embedding-3-small`.

---

## ✅ LLM Setup

```python
LLM_MODEL = ChatOpenAI(
    model_name="gpt-3.5-turbo",
    temperature=0,
    max_tokens=500
)
```

Perfect for now:

* Temperature = 0 → factual, stable
* Max tokens = 500 → good for clear Q\&A; increase only if your responses are getting cut off

---

## ✅ Final Verdict

No changes needed right now. Your config is well tuned for:

* Short, optimized documents
* Fast, cost-efficient querying
* Clean, trustworthy RAG responses




In [30]:
from langchain_openai.chat_models.base import ChatOpenAI

# SET MODEL PARAMS
EMBED_MODEL = "all-MiniLM-L6-v2"
CHUNK_SIZE = 250
CHUNK_OVERLAP = 50
K = 2

# Load token from .env.
load_dotenv("/content/API_KEYS.env", override=True)

LLM_MODEL = ChatOpenAI(
    model_name="gpt-3.5-turbo",  # or "gpt-4-turbo" if budget allows
    temperature=0,  # factual, deterministic
    max_tokens=500  # adjust based on response length
)




## 📄 Document Optimization for RAG

### 🎯 Purpose

Before embedding documents into the RAG system, it is essential to **optimize their structure and content**. Once documents are vectorized, the content is locked into the vector store. This makes it critical to ensure that the first embedding is clean, concise, and structured for efficient retrieval.

---

### ✅ Optimization Goals

* **Improve retrieval precision** by focusing content on factual, answerable information.
* **Reduce token usage and latency** by eliminating fluff and redundant language.
* **Support semantic chunking** by grouping related content and maintaining context within each chunk.
* **Enhance clarity** for the LLM by using structured formatting.

---

### 🧰 Optimization Process

1. **Review existing documents** (repurposed blog articles in `.txt` format).
2. **Restructure content** using:

   * Clear section headers (e.g., `## How It Works`, `## Pricing`)
   * Bullet points for features, steps, and benefits
   * Direct, concise language
3. **Front-load key information** (e.g., pricing, product value, use cases).
4. **Remove irrelevant marketing language** or repetitive phrasing.
5. **Tag or label sections** if needed for traceability in future updates.
6. **Finalize and embed** the cleaned documents into the vector database.



### DOC 1: What If You Could Cut Cash Flow Forecasting Errors by 50%?

---

## ✅ Strengths of This Doc for RAG

* **Clear narrative** about the problem and your solution.
* **Data-rich** comparison between spreadsheet forecasting vs. ML model.
* **Concrete financial examples** that illustrate value.
* **Localized context** (Gainesville economic indicators).

---

## ⚠️ Limitations (for RAG Retrieval)

### 1. **Buried Key Info**

The doc opens like a blog post. The most useful, factual info (the comparison table) is about halfway down. In RAG, **early content has higher retrieval weight** because chunking is typically sequential.

### 2. **Table Format May Not Be Parsed Well**

Tables in plain `.txt` can be lossy:

* Most retrievers don’t treat them as structured data.
* LLMs can misunderstand them unless clearly labeled or rewritten as bullet points.

### 3. **No Headings/Anchors**

Without subheaders (`## Forecasting Accuracy`, `## ML vs. Spreadsheets`), retrievers can't contextually separate sections, which may blend unrelated ideas together.

---

## ✨ Optimization Plan

Here’s how I recommend restructuring this for RAG use:

### 🔹 Add Section Headers

Use markdown-style headers:

```markdown
## Overview
## Why Forecasting Accuracy Matters
## Traditional Forecasting (Excel/Prophet)
## Cashflow4cast Model Comparison
## Real Dollar Impact by Business Type
## Summary Benefits
```

### 🔹 Reformat the Tables into Bullet Points

Turn each row into its own readable list, e.g.:

```markdown
### Traditional Forecasting (Excel or Prophet)

**Small Retail Shop**
- Revenue range: $20,000 – $50,000
- Forecast error: 14%
- Dollar error range: $2,740 – $6,850

**Restaurant or Café**
- Revenue range: $30,000 – $75,000
- Forecast error: 14%
- Dollar error range: $4,110 – $10,275
```

Repeat for the ML model section.

### 🔹 Bring Key Value Proposition to the Top

Consider starting with a “TL;DR” style summary at the very top of the doc like:

```markdown
### 🚀 Summary
- Cut forecasting errors by **50%** using our ML model
- Reduce cash flow risk with **earlier warnings**
- Built for **Gainesville businesses** with local economic data
- Better than spreadsheets, across all major business types
```



### DOC 1 - What If You Could Cut Cash Flow Forecasting Errors by 50%?

In [2]:
original_doc_1 = '''
What If You Could Cut Cash Flow Forecasting Errors by 50%?
Every business lives or dies by its ability to manage cash flow.

Whether it’s covering payroll, restocking inventory, or preparing for a seasonal dip — having reliable numbers makes all the difference. And yet, most small business owners are flying blind with clunky spreadsheets or outdated tools that leave them guessing.

That’s where CashFlow4Cast comes in.

Using advanced machine learning, we help you cut forecasting errors in half — giving you clearer insight, earlier warnings, and greater confidence in your day-to-day decisions.


📉 Forecasting in Uncertain Times
Lately, it’s been harder to plan with confidence. Consumer confidence is down, retail spending has softened, and Gainesville’s unemployment rate is ticking up. Even if your business is steady, uncertainty can sneak in — and shake up your revenue from month to month.

That’s why better forecasting matters. We built a smarter, locally trained machine learning model that helps Gainesville businesses reduce costly forecasting errors — often by 50% or more.

Here’s what that means in real dollars:

🧮 Spreadsheet-Based Forecasting (Excel or Prophet)
Business Type	Revenue Low	Revenue High	Error (%)	Error (Low)	Error (High)
Small Retail Shop	$20,000	$50,000	14%	$2,740	$6,850
Restaurant or Café	$30,000	$75,000	14%	$4,110	$10,275
Personal Services	$10,000	$25,000	14%	$1,370	$3,425
Local Professional Services	$15,000	$40,000	14%	$2,055	$5,480
🤖 Our Local Machine Learning Model
Business Type	Revenue Low	Revenue High	Error (%)	Error (Low)	Error (High)
Small Retail Shop	$20,000	$50,000	7%	$1,370	$3,425
Restaurant or Café	$30,000	$75,000	7%	$2,055	$5,138
Personal Services	$10,000	$25,000	7%	$685	$1,713
Local Professional Services	$15,000	$40,000	7%	$1,028	$2,740
Even small gains in forecasting accuracy can unlock real savings — every single month. Better ordering. Smarter staffing. More confident planning. And peace of mind.

If you're a mid-size business owner looking to get more accurate sales forecasts, streamline cash flow, and make better decisions with less stress — I’d love to hear from you.
'''


In [23]:
CFFC_Overview_ForecastingBenefits = '''

# 🚀 Summary

- Reduce cash flow forecasting errors by **up to 50%**
- Built using a **locally trained machine learning model**
- Built with national and local indicators to reflect your county’s unique economic conditions
- More accurate than spreadsheets across all business types
- Unlock savings, smarter planning, and peace of mind

---

## 💡 Overview

Cashflow4cast helps small and mid-sized businesses improve financial forecasting accuracy using machine learning and economic indicators.

Traditional forecasting tools like Excel or Prophet often result in large error margins. Our model significantly reduces these errors — helping business owners plan with greater confidence and clarity.

---

## 📉 Why Forecasting Accuracy Matters

- Planning is harder than ever due to shifting economic conditions:
  - Declining consumer confidence
  - Softening retail sales
  - Rising unemployment in Gainesville
- Even steady businesses face revenue uncertainty from month to month
- Better forecasting leads to:
  - Smarter inventory and staffing decisions
  - Fewer surprises
  - Better financial control

---

## 📊 Traditional Forecasting (Excel or Prophet)

**Forecast Error: ~14%**

**Small Retail Shop**
- Revenue range: $20,000 – $50,000
- Forecast error: 14%
- Dollar error range: $2,740 – $6,850

**Restaurant or Café**
- Revenue range: $30,000 – $75,000
- Forecast error: 14%
- Dollar error range: $4,110 – $10,275

**Personal Services**
- Revenue range: $10,000 – $25,000
- Forecast error: 14%
- Dollar error range: $1,370 – $3,425

**Local Professional Services**
- Revenue range: $15,000 – $40,000
- Forecast error: 14%
- Dollar error range: $2,055 – $5,480

---

## 🤖 Cashflow4cast ML Model

**Forecast Error: ~7%**

**Small Retail Shop**
- Revenue range: $20,000 – $50,000
- Forecast error: 7%
- Dollar error range: $1,370 – $3,425

**Restaurant or Café**
- Revenue range: $30,000 – $75,000
- Forecast error: 7%
- Dollar error range: $2,055 – $5,138

**Personal Services**
- Revenue range: $10,000 – $25,000
- Forecast error: 7%
- Dollar error range: $685 – $1,713

**Local Professional Services**
- Revenue range: $15,000 – $40,000
- Forecast error: 7%
- Dollar error range: $1,028 – $2,740

---

## 💰 Real-World Impact

Even small gains in forecasting accuracy lead to real savings:

- Better ordering
- Smarter staffing
- More confident planning
- Less stress, more peace of mind

---

## 📞 Let’s Talk

If you're a business owner looking to:
- Get more accurate sales forecasts
- Streamline cash flow
- Make better, faster decisions

**We’d love to hear from you.**
'''

### DOC 2 - 📘 Federal Economic Indicators That Impact Local Businesses

In [3]:
original_doc_2 = '''
📘 Federal Economic Indicators That Impact Gainesville Businesses
1. Inflation (CPI)
Inflation Chart
What It Is:
The Consumer Price Index (CPI) tracks the average change in prices paid by consumers for everyday goods and services — things like groceries, gas, rent, and utilities. It's one of the most widely followed measures of inflation in the U.S.

Why It Matters for Gainesville:
When prices go up, Gainesville customers feel it — especially in a town with a large student population, retirees, and working families. Rising inflation affects what people can afford and how much they’re willing to spend. It also puts pressure on small business owners, who have to balance rising costs of supplies, shipping, and labor with what customers are willing to pay.

💬 Local Insight
Since October 4, 2024, the CPI has increased from 315.56 to 319.77, a 1.33% rise. That may not sound like much, but for many local businesses, that’s more than enough to:

Shrink profit margins
Drive up wholesale and supplier costs
Force hard decisions about pricing and payroll
Even if sales volumes stay steady, your bottom line may feel tighter than it did just a few months ago — and inflation is often the invisible reason why.

2. Consumer Confidence Index
Consumer Confidence Chart
What It Is:
This index measures how people feel about the economy — whether they think it’s a good time to buy, save, or invest. It doesn’t track what people are doing, but how optimistic (or pessimistic) they feel.

Why It Matters for Gainesville:
Even if people have money to spend, uncertainty can make them hesitate. A drop in confidence can lead to:

Fewer customers walking into stores
Smaller average purchases
Slower restaurant nights or service bookings
💬 Local Insight
Since December 1, 2024, the Consumer Confidence Index has dropped from 74.0 to 64.7, a 12.6% decline. That’s a noticeable shift in how people feel about the economy — and it’s the kind of thing that can ripple through a local business even if nothing internal has changed.

For Gainesville businesses, this can show up as:

Customers hesitating before making purchases
Smaller transactions or more price shopping
Uneven sales patterns from week to week
Even if your products or services are solid, consumer anxiety can quietly slow things down. This indicator helps explain why things might feel “off” — even if your store, team, or marketing hasn't changed.

3. Retail Sales (RSAFS)
Retail Sales Chart
What It Is:
This indicator tracks how much money is being spent across the country — in stores, restaurants, and online. It’s a monthly pulse check on consumer activity and buying habits.

Why It Matters for Gainesville:
Retail trends don’t stop at state lines. When national sales dip, Gainesville businesses often feel it too. It’s like a headwind — even if your store is doing everything right, consumer behavior shaped by national trends can start to drag down your numbers.

💬 Local Insight
Since December 1, 2024, national retail sales have slipped from $730,336 million to $722,708 million, a decline of about 1.04%.

That might not sound like much on the surface — but for many local businesses, even small slowdowns can throw off inventory planning, staffing, and cash flow. If your revenue feels like it’s just slightly off, or if customer volume feels unpredictable, this broader trend might be part of the story.

'''

In [24]:
CFFC_EconomicIndicators_Federal = '''

# 📘 Federal Economic Indicators That Impact Local Businesses

Understanding national economic indicators is essential for business owners. These data points influence consumer behavior, pricing, staffing, and inventory decisions — even at the local level. Here are three major indicators to watch and how they can affect your business operations.

---

## 📈 Inflation (CPI)

**What It Is**
The Consumer Price Index (CPI) tracks the average price changes over time for goods and services such as groceries, gas, rent, and utilities. It is a key measure of inflation in the U.S.

**Why It Matters**
Inflation affects both consumers and businesses:
- Customers have less discretionary income
- Prices for supplies, shipping, and labor rise
- Business owners face tighter margins and tougher pricing decisions

**Example Insight**
- CPI rose from **315.56 to 319.77** (a **1.33% increase**) since October 4, 2024
- Even small increases can:
  - Shrink profit margins
  - Drive up supplier and operating costs
  - Pressure staffing or wage budgets

---

## 💬 Consumer Confidence Index

**What It Is**
This index measures how optimistic people feel about the economy — whether they think it’s a good time to buy, save, or invest. It reflects sentiment, not behavior.

**Why It Matters**
Consumer sentiment often drives spending habits:
- Hesitation to make purchases
- Smaller average transactions
- Reduced frequency of visits to stores or service providers

**Example Insight**
- Consumer Confidence Index dropped from **74.0 to 64.7** since December 1, 2024 — a **12.6% decline**
- Impacts may include:
  - Customers delaying purchases or shopping for lower prices
  - Slower sales without any internal change to your business
  - Uneven or unpredictable weekly revenue

---

## 🛍️ Retail Sales (RSAFS)

**What It Is**
This indicator measures monthly retail spending across the U.S., including physical stores, restaurants, and e-commerce. It provides a national snapshot of consumer activity.

**Why It Matters**
National trends shape local behavior:
- Lower national retail spending can create a local "headwind"
- Even well-run businesses may see softened demand or irregular sales

**Example Insight**
- Retail sales dropped from **$730.3B to $722.7B** (a **1.04% decline**) since December 1, 2024
- This can lead to:
  - Over- or under-buying inventory
  - Staffing mismatches
  - Short-term cash flow crunches

---

## 🧠 Summary: Why This Matters for You

Even if your business is stable and well-managed, national economic indicators influence customer behavior, costs, and sales rhythms. Monitoring these metrics helps you:

- Anticipate changes in demand
- Adjust pricing and promotions proactively
- Make smarter decisions around inventory and labor
- Stay ahead of invisible risks affecting your bottom line
'''



### DOC 3 - 📊 Forecasting You Can Trust in Uncertain Times

In [None]:
original_doc_3 = '''

Forecasting You Can Trust in Uncertain Times
We help mid-sized businesses stay ahead of sales volatility and protect their bottom line by cutting forecast errors in half — using powerful machine learning models.

😞 1. Risk: Can I avoid costly surprises — and capitalize on hidden gains?
❌ With Excel or QuickBooks:
Assumes tomorrow will look like yesterday. When demand drops or costs spike, you're the last to know — and the first to lose money. And when sales surge unexpectedly, your forecasts stay flat — so you miss the chance to scale up and cash in.

✅ With ML:
Tracks volatility, promotions, seasonality, and macroeconomic shifts — and alerts you early when things start to move. It learns when your sales tend to rise (not just fall), so you can order more, staff up, and maximize revenue before the opportunity passes.

Why This Matters
📉 Reduce painful surprises — like slow days that wreck your cash flow
📈 Spot hidden upside — like early demand spikes your old system can’t see
⚖️ Balance risk and opportunity — and make smarter, faster decisions
Time Series Forecast - Store 40 ML Forecast - Store 40
📉 Forecasting in Uncertain Times
Lately, it’s been harder to plan with confidence. Consumer confidence is down, retail spending has softened, and Gainesville’s unemployment rate is ticking up. Even if your business is steady, uncertainty can sneak in — and shake up your revenue from month to month.

That’s why better forecasting matters. We built a smarter, locally trained machine learning model that helps Gainesville businesses reduce costly forecasting errors — often by 50% or more.

Here’s what that means in real dollars:

📦2. Preparedness: Will I know what’s coming?
❌ With Excel: Short-term averages break under pressure. You scramble to adjust too late — or miss key turns completely.

✅ With ML: Recognizes trends faster. Captures seasonality, economic shifts, and the impact of major events — automatically.

📈3. Opportunity: Can I scale without chaos?
❌ With Excel: Every new product, promotion, or region adds another layer of complexity — and guesswork.

✅ With ML: Learns from your data to forecast by product, category, or store — and adapts as your business grows.

🕒4. Focus: Why 30-Day Forecasts Deliver More Value
❌ With Excel & Long-Term Models: Longer-term forecasts quickly become unreliable. The farther out you look, the harder it is to trust — especially in a volatile economy.

✅ With ML (and Us): We focus on 30-day rolling forecasts because they’re the sweet spot for balancing accuracy, agility, and impact. Our models are optimized to deliver high-confidence predictions over the next 4 weeks — when your decisions matter most.

🎯 Why it works: You get fast, high-quality forecasts when you need them most — and you’re always one step ahead.
🤝Why Local Matters
We’re based in Gainesville, FL — and we work directly with local businesses who want better forecasting but don’t have time to wrangle complicated models or consultants.

📊 Deliver accurate forecasts with less hassle
📉 Cut your errors by 50% or more
📆 Help you plan inventory, staffing, and cash flow with confidence
🛠️How It Works
We start with a quick, free consultation to make sure your data is a good fit for machine learning forecasting. If it is, we’ll:

Train our model on your sales data
Compare ML and Excel-style forecasts side-by-side
Walk you through the differences and explain what the model sees
If you see strong improvements, we’ll help you choose a plan that fits your needs. If not — no worries. You’ll still walk away with a free benchmark and insights into your sales volatility.
'''


In [25]:
CFFC_Features_UseCases = '''
# 📊 Forecasting You Can Trust in Uncertain Times

We help mid-sized businesses stay ahead of sales volatility and protect their bottom line by cutting forecast errors in half — using powerful machine learning models.

---

## 🚀 Key Benefits at a Glance

- Cut forecasting errors by **50% or more**
- Detect changes in demand **before they impact cash flow**
- Respond to opportunities with **confidence and speed**
- Replace spreadsheets with **smarter, adaptive forecasting**

---

## ⚠️ 1. Risk: Can I Avoid Costly Surprises?

**Traditional Tools (Excel, QuickBooks):**
- Assume tomorrow looks like yesterday
- Miss sudden drops in demand or spikes in costs
- Under-forecast surges, causing missed growth

**ML-Based Forecasting:**
- Tracks volatility, promotions, and macro shifts
- Alerts you early when patterns shift
- Sees both upside and downside risk

**Why It Matters:**
- Reduce painful surprises like slow days that kill cash flow
- Spot early spikes in demand and capture upside
- Make fast, informed decisions under pressure

---

## 📦 2. Preparedness: Will I Know What’s Coming?

**Traditional Tools:**
- Use short-term averages that break under pressure
- React too late to market shifts or economic trends

**ML-Based Forecasting:**
- Detects patterns, seasonality, and macroeconomic events
- Reacts faster than human-built models

**Result:** Stay proactive, not reactive.

---

## 📈 3. Opportunity: Can I Scale Without Chaos?

**Traditional Tools:**
- Each new product or region adds guesswork
- Models don’t adapt as your business evolves

**ML-Based Forecasting:**
- Learns across products, stores, and categories
- Adapts dynamically as your business grows

**Result:** Confidently expand without overwhelming your systems.

---

## 🕒 4. Focus: Why 30-Day Forecasts Deliver More Value

**Traditional Tools:**
- Long-term forecasts lose reliability fast
- Noise increases the farther out you go

**Our ML Model:**
- Focuses on 30-day rolling forecasts
- Optimized for agility and decision-making impact

**Result:** High-confidence, short-term forecasts where it matters most.

---

## 🌐 Why Economic Context Matters

Recent trends (example data as of Q4 2024):
- **Inflation (CPI)** up by 1.33%
- **Consumer confidence** down by 12.6%
- **Retail sales** down by 1.04%

**What this means for you:**
- Reduced consumer spending
- More erratic buying behavior
- More value in accurate, responsive forecasting

---

## 🛠️ How It Works

**Step 1: Free Consultation**
- We assess if your data is a good fit for ML forecasting

**Step 2: Test the Model**
- We train our model on your historical sales data
- You compare ML and Excel-style forecasts side-by-side

**Step 3: Review & Decide**
- We walk you through what the model sees
- If it delivers strong improvements, you choose a plan
- If not, you still leave with a free benchmark and insights

---

## ✅ Summary of What You Get

- Accurate forecasts with **less hassle**
- 50%+ **error reduction**
- Smart planning for **inventory, staffing, and cash flow**
- A partner who works with your data — not against it


'''

### DOC 4 - Consistency That Builds Confidence

In [None]:
original_doc_4 = '''

Consistency That Builds Confidence

When it comes to cash flow forecasting, one-off success is easy to dismiss. But what if you could deliver predictable accuracy across every location — every week?

That’s exactly what our machine learning model did. We tested it across 20+ stores, each with unique demand patterns. And in every case, it consistently cut forecasting error by at least 50%. It Wasn’t Just One Store — It Was Every Store

📉 Error Reduction Across the Board
We compared traditional forecasting (Excel-style models using Prophet) to our ML approach. Here’s how our ML forecasts stacked up against traditional models:

💼 Why Consistency Matters
Forecasting accuracy isn’t just a statistic — it’s the difference between:

📦 Stocking just the right amount of inventory
💵 Paying suppliers and staff on time
📈 Planning with confidence — instead of crossing your fingers
When your forecasts are consistently reliable, your business decisions get better, faster, and less stressful.

That’s more than 50% better performance — store after store after store.

If you're a mid-size business owner looking to get more accurate sales forecasts, streamline cash flow, and make better decisions with less stress — I’d love to hear from you.

'''

In [26]:
CFFC_Proof_MultiStoreAccuracy = '''
# 📊 Consistency That Builds Confidence

Forecasting accuracy isn’t just about one-time wins — it’s about **reliable, repeatable performance** across your business. Cashflow4cast delivers consistent forecasting accuracy, store after store.

---

## 🏪 Proven Across 20+ Locations

We tested our machine learning model across **20+ different store locations**, each with unique demand patterns.

**Result:**
In every case, the ML model **reduced forecasting errors by at least 50%**, outperforming traditional Excel-style models (including Prophet).

---

## 📉 Error Reduction: ML vs. Traditional Forecasting

- **Traditional Tools (Excel / Prophet):**
  - Static, lagging predictions
  - High error rates, especially with demand shifts

- **ML-Based Forecasting:**
  - Adapts to each location’s patterns
  - Maintains accuracy even with volatility

**Outcome:**
> “More than 50% better performance — consistently, across every store.”

---

## 💼 Why Consistency Matters

Reliable forecasts are a competitive advantage:

- 📦 Stock the right inventory at the right time
- 💵 Pay suppliers and staff without stress
- 📈 Plan confidently, without guessing

Inconsistent forecasts create chaos. Consistent accuracy builds trust in your numbers — and frees you to lead, not react.

---

## 🎯 Summary

- Tested across **multiple business units**
- Achieved **50%+ error reduction** in every case
- Enables smarter decisions across the organization
- Designed for businesses that operate **beyond a single store or product line**

If you're a growing business looking for forecasting you can count on — across every location — let's talk.
'''

### DOC 5 - Pricing

In [None]:
original_doc_5 = '''

Pricing

💰 Pricing & Service Packages
We offer three tiers based on your needs:

📊 Basic Plan
Best For: Small businesses new to forecasting

Features: Monthly forecasts, category-level predictions, Excel/CSV reports

$199/mo

🚀 Advanced Plan
Best For: Growing teams needing more detail

Features: Weekly forecasts, SKU-level insights, promotion impact

$499/mo

💎 Premium Plan
Best For: Businesses needing real-time insight

Features: Daily forecasts, dashboards, economic indicators, 1-on-1 consulting

$999/mo

📣 Ready to See If Machine Learning Forecasting Is Right for You?
Our ML model learns from your actual sales history to spot patterns and trends a spreadsheet can’t. It works with detailed data — by product, category, or region — and picks up on subtle relationships between items that influence high or low sales.

💡 Want to forecast which products will rise or fall next month?
💡 Curious how a promotion in one category affects others?
ML can handle it — and it gets better the more data you have.

We also offer custom data analysis and visualizations if you want to better understand your data before we build your forecast. Whether it’s modeling cash flow, sales trends, or category/regional performance, we can surface insights to guide your decisions.

🧪 Free Fit Assessment
We offer a free consultation to see if we’re a good fit. Not every business has the right data for ML forecasting — and that’s OK.

You provide a sample of your recent sales history, and we’ll:

Train our ML model on your data
Forecast the last 2–3 months using ML and Excel-style models
Compare the results side-by-side
Walk through what the model sees — and how it helps you
If the results show a strong improvement, great!
If not, you’ll still walk away with a valuable forecasting benchmark — no strings attached.

If you're a mid-size business owner looking to get more accurate sales forecasts, streamline cash flow, and make better decisions with less stress — I’d love to hear from you.

'''

In [27]:
CFFC_Pricing_PlansAndFit = '''
# 💰 Pricing & Service Packages

We offer three pricing tiers to match your business needs and forecasting complexity.

---

## 📊 Basic Plan – $199/mo

**Best for:** Small businesses new to forecasting

**Includes:**
- Monthly forecasts
- Category-level predictions
- Excel/CSV downloadable reports

---

## 🚀 Advanced Plan – $499/mo

**Best for:** Growing teams needing more detail and flexibility

**Includes:**
- Weekly forecasts
- SKU-level insights
- Promotion impact tracking

---

## 💎 Premium Plan – $999/mo

**Best for:** Businesses that require real-time insight and support

**Includes:**
- Daily forecasts
- Interactive dashboards
- Built-in economic indicators
- 1-on-1 expert consulting

---

## 🧠 What Machine Learning Forecasting Can Do

Our ML model learns from your actual sales history — picking up on patterns spreadsheets miss.

**Use cases include:**
- Forecasting product performance for upcoming months
- Understanding how promotions affect other categories
- Generating insights by product, category, or region

The more historical data you provide, the better the model performs.

---

## 🧪 Free Fit Assessment (No Obligation)

We offer a **free consultation** to assess if your data is a good fit for ML forecasting.

**How it works:**
1. You send us a sample of your recent sales history
2. We train our ML model and generate 2–3 months of forecasts
3. You compare the ML forecast to a traditional Excel-style model
4. We explain what the model sees — and how it helps you

**If it works:** We’ll recommend the best plan
**If not:** You still get a free benchmark and insight report — no strings attached

---

## ✅ Summary

- Plans range from **$199–$999/month**
- Designed for businesses of all sizes and forecasting maturity
- Free fit assessment ensures ML is right for your business
- We help you streamline cash flow and forecast with confidence
'''



## 🧠 Rename Doc Titles?

When using `.txt` filenames as **document metadata or chunk labels**, clear titles help:

* **Improve traceability** (retrieval source display, debugging)
* **Enhance context** if you expose `doc.source` in responses
* **Support filtering or re-indexing** in future updates
* Reduce confusion if you add many more docs later

---

## ✅ Suggested Naming Convention

Use a consistent, functional format:

```text
CFFC_<Topic>_<Purpose>.txt
```

---

## 🏷️ Recommended New Filenames

| Old Filename                                                           | Suggested New Filename                                                        |
| ---------------------------------------------------------------------- | ----------------------------------------------------------------------------- |
| `CFFC_What If You Could Cut Cash Flow Forecasting Errors by 50%?`      | `CFFC_Overview_ForecastingBenefits.txt`                                       |
| `CFFC_Federal Economic Indicators That Impact Gainesville Businesses`  | `CFFC_EconomicIndicators_Federal.txt`                                         |
| `CFFC_Forecasting You Can Trust in Uncertain Times`                    | `CFFC_Features_UseCases.txt`                                                  |
| `CFFC_Gainesville Economic Indicators That Matter to Local Businesses` | `CFFC_EconomicIndicators_LocalTemplate.txt` (keep for future regionalization) |
| `CFFC_Consistency That Builds Confidence`                              | `CFFC_Proof_MultiStoreAccuracy.txt`                                           |
| `CFFC_Pricing`                                                         | `CFFC_Pricing_PlansAndFit.txt`                                                |




In [28]:
# Define your document path
docs_path = "/content/CFFC_RAG_DOCS"
os.makedirs(docs_path, exist_ok=True)  # Create folder if it doesn't exist

documents = {
    "CFFC_Overview_ForecastingBenefits.txt": CFFC_Overview_ForecastingBenefits,
    "CFFC_EconomicIndicators_Federal.txt": CFFC_EconomicIndicators_Federal,
    "CFFC_Features_UseCases.txt": CFFC_Features_UseCases,
    "CFFC_Proof_MultiStoreAccuracy.txt": CFFC_Proof_MultiStoreAccuracy,
    "CFFC_Pricing_PlansAndFit.txt": CFFC_Pricing_PlansAndFit
}

# Write each file into the directory
for filename, content in documents.items():
    with open(os.path.join(docs_path, filename), "w", encoding="utf-8") as f:
        f.write(content.strip())

## Load Docs - Clean & Chunk

In [31]:
# Path to your documents
docs_path = "/content/CFFC_RAG_DOCS"

# Step 1: Load all .txt files in the folder
raw_documents = []
for filename in os.listdir(docs_path):
    if filename.endswith(".txt"):
        file_path = os.path.join(docs_path, filename)
        loader = TextLoader(file_path, encoding="utf-8")
        docs = loader.load()
        for doc in docs:
            doc.metadata["source"] = filename # reserve source filenames for tracing
        raw_documents.extend(docs)

print(f"Loaded {len(raw_documents)} documents.")

# Step 2 (optional): Clean up newlines and extra whitespace
def clean_doc(doc: Document) -> Document:
    cleaned = " ".join(doc.page_content.split())  # Removes newlines & extra spaces
    return Document(page_content=cleaned, metadata=doc.metadata)

cleaned_documents = [clean_doc(doc) for doc in raw_documents]

# Step 3: Split documents into chunks
splitter = RecursiveCharacterTextSplitter(
    chunk_size=CHUNK_SIZE,
    chunk_overlap=CHUNK_OVERLAP
)

chunked_documents = splitter.split_documents(cleaned_documents)

# Check average chunk size (debug aid)
avg_length = sum(len(doc.page_content) for doc in chunked_documents) / len(chunked_documents)
print(f"📏 Average chunk length: {avg_length:.1f} characters")


print(f"Split into {len(chunked_documents)} total chunks.")

# Preview the first 5 chunks
print(f"Showing first 5 of {len(chunked_documents)} chunks:\n")

for i, doc in enumerate(chunked_documents[:5]):
    print(f"--- Chunk {i+1} ---")
    print(f"Source: {doc.metadata.get('source', 'N/A')}\n")
    print(textwrap.fill(doc.page_content[:500], width=100))  # limit preview to 500 characters
    print("\n")

Loaded 5 documents.
📏 Average chunk length: 238.5 characters
Split into 60 total chunks.
Showing first 5 of 60 chunks:

--- Chunk 1 ---
Source: CFFC_Features_UseCases.txt

# 📊 Forecasting You Can Trust in Uncertain Times We help mid-sized businesses stay ahead of sales
volatility and protect their bottom line by cutting forecast errors in half — using powerful machine
learning models. --- ## 🚀 Key Benefits at a Glance


--- Chunk 2 ---
Source: CFFC_Features_UseCases.txt

models. --- ## 🚀 Key Benefits at a Glance - Cut forecasting errors by **50% or more** - Detect
changes in demand **before they impact cash flow** - Respond to opportunities with **confidence and
speed** - Replace spreadsheets with **smarter,


--- Chunk 3 ---
Source: CFFC_Features_UseCases.txt

speed** - Replace spreadsheets with **smarter, adaptive forecasting** --- ## ⚠️ 1. Risk: Can I Avoid
Costly Surprises? **Traditional Tools (Excel, QuickBooks):** - Assume tomorrow looks like yesterday
- Miss sudden drops in dema

## Embeddings

In [32]:
# Step 1: Set up Hugging Face embedding model
embedding_model = HuggingFaceEmbeddings(model_name=EMBED_MODEL)

# Step 2: Set up Chroma with persistence
persist_dir = "chroma_db"

vectorstore = Chroma.from_documents(
    documents=chunked_documents,
    embedding=embedding_model,
    persist_directory=persist_dir
)

print(f"✅ Stored {len(chunked_documents)} chunks in Chroma at '{persist_dir}'")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.5k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

✅ Stored 60 chunks in Chroma at 'chroma_db'


## Retriever & Prompt

In [33]:
retriever = vectorstore.as_retriever(search_kwargs={"k": K})

# promtp template
prompt_template = ChatPromptTemplate.from_template("""
You are a customer support assistant for **Cashflow4cast**, a platform that helps businesses improve cash flow forecasting using machine learning and economic indicators.

Answer the user's question using only the information provided in the business documents below.

Instructions:
- If the answer is not in the context, respond with: "I'm sorry, I don't have that information based on the current documentation."
- Be clear, concise, and professional.
- Use bullet points or short lists where appropriate (e.g., pricing, features).
- Do not include personal opinions or make assumptions.
- Do not mention services, features, or pricing unless explicitly included in the provided context.

Context:
{context}

Question:
{question}

Answer:
""")


## RAG Chain

In [35]:
# Define RAG chain
rag_chain = (
    RunnableLambda(lambda d: {
        "question": d["question"],
        "docs": retriever.invoke(d["question"])
    })
    # | RunnableLambda(lambda d: {
    #     "context": "\n\n".join([doc.page_content for doc in d["docs"]]),
    #     "question": d["question"]
    # })

    | RunnableLambda(lambda d: {
    "context": "\n\n".join(
        [f"[Source: {doc.metadata.get('source', 'N/A')}]\n{doc.page_content}" for doc in d["docs"]]
    ),
    "question": d["question"]
    })

    | prompt_template
    | LLM_MODEL
    | StrOutputParser()
)

# Invoke RAG
response = rag_chain.invoke({
    "question": "What are the recent economic indicators that affect local businesses?"
})

# Print response nicely
import textwrap
print("\n" + textwrap.fill(response, width=100))



- Inflation (CPI) - Consumer behavior - Pricing - Staffing - Inventory decisions


What you're seeing is a classic case of the LLM **strictly following the prompt’s instructions** (especially with `temperature=0`) and being a little too “robotic” or clipped. That’s actually a good sign — it means your RAG setup and guardrails are working.

But yes, you’re right: this answer could be **more natural and conversational** while still being concise and factual.

---

## 🧠 Response too Blunt

The model was told to:

* Be concise ✅
* Avoid assumptions ✅
* Only use what's in the context ✅

So it likely defaulted to listing keywords from your economic indicators section — not wrong, but not warm or clear.

---

## ✅ How to Fix It (Without Breaking the Rules)

We can **nudge the tone** a bit in your prompt with a small tweak:

### ✍️ Suggested Prompt Tweak

Change this line:

> * Be clear, concise, and professional.

To:

> * Be clear, concise, and professional, but use natural language — answer like you're explaining to a business owner, not a computer.

This encourages full-sentence answers while keeping things factual and brand-aligned.

---

### 🧾 Example Before & After

**Before (your output):**

> * Inflation (CPI)
> * Consumer behavior
> * Pricing
> * Staffing
> * Inventory decisions

**After (with new tone prompt):**

> Recent economic indicators affecting local businesses include rising inflation (CPI), declining consumer confidence, and lower retail sales. These trends impact customer spending habits, inventory planning, staffing decisions, and overall cash flow — even for otherwise stable businesses.

---

## ✅ Summary

* Keep your strict fallback logic (for accuracy)
* Add **tone coaching** for friendly, complete sentences
* Test a few outputs after adding the phrase:

  > "…answer like you're explaining to a business owner, not a computer."



In [36]:
# promtp template
prompt_template = ChatPromptTemplate.from_template("""
You are a customer support assistant for **Cashflow4cast**, a platform that helps businesses improve cash flow forecasting using machine learning and economic indicators.

Answer the user's question using only the information provided in the business documents below.

Instructions:
- If the answer is not in the context, respond with: "I'm sorry, I don't have that information based on the current documentation."
- Be clear, concise, and professional, but use natural language — answer like you're explaining to a business owner, not a computer.
- Use bullet points or short lists where appropriate (e.g., pricing, features).
- Do not include personal opinions or make assumptions.
- Do not mention services, features, or pricing unless explicitly included in the provided context.

Context:
{context}

Question:
{question}

Answer:
""")

# Define RAG chain
rag_chain = (
    RunnableLambda(lambda d: {
        "question": d["question"],
        "docs": retriever.invoke(d["question"])
    })

    | RunnableLambda(lambda d: {
    "context": "\n\n".join(
        [f"[Source: {doc.metadata.get('source', 'N/A')}]\n{doc.page_content}" for doc in d["docs"]]
    ),
    "question": d["question"]
    })

    | prompt_template
    | LLM_MODEL
    | StrOutputParser()
)

# Invoke RAG
response = rag_chain.invoke({
    "question": "What are the recent economic indicators that affect local businesses?"
})

# Print response nicely
import textwrap
print("\n" + textwrap.fill(response, width=100))


- Inflation (CPI) - Consumer behavior - Pricing - Staffing - Inventory decisions


### Increase Tempearture

That’s a **great improvement** — it’s clearer, more natural, and structured. The model is now:

* Using **full sentences** in a list format ✅
* Giving brief **definitions** of each indicator ✅
* Providing **business-relevant framing** (without overexplaining) ✅

---

### 🧠 Final Polish Suggestions (Optional)

If you want to *nudge it even further* into a customer-friendly tone, you could tweak your prompt slightly to suggest a purpose:

#### 🔧 Add to Prompt Instructions:

> * When listing economic indicators, briefly explain why each one matters to local businesses.

This tells the model to not just define terms, but tie them back to the use case (e.g., “Unemployment affects consumer spending,” etc.).

---

### 📌 Summary

Your current combo of:

* `temperature = 0.3`
* Refined prompt with tone coaching
* Well-structured RAG context

...is now **functionally production-ready**.



In [37]:
LLM_MODEL = ChatOpenAI(
    model_name="gpt-3.5-turbo",  # or "gpt-4-turbo" if budget allows
    temperature=0.3,  # factual, deterministic
    max_tokens=500  # adjust based on response length
)

# promtp template
prompt_template = ChatPromptTemplate.from_template("""
You are a customer support assistant for **Cashflow4cast**, a platform that helps businesses improve cash flow forecasting using machine learning and economic indicators.

Answer the user's question using only the information provided in the business documents below.

Instructions:
- If the answer is not in the context, respond with: "I'm sorry, I don't have that information based on the current documentation."
- Be clear, concise, and professional, but use natural language — answer like you're explaining to a business owner, not a computer.
- Use bullet points or short lists where appropriate (e.g., pricing, features).
- Do not include personal opinions or make assumptions.
- Do not mention services, features, or pricing unless explicitly included in the provided context.

Context:
{context}

Question:
{question}

Answer:
""")

# Define RAG chain
rag_chain = (
    RunnableLambda(lambda d: {
        "question": d["question"],
        "docs": retriever.invoke(d["question"])
    })

    | RunnableLambda(lambda d: {
    "context": "\n\n".join(
        [f"[Source: {doc.metadata.get('source', 'N/A')}]\n{doc.page_content}" for doc in d["docs"]]
    ),
    "question": d["question"]
    })

    | prompt_template
    | LLM_MODEL
    | StrOutputParser()
)

# Invoke RAG
response = rag_chain.invoke({
    "question": "What are the recent economic indicators that affect local businesses?"
})

# Print response nicely
import textwrap
print("\n" + textwrap.fill(response, width=100))


- **Inflation (CPI)**: The Consumer Price Index (CPI) tracks average price changes. - **Unemployment
Rate**: Indicates the percentage of people actively seeking employment. - **GDP Growth**: Measures
the economic output of a country.


# TEST QUESTIONS

In [38]:
# Define test questions
test_questions = [
    "What does Cashflow4cast actually do?",
    "How does this forecasting service work?",
    "Is this based on my actual sales data or general trends?"
]

# Function to run and display answers
def run_rag_tests(questions, rag_chain, wrap_width=100):
    for i, question in enumerate(questions, 1):
        print(f"\n🔹 Test {i}: {question}\n{'-' * wrap_width}")
        response = rag_chain.invoke({"question": question})
        print(textwrap.fill(response, width=wrap_width))
        print("\n" + "=" * wrap_width)

# Run the first three tests
run_rag_tests(test_questions, rag_chain)



🔹 Test 1: What does Cashflow4cast actually do?
----------------------------------------------------------------------------------------------------
- Cashflow4cast helps small and mid-sized businesses improve financial forecasting accuracy - It
provides more accurate forecasting than spreadsheets across all business types - Cashflow4cast
unlocks savings, enables smarter planning, and offers peace of mind - The platform delivers
consistent forecasting accuracy, store after store, ensuring reliable and repeatable performance


🔹 Test 2: How does this forecasting service work?
----------------------------------------------------------------------------------------------------
Based on the provided information, here is how the forecasting service of Cashflow4cast works:  -
Adapts dynamically as your business grows, allowing you to confidently expand without overwhelming
your systems - Focuses on 30-day forecasts to deliver more value compared to traditional tools that
lose reliability fas

### TEST 1 - 🧠 General Understanding

In [41]:
test_general = [
    "What does Cashflow4cast actually do?",
    "How does this forecasting service work?",
    "Is this based on my actual sales data or general trends?",
    "How is this different from Excel or QuickBooks forecasts?",
    "Can I trust forecasts during unstable economic conditions?"
]

# Run the first three tests
run_rag_tests(test_general, rag_chain)


🔹 Test 1: What does Cashflow4cast actually do?
----------------------------------------------------------------------------------------------------
- Cashflow4cast helps small and mid-sized businesses improve financial forecasting accuracy - It
provides more accurate forecasting than spreadsheets for all types of businesses - Cashflow4cast
unlocks savings, enables smarter planning, and offers peace of mind for business owners - The
platform delivers consistent forecasting accuracy across multiple stores, ensuring reliable and
repeatable performance


🔹 Test 2: How does this forecasting service work?
----------------------------------------------------------------------------------------------------
Based on the provided information, here is how the forecasting service of Cashflow4cast works:  -
Adapts dynamically as your business grows, allowing you to confidently expand without overwhelming
your systems - Focuses on 30-day forecasts to deliver more value compared to traditional tools

### TEST 2 - 💰 Pricing & Plans

In [42]:
test_pricing = [
    "What does it cost?",
    "What’s included in each plan?",
    "Is there a free trial?",
    "Do you offer consulting or custom analysis?",
    "What’s the difference between the Basic and Premium plans?"
]

# Run the first three tests
run_rag_tests(test_pricing, rag_chain)


🔹 Test 1: What does it cost?
----------------------------------------------------------------------------------------------------
- The pricing for Cashflow4cast ranges from **$199–$999/month** depending on the plan chosen. - The
Basic Plan is priced at $199/month and is best suited for small businesses new to forecasting.


🔹 Test 2: What’s included in each plan?
----------------------------------------------------------------------------------------------------
- **Basic Plan:**   - Monthly forecasts   - Category-level predictions   - Excel/CSV downloadable
reports  - **Advanced Plan:**   - Weekly forecasts   - SKU-level insights   - Promotion


🔹 Test 3: Is there a free trial?
----------------------------------------------------------------------------------------------------
I'm sorry, I don't have that information based on the current documentation.


🔹 Test 4: Do you offer consulting or custom analysis?
----------------------------------------------------------------------------

### TEST 3 - 📈 Forecasting & Accuracy

In [43]:
test_forecasting = [
    "How accurate are your forecasts?",
    "How do you reduce forecasting errors?",
    "Can you forecast by product or store?",
    "How do you handle promotions or seasonal trends?",
    "How often are forecasts updated?"
]

# Run the first three tests
run_rag_tests(test_forecasting, rag_chain)


🔹 Test 1: How accurate are your forecasts?
----------------------------------------------------------------------------------------------------
- Accurate forecasts with less hassle - 50%+ error reduction - Maintains accuracy even with
volatility, consistently performing more than 50% better across every store


🔹 Test 2: How do you reduce forecasting errors?
----------------------------------------------------------------------------------------------------
To reduce forecasting errors, Cashflow4cast offers the following features based on the provided
information:  - Accurate forecasts with less hassle - 50%+ error reduction - Smart planning for
inventory, staffing, and cash flow


🔹 Test 3: Can you forecast by product or store?
----------------------------------------------------------------------------------------------------
- **Forecasting by Product or Store:**   - **Traditional Tools:**     - Each new product or region
adds guesswork     - Models don’t adapt as your business ev

### TEST 4 - 🌐 Economic Indicators

In [44]:
test_economics = [
    "What economic indicators do you use?",
    "How does inflation affect my business?",
    "What does the Consumer Confidence Index mean for sales?",
    "Do national trends really affect local businesses?",
    "What should I be watching in the economy right now?"
]

# Run the first three tests
run_rag_tests(test_economics, rag_chain)


🔹 Test 1: What economic indicators do you use?
----------------------------------------------------------------------------------------------------
- Cashflow4cast uses federal economic indicators to help businesses improve cash flow forecasting -
These indicators influence customer behavior, costs, and sales rhythms - Monitoring these metrics
helps anticipate changes in demand and adjust pricing and promotions proactively


🔹 Test 2: How does inflation affect my business?
----------------------------------------------------------------------------------------------------
- Customers may have less discretionary income due to inflation - Prices for supplies, shipping, and
labor may rise - Business owners may face tighter margins and tougher pricing decisions


🔹 Test 3: What does the Consumer Confidence Index mean for sales?
----------------------------------------------------------------------------------------------------
- The Consumer Confidence Index reflects how optimistic people

### TEST 5 - 👣 Onboarding & Setup

In [46]:
test_onboarding = [
    "What’s the first step if I want to try this?",
    "How do you test if my data is a good fit?",
    "Do I need to install anything?",
    "How long does setup take?"
]

# Run the first three tests
run_rag_tests(test_onboarding, rag_chain)


🔹 Test 1: What’s the first step if I want to try this?
----------------------------------------------------------------------------------------------------
The first step to try Cashflow4cast is to schedule a free consultation where they will assess if
your data is a good fit for machine learning forecasting.


🔹 Test 2: How do you test if my data is a good fit?
----------------------------------------------------------------------------------------------------
To test if your data is a good fit for ML forecasting with Cashflow4cast, you can follow these
steps:  - Send us a sample of your recent sales history - We will train our ML model and generate
2–3 months of forecasts - Compare the ML forecast to a traditional Excel-style model - We will
explain the results and provide insights on the fit of your data


🔹 Test 3: Do I need to install anything?
----------------------------------------------------------------------------------------------------
Based on the provided information, y

### TEST 6 - 🚫 Fallback Testing (Out-of-Scope)



In [45]:
test_fallbacks = [
    "Do you support inventory tracking and reordering?",
    "Can I use this to file my taxes?",
    "How do I open a store in another state?"
]

# Run the first three tests
run_rag_tests(test_fallbacks, rag_chain)


🔹 Test 1: Do you support inventory tracking and reordering?
----------------------------------------------------------------------------------------------------
- Stock the right inventory at the right time - Adjust pricing and promotions proactively - Make
smarter decisions around inventory and labor


🔹 Test 2: Can I use this to file my taxes?
----------------------------------------------------------------------------------------------------
I'm sorry, I don't have that information based on the current documentation.


🔹 Test 3: How do I open a store in another state?
----------------------------------------------------------------------------------------------------
I'm sorry, I don't have that information based on the current documentation.



#### CLEAN WIDGETS

In [1]:
import json
from google.colab import drive
drive.mount('/content/drive', force_remount=True)

notebook_path ="/content/drive/My Drive/LANGCHAIN/LC_010_RAG_CustServiceBot_Model_DocOptimzation.ipynb"

# Load the notebook JSON
with open(notebook_path, 'r', encoding='utf-8') as f:
    nb = json.load(f)

# 1. Remove widgets from notebook-level metadata
if "widgets" in nb.get("metadata", {}):
    del nb["metadata"]["widgets"]
    print("✅ Removed notebook-level 'widgets' metadata.")

# 2. Remove widgets from each cell's metadata
for i, cell in enumerate(nb.get("cells", [])):
    if "metadata" in cell and "widgets" in cell["metadata"]:
        del cell["metadata"]["widgets"]
        print(f"✅ Removed 'widgets' from cell {i}")

# Save the cleaned notebook
with open(notebook_path, 'w', encoding='utf-8') as f:
    json.dump(nb, f, indent=2)

print("✅ Notebook deeply cleaned. Try uploading to GitHub again.")

Mounted at /content/drive
✅ Removed notebook-level 'widgets' metadata.
✅ Notebook deeply cleaned. Try uploading to GitHub again.
