This repository contains code and instructions for running your own Firebolt-powered chatbot that uses retrieval-augmented generation (RAG).
- Python 3.12 (for local setup)
- Docker and Docker Compose (for Docker setup)
- GPU with NVIDIA drivers (optional, for better performance)
- Git
Task is a task runner / build tool that aims to be simpler and easier to use than, for example, GNU Make. It's used in this project to simplify common operations.
Install Task using one of the following methods:
# macOS (via Homebrew)
brew install go-task/tap/go-task
# Linux (via script)
sh -c "$(curl --location https://taskfile.dev/install.sh)" -- -dFor other installation methods, see the official Task installation guide.
# Clone the repository
git clone <repository-url>
cd rag_chatbot
# Run the automated setup script (interactive mode)
./setup.sh
# Or specify setup mode directly:
./setup.sh --docker # for Docker setup
./setup.sh --local # for local Python setupThe script will:
- Check your system for required dependencies
- Install Ollama if not already installed
- Set up environment files
- Install Python dependencies or build Docker containers
- Download required Ollama models
If you have Task installed:
# Install dependencies
task install-deps
# Setup Ollama models
task setup-ollama
# Start the server
task start-server- Copy
.env.exampleto.envand fill in your Firebolt credentials - Update the GitHub repository paths and chunking strategy configuration in your
.envfile - Run
task populateorpython populate_table.pyto populate your vector database - The system automatically ensures chunking strategy consistency across embedding generation and retrieval
When using Docker, you can populate the table using the following methods:
Option 1: Using Task (Recommended)
# This automatically detects Docker vs local setup
task populateOption 2: Direct Docker Command
# Ensure your Docker services are running
docker compose up -d
# Run populate_table.py inside the container
docker compose exec rag_chatbot python populate_table.pyImportant Notes for Docker:
- The
FIREBOLT_RAG_CHATBOT_LOCAL_GITHUB_PATHenvironment variable should point to your local GitHub repositories directory - This directory is automatically mounted to
/githubinside the Docker container - The script will automatically use
/githubas the base path when running in Docker - Make sure your document repositories are cloned locally in the
FIREBOLT_RAG_CHATBOT_LOCAL_GITHUB_PATHdirectory before running
- GPU Support: For Docker GPU support, ensure NVIDIA Docker runtime is installed
- Ollama Models: Models are automatically downloaded but may take time on first run
- Port Conflicts: Default ports are 5000 (web) and 11434 (Ollama)
- Ollama Performance: For better performance with large models, consider the following:
- On macOS:
OLLAMA_FLASH_ATTENTION="1" OLLAMA_KV_CACHE_TYPE="q8_0" /usr/local/opt/ollama/bin/ollama serve - For production: Use GPU-accelerated or cloud-hosted inference services
- On macOS:
- Register for Firebolt
- Set up your account following these instructions
- Create a database by following the Create a Database section
- Create or use an existing engine (Firebolt may have automatically created
my_engine) - Create a service account:
- Follow the service account setup instructions
- When creating a user, select
Service Accountin theAssign Todropdown andaccount_adminfor the role
- Python 3.12
- GPU support (recommended) - many local computers have GPUs, but for better performance consider using a cloud GPU instance
Run the automated script which will handle all dependencies:
./setup.shChoose either Docker or local setup when prompted, or specify directly:
./setup.sh --docker # for Docker setup
./setup.sh --local # for local Python setupIf you prefer to set up manually:
-
Install Ollama:
- macOS:
brew install ollama && brew services start ollama - Linux:
curl -fsSL https://ollama.com/install.sh | sh - Windows: Download from ollama.com/download
- macOS:
-
Pull Required Models:
ollama pull llama3.1 ollama pull nomic-embed-text
-
Setup Python Environment:
python -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate pip install -r requirements.txt
-
Environment Variables:
- Copy
.env.exampleto.env - Fill in the required Firebolt credentials and configuration:
# Firebolt Database Configuration FIREBOLT_RAG_CHATBOT_CLIENT_ID=<your-service-account-id> FIREBOLT_RAG_CHATBOT_CLIENT_SECRET=<your-service-account-secret> FIREBOLT_RAG_CHATBOT_ENGINE=<your-engine-name> FIREBOLT_RAG_CHATBOT_DB=<your-database-name> FIREBOLT_RAG_CHATBOT_ACCOUNT_NAME=<your-account-name> FIREBOLT_RAG_CHATBOT_TABLE_NAME=<your-table-name> FIREBOLT_RAG_CHATBOT_LOCAL_GITHUB_PATH=<path-to-your-github-repos> # Chunking Strategy Configuration (Environment-Driven) FIREBOLT_RAG_CHATBOT_CHUNKING_STRATEGY=recursive_character_text_splitting FIREBOLT_RAG_CHATBOT_CHUNK_SIZE=300 FIREBOLT_RAG_CHATBOT_CHUNK_OVERLAP=50 FIREBOLT_RAG_CHATBOT_NUM_WORDS_PER_CHUNK=100 FIREBOLT_RAG_CHATBOT_NUM_SENTENCES_PER_CHUNK=3 FIREBOLT_RAG_CHATBOT_BATCH_SIZE=150
Chunking Strategy Options:
recursive_character_text_splitting(recommended)semantic_chunkingby_paragraphby_sentenceby_sentence_with_sliding_windowevery_n_words
- Copy
-
Prepare Documents for RAG:
- Clone your document repositories locally
- Update
repo_dictinpopulate_table.pywith your repositories - Configure chunking strategy and parameters via environment variables (no code changes needed)
- Optionally, add file names to
DISALLOWED_FILENAMESinconstants.pyto exclude them
-
Populate the Vector Database:
- The script automatically validates chunking strategy consistency to prevent embedding mismatches
- For local setup:
python populate_table.py - For Docker setup:
task populateordocker compose exec rag_chatbot python populate_table.py - Important: The system will warn you if changing chunking strategies on existing embeddings
-
Customize the Chatbot:
- Modify the prompt in
run_chatbot()function inrun_llm.pyto suit your use case - Configure chunking strategy and parameters via environment variables in your
.envfile - The system automatically ensures consistency between embedding generation and retrieval phases
- Modify the prompt in
python web_server.pydocker-compose up -dAccess the web UI at http://127.0.0.1:5000
- Supported Formats: Only
.docx,.txt, and.mdfiles are processed - Character Issues: Null characters and certain Unicode values may cause errors in Firebolt tables
- Markdown Syntax: Ensure all Markdown files have valid syntax to prevent errors
- Environment-Driven: All chunking parameters are configurable via environment variables
- Consistency Validation: The system automatically validates chunking strategy consistency before processing
- No Code Changes: Switch between chunking strategies by updating your
.envfile only - Strategy Mismatch Warning: The system warns when attempting to mix different chunking strategies in the same database
To toggle between internal/external user access:
- Go to
web_server.py - Set
is_customer=Truein therun_chatbot()function to restrict access to public documents only
We have provided an example dataset that you can use to build your chatbot! You can find the dataset at this GitHub repository, which contains documentation for HuggingFace Transformers.