A semantic search service that bridges Firecrawl web scraping with vector search capabilities using HuggingFace Text Embeddings Inference (TEI) and Qdrant.
Firecrawl (steamy-wsl) → Search Bridge API → Redis Queue → Background Worker
├─> HuggingFace TEI (embeddings)
├─> Qdrant (vector storage)
└─> BM25 (keyword search)
- Hybrid Search: Combines vector similarity (semantic) + BM25 (keyword) using Reciprocal Rank Fusion (RRF)
- Token-based Chunking: Intelligent text splitting using actual token counts (not characters)
- Async Processing: Background job queue for non-blocking document indexing
- Rich Filtering: Domain, language, country, mobile device filters
- Multiple Search Modes: Hybrid, semantic-only, keyword-only, BM25-only
- Python 3.11+
- Docker & Docker Compose
- UV package manager
# Clone repository
git clone https://github.com/jmagar/fc-bridge.git
cd fc-bridge
# Copy environment template
cp .env.example .env
# Edit .env with your configuration
# Install dependencies
make install
# Check port availability
make ports
# Start Docker services (Redis, Qdrant, TEI)
make services
# Run API server (in one terminal)
make dev
# Run background worker (in another terminal)
make workerPOST /api/index- Queue document for indexingPOST /api/search- Search indexed documentsGET /health- Health checkGET /api/stats- Index statistics
| Port | Service | Description |
|---|---|---|
| 52100 | search-bridge | FastAPI REST API |
| 52101 | redis | Redis Queue |
| 52102 | qdrant | Qdrant HTTP API |
| 52103 | qdrant | Qdrant gRPC API |
| 52104 | tei | HuggingFace TEI |
Update Firecrawl's .env (on steamy-wsl):
ENABLE_SEARCH_INDEX=true
SEARCH_SERVICE_URL=http://<IP_OF_THIS_MACHINE>:52100
SEARCH_SERVICE_API_SECRET=your-secret-key
SEARCH_INDEX_SAMPLE_RATE=0.1To find this machine's IP:
hostname -I | awk '{print $1}'# Run tests
make test
# Format code
make format
# Lint code
make lint
# Type check
make type-check
# Run all checks
make checkfc-bridge/
├── app/
│ ├── main.py # FastAPI application
│ ├── config.py # Settings
│ ├── models.py # Pydantic schemas
│ ├── api/
│ │ ├── routes.py # API endpoints
│ │ └── dependencies.py # Shared dependencies
│ ├── services/
│ │ ├── embedding.py # HF TEI client
│ │ ├── vector_store.py # Qdrant client
│ │ ├── bm25_engine.py # BM25 indexing
│ │ ├── search.py # Hybrid search
│ │ └── indexing.py # Document processing
│ ├── utils/
│ │ └── text_processing.py # Token-based chunking
│ └── worker.py # Background worker
├── tests/
├── data/ # Docker volume mounts
├── docker-compose.yaml
├── pyproject.toml
└── README.md
Combines vector similarity + BM25 using RRF:
{
"query": "machine learning",
"mode": "hybrid",
"limit": 10
}Pure vector similarity:
{
"query": "machine learning",
"mode": "semantic"
}Traditional keyword search:
{
"query": "machine learning",
"mode": "keyword"
}# View service logs
make services-logs
# Check health
curl http://localhost:52100/health
# View stats
curl http://localhost:52100/api/statsMIT
See FIRECRAWL_BRIDGE.md for complete implementation details.