We have introduced new documentation to help you understand the modular architecture and migration path.
Please refer to the following documents:
Transform natural language questions into production-ready SQL with ontological context and warm, human-centered design.
- Business teams wait on backlogs of custom SQL while analysts juggle endless report tweaks.
- Complex domains like mortgage analytics demand institutional knowledge that traditional BI tools cannot encode.
- Open data is abundant, but combining it with AI safely and accurately remains tedious.
- Ontology-first modeling captures relationships, risk logic, and business vocabulary once and reuses it everywhere.
- Adapter-based AI orchestration lets you swap Claude, Bedrock, Gemini, or local engines without touching the UI.
- Streamlit experience design bridges analysts and executives with curated prompts, cached schemas, and explainable results.
The reference app ships with 9M+ rows of Fannie Mae loan performance data. Ask the AI for โhigh-risk California loans under 620 credit scoreโ and get DuckDB-ready SQL plus rich metrics at a glance.
- ๐ง 110+ fields grouped into 15 ontology domains with risk heuristics baked into prompts.
- โก CSV โ Parquet pipeline with enforced types, 10ร compression, and predicate pushdown via DuckDB.
- ๐ Google OAuth guardrails with optional Cloudflare D1 logging.
- ๐ค Multi-provider AI adapters (Bedrock, Claude, Gemini) with graceful fallbacks and prompt caching.
SELECT LOAN_ID, STATE, CSCORE_B, OLTV, DTI, DLQ_STATUS, CURRENT_UPB
FROM data
WHERE STATE = 'CA'
AND CSCORE_B < 620
AND CSCORE_B IS NOT NULL
ORDER BY CSCORE_B ASC, OLTV DESC
LIMIT 20;
Streamlit UI (app.py)
โโ Core orchestration (src/core.py)
โโ DuckDB execution
โโ Cached schema + ontology context
โโ Data sync checks (scripts/sync_data.py)
โโ AI service (src/ai_service.py)
โโ Adapter registry (src/ai_engines/*)
โโ Prompt construction with risk framework
โโ Clean SQL post-processing
converSQL follows a clean, layered architecture designed for extensibility:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Application Layer โ
โ (Streamlit UI โข Query Builder โข Ontology Explorer) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ AI Engine Layer โ
โ (Adapter Pattern: Bedrock โข Claude โข Gemini โข Ollama) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Intelligence Layer โ
โ (Ontology โข Schema Context โข Business Rules) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Data Layer โ
โ (Parquet Files โข DuckDB โข R2 Storage โข Query Execution) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Our showcase implementation demonstrates a complete data engineering workflow:
- Ingestion: Fannie Mae's pipe-separated loan performance files
- Transformation: Schema enforcement with explicit data types (VARCHAR, Float, Int16, etc.)
- Storage: Parquet format with SNAPPY compression (10x size reduction)
- Performance: DuckDB for blazing-fast analytical queries
- Ontology: Structured metadata linking business concepts to database schema
๐ Learn more about the data pipeline โ
Token | Hex | Description |
---|---|---|
--color-background |
#FAF6F0 |
Ivory linen canvas across the app |
--color-background-alt |
#FDFDFD |
Porcelain surfaces for cards and modals |
--color-text-primary |
#3A3A3A |
Charcoal Plum headings |
--color-text-secondary |
#7C6F64 |
Warm Taupe body copy |
--color-accent-primary |
#DDBEA9 |
Soft Clay primary accent |
--color-accent-primary-darker |
#B45F4D |
Terracotta hover and emphasis |
--color-border-light |
#E4C590 |
Gold Sand borders, dividers, and tags |
- Install prerequisites
git clone https://github.com/ravishan16/converSQL.git cd converSQL pip install -r requirements.txt
- Configure environment
cp .env.example .env # Enable one AI block (CLAUDE_API_KEY, AWS_* for Bedrock, or GEMINI_API_KEY) # Provide Google OAuth or set ENABLE_AUTH=false for local dev
- Launch the app
streamlit run app.py
- Architecture โ layered design and component interactions.
- Data pipeline โ ingest, transformation, and Parquet strategy.
- AI engines โ adapter contracts and extension guides.
- Environment setup โ required variables for auth, data, and providers.
make setup
โ clean install + cache purge.make test-unit
/make test
โ pytest with coverage that mirrors CI.make format
andmake lint
โ Black (120 cols), isort, flake8, mypy.- Cached helpers such as
scan_parquet_files()
triggerscripts/sync_data.py
when Parquet is missingโkeepdata/processed/
warm during tests.
- Fork and branch:
git checkout -b feature/my-update
. - Run formatting + tests before committing.
- Open a PR describing the change, provider credentials (if applicable), and test strategy.
See CONTRIBUTING.md for templates, AI adapter expectations, and review checklists.
- Financial services โ credit risk, portfolio concentrations, regulatory stress tests.
- Healthcare โ patient outcomes, clinical trial cohorts, claims analytics.
- E-commerce โ customer segments, inventory velocity, supply chain exceptions.
- Any ontology-driven domain โ define your schema metadata and let converSQL converse.
- โ Multi-AI adapter support with prompt caching and fallbacks.
- โ Mortgage analytics reference implementation.
- ๐ Ollama adapter and enhanced SQL validation.
- ๐ฎ Upcoming: multi-table joins, query explanations, historical learning, self-serve ontology editor.
Released under the MIT License.
- Fannie Mae for the Single Family Loan Performance dataset.
- The DuckDB, Streamlit, and Anthropic/AWS/Google teams for exceptional tooling.
- The converSQL community for ideas, issues, and adapters.
- โญ Star the repo to follow releases.
- ๐ฌ Join discussions or open issues at github.com/ravishan16/converSQL/issues.
- ๐จ Share what you buildโdata should feel conversational.