Skip to content

ocelma/mega-djinn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

44 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Mega Djinn

An AI-powered data agent that enables natural language querying over your entire Databricks lakehouse

Simply ask a question in plain English.

The agent automatically discovers the right tables in Databricks Unity Catalog, enriches the context with business definitions and approved SQL patterns from Alation, generates the optimized query, and returns the results. No SQL knowledge required, and no more hunting for tables.

⠀⠀⠀⠀⠀⠀⠀⠀⠀⣀⣀⣠⣄⣀⣀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⣰⣿⣿⣿⑿Ⓙ⣿⣿⣿⣆⠀⠀⠀⠀⠀⠀⠀
β €β €β €β €β €β €β €β Ήβ Ώβ ›β£β£€β£€β£ˆβ ›β Ώβ β €β €β €β €β €β €β €
⠀⠀⠀⠀⠀⠀⠀⠀⠀Ⓒ⣿⣿⣿⣿⑇⠀⠀⠀⠀⠀⠀⠀⠀⠀
β €β €β €β €β£€β£€β£΄β£Άβ£€β£ˆβ ™β »β Ÿβ ‹β£β£€β£Άβ£¦β£€β£€β €β €β €β €
β €β£€β£Ύβ£Ώβ£Ώβ£Ώβ£Ώβ£Ώβ£Ώβ£Ώβ£Ώβ£Άβ£Άβ£Ώβ£Ώβ£Ώβ£Ώβ£Ώβ£Ώβ£Ώβ£Ώβ£·β£€β €
β£Ύβ£Ώβ£Ώβ£Ώβ£Ώβ£Ώβ£§β£€β£€β£€β£€β£€β‘€β €β’€β£€β£ β£Ώβ£Ώβ£Ώβ£Ώβ£Ώβ£Ώβ£·
⠙⠿⣿⣿⣿⣿⣿⣿⠿⠿⠋⠁⠀⠢Ⓙ⣿⣿⣿⣿⣿⣿⠿⠿⠋
β €β €β €β €β €β£€β£€β£€β£€β£Άβ£Ύβ£Ώβ£·β£Άβ£€β£€β£€β£€β£€β €β €β €β €β €
⠀⠀⠀⠀⠀Ⓙ⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⑿⠁⠀⠀⠀⠀⠀
β €β €β €β €β €β ˜β£Ώβ£Ώβ£Ώβ£Ώβ£Ώβ£Ώβ£Ώβ£Ώβ£Ώβ£Ώβ‘Ώβ β €β €β €β €β €β €
β €β €β €β €β €β €β ˜β£Ώβ£Ώβ£Ώβ£Ώβ£Ώβ£Ώβ£Ώβ£Ώβ£Ώβ β €β €β €β €β €β €β €
β €β €β €β €β €β €β €β ˆβ’Ώβ£Ώβ£Ώβ£Ώβ Ώβ Ÿβ ›β’‰β£„β €β €β €β €β €β €β €
β €β €β €β €β €β €β €β €β €β €β  β’€β£€β£Άβ£Ύβ£Ώβ£Ώβ£Ώβ£Άβ£Άβ£Άβ Άβ ’β €
β €β €β €β €β €β €β €β €β €β €β €β €β ˆβ ‰β ‰β ™β ›β ‰β ‰β ‰β €β €β €β €

🧞✨ Explore Your Organization's Data

Ask a business question in plain English and get a detailed report back.

- Which articles kept readers on the page the longest last month?
- How has subscriber growth trended over the past 6 months?
- Which markets drove the biggest revenue increase in Q1 2026?
- Compare mobile vs desktop subs conversion rates for the last 90 days

No SQL knowledge required. No need to learn what a LEFT OUTER JOIN does!

Quick Install

Run this command on your terminal:

curl -fsSL https://raw.githubusercontent.com/ocelma/mega-djinn/main/install.sh | bash

The installer will:

  • clone the repo
  • install the Databricks CLI if not already present
  • ask for your Databricks endpoint
  • (optional) ask for your Alation token
  • create .env
  • create .venv
  • install requirements.txt

For the full step-by-step setup, use the manual install flow below.

How to Use It?

Once mega-djinn agent is successfully installed and configured (see Setup below), open a terminal, cd into the repo, and launch Claude Code (Codex, Gemini CLI, or Cursor):

cd /path/to/mega-djinn
claude

🧞✨ Your Mega Djinn is officially out of the bottle and ready to grant your data wishes!

The agent will find the right tables, show you the SQL it plans to run, ask for your confirmation, execute it, and save the results as an HTML report in reports/.

What the Agent Does

Mega Djinn AI agent does its magic and make your dreams questions come true!

  1. Takes a natural language question from a user (no SQL knowledge needed)
  2. Searches the local knowledge base (.ai/knowledge/) for verified queries and known pitfalls from teammates β€” no network cost
  3. Retrieves relevant tables from Databricks Unity Catalog as well as approved queries, table metadata, and business glossary from Alation (if available)
  4. Using your preferred LLM, it generates the SQL grounded in your org's conventions
  5. Executes the query on Databricks and returns results β€” no need to open Databricks
  6. Saves results as a styled HTML report in ./reports/
  7. Saves the verified SQL (or failure notes) to .ai/knowledge/ so teammates benefit from it on their next query

Caution

Always verify that the SQL queries and results are correct. Free-will Djinns are sometimes evil or mischievous!

Agent Workflow

          User (natural language query)
                        ↓
                    Agent (LLM)
                        ↓ retrieves context from
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    ↓                   ↓                      ↓
.ai/knowledge/   Databricks Unity Catalog   Alation (Optional)
(verified queries,  (schemas, tables/       (approved queries,
 known pitfalls)     column metadata)        glossary, lineage)
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                        ↓
               Agent creates a Plan
                        ↓
     Agent generates SQL (user agrees to run the query)
                        ↓
           Databricks SQL execution
                        ↓
                     Results
                        ↓
          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
          ↓             ↓             ↓
        User    HTML in ./reports  .ai/knowledge/
                                  (saves verified SQL
                                   or failure notes)

Agent Integration with CLI Coding Agents

Claude Code

The project ships a Claude Code skill and two slash commands:

Path Purpose
.claude/skills/mega-djinn/SKILL.md Full workflow, tables, SQL rules, safety guardrails β€” auto-loaded by Claude Code
.claude/commands/query.md /query "<question>" β€” runs the full plan β†’ SQL β†’ report workflow
.claude/commands/report.md /report β€” regenerates the last result as a dated HTML report
CLAUDE.md Project-level invariants (confirm before execute, read-only, report format)
.ai/knowledge/ Shared knowledge base β€” verified queries, table notes, known pitfalls; grows via git push/pull

Codex

Path Purpose
AGENTS.md It will read this file and understand what Mega Djinn can do.

Gemini CLI

Path Purpose
GEMINI.md Project-level invariants for Gemini CLI (mirrors CLAUDE.md)
.gemini/skills Symlink to .claude/skills β€” shares the same skill definitions as Claude Code

Cursor IDE

Path Purpose
.cursor/rules/mega-djinn.mdc Always-on workspace rule β€” mirrors the skill for Cursor
.cursor/rules/python-scripts.mdc Auto-attaches when editing scripts/*.py
.cursor/mcp.json Databricks MCP server config for Cursor
.cursorignore Excludes .venv/, __pycache__ from Cursor indexing

Manual Setup

If you did not run Quick Install, follow the steps below.

Prerequisites

  • Python 3.13+
  • Databricks CLI installed and authenticated. If not installed, run:
brew tap databricks/tap
brew install databricks

Install

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Run the helper script through the project virtualenv so the installed SDK dependencies are used:

.venv/bin/python scripts/execute.py --help

Databricks CLI Auth (Mandatory)

Auth is handled via the existing [dev] or [prod] profile in ~/.databrickscfg, using OAuth with auto-refresh. Make sure the file exists:

test -f ~/.databrickscfg && echo "Databricks Config File Exists" || echo "Error. Not found"
# ~/.databrickscfg
[dev]
host      = https://your-org-dev.cloud.databricks.com
auth_type = databricks-cli

[prod]
host      = https://your-org-prod.cloud.databricks.com
auth_type = databricks-cli

The SDK reads DATABRICKS_CONFIG_PROFILE from .env and automatically refreshes the OAuth token using the cached refresh token in ~/.databricks/token-cache.json.

Alation Auth (optional, but recommended if available)

(Skip this step if Alation is not available; the agent will still work using Databricks Unity Catalog metadata).

If your organization uses Alation, connecting it gives the agent access to your company's accumulated data knowledge β€” business glossary definitions, approved SQL patterns, table documentation, and curated queries β€”so answers are grounded in your org's conventions rather than inferred from schema names alone. Access uses OAuth 2.0 client credentials (machine-to-machine). No browser login or manual token renewal required.

Caution

An Alation Admin (most likely his is NOT you!) must create an OAuth client application for the Agent:

  1. Alation Admin opens Settings (gear icon, top right) β†’ Authentication β†’ OAuth Client Applications β†’ Add.
  2. Enter a name, set Access Token Duration in seconds (e.g. 3600 for 1 hour), and set User Role to Viewer.
  3. Click Save β€” copy the Client ID and Client Secret immediately (the secret is shown only once) and add them to your .env (see section below).

scripts/execute.py fetches a JWT automatically on each run. No manual token refresh needed.

Config file (.env)

Copy .env.example to a new file named .env. Set the Databricks variables below. If you completed Alation Auth (optional) above, add ALATION_BASE_URL, ALATION_CLIENT_ID, and ALATION_CLIENT_SECRET (else leave them empty).

# Databricks Configuration
DATABRICKS_CONFIG_PROFILE=dev
DATABRICKS_AUTH_TYPE=databricks-cli
DATABRICKS_HOST=https://your-org-dev.cloud.databricks.com

# Optional: pin a specific SQL warehouse. If unset, execute.py auto-selects a running warehouse.
# DATABRICKS_WAREHOUSE=

# Alation Configuration. Optional (only if organization uses it)
ALATION_BASE_URL=https://your-org.alationcloud.com
ALATION_CLIENT_ID=
ALATION_CLIENT_SECRET=

.venv/bin/python scripts/execute.py --sql auto-selects a running SQL warehouse. Set DATABRICKS_WAREHOUSE in .env to pin a specific one. The helper script defaults to DATABRICKS_AUTH_TYPE=databricks-cli so it uses the authenticated DATABRICKS_CONFIG_PROFILE CLI profile directly and avoids relying on automatic host metadata resolution.

Caution

Never commit .env to git!

To re-authenticate with Databricks once the refresh token expires do:

MY_PROFILE=$(grep "DATABRICKS_CONFIG_PROFILE" .env | cut -d '=' -f2)
databricks auth login --profile $MY_PROFILE

Caution

To use Mega Djinn you must be logged in into Databricks (see command above).

Optional: Databricks MCP server (AI Dev Kit)

Databricks AI Dev Kit ships databricks-mcp-server, which exposes Databricks actions (SQL, table metadata, etc.) to Claude Code, Codex, or Cursor IDE. This repository already includes a root .mcp.json that starts that server using your DATABRICKS_CONFIG_PROFILE profile (same as Databricks CLI Auth).

You do not need MCP to use Mega Djinn as .venv/bin/python scripts/execute.py --sql talks to Databricks CLI directly. MCP is only for agents inside your editor.

Prerequisites

  • Finish Databricks CLI Auth (databricks auth login --profile $MY_PROFILE).
  • Install uv if you do not have it β€” the AI Dev Kit installer uses it to create the MCP Python environment.

Install Databricks MCP server (step by step)

  1. Run the official installer (macOS / Linux; can be run from any directory):

    curl -fsSL https://raw.githubusercontent.com/databricks-solutions/ai-dev-kit/main/install.sh | bash
  2. Answer the prompts so that MCP gets installed. In particular:

    • Enable MCP when asked which components to install.
    • Select Cursor (and any other editors you use) if you want the kit to drop helper config into those tools.
    • When asked for the MCP server install path, accept the default ~/.ai-dev-kit unless you want to use a different location. This repo’s .mcp.json assumes that default (see step 4 if you change it).
  3. Wait for the script to finish. It will create:

    • ~/.ai-dev-kit/.venv/bin/python β€” interpreter for the MCP server
    • ~/.ai-dev-kit/repo/databricks-mcp-server/run_server.py β€” Databricks MCP entrypoint
  4. Open .mcp.json and verify the paths point to $HOME/.ai-dev-kit/ (the default install location), or update them to the custom path where you installed it.

Verify paths exist (optional):

test -x "$HOME/.ai-dev-kit/.venv/bin/python" \
  && test -f "$HOME/.ai-dev-kit/repo/databricks-mcp-server/run_server.py" \
  && echo "MCP runtime OK"

For Windows OS: the committed .mcp.json uses /bin/sh and Unix-style paths. On native Windows, configure the Databricks MCP server in your editor with explicit paths to your Python and run_server.py, or use WSL and run the steps above inside WSL.

References:

About

An AI-powered data agent that enables natural language querying over your entire Databricks lakehouse Simply ask a question in plain English and get the results back. No SQL knowledge needed!

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors