Skip to content

v0.8.0 — MCP Server, Narrative Growth Patterns, 18 Domains, DetectionReport

Choose a tag to compare

@rasinmuhammed rasinmuhammed released this 10 May 11:25
· 96 commits to main since this release

What's new in 0.8.0

This is the biggest release since the realism engine in 0.5. Three major systems land together: a built-in MCP server so AI assistants can generate data on your behalf, a full narrative pattern engine that converts plain English into exact monthly data shapes, and 5 new industry domains bringing the total to 18.


MCP server — use Misata from Claude, Cursor, Windsurf

pip install "misata[mcp]"

Wire it into Claude Desktop — edit ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "misata": { "command": "misata-mcp" }
  }
}

Then just describe what you need:

  • "Generate a fintech fraud dataset with 10k customers and a 2% fraud rate."
  • "Show me the tables Misata would create for an HR system with 200 German employees."
  • "Build SaaS data — MRR from $50k in January to $200k in December, Q3 slump, Black Friday spike."

Five tools exposed over stdio:

Tool Purpose
list_domains List all 18 domains with sample stories
preview_story Dry-run detection — domain, scale, table layout, zero rows generated
inspect_schema Full schema: tables, columns, FK relationships, outcome curves
generate_dataset Generate CSV files on disk, return paths + 5-row previews
validate_yaml Two-layer structural + semantic validation of misata.yaml

All tools return {"ok": true/false, ...} — agents receive structured errors with actionable suggestions instead of Python tracebacks. Compatible with Claude Desktop, Cursor, Windsurf, Zed, and Continue.


Narrative growth patterns

Describe a business trajectory in plain English — Misata converts it to exact per-month targets and shapes generated data to match.

# Ecommerce with full seasonal story
misata.generate(
    "Ecommerce store with 10k customers — revenue from $200k in Jan to $350k in Sep, "
    "Black Friday spike, Christmas peak, Q1 slump after holidays",
    rows=10_000, seed=42
)

# SaaS hockey-stick
misata.generate(
    "SaaS startup — MRR $5k in January, 10x growth over the year, strong Q4",
    rows=2000, seed=42
)

# B2B with anchor + multiplier
misata.generate(
    "B2B SaaS — ARR $500k in Jan, doubled by December, summer slump",
    rows=1000, seed=42
)

Three pattern types, all composable in the same story:

Pattern Example phrase Effect
Monthly anchor "MRR $50k in January" Pins exact value for that month
Quarter modifier "Q4 spike" Oct / Nov / Dec boosted by 1.3×
Named event "Black Friday spike" November +1.55×
Multiplier "doubled by December" End value = 2× start
From–to "from $50k to $200k" Linear interpolation across 12 months

Named seasonal events: Black Friday (Nov ×1.55), Christmas (Dec ×1.40), holiday season (Dec ×1.35), summer slump (Jul+Aug ×0.75), back to school (Aug ×1.20), New Year (Jan ×1.25), tax season (Apr ×1.20), Valentine (Feb ×1.20).

Quarter modifiers: Q1 slump, Q2 flat, Q3 dip, Q4 spike, strong Q4, Q3 peak, and all synonyms (boom, surge, decline, crash, slow, push, flat).

Multipliers: doubled, tripled, 10x growth, halved, Nx notation, grew 300%.


preview() and DetectionReport

Inspect what Misata understood before generating a single row:

report = misata.preview(
    "A SaaS company with 5k users, MRR from $50k in Jan to $200k in Dec"
)

print(report.domain)             # "saas"
print(report.domain_confidence)  # "high"
print(report.matched_keywords)   # ["saas", "mrr"]
print(report.table_preview)
# [{"name": "users",         "rows": 5000,  "columns": 12},
#  {"name": "subscriptions", "rows": 5000,  "columns": 8},
#  {"name": "invoices",      "rows": 20000, "columns": 6}]

print(report.summary())
# ✓ Domain: saas  [high]  matched: saas, mrr
# ✓ Scale: users=5,000
# ✓ Events: 2 detected
#
#   Will generate 3 table(s), 30,000 total rows:
#     users          5,000 rows  (12 columns)
#     subscriptions  5,000 rows  (8 columns)
#     invoices      20,000 rows  (6 columns)

preview() is pure inspection — no generators, no rows, instant feedback. Call it before any large generate() to catch ambiguity early.


5 new domains (18 total)

New domain Trigger keywords Tables
crm crm, contacts, deals, pipeline companies, contacts, deals, activities
crypto crypto, blockchain, defi, wallet wallets, tokens, transactions, token_prices
insurance insurance, policy, claims, premium customers, policies, claims, payments
travel travel, hotel, flights, bookings users, hotels, flights, bookings, reviews
streaming streaming, netflix, subscribers subscribers, content, watch_history, ratings

All 18 domains have a full hardening suite: parse + validate, generate at two scales, FK integrity, YAML roundtrip, determinism, and rows=1 edge case.


Scored domain detection

Detection changed from first-match to scored:

  • +5 if the literal domain name appears in the story
  • +1 per matched keyword

"A fintech company with crypto wallets" correctly detects as fintech even though crypto and wallet are crypto keywords — the literal "fintech" earns +5 and wins. The near_misses field on DetectionReport shows which other domains also fired.


JSON Schema for misata.yaml

Add one line to misata.yaml for in-editor validation and auto-complete in VS Code, PyCharm, and any JSON Schema–aware editor:

# yaml-language-server: $schema=https://rasinmuhammed.github.io/misata/schema/misata_schema.json

tables:
  - name: users
    rows: 5000
    columns:
      - name: user_id
        type: int
        unique: true

misata init writes this header automatically. The schema covers all 18 domain names, all column types, all distribution names, and FK target resolution.


Other improvements

  • Locale-aware phone numbers — format-correct per country (+49 ### ####### for DE, +91-#####-##### for IN); auto-applied to any phone, mobile, or tel column
  • Temporal coherencedate_diff_to: "today" derives exact tenure years from hire date; max_date: "today" caps future-dated events; age ≥ 18 enforced at hire in HR domain
  • Name-derived emails — email columns adjacent to first_name + last_name adopt the person's name (jane.doe@acmecorp.com)
  • Actionable validation hints — every SchemaValidationError now includes a specific fix (exact probability delta to correct, exact relationship call to add, column list when a curve references a missing column)
  • 581 tests passing across Python 3.10, 3.11, and 3.12

Breaking changes

None. All existing generate(), parse(), and generate_from_schema() calls are fully compatible.


Upgrade

pip install --upgrade misata

# For MCP server support
pip install --upgrade "misata[mcp]"

Full changelog: https://github.com/rasinmuhammed/misata/blob/main/CHANGELOG.md
Documentation: https://rasinmuhammed.github.io/misata
PyPI: https://pypi.org/project/misata/0.8.0/