Skip to content

Add full docs/ tree#213

Merged
jogrogan merged 24 commits intomainfrom
docs/restructure
Apr 28, 2026
Merged

Add full docs/ tree#213
jogrogan merged 24 commits intomainfrom
docs/restructure

Conversation

@jogrogan
Copy link
Copy Markdown
Collaborator

@jogrogan jogrogan commented Apr 27, 2026

Summary

Provide comprehensive documentation for an open-source audience.
The README becomes a slim landing page; detailed content moves into a journey-based docs/ tree.

Motivation

The previous README.md was ~230 lines of mixed elevator-pitch + quickstart + Kafka/Flink runbooks + extension API. There was no place that explained concepts to a newcomer, no CRD reference, no extension guide. The two LinkedIn engineering blog posts position Hoptimator as a "control plane for data planes" — that framing wasn't reflected anywhere in the repo.

New SQL syntax wasn't documented nor was a clear path how to extend this repo.

What's new

docs/
├── index.md -- top-level landing
├── getting-started/
│ ├── index.md
│ ├── quickstart.md -- 5-min walkthrough on Docker Desktop
│ ├── concepts.md -- vocabulary reference
│ └── architecture.md -- life of a SQL statement, module map
├── user-guide/
│ ├── index.md
│ ├── sql-cli.md -- ./hoptimator + custom commands
│ ├── jdbc.md -- driver-level connection properties
│ ├── mcp-server.md -- MCP tool surface, agent workflow
│ ├── ddl-reference.md -- CREATE/DROP/PAUSE/RESUME, partial views, WITH syntax
│ └── hints.md -- template vs connector hints
├── kubernetes/
│ ├── index.md
│ ├── operator.md -- controllers, RBAC, lifecycle
│ ├── crd-reference.md -- field-by-field for all 10 CRDs
│ ├── templates.md -- TableTemplate / JobTemplate authoring
│ ├── triggers.md -- TableTrigger operational guide
│ └── configuration.md -- configmap, ConfigProvider, k8s.* properties
├── extending/
│ ├── index.md
│ ├── data-sources.md -- JDBC adapter + Database CRD
│ ├── deployers.md -- Deployer SPI
│ ├── validators.md -- Validator SPI
│ └── config-providers.md -- ConfigProvider SPI
└── resources/
└── learn-more.md -- engineering blog posts

Plus:

  • CLAUDE.md — context file for Claude Code agents working in this repo.

jogrogan added 24 commits April 27, 2026 14:54
Restructures documentation for an open-source audience. The README is now
a slim landing page (project framing, why-Hoptimator, quickstart pointer,
status, license) instead of a mixed dev-guide. Detailed content moves into
a journey-based docs/ tree modeled on linkedin/venice's structure:

  docs/
    index.md                    -- top-level landing
    getting-started/
      index.md                  -- section landing
      quickstart.md             -- 5-minute walkthrough on Docker Desktop
      concepts.md               -- vocabulary reference
      architecture.md           -- life of a SQL statement, module map
    resources/
      learn-more.md             -- engineering blog posts and case studies

Also cleans up CONTRIBUTING.md: removes placeholder "(link to more info)"
URLs and adds a how-to-file-an-issue / how-to-send-a-PR section.

Phase 2 (user guide) and beyond will follow on the same branch.
Adds the User guide section to docs/. Covers the three client interfaces
(SQL CLI, JDBC, MCP) and the two reference pages (DDL, Hints):

  docs/user-guide/
    index.md            -- section landing
    sql-cli.md          -- ./hoptimator script, sqlline + custom commands
                            (!intro, !resolve, !pipeline, !specify)
    jdbc.md             -- jdbc:hoptimator:// URL format, full connection-
                            property reference, Java example, system tables
    mcp-server.md       -- MCP tool surface (discovery / planning /
                            execution), recommended agent workflow
    ddl-reference.md    -- CREATE/DROP for views/materialized views/triggers
                            /functions/tables, PAUSE/RESUME/REFRESH/FIRE,
                            identifier rules, k8s system schema, what isn't
                            supported
    hints.md            -- template hints vs connector hints, where to set
                            them, how to read what was applied

Also promotes the docs/index.md "User guide" entry from "coming soon" to
linked, and updates the README's Documentation section to match.
Two corrections after a closer look at what the executor actually handles:

DDL reference:
- Drops CREATE FUNCTION as a documented form. The grammar accepts it,
  but no executor handler exists.
- Drops the standalone REFRESH/FIRE section and the "PAUSE/RESUME also
  works for materialized views" line. None of these have executor support.
- Adds a "Reserved syntax" section that lists what parses but does not
  execute today (REFRESH MATERIALIZED VIEW, FIRE *, PAUSE/RESUME
  MATERIALIZED VIEW, CREATE FUNCTION) so readers don't get a false
  positive from a successful parse.

MCP server:
- Rewrites the `query` description to explain why it's restricted to
  ADS / PROFILE / METADATA / K8S: not a safety allowlist, but the only
  schemas Hoptimator can answer queries from without a configured engine.
  The Zeppelin POC for notebook-style execution is acknowledged as
  incomplete.
- Updates the limitations bullet accordingly so it doesn't suggest
  "just widen the allowlist" as a fix.
Six corrections from review:

CONTRIBUTING:
- Add a "cover your changes" step. Document `make coverage` and an 80%
  line-coverage target on changed code (CI currently enforces a softer
  60% / 40%).

README:
- Badge: add explicit `&label=CI` so the build status renders as "CI"
  instead of the default "build".
- Drop the link to GitHub Packages. The page is empty in practice; only
  JFrog has artifacts today.
- Soften the "That one statement becomes:" list. The exact resources
  emitted depend on the registered Databases and templates, not on
  Hoptimator. Frame it as "with a typical Kafka + Flink setup" and add
  a paragraph explaining that the same SQL can target a different stack
  by swapping templates.
- Replace the "Kubernetes-native" why-bullet with a more honest
  "Kubernetes out of the box, not as a hard requirement" — the bundled
  deployers target K8s, but `Deployer` is the actual extension point.

Quickstart:
- Add a note on `CREATE OR REPLACE MATERIALIZED VIEW`. Without it,
  re-running a CREATE for an existing view fails; with it, the
  development loop is much faster.

Concepts (engine clarification):
- Split "Engines and connectors" into separate "Connectors" and
  "Engines (optional)" sections. Connectors do not require an Engine
  to function — Hoptimator emits YAML (e.g. a FlinkSessionJob) and an
  unrelated operator (Flink Kubernetes Operator, etc.) runs it.
- The Engine CRD is specifically about *query* execution (e.g. running
  SELECT against tables that need a runtime). Pipeline materialization
  does not need one. Mark the Engine path as partially developed today.
- Strengthen the Deployers section to lead with "Kubernetes is the
  default, not a hard requirement."
- Rename the trailing "Engines today" section to "Bundled adapters and
  runtimes" and reframe accordingly.

Architecture:
- Step 4 (Deploy) now leads with "Kubernetes is the path of least
  resistance, not a hard requirement" and notes that the implementation
  resources Hoptimator emits aren't run by Hoptimator — the relevant
  operator (Strimzi, Flink Kubernetes Operator, etc.) runs them.
- Reword the "A new engine" extension bullet so it doesn't conflate
  pipeline runtimes with the Engine CRD's query path.

JDBC user guide:
- Drop the GitHub Packages link from the dependency section to match
  the README change.
… MCP DDL)

Four corrections from review.

DDL reference:
- Add a "Partial views (multiple pipelines into one sink)" subsection
  under CREATE MATERIALIZED VIEW. Explains the `$<suffix>` syntax,
  shows the multi-writer pattern with two views feeding the same
  VENICE.AUDIENCE sink, and recommends partial views as the default
  for production cases. Cross-link from the CREATE MATERIALIZED VIEW
  bullet list.
- Rewrite CREATE TABLE. The previous text underplayed it — CREATE
  TABLE goes through the Deployer SPI to actually provision real
  infrastructure (e.g. creating a Kafka topic via the Kafka deployer
  rather than a separate Strimzi manifest). Show the example of
  declaring a Kafka topic with partitions and then using a partial
  view to write to it. Note that AS <query> isn't supported today.
- Trigger section: add one-line framing about what triggers enable
  (backfills, rETL refreshes, downstream notifications, ops hooks)
  and link to the concepts page for the bigger picture.

Concepts:
- Expand the TableTrigger section. Lead with what triggers actually
  let you express (backfills tied to offline-tier arrivals, rETL on
  cron, downstream notifications, operational hooks). Explain the
  status-patch mechanism that fires triggers and how that makes them
  composable with whatever already owns the upstream system. Note
  that triggers can be auto-generated from TableTemplates so adapters
  can ship sensible defaults.
- Close with the design summary: "pipelines stay pure data-flow
  expressions, triggers carry the imperative side effects, and the
  two compose at the table level."

MCP server:
- Add a limitations bullet flagging that `modify` only accepts
  CREATE [OR REPLACE] MATERIALIZED VIEW and DROP today, not the full
  Hoptimator DDL surface. Triggers, plain views, tables, and the
  inspection-only DDL still need the JDBC driver or the SQL CLI.

Plus an intentional in-flight edit to docs/user-guide/sql-cli.md:
- Replace placeholder `MY.AUDIENCE` examples with `ADS.AUDIENCE`
  (matches the demo's registered schemas) and the elided `!pipeline`
  / `!specify` outputs with the actual ones produced by the demo.
The "Kubernetes-native control plane for multi-hop data pipelines" framing
oversold the K8s coupling and undersold the SQL-first part. Kubernetes is
the default deployer, not the differentiator — Hoptimator's job is to
compile SQL into multi-system pipelines, with the runtime substrate
pluggable underneath.

New phrasing:

- README h3:    "A SQL control plane for multi-system data pipelines"
- README intro: "Hoptimator turns SQL into running, multi-hop data
                 pipelines that span Kafka, Flink, Venice, and anything
                 else you plug in."
- docs/index:   "Hoptimator is a SQL control plane for multi-system data
                 pipelines. You write SQL; it figures out the topology
                 across Kafka, Flink, Venice, and whatever else you plug
                 in, generates the specs, deploys them, and reconciles
                 them."

This keeps the "control plane" framing (planner-not-runtime) while
removing the K8s lock-in suggestion and naming the actual systems
Hoptimator spans up front.
The previous section described LogicalTables in six lines, framed mostly
around the YAML shape. That undersells what they actually do — the
abstraction is the point, not the driver.

Rewrites the "Logical tables" section in concepts.md to cover:

- The abstraction value (one named entity, N physical backends; collapses
  the typical mess of three names + hand-built sync jobs into a single
  declaration).
- The tier model (nearline / online / offline) with a table mapping each
  tier to typical backends and the role it plays.
- What you get for free at deploy time: physical tier resources via the
  Deployer SPI, implicit inter-tier sync pipelines (nearline → online,
  nearline → offline), auto-backfill triggers when offline is bound, and
  one schema source-of-truth resolved from nearline.
- Why this matters as an abstraction: tier-agnostic application code,
  the right topology being the cheap path, and clean composition with
  partial views / materialized views.
- The classic use case (lambda / kappa for feature stores) so readers
  recognize the pattern.
- An explicit note that LogicalTables ship as a JDBC driver today but
  function as an abstraction model — the deployer does the heavy lifting
  at create time, not the driver at query time.

Also updates the at-a-glance table entry from "A single logical entity
that spans multiple physical storage tiers" to lead with "An abstraction
model" and call out the auto-sync/auto-backfill behaviors.
Hoptimator's parser uses Calcite's `'key' 'value'` form (whitespace, no
`=`) inside WITH clauses, not the `'key'='value'` form. The `=` form is
Flink's syntax and only shows up in auto-generated pipeline output.

Fixes the four user-facing DDL signatures and the example:

- CREATE MATERIALIZED VIEW
- CREATE TRIGGER
- CREATE TABLE (both signature and example)
- CREATE TABLE example: `'kafka.partitions' '8'`

Adds a "WITH options syntax" section explicitly noting the difference and
calling out that the `=` form readers see in `!pipeline` / `!specify`
output is the Flink engine's syntax, not Hoptimator's input grammar.

The `=` instances elsewhere in the docs (auto-generated Flink SQL in
sql-cli.md output blocks, the truncated pipeline SQL in quickstart.md)
are correct as-is and were left alone.
Adds the Kubernetes section to docs/. Five new pages covering everything
needed to operate Hoptimator on a cluster:

  docs/kubernetes/
    index.md            -- section landing
    operator.md         -- what hoptimator-operator does, controllers it
                            runs (PipelineReconciler, TableTriggerReconciler,
                            ViewReconciler), how to deploy it, RBAC,
                            namespace scoping, lifecycle of a pipeline,
                            when not to run the operator
    crd-reference.md    -- field-by-field for all 10 CRDs (Database, View,
                            Pipeline, TableTemplate, JobTemplate,
                            TableTrigger, Subscription, LogicalTable,
                            Engine, SqlJob) with spec/status/printer-column
                            tables and one example per kind
    templates.md        -- TableTemplate / JobTemplate authoring deep
                            dive: matching rules (databases, methods),
                            full placeholder syntax (subst, defaults,
                            conditionals, transforms, multiline), the
                            default placeholders K8sSourceDeployer and
                            K8sJobDeployer inject, where hint and
                            configmap values fit, common patterns
    triggers.md         -- TableTrigger operational guide: cron vs
                            status-driven firing, pause/resume,
                            jobProperties, common patterns (offline-tier
                            backfill, rETL, downstream notification, ops
                            hooks), when not to use a trigger
    configuration.md    -- hoptimator-configmap, ConfigProvider SPI,
                            three-source precedence (system properties <
                            configmap < hints), file-like keys and lazy
                            expansion, pod-namespace detection, writing
                            a custom ConfigProvider

Also promotes the docs/index.md and README.md "Kubernetes guide
(coming soon)" entries to live links.
- SqlJob: reframe as a primitive consumed by an external SqlJob
  operator that deploys Flink and Flink-Beam SQL jobs. Drop the
  "useful when a job doesn't fit CREATE MATERIALIZED VIEW" framing,
  which conflated SqlJob with materialized-view tooling.
- operator.md: drop the "does not yet emit Kubernetes events" line.
  Confirmed by grep that no events are emitted, but the term is
  jargon that doesn't help an open-source reader without a side
  explanation. The remaining "logs are the primary debugging surface"
  carries the practical guidance.
The full `k8s.*` connection-property table in the JDBC user guide
contradicted the "Kubernetes is the default deployer, not a hard
requirement" framing established elsewhere — those properties are
deployer-specific, not driver-specific.

Consolidates so the table lives in one place:

- jdbc.md: replaces the "Kubernetes context" subsection with a short
  "Deployer-specific properties" note pointing at the Kubernetes guide,
  and explicitly calls out that a different deployer would expose its
  own `<deployer>.*` properties.
- kubernetes/configuration.md: takes the full table (now with the
  Default column merged in from the jdbc.md version), replaces the
  pointer to jdbc.md with one that names the driver-level surface
  (catalogs, hints, fun) so readers know what each page covers.

The table content is unchanged; this is a re-homing edit.
The Java `V1alpha1*` model classes under hoptimator-k8s are generated
from the CRD YAMLs by `make generate-models` (which shells out to the
upstream Kubernetes Java client's `crd-model-gen` Docker image). Without
that callout, contributors who add a CRD field have no obvious way to
discover that they need to regenerate.

Adds the reference in the two natural places:

- CONTRIBUTING.md: new step in the PR checklist ("Regenerate Java models
  if you touched a CRD"), with the command, the Docker requirement, and
  a pointer to the upstream tool.
- docs/kubernetes/crd-reference.md: a callout block near the top of the
  page so it's visible to anyone landing on the CRD reference while
  modifying one.
The previous text described `{{var==value}}` as an inline conditional
that emits a block of the template when the condition matches. That's
wrong. Reading Template.java's `render` method:

  - On a `==` / `!=` marker whose condition is satisfied, the marker
    is erased and rendering continues normally.
  - On a marker whose condition is *not* satisfied, the renderer
    `return null;`s the entire template — it produces nothing.

So the markers are template-level guards, not inline blocks. The
real-world pattern is "two templates with mirrored guards; whichever
condition matches is the one that fires." The bundled flink-template
uses this to swap between SQL-job and Beam-job entry classes — there
isn't an `{{end}}` companion marker (which the docs invented) and the
syntax doesn't conditionally include a single line.

Fixes:

- Placeholder syntax table: drop the imagined `{{end}}` companion;
  reword the `==` / `!=` rows as "template-level guard: render this
  template only if X; otherwise skip the whole template."
- Drop `{{var toName}}` from the table — it's documented in
  Template.java's javadoc but not implemented in `applyTransform()`,
  which only handles `toLowerCase`, `toUpperCase`, and `concat`.
  Documenting it would invite reports of a "broken" feature.
- Replace the "Conditional rendering example" section with one that
  shows the actual pattern: two JobTemplates with mirroring `==`
  guards on `flink.app.type`, only one of which renders for a given
  pipeline.
- Rename the section "Conditional templates" so the heading no longer
  implies block-level conditionals.
Adds the Extending section. Five pages covering the SPI surfaces a
contributor would touch when integrating a new system or customizing
behavior end-to-end:

  docs/extending/
    index.md              -- section landing with a "pick the right
                              surface" decision table; explains the
                              ServiceLoader-based loading pattern and
                              the META-INF/services file layout
    data-sources.md       -- JDBC adapter walkthrough (DriverVersion,
                              Schema, register()), Database CRD wiring,
                              TableTemplate-only path for declarative
                              integrations, when to reach for a custom
                              Deployer instead, the connector-only
                              pattern for pre-existing infrastructure,
                              when ConnectorProvider applies
    deployers.md          -- Deployer interface + lifecycle (create /
                              update / delete / specify / restore),
                              DeployerProvider with priority semantics,
                              KafkaDeployerProvider as a concrete shape
                              to copy, opt-in Validation, testing
                              surface, common pitfalls (missing restore,
                              side effects in specify, wrong priority,
                              wrong exception types)
    validators.md         -- the three points validation runs
                              (parsed SQL, resolved object, deployer
                              collection), Issues tree API with severity
                              levels, the two participation patterns
                              (Validated on your own type vs.
                              ValidatorProvider for cross-cutting
                              policy), built-in providers as references,
                              authoring patterns and testing
    config-providers.md   -- ConfigProvider SPI mechanics, when to write
                              one (vault, central config service,
                              dynamic per-connection values), example
                              sketch, interaction with hints/JDBC props,
                              caveats (latency on every connection,
                              tolerated errors, no SPI ordering control)

Also promotes the docs/index.md and README.md "Extending Hoptimator
(coming soon)" entries to live links.
Lands a top-level CLAUDE.md so Claude Code agents pick up the
high-leverage context for this repo automatically — what the project
is, where the docs live, common commands, the active module layout
(plus the marked-for-deletion list), the gotchas that aren't obvious
from reading code, the patterns to prefer, distilled testing rules,
and a "keep docs in sync" reminder.

Sections:

- What this repo is — three roles (planner/adapter/operator);
  Kubernetes is default-not-required; Deployer SPI is the actual
  extension point.
- Read these first — pointers into docs/ rather than duplicated
  content.
- Common commands — make build/test/integration-tests/coverage/
  generate-models/deploy*; ./hoptimator; ./start-mcp-server. Calls
  out that generate-models is required after any CRD edit.
- Module layout — active modules with one-line purpose; explicit
  list of marked-for-deletion modules so contributions don't land
  there.
- Gotchas — the items an agent would otherwise discover by trial:
  WITH 'key' 'value' (not =), partial-views as default, Engine CRD
  partially developed, reserved-but-unimplemented DDL, template
  guards are template-level not inline, toName documented but not
  implemented, MCP query allowlist isn't a safety mechanism, alpha
  status (don't add backcompat shims unless asked), Checkstyle +
  SpotBugs enforced.
- Patterns to prefer — declarative > imperative, validators >
  runtime checks, hints > template edits, configmap > hints,
  CREATE OR REPLACE for iteration.
- Testing — distilled from the internal testing-best-practices file.
  Focus on the non-obvious anti-patterns an agent would otherwise
  reach for: never @MockitoSettings LENIENT class-wide, no reflection
  field injection, no coverage exclusions for new files, doReturn
  for wildcard generics, MockedStatic as @mock field not
  try-with-resources, "find bugs not coverage."
- Keep docs in sync — concrete mapping from change-type to which
  doc(s) to update, plus the "README is slim, docs/ is journey-based"
  framing reminder.

~175 lines. Designed to load fast and survive multiple sessions
without rotting.
…e code

Four targeted improvements from review.

Concepts:
- Add a Validators section. Validators were only mentioned in
  extending/validators.md, so the conceptual surface was missing from
  the place readers learn vocabulary. Covers the "Deployer makes things
  real, Validator says whether they're allowed" framing, when validation
  runs, and what the bundled validators do.
- Add a Validator row to the at-a-glance table.
- Open the page with an Apache Calcite acknowledgment + link to the
  Calcite reference, since the parser/planner/JDBC layer all build on
  it and readers should know where to look for SELECT/expression syntax.

DDL reference:
- Add a Quidem callout in the page intro pointing at the .id files
  under each module's src/test/resources/ as a fast way to see
  currently-passing examples of every DDL form.

CONTRIBUTING:
- Add a Quidem paragraph to the "Build and test locally" step
  explaining what .id files are, where they live, and that they're
  the right place to extend coverage when changing DDL parsing,
  planning, or any user-facing SQL behavior.

Extending → deployers:
- Drop the trimmed KafkaDeployerProvider Java snippet; replace with
  bullets pointing at the real KafkaDeployerProvider and KafkaDeployer
  files on GitHub. Reading the source is more reliable than reading a
  drift-prone copy of it.

Extending → data-sources:
- Drop the synthesized MySystemDriver code block; replace with a
  short list pointing at the bundled hoptimator-demodb / -kafka /
  -venice / -mysql modules as progressively richer reference
  adapters. Keep the prose description of the common shape.

Other illustrative code blocks (NamingPolicyValidator,
VaultConfigProvider, the deployer testing snippet) are kept — they're
not copies of repo code, they're shape-of-what-you'd-write examples
that don't drift with the implementation.
Five overlap cleanups identified in review. Net effect: concepts.md
goes from ~430 lines to 328 with no information loss — every cut
already lives in a more authoritative page.

Concepts:

- TableTriggers: drop the four-bullet "what triggers enable" list
  and the status-patch explanation, since kubernetes/triggers.md
  now owns the operational depth. Keep the one-line decoupling
  philosophy ("pipelines stay pure data-flow expressions, triggers
  carry the imperative side effects") and link out for patterns.
  ~35 lines removed.

- Logical tables: drop the "Why this matters as an abstraction"
  three-bullet section, which restated "What you get for free" from
  a different angle. Drop the "Implementation note" header and
  fold its substance into a one-line aside. The "classic use case"
  section becomes one paragraph instead of a bulleted breakdown.
  ~50 lines removed; the unique abstraction-model framing stays.

- Configuration and hints: replace the ~25-line section with a
  5-line stub pointing at user-guide/hints.md (full hint mechanics)
  and kubernetes/configuration.md (configmap + precedence). The
  details lived in three places; now they live in two.

Quickstart:

- Drop the six-row CLI command table that duplicated
  user-guide/sql-cli.md. Keep the !intro / !quit pointers and
  link out for the rest. ~10 lines removed; one fewer place for
  the table to drift.

Plus user's intentional in-flight edits to extending/data-sources.md
and extending/deployers.md trimming "Read the source rather than rely
on a snippet here" to cleaner intro lines.
Cleanups #6-#8 from review.

#6 Engine "partially developed" caveat:
- crd-reference.md kept its full Engine section content — but the
  "partially developed" framing was identical to concepts.md's
  Engines (optional) section. Replace with a one-line pointer to
  concepts; keep the field tables since those are the unique value
  of a CRD-reference page.
- Trim the at-a-glance row similarly.
- concepts.md remains the canonical home of the Engine framing; CLAUDE.md
  preserves it as a gotcha for agents.

#7 K8s-as-pluggable framing:
- architecture.md had it twice — once as prose in "Step 4 — Deploy"
  and once as a bullet in "Where to extend". Trim the Step 4 prose:
  drop the "path of least resistance, not a hard requirement"
  sentence and the "every page assumes a cluster" reasoning, since
  the same point is made in the extension bullet and on the README's
  Why-Hoptimator list. The Step 4 paragraph still notes that bundled
  deployers target K8s and that the SPI is the swap point.
- README's Why-Hoptimator bullet remains the canonical statement.

#8 DDL reference's two non-support sections:
- Merge "Reserved syntax" and "What is *not* supported" into one
  "What's not supported" section with two clearly labeled subgroups:
  "Out of scope" (INSERT/UPDATE/DELETE against arbitrary tables,
  ALTER TABLE, transactions, stored procedures) and "Parses but not
  yet executed" (REFRESH MATERIALIZED VIEW, FIRE *, PAUSE/RESUME MV,
  CREATE FUNCTION). Same content; readers no longer have to scroll
  past identifiers/WITH/system-tables to find the second list.
Two normalization passes.

hints.md (user guide):
- Drop the deep "Template hints" / "Connector hints" sub-sections that
  duplicated kubernetes templates content; replace with a brief two-flavor
  paragraph and a pointer to "Templates and configuration" for the full
  story.
- Remove the connector-hint examples block and the source/sink
  segment-meaning table (lives in templates/configuration now).
- Keep the user-side surface that's unique to this page: where to set
  hints (CLI, JDBC, Subscription, MCP), format, advisory nature,
  reading what was applied via `kubectl get pipeline ... -o yaml`,
  and the hint-vs-template-default decision rule.

kubernetes/templates.md → kubernetes/configuration.md (merged):
- Templates were already part of "configuration" in the broader sense
  (templates are the artifacts; configmap/hints/system-props supply the
  values). Merge the two pages into a single "Templates and
  configuration" reference, file named configuration.md (the more
  general name).
- Top half: template authoring (matching, structure, placeholder
  syntax, conditionals, patterns, tips) — content from the old
  templates.md.
- Bottom half: where placeholder values come from — deployer-injected
  defaults, the three configuration sources (hoptimator-configmap,
  JDBC connection properties / Subscription hints, JVM system
  properties), precedence, file-like keys, k8s.* connection-properties
  reference, pod-namespace detection, ConfigProvider extension pointer.
- Update every cross-link: docs/index.md, docs/user-guide/hints.md,
  docs/extending/{data-sources,deployers,index}.md,
  docs/kubernetes/{crd-reference,index}.md.
- Delete docs/kubernetes/templates.md.

Plus user-side fixes in this round:
- ddl-reference.md: drop the standalone "WITH options syntax" section
  (the form signatures already show the 'key' 'value' shape inline).
- architecture.md, jdbc.md: change example schema MY.AUDIENCE →
  ADS.AUDIENCE to match the demo's registered schemas.
@github-actions
Copy link
Copy Markdown

Code Coverage

Overall Project 84.62% 🟢

There is no coverage information present for the Files changed

@ryannedolan
Copy link
Copy Markdown
Collaborator

Nice! Wonder if we can set up some sort of automation to revisit this as changes are made.

@jogrogan
Copy link
Copy Markdown
Collaborator Author

Nice! Wonder if we can set up some sort of automation to revisit this as changes are made.

Automate, not sure. I did add a blurb to the CLAUDE.md that will hopefully help it not go out of date, assuming we use Claude. But I did go through this change and purposefully exclude really specific things that seemed likely to drift.

Copy link
Copy Markdown
Collaborator

@ryannedolan ryannedolan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Crazy to see how many features we have hidden in here :)

@jogrogan jogrogan merged commit e79eb1c into main Apr 28, 2026
1 check passed
@jogrogan jogrogan deleted the docs/restructure branch April 28, 2026 04:03
jogrogan added a commit that referenced this pull request May 1, 2026
* Docs: rewrite README and add docs/ tree (Phase 1)

Restructures documentation for an open-source audience. The README is now
a slim landing page (project framing, why-Hoptimator, quickstart pointer,
status, license) instead of a mixed dev-guide. Detailed content moves into
a journey-based docs/ tree modeled on linkedin/venice's structure:

  docs/
    index.md                    -- top-level landing
    getting-started/
      index.md                  -- section landing
      quickstart.md             -- 5-minute walkthrough on Docker Desktop
      concepts.md               -- vocabulary reference
      architecture.md           -- life of a SQL statement, module map
    resources/
      learn-more.md             -- engineering blog posts and case studies

Also cleans up CONTRIBUTING.md: removes placeholder "(link to more info)"
URLs and adds a how-to-file-an-issue / how-to-send-a-PR section.

Phase 2 (user guide) and beyond will follow on the same branch.

* Docs: add user guide (Phase 2)

Adds the User guide section to docs/. Covers the three client interfaces
(SQL CLI, JDBC, MCP) and the two reference pages (DDL, Hints):

  docs/user-guide/
    index.md            -- section landing
    sql-cli.md          -- ./hoptimator script, sqlline + custom commands
                            (!intro, !resolve, !pipeline, !specify)
    jdbc.md             -- jdbc:hoptimator:// URL format, full connection-
                            property reference, Java example, system tables
    mcp-server.md       -- MCP tool surface (discovery / planning /
                            execution), recommended agent workflow
    ddl-reference.md    -- CREATE/DROP for views/materialized views/triggers
                            /functions/tables, PAUSE/RESUME/REFRESH/FIRE,
                            identifier rules, k8s system schema, what isn't
                            supported
    hints.md            -- template hints vs connector hints, where to set
                            them, how to read what was applied

Also promotes the docs/index.md "User guide" entry from "coming soon" to
linked, and updates the README's Documentation section to match.

* Docs: correct DDL/MCP coverage based on what's actually wired up

Two corrections after a closer look at what the executor actually handles:

DDL reference:
- Drops CREATE FUNCTION as a documented form. The grammar accepts it,
  but no executor handler exists.
- Drops the standalone REFRESH/FIRE section and the "PAUSE/RESUME also
  works for materialized views" line. None of these have executor support.
- Adds a "Reserved syntax" section that lists what parses but does not
  execute today (REFRESH MATERIALIZED VIEW, FIRE *, PAUSE/RESUME
  MATERIALIZED VIEW, CREATE FUNCTION) so readers don't get a false
  positive from a successful parse.

MCP server:
- Rewrites the `query` description to explain why it's restricted to
  ADS / PROFILE / METADATA / K8S: not a safety allowlist, but the only
  schemas Hoptimator can answer queries from without a configured engine.
  The Zeppelin POC for notebook-style execution is acknowledged as
  incomplete.
- Updates the limitations bullet accordingly so it doesn't suggest
  "just widen the allowlist" as a fix.

* Docs: address Phase 1 review feedback

Six corrections from review:

CONTRIBUTING:
- Add a "cover your changes" step. Document `make coverage` and an 80%
  line-coverage target on changed code (CI currently enforces a softer
  60% / 40%).

README:
- Badge: add explicit `&label=CI` so the build status renders as "CI"
  instead of the default "build".
- Drop the link to GitHub Packages. The page is empty in practice; only
  JFrog has artifacts today.
- Soften the "That one statement becomes:" list. The exact resources
  emitted depend on the registered Databases and templates, not on
  Hoptimator. Frame it as "with a typical Kafka + Flink setup" and add
  a paragraph explaining that the same SQL can target a different stack
  by swapping templates.
- Replace the "Kubernetes-native" why-bullet with a more honest
  "Kubernetes out of the box, not as a hard requirement" — the bundled
  deployers target K8s, but `Deployer` is the actual extension point.

Quickstart:
- Add a note on `CREATE OR REPLACE MATERIALIZED VIEW`. Without it,
  re-running a CREATE for an existing view fails; with it, the
  development loop is much faster.

Concepts (engine clarification):
- Split "Engines and connectors" into separate "Connectors" and
  "Engines (optional)" sections. Connectors do not require an Engine
  to function — Hoptimator emits YAML (e.g. a FlinkSessionJob) and an
  unrelated operator (Flink Kubernetes Operator, etc.) runs it.
- The Engine CRD is specifically about *query* execution (e.g. running
  SELECT against tables that need a runtime). Pipeline materialization
  does not need one. Mark the Engine path as partially developed today.
- Strengthen the Deployers section to lead with "Kubernetes is the
  default, not a hard requirement."
- Rename the trailing "Engines today" section to "Bundled adapters and
  runtimes" and reframe accordingly.

Architecture:
- Step 4 (Deploy) now leads with "Kubernetes is the path of least
  resistance, not a hard requirement" and notes that the implementation
  resources Hoptimator emits aren't run by Hoptimator — the relevant
  operator (Strimzi, Flink Kubernetes Operator, etc.) runs them.
- Reword the "A new engine" extension bullet so it doesn't conflate
  pipeline runtimes with the Engine CRD's query path.

JDBC user guide:
- Drop the GitHub Packages link from the dependency section to match
  the README change.

* Docs: Phase 2 review feedback (partial views, triggers, CREATE TABLE, MCP DDL)

Four corrections from review.

DDL reference:
- Add a "Partial views (multiple pipelines into one sink)" subsection
  under CREATE MATERIALIZED VIEW. Explains the `$<suffix>` syntax,
  shows the multi-writer pattern with two views feeding the same
  VENICE.AUDIENCE sink, and recommends partial views as the default
  for production cases. Cross-link from the CREATE MATERIALIZED VIEW
  bullet list.
- Rewrite CREATE TABLE. The previous text underplayed it — CREATE
  TABLE goes through the Deployer SPI to actually provision real
  infrastructure (e.g. creating a Kafka topic via the Kafka deployer
  rather than a separate Strimzi manifest). Show the example of
  declaring a Kafka topic with partitions and then using a partial
  view to write to it. Note that AS <query> isn't supported today.
- Trigger section: add one-line framing about what triggers enable
  (backfills, rETL refreshes, downstream notifications, ops hooks)
  and link to the concepts page for the bigger picture.

Concepts:
- Expand the TableTrigger section. Lead with what triggers actually
  let you express (backfills tied to offline-tier arrivals, rETL on
  cron, downstream notifications, operational hooks). Explain the
  status-patch mechanism that fires triggers and how that makes them
  composable with whatever already owns the upstream system. Note
  that triggers can be auto-generated from TableTemplates so adapters
  can ship sensible defaults.
- Close with the design summary: "pipelines stay pure data-flow
  expressions, triggers carry the imperative side effects, and the
  two compose at the table level."

MCP server:
- Add a limitations bullet flagging that `modify` only accepts
  CREATE [OR REPLACE] MATERIALIZED VIEW and DROP today, not the full
  Hoptimator DDL surface. Triggers, plain views, tables, and the
  inspection-only DDL still need the JDBC driver or the SQL CLI.

Plus an intentional in-flight edit to docs/user-guide/sql-cli.md:
- Replace placeholder `MY.AUDIENCE` examples with `ADS.AUDIENCE`
  (matches the demo's registered schemas) and the elided `!pipeline`
  / `!specify` outputs with the actual ones produced by the demo.

* Docs: switch tagline from "Kubernetes-native" to "SQL control plane"

The "Kubernetes-native control plane for multi-hop data pipelines" framing
oversold the K8s coupling and undersold the SQL-first part. Kubernetes is
the default deployer, not the differentiator — Hoptimator's job is to
compile SQL into multi-system pipelines, with the runtime substrate
pluggable underneath.

New phrasing:

- README h3:    "A SQL control plane for multi-system data pipelines"
- README intro: "Hoptimator turns SQL into running, multi-hop data
                 pipelines that span Kafka, Flink, Venice, and anything
                 else you plug in."
- docs/index:   "Hoptimator is a SQL control plane for multi-system data
                 pipelines. You write SQL; it figures out the topology
                 across Kafka, Flink, Venice, and whatever else you plug
                 in, generates the specs, deploys them, and reconciles
                 them."

This keeps the "control plane" framing (planner-not-runtime) while
removing the K8s lock-in suggestion and naming the actual systems
Hoptimator spans up front.

* Docs: expand LogicalTable concept to explain the abstraction model

The previous section described LogicalTables in six lines, framed mostly
around the YAML shape. That undersells what they actually do — the
abstraction is the point, not the driver.

Rewrites the "Logical tables" section in concepts.md to cover:

- The abstraction value (one named entity, N physical backends; collapses
  the typical mess of three names + hand-built sync jobs into a single
  declaration).
- The tier model (nearline / online / offline) with a table mapping each
  tier to typical backends and the role it plays.
- What you get for free at deploy time: physical tier resources via the
  Deployer SPI, implicit inter-tier sync pipelines (nearline → online,
  nearline → offline), auto-backfill triggers when offline is bound, and
  one schema source-of-truth resolved from nearline.
- Why this matters as an abstraction: tier-agnostic application code,
  the right topology being the cheap path, and clean composition with
  partial views / materialized views.
- The classic use case (lambda / kappa for feature stores) so readers
  recognize the pattern.
- An explicit note that LogicalTables ship as a JDBC driver today but
  function as an abstraction model — the deployer does the heavy lifting
  at create time, not the driver at query time.

Also updates the at-a-glance table entry from "A single logical entity
that spans multiple physical storage tiers" to lead with "An abstraction
model" and call out the auto-sync/auto-backfill behaviors.

* Docs: fix WITH options syntax in DDL reference

Hoptimator's parser uses Calcite's `'key' 'value'` form (whitespace, no
`=`) inside WITH clauses, not the `'key'='value'` form. The `=` form is
Flink's syntax and only shows up in auto-generated pipeline output.

Fixes the four user-facing DDL signatures and the example:

- CREATE MATERIALIZED VIEW
- CREATE TRIGGER
- CREATE TABLE (both signature and example)
- CREATE TABLE example: `'kafka.partitions' '8'`

Adds a "WITH options syntax" section explicitly noting the difference and
calling out that the `=` form readers see in `!pipeline` / `!specify`
output is the Flink engine's syntax, not Hoptimator's input grammar.

The `=` instances elsewhere in the docs (auto-generated Flink SQL in
sql-cli.md output blocks, the truncated pipeline SQL in quickstart.md)
are correct as-is and were left alone.

* Docs: tone down LogicalTable opening paragraph

* manual cleanup

* Docs: add Kubernetes guide (Phase 3)

Adds the Kubernetes section to docs/. Five new pages covering everything
needed to operate Hoptimator on a cluster:

  docs/kubernetes/
    index.md            -- section landing
    operator.md         -- what hoptimator-operator does, controllers it
                            runs (PipelineReconciler, TableTriggerReconciler,
                            ViewReconciler), how to deploy it, RBAC,
                            namespace scoping, lifecycle of a pipeline,
                            when not to run the operator
    crd-reference.md    -- field-by-field for all 10 CRDs (Database, View,
                            Pipeline, TableTemplate, JobTemplate,
                            TableTrigger, Subscription, LogicalTable,
                            Engine, SqlJob) with spec/status/printer-column
                            tables and one example per kind
    templates.md        -- TableTemplate / JobTemplate authoring deep
                            dive: matching rules (databases, methods),
                            full placeholder syntax (subst, defaults,
                            conditionals, transforms, multiline), the
                            default placeholders K8sSourceDeployer and
                            K8sJobDeployer inject, where hint and
                            configmap values fit, common patterns
    triggers.md         -- TableTrigger operational guide: cron vs
                            status-driven firing, pause/resume,
                            jobProperties, common patterns (offline-tier
                            backfill, rETL, downstream notification, ops
                            hooks), when not to use a trigger
    configuration.md    -- hoptimator-configmap, ConfigProvider SPI,
                            three-source precedence (system properties <
                            configmap < hints), file-like keys and lazy
                            expansion, pod-namespace detection, writing
                            a custom ConfigProvider

Also promotes the docs/index.md and README.md "Kubernetes guide
(coming soon)" entries to live links.

* Docs: tighten SqlJob and operator-logging notes

- SqlJob: reframe as a primitive consumed by an external SqlJob
  operator that deploys Flink and Flink-Beam SQL jobs. Drop the
  "useful when a job doesn't fit CREATE MATERIALIZED VIEW" framing,
  which conflated SqlJob with materialized-view tooling.
- operator.md: drop the "does not yet emit Kubernetes events" line.
  Confirmed by grep that no events are emitted, but the term is
  jargon that doesn't help an open-source reader without a side
  explanation. The remaining "logs are the primary debugging surface"
  carries the practical guidance.

* Docs: move k8s connection properties out of jdbc.md

The full `k8s.*` connection-property table in the JDBC user guide
contradicted the "Kubernetes is the default deployer, not a hard
requirement" framing established elsewhere — those properties are
deployer-specific, not driver-specific.

Consolidates so the table lives in one place:

- jdbc.md: replaces the "Kubernetes context" subsection with a short
  "Deployer-specific properties" note pointing at the Kubernetes guide,
  and explicitly calls out that a different deployer would expose its
  own `<deployer>.*` properties.
- kubernetes/configuration.md: takes the full table (now with the
  Default column merged in from the jdbc.md version), replaces the
  pointer to jdbc.md with one that names the driver-level surface
  (catalogs, hints, fun) so readers know what each page covers.

The table content is unchanged; this is a re-homing edit.

* Docs: document `make generate-models` for CRD model regeneration

The Java `V1alpha1*` model classes under hoptimator-k8s are generated
from the CRD YAMLs by `make generate-models` (which shells out to the
upstream Kubernetes Java client's `crd-model-gen` Docker image). Without
that callout, contributors who add a CRD field have no obvious way to
discover that they need to regenerate.

Adds the reference in the two natural places:

- CONTRIBUTING.md: new step in the PR checklist ("Regenerate Java models
  if you touched a CRD"), with the command, the Docker requirement, and
  a pointer to the upstream tool.
- docs/kubernetes/crd-reference.md: a callout block near the top of the
  page so it's visible to anyone landing on the CRD reference while
  modifying one.

* Docs: correct conditional template syntax (template-level, not inline)

The previous text described `{{var==value}}` as an inline conditional
that emits a block of the template when the condition matches. That's
wrong. Reading Template.java's `render` method:

  - On a `==` / `!=` marker whose condition is satisfied, the marker
    is erased and rendering continues normally.
  - On a marker whose condition is *not* satisfied, the renderer
    `return null;`s the entire template — it produces nothing.

So the markers are template-level guards, not inline blocks. The
real-world pattern is "two templates with mirrored guards; whichever
condition matches is the one that fires." The bundled flink-template
uses this to swap between SQL-job and Beam-job entry classes — there
isn't an `{{end}}` companion marker (which the docs invented) and the
syntax doesn't conditionally include a single line.

Fixes:

- Placeholder syntax table: drop the imagined `{{end}}` companion;
  reword the `==` / `!=` rows as "template-level guard: render this
  template only if X; otherwise skip the whole template."
- Drop `{{var toName}}` from the table — it's documented in
  Template.java's javadoc but not implemented in `applyTransform()`,
  which only handles `toLowerCase`, `toUpperCase`, and `concat`.
  Documenting it would invite reports of a "broken" feature.
- Replace the "Conditional rendering example" section with one that
  shows the actual pattern: two JobTemplates with mirroring `==`
  guards on `flink.app.type`, only one of which renders for a given
  pipeline.
- Rename the section "Conditional templates" so the heading no longer
  implies block-level conditionals.

* Docs: add Extending Hoptimator guide (Phase 4)

Adds the Extending section. Five pages covering the SPI surfaces a
contributor would touch when integrating a new system or customizing
behavior end-to-end:

  docs/extending/
    index.md              -- section landing with a "pick the right
                              surface" decision table; explains the
                              ServiceLoader-based loading pattern and
                              the META-INF/services file layout
    data-sources.md       -- JDBC adapter walkthrough (DriverVersion,
                              Schema, register()), Database CRD wiring,
                              TableTemplate-only path for declarative
                              integrations, when to reach for a custom
                              Deployer instead, the connector-only
                              pattern for pre-existing infrastructure,
                              when ConnectorProvider applies
    deployers.md          -- Deployer interface + lifecycle (create /
                              update / delete / specify / restore),
                              DeployerProvider with priority semantics,
                              KafkaDeployerProvider as a concrete shape
                              to copy, opt-in Validation, testing
                              surface, common pitfalls (missing restore,
                              side effects in specify, wrong priority,
                              wrong exception types)
    validators.md         -- the three points validation runs
                              (parsed SQL, resolved object, deployer
                              collection), Issues tree API with severity
                              levels, the two participation patterns
                              (Validated on your own type vs.
                              ValidatorProvider for cross-cutting
                              policy), built-in providers as references,
                              authoring patterns and testing
    config-providers.md   -- ConfigProvider SPI mechanics, when to write
                              one (vault, central config service,
                              dynamic per-connection values), example
                              sketch, interaction with hints/JDBC props,
                              caveats (latency on every connection,
                              tolerated errors, no SPI ordering control)

Also promotes the docs/index.md and README.md "Extending Hoptimator
(coming soon)" entries to live links.

* Add CLAUDE.md for agent onboarding

Lands a top-level CLAUDE.md so Claude Code agents pick up the
high-leverage context for this repo automatically — what the project
is, where the docs live, common commands, the active module layout
(plus the marked-for-deletion list), the gotchas that aren't obvious
from reading code, the patterns to prefer, distilled testing rules,
and a "keep docs in sync" reminder.

Sections:

- What this repo is — three roles (planner/adapter/operator);
  Kubernetes is default-not-required; Deployer SPI is the actual
  extension point.
- Read these first — pointers into docs/ rather than duplicated
  content.
- Common commands — make build/test/integration-tests/coverage/
  generate-models/deploy*; ./hoptimator; ./start-mcp-server. Calls
  out that generate-models is required after any CRD edit.
- Module layout — active modules with one-line purpose; explicit
  list of marked-for-deletion modules so contributions don't land
  there.
- Gotchas — the items an agent would otherwise discover by trial:
  WITH 'key' 'value' (not =), partial-views as default, Engine CRD
  partially developed, reserved-but-unimplemented DDL, template
  guards are template-level not inline, toName documented but not
  implemented, MCP query allowlist isn't a safety mechanism, alpha
  status (don't add backcompat shims unless asked), Checkstyle +
  SpotBugs enforced.
- Patterns to prefer — declarative > imperative, validators >
  runtime checks, hints > template edits, configmap > hints,
  CREATE OR REPLACE for iteration.
- Testing — distilled from the internal testing-best-practices file.
  Focus on the non-obvious anti-patterns an agent would otherwise
  reach for: never @MockitoSettings LENIENT class-wide, no reflection
  field injection, no coverage exclusions for new files, doReturn
  for wildcard generics, MockedStatic as @mock field not
  try-with-resources, "find bugs not coverage."
- Keep docs in sync — concrete mapping from change-type to which
  doc(s) to update, plus the "README is slim, docs/ is journey-based"
  framing reminder.

~175 lines. Designed to load fast and survive multiple sessions
without rotting.

* Docs: validators in concepts, Calcite + Quidem links, drop drift-prone code

Four targeted improvements from review.

Concepts:
- Add a Validators section. Validators were only mentioned in
  extending/validators.md, so the conceptual surface was missing from
  the place readers learn vocabulary. Covers the "Deployer makes things
  real, Validator says whether they're allowed" framing, when validation
  runs, and what the bundled validators do.
- Add a Validator row to the at-a-glance table.
- Open the page with an Apache Calcite acknowledgment + link to the
  Calcite reference, since the parser/planner/JDBC layer all build on
  it and readers should know where to look for SELECT/expression syntax.

DDL reference:
- Add a Quidem callout in the page intro pointing at the .id files
  under each module's src/test/resources/ as a fast way to see
  currently-passing examples of every DDL form.

CONTRIBUTING:
- Add a Quidem paragraph to the "Build and test locally" step
  explaining what .id files are, where they live, and that they're
  the right place to extend coverage when changing DDL parsing,
  planning, or any user-facing SQL behavior.

Extending → deployers:
- Drop the trimmed KafkaDeployerProvider Java snippet; replace with
  bullets pointing at the real KafkaDeployerProvider and KafkaDeployer
  files on GitHub. Reading the source is more reliable than reading a
  drift-prone copy of it.

Extending → data-sources:
- Drop the synthesized MySystemDriver code block; replace with a
  short list pointing at the bundled hoptimator-demodb / -kafka /
  -venice / -mysql modules as progressively richer reference
  adapters. Keep the prose description of the common shape.

Other illustrative code blocks (NamingPolicyValidator,
VaultConfigProvider, the deployer testing snippet) are kept — they're
not copies of repo code, they're shape-of-what-you'd-write examples
that don't drift with the implementation.

* slim CLAUDE.md

* Docs: trim concepts.md and quickstart for duplication

Five overlap cleanups identified in review. Net effect: concepts.md
goes from ~430 lines to 328 with no information loss — every cut
already lives in a more authoritative page.

Concepts:

- TableTriggers: drop the four-bullet "what triggers enable" list
  and the status-patch explanation, since kubernetes/triggers.md
  now owns the operational depth. Keep the one-line decoupling
  philosophy ("pipelines stay pure data-flow expressions, triggers
  carry the imperative side effects") and link out for patterns.
  ~35 lines removed.

- Logical tables: drop the "Why this matters as an abstraction"
  three-bullet section, which restated "What you get for free" from
  a different angle. Drop the "Implementation note" header and
  fold its substance into a one-line aside. The "classic use case"
  section becomes one paragraph instead of a bulleted breakdown.
  ~50 lines removed; the unique abstraction-model framing stays.

- Configuration and hints: replace the ~25-line section with a
  5-line stub pointing at user-guide/hints.md (full hint mechanics)
  and kubernetes/configuration.md (configmap + precedence). The
  details lived in three places; now they live in two.

Quickstart:

- Drop the six-row CLI command table that duplicated
  user-guide/sql-cli.md. Keep the !intro / !quit pointers and
  link out for the rest. ~10 lines removed; one fewer place for
  the table to drift.

Plus user's intentional in-flight edits to extending/data-sources.md
and extending/deployers.md trimming "Read the source rather than rely
on a snippet here" to cleaner intro lines.

* Docs: dedupe Engine caveat, K8s framing, and DDL non-support sections

Cleanups #6-#8 from review.

#6 Engine "partially developed" caveat:
- crd-reference.md kept its full Engine section content — but the
  "partially developed" framing was identical to concepts.md's
  Engines (optional) section. Replace with a one-line pointer to
  concepts; keep the field tables since those are the unique value
  of a CRD-reference page.
- Trim the at-a-glance row similarly.
- concepts.md remains the canonical home of the Engine framing; CLAUDE.md
  preserves it as a gotcha for agents.

#7 K8s-as-pluggable framing:
- architecture.md had it twice — once as prose in "Step 4 — Deploy"
  and once as a bullet in "Where to extend". Trim the Step 4 prose:
  drop the "path of least resistance, not a hard requirement"
  sentence and the "every page assumes a cluster" reasoning, since
  the same point is made in the extension bullet and on the README's
  Why-Hoptimator list. The Step 4 paragraph still notes that bundled
  deployers target K8s and that the SPI is the swap point.
- README's Why-Hoptimator bullet remains the canonical statement.

#8 DDL reference's two non-support sections:
- Merge "Reserved syntax" and "What is *not* supported" into one
  "What's not supported" section with two clearly labeled subgroups:
  "Out of scope" (INSERT/UPDATE/DELETE against arbitrary tables,
  ALTER TABLE, transactions, stored procedures) and "Parses but not
  yet executed" (REFRESH MATERIALIZED VIEW, FIRE *, PAUSE/RESUME MV,
  CREATE FUNCTION). Same content; readers no longer have to scroll
  past identifiers/WITH/system-tables to find the second list.

* Docs: surface !tables and !schemas in the SQL CLI command table

* Docs: trim hints.md and merge templates.md into configuration.md

Two normalization passes.

hints.md (user guide):
- Drop the deep "Template hints" / "Connector hints" sub-sections that
  duplicated kubernetes templates content; replace with a brief two-flavor
  paragraph and a pointer to "Templates and configuration" for the full
  story.
- Remove the connector-hint examples block and the source/sink
  segment-meaning table (lives in templates/configuration now).
- Keep the user-side surface that's unique to this page: where to set
  hints (CLI, JDBC, Subscription, MCP), format, advisory nature,
  reading what was applied via `kubectl get pipeline ... -o yaml`,
  and the hint-vs-template-default decision rule.

kubernetes/templates.md → kubernetes/configuration.md (merged):
- Templates were already part of "configuration" in the broader sense
  (templates are the artifacts; configmap/hints/system-props supply the
  values). Merge the two pages into a single "Templates and
  configuration" reference, file named configuration.md (the more
  general name).
- Top half: template authoring (matching, structure, placeholder
  syntax, conditionals, patterns, tips) — content from the old
  templates.md.
- Bottom half: where placeholder values come from — deployer-injected
  defaults, the three configuration sources (hoptimator-configmap,
  JDBC connection properties / Subscription hints, JVM system
  properties), precedence, file-like keys, k8s.* connection-properties
  reference, pod-namespace detection, ConfigProvider extension pointer.
- Update every cross-link: docs/index.md, docs/user-guide/hints.md,
  docs/extending/{data-sources,deployers,index}.md,
  docs/kubernetes/{crd-reference,index}.md.
- Delete docs/kubernetes/templates.md.

Plus user-side fixes in this round:
- ddl-reference.md: drop the standalone "WITH options syntax" section
  (the form signatures already show the 'key' 'value' shape inline).
- architecture.md, jdbc.md: change example schema MY.AUDIENCE →
  ADS.AUDIENCE to match the demo's registered schemas.

* Rename configuration -> templates
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants