Skip to content

refactor: drop hive, databricks, and redshift connectors#721

Merged
mplatzer merged 18 commits into
mainfrom
cursor/local-first-drop-heavy-connectors-da97
Apr 27, 2026
Merged

refactor: drop hive, databricks, and redshift connectors#721
mplatzer merged 18 commits into
mainfrom
cursor/local-first-drop-heavy-connectors-da97

Conversation

@mplatzer
Copy link
Copy Markdown
Contributor

Pull Request

Stacks on top of #720 (feat/local-first-major-release). Drops the three most complex data connectors so the LOCAL-first SDK only ships connectors we can actually maintain and test.

Changes

  • Delete the connector modules:
    • mostlyai/sdk/_data/db/hive.py
    • mostlyai/sdk/_data/db/databricks.py
    • mostlyai/sdk/_data/db/redshift.py
  • Remove ConnectorType.{hive, databricks, redshift} from mostlyai/sdk/domain.py (enum + docstring entries) and the matching mostly.connect examples in mostlyai/sdk/client/api.py.
  • Drop the databricks, hive, and redshift extras from pyproject.toml and refresh the README's connector list / install example.
  • Remove DatabricksContainerParameters plus the Hive-only keystore / keystore_password fields and all Kerberos fields from mostlyai/sdk/_data/metadata_objects.py. The shared ca_certificate field is kept (still used by AWS S3 and Postgres SSL).
  • In mostlyai/sdk/_data/db/base.py, remove the kerberized() context manager, all Kerberos init args / attributes / helpers, the KRB5_CONF_TEMPLATE, and now-unused base64 / hashlib / os / subprocess imports. use_sa_engine now only wraps use_ssh_tunnel() and use_ssl_connection().
  • Delete mostlyai/sdk/_data/util/kerberos.py and its unit tests (tests/_data/unit/util/test_kerberos.py); these were used exclusively by the Hive connector.
  • Refresh uv.lock accordingly.

Why this change?

Per the LOCAL-first refactor goal, we want a connector surface that we can credibly support. These three are by far the heaviest:

  • Redshift ships its own DefaultDialect-derived SQLAlchemy dialect and a hand-rolled DBAPI wrapper, plus bespoke information_schema queries, case-insensitivity workarounds, and parameter-limit chunking. It's the largest single connector in the repo.
  • Databricks owns a non-standard write path (parquet → temp Volume → COPY INTO), Azure ClientSecretCredential service-principal auth, and a fragile error-message-keyword mapper in is_accessible().
  • Hive monkey-patches ImpalaDialect and HiveDialect for SA 2.0 compatibility and was the sole driver of the Kerberos plumbing in SqlAlchemyContainer and the kerberos/pyhive/impyla extras.

None of these had end-to-end coverage in this repo (only SQLite is covered by tests/_local/end_to_end/test_connector.py), so dropping them removes both the most code and the most untested code.

The remaining DB connectors (postgresql, mysql, mariadb, mssql, oracle, snowflake, bigquery, sqlite) are thin SQLAlchemy wrappers around mostlyai/sdk/_data/db/base.py, and all file connectors are unaffected.

Testing

  • uv run python -m compileall -q mostlyai/sdk — clean.
  • uv run pytest tests/test_domain.py tests/_local/unit/ tests/client/unit/ tests/_data/unit/db/ tests/_data/unit/util/ tests/_data/unit/file/ — 177 passed, 1 skipped.
  • uv run pytest tests/_local/end_to_end/test_connector.py — 4 passed (SQLite read / write / delete / query).
  • uv run ruff check on the modified files — all clean. (Pre-existing lint warnings in unrelated files under mostlyai/sdk/_local/execution/ and mostlyai/sdk/_data/{auto_detect,non_context}.py are inherited from the base branch and not touched here.)

Additional Notes

  • Public API impact: ConnectorType loses the HIVE, DATABRICKS, and REDSHIFT members. Existing connectors of those types stored on disk would no longer load — this is expected for a local-first major-release branch.
  • The shared ca_certificate field is intentionally retained on SqlAlchemyContainerParameters and SslCertificates; only the Hive-specific keystore/keystore_password fields are removed.
Open in Web Open in Cursor 

mplatzer and others added 18 commits April 25, 2026 10:16
Switch default startup behavior to LOCAL mode, keep CLIENT as explicit remote mode, and remove unsupported platform-era endpoints plus public OpenAPI model generation artifacts for the major release.

Made-with: Cursor
Drop users/me client and local route support, remove CurrentUser and UserPlan domain models, and tighten tooling docs wording to match SDK-owned model maintenance.

Made-with: Cursor
Remove remaining platform-era client entrypoints that are not backed by the SDK local runtime API, keeping the public SDK surface aligned with supported endpoints.

Made-with: Cursor
Reduce domain.py to models transitively required by the currently supported SDK endpoints and remove remaining platform-era account, billing, notification, artifact, and assistant model surface.

Made-with: Cursor
Drop the unsupported computes client surface, remove the local /computes stub endpoint, and remove the obsolete ComputeListItem domain model to keep the SDK API runtime-only.

Made-with: Cursor
Drop API key and bearer-token support, remove assistant/user-org/visibility/compute metadata from the SDK domain and endpoints, prioritize local-first examples, and remove the legacy tools folder.

Made-with: Cursor
Drop connector, generator, and synthetic dataset usage stat models and fields, including no_of_threads and related no_of_* counters, and align list docstrings with the reduced runtime surface.

Made-with: Cursor
Drop sort_by query support from connector, generator, and synthetic dataset list APIs and remove remaining NO_OF_* sort-related surface now that listing is recency-only.

Made-with: Cursor
Drop the disabled client-mode E2E workflow job and remove outdated api_domain filters that referenced removed Share/LiteLlm/DataLlm/UsageReport surfaces.

Made-with: Cursor
Update initialization examples to require explicit base_url for client mode and remove remaining mention of MOSTLY_BASE_URL environment configuration.

Made-with: Cursor
Drop /about and /models endpoint support from local routes and client helpers, remove the remaining AboutService domain model, and update docs/tests accordingly.

Made-with: Cursor
Drop MOSTLY_LOCAL-based mode selection so SDK mode is configured only through explicit constructor arguments.

Made-with: Cursor
Drop short-lived file token arguments from local file/log download endpoints so the SDK API no longer exposes legacy platform-era token plumbing.

Made-with: Cursor
Fix Docker runtime entrypoint after tools removal, refresh local server description text, and update project homepage plus stale build excludes.

Made-with: Cursor
Make docs and tutorials local-first by removing SDV comparison notebooks, dropping support contact links, and cleaning external blog references while updating README install/quick-start ordering.

Made-with: Cursor
Update Quick Start to show only LOCAL mode setup and use uv pip install for the primary install command.

Made-with: Cursor
Remove redundant wording in the Quick Start local install sentence while keeping the local-first guidance intact.

Made-with: Cursor
These three were the most complex connectors in the codebase:

- Redshift shipped a custom DBAPI shim and SQLAlchemy dialect built from
  scratch on top of `DefaultDialect`, including hand-rolled
  `information_schema` introspection and case-insensitive table handling.
- Databricks owned a non-standard write path (parquet -> temp Volume ->
  COPY INTO), Azure service-principal auth via `azure-identity`, and a
  fragile error-message-keyword mapper.
- Hive monkey-patched both `ImpalaDialect` and `HiveDialect` for SA 2.0
  compatibility, was the sole consumer of the kerberos plumbing in
  `SqlAlchemyContainer`, and pulled in three extras
  (`pyhive`, `impyla`, `kerberos`).

Remove the connector modules along with:
- `ConnectorType.{hive,databricks,redshift}` and their docstring entries
  in `domain.py` and `client/api.py`.
- `DatabricksContainerParameters` plus the keystore/kerberos fields on
  `SqlAlchemyContainerParameters` and `SslCertificates`.
- The kerberized() context manager and supporting kerberos plumbing in
  `_data/db/base.py`, plus `_data/util/kerberos.py` and its test.
- The `databricks`, `hive`, and `redshift` extras in `pyproject.toml`
  and the corresponding wording in the README.

Co-authored-by: Michi Platzer <michael.platzer@gmail.com>
@mplatzer mplatzer marked this pull request as ready for review April 25, 2026 19:15
Base automatically changed from feat/local-first-major-release to main April 27, 2026 05:33
@mplatzer mplatzer merged commit 5dad631 into main Apr 27, 2026
11 checks passed
@mplatzer mplatzer deleted the cursor/local-first-drop-heavy-connectors-da97 branch April 27, 2026 05:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants