refactor: drop hive, databricks, and redshift connectors by mplatzer · Pull Request #721 · mostly-ai/mostlyai

mplatzer · 2026-04-25T19:12:13Z

Pull Request

Stacks on top of #720 (feat/local-first-major-release). Drops the three most complex data connectors so the LOCAL-first SDK only ships connectors we can actually maintain and test.

Changes

Delete the connector modules:
- mostlyai/sdk/_data/db/hive.py
- mostlyai/sdk/_data/db/databricks.py
- mostlyai/sdk/_data/db/redshift.py
Remove ConnectorType.{hive, databricks, redshift} from mostlyai/sdk/domain.py (enum + docstring entries) and the matching mostly.connect examples in mostlyai/sdk/client/api.py.
Drop the databricks, hive, and redshift extras from pyproject.toml and refresh the README's connector list / install example.
Remove DatabricksContainerParameters plus the Hive-only keystore / keystore_password fields and all Kerberos fields from mostlyai/sdk/_data/metadata_objects.py. The shared ca_certificate field is kept (still used by AWS S3 and Postgres SSL).
In mostlyai/sdk/_data/db/base.py, remove the kerberized() context manager, all Kerberos init args / attributes / helpers, the KRB5_CONF_TEMPLATE, and now-unused base64 / hashlib / os / subprocess imports. use_sa_engine now only wraps use_ssh_tunnel() and use_ssl_connection().
Delete mostlyai/sdk/_data/util/kerberos.py and its unit tests (tests/_data/unit/util/test_kerberos.py); these were used exclusively by the Hive connector.
Refresh uv.lock accordingly.

Why this change?

Per the LOCAL-first refactor goal, we want a connector surface that we can credibly support. These three are by far the heaviest:

Redshift ships its own DefaultDialect-derived SQLAlchemy dialect and a hand-rolled DBAPI wrapper, plus bespoke information_schema queries, case-insensitivity workarounds, and parameter-limit chunking. It's the largest single connector in the repo.
Databricks owns a non-standard write path (parquet → temp Volume → COPY INTO), Azure ClientSecretCredential service-principal auth, and a fragile error-message-keyword mapper in is_accessible().
Hive monkey-patches ImpalaDialect and HiveDialect for SA 2.0 compatibility and was the sole driver of the Kerberos plumbing in SqlAlchemyContainer and the kerberos/pyhive/impyla extras.

None of these had end-to-end coverage in this repo (only SQLite is covered by tests/_local/end_to_end/test_connector.py), so dropping them removes both the most code and the most untested code.

The remaining DB connectors (postgresql, mysql, mariadb, mssql, oracle, snowflake, bigquery, sqlite) are thin SQLAlchemy wrappers around mostlyai/sdk/_data/db/base.py, and all file connectors are unaffected.

Testing

uv run python -m compileall -q mostlyai/sdk — clean.
uv run pytest tests/test_domain.py tests/_local/unit/ tests/client/unit/ tests/_data/unit/db/ tests/_data/unit/util/ tests/_data/unit/file/ — 177 passed, 1 skipped.
uv run pytest tests/_local/end_to_end/test_connector.py — 4 passed (SQLite read / write / delete / query).
uv run ruff check on the modified files — all clean. (Pre-existing lint warnings in unrelated files under mostlyai/sdk/_local/execution/ and mostlyai/sdk/_data/{auto_detect,non_context}.py are inherited from the base branch and not touched here.)

Additional Notes

Public API impact: ConnectorType loses the HIVE, DATABRICKS, and REDSHIFT members. Existing connectors of those types stored on disk would no longer load — this is expected for a local-first major-release branch.
The shared ca_certificate field is intentionally retained on SqlAlchemyContainerParameters and SslCertificates; only the Hive-specific keystore/keystore_password fields are removed.

Switch default startup behavior to LOCAL mode, keep CLIENT as explicit remote mode, and remove unsupported platform-era endpoints plus public OpenAPI model generation artifacts for the major release. Made-with: Cursor

Drop users/me client and local route support, remove CurrentUser and UserPlan domain models, and tighten tooling docs wording to match SDK-owned model maintenance. Made-with: Cursor

Remove remaining platform-era client entrypoints that are not backed by the SDK local runtime API, keeping the public SDK surface aligned with supported endpoints. Made-with: Cursor

Reduce domain.py to models transitively required by the currently supported SDK endpoints and remove remaining platform-era account, billing, notification, artifact, and assistant model surface. Made-with: Cursor

Drop the unsupported computes client surface, remove the local /computes stub endpoint, and remove the obsolete ComputeListItem domain model to keep the SDK API runtime-only. Made-with: Cursor

Drop API key and bearer-token support, remove assistant/user-org/visibility/compute metadata from the SDK domain and endpoints, prioritize local-first examples, and remove the legacy tools folder. Made-with: Cursor

Drop connector, generator, and synthetic dataset usage stat models and fields, including no_of_threads and related no_of_* counters, and align list docstrings with the reduced runtime surface. Made-with: Cursor

Drop sort_by query support from connector, generator, and synthetic dataset list APIs and remove remaining NO_OF_* sort-related surface now that listing is recency-only. Made-with: Cursor

Drop the disabled client-mode E2E workflow job and remove outdated api_domain filters that referenced removed Share/LiteLlm/DataLlm/UsageReport surfaces. Made-with: Cursor

Update initialization examples to require explicit base_url for client mode and remove remaining mention of MOSTLY_BASE_URL environment configuration. Made-with: Cursor

Drop /about and /models endpoint support from local routes and client helpers, remove the remaining AboutService domain model, and update docs/tests accordingly. Made-with: Cursor

Drop MOSTLY_LOCAL-based mode selection so SDK mode is configured only through explicit constructor arguments. Made-with: Cursor

Drop short-lived file token arguments from local file/log download endpoints so the SDK API no longer exposes legacy platform-era token plumbing. Made-with: Cursor

Fix Docker runtime entrypoint after tools removal, refresh local server description text, and update project homepage plus stale build excludes. Made-with: Cursor

Make docs and tutorials local-first by removing SDV comparison notebooks, dropping support contact links, and cleaning external blog references while updating README install/quick-start ordering. Made-with: Cursor

Update Quick Start to show only LOCAL mode setup and use uv pip install for the primary install command. Made-with: Cursor

Remove redundant wording in the Quick Start local install sentence while keeping the local-first guidance intact. Made-with: Cursor

These three were the most complex connectors in the codebase: - Redshift shipped a custom DBAPI shim and SQLAlchemy dialect built from scratch on top of `DefaultDialect`, including hand-rolled `information_schema` introspection and case-insensitive table handling. - Databricks owned a non-standard write path (parquet -> temp Volume -> COPY INTO), Azure service-principal auth via `azure-identity`, and a fragile error-message-keyword mapper. - Hive monkey-patched both `ImpalaDialect` and `HiveDialect` for SA 2.0 compatibility, was the sole consumer of the kerberos plumbing in `SqlAlchemyContainer`, and pulled in three extras (`pyhive`, `impyla`, `kerberos`). Remove the connector modules along with: - `ConnectorType.{hive,databricks,redshift}` and their docstring entries in `domain.py` and `client/api.py`. - `DatabricksContainerParameters` plus the keystore/kerberos fields on `SqlAlchemyContainerParameters` and `SslCertificates`. - The kerberized() context manager and supporting kerberos plumbing in `_data/db/base.py`, plus `_data/util/kerberos.py` and its test. - The `databricks`, `hive`, and `redshift` extras in `pyproject.toml` and the corresponding wording in the README. Co-authored-by: Michi Platzer <michael.platzer@gmail.com>

mplatzer and others added 18 commits April 25, 2026 10:16

feat: make SDK local-first and remove platform-only surfaces

052aa48

Switch default startup behavior to LOCAL mode, keep CLIENT as explicit remote mode, and remove unsupported platform-era endpoints plus public OpenAPI model generation artifacts for the major release. Made-with: Cursor

refactor: remove remaining user-account platform API surface

c15e83f

Drop users/me client and local route support, remove CurrentUser and UserPlan domain models, and tighten tooling docs wording to match SDK-owned model maintenance. Made-with: Cursor

refactor: drop unsupported artifacts and integrations clients

acf8da6

Remove remaining platform-era client entrypoints that are not backed by the SDK local runtime API, keeping the public SDK surface aligned with supported endpoints. Made-with: Cursor

refactor: hard-prune domain models to SDK-supported API

d008167

Reduce domain.py to models transitively required by the currently supported SDK endpoints and remove remaining platform-era account, billing, notification, artifact, and assistant model surface. Made-with: Cursor

refactor: remove computes API and local dummy endpoint

81b7fde

Drop the unsupported computes client surface, remove the local /computes stub endpoint, and remove the obsolete ComputeListItem domain model to keep the SDK API runtime-only. Made-with: Cursor

refactor: remove remaining platform-era auth and metadata surface

e35a3ef

Drop API key and bearer-token support, remove assistant/user-org/visibility/compute metadata from the SDK domain and endpoints, prioritize local-first examples, and remove the legacy tools folder. Made-with: Cursor

refactor: remove usage statistics from SDK domain models

b33e84f

Drop connector, generator, and synthetic dataset usage stat models and fields, including no_of_threads and related no_of_* counters, and align list docstrings with the reduced runtime surface. Made-with: Cursor

refactor: remove sort_by from list endpoints

b0c035a

Drop sort_by query support from connector, generator, and synthetic dataset list APIs and remove remaining NO_OF_* sort-related surface now that listing is recency-only. Made-with: Cursor

chore: remove legacy client CI section and stale domain filters

f948a56

Drop the disabled client-mode E2E workflow job and remove outdated api_domain filters that referenced removed Share/LiteLlm/DataLlm/UsageReport surfaces. Made-with: Cursor

docs: remove MOSTLY_BASE_URL configuration mention

19ea662

Update initialization examples to require explicit base_url for client mode and remove remaining mention of MOSTLY_BASE_URL environment configuration. Made-with: Cursor

refactor: remove about and models endpoints from SDK API

c690ed2

Drop /about and /models endpoint support from local routes and client helpers, remove the remaining AboutService domain model, and update docs/tests accordingly. Made-with: Cursor

refactor: remove MOSTLY_LOCAL environment mode switch

e291ff1

Drop MOSTLY_LOCAL-based mode selection so SDK mode is configured only through explicit constructor arguments. Made-with: Cursor

refactor: remove slft query parameter from local routes

d122ba7

Drop short-lived file token arguments from local file/log download endpoints so the SDK API no longer exposes legacy platform-era token plumbing. Made-with: Cursor

chore: clean up local-first leftovers in packaging and metadata

fdcec9c

Fix Docker runtime entrypoint after tools removal, refresh local server description text, and update project homepage plus stale build excludes. Made-with: Cursor

docs: remove SDV comparison and external support/blog references

8d85af8

Make docs and tutorials local-first by removing SDV comparison notebooks, dropping support contact links, and cleaning external blog references while updating README install/quick-start ordering. Made-with: Cursor

docs: make quick start local-only with uv install

e81f820

Update Quick Start to show only LOCAL mode setup and use uv pip install for the primary install command. Made-with: Cursor

docs: simplify quick start local wording

dd88603

Remove redundant wording in the Quick Start local install sentence while keeping the local-first guidance intact. Made-with: Cursor

mplatzer marked this pull request as ready for review April 25, 2026 19:15

Base automatically changed from feat/local-first-major-release to main April 27, 2026 05:33

mplatzer merged commit 5dad631 into main Apr 27, 2026
11 checks passed

mplatzer deleted the cursor/local-first-drop-heavy-connectors-da97 branch April 27, 2026 05:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: drop hive, databricks, and redshift connectors#721

refactor: drop hive, databricks, and redshift connectors#721
mplatzer merged 18 commits into
mainfrom
cursor/local-first-drop-heavy-connectors-da97

mplatzer commented Apr 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mplatzer commented Apr 25, 2026

Pull Request

Changes

Why this change?

Testing

Additional Notes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants