refactor: drop hive, databricks, and redshift connectors#721
Merged
Conversation
Switch default startup behavior to LOCAL mode, keep CLIENT as explicit remote mode, and remove unsupported platform-era endpoints plus public OpenAPI model generation artifacts for the major release. Made-with: Cursor
Drop users/me client and local route support, remove CurrentUser and UserPlan domain models, and tighten tooling docs wording to match SDK-owned model maintenance. Made-with: Cursor
Remove remaining platform-era client entrypoints that are not backed by the SDK local runtime API, keeping the public SDK surface aligned with supported endpoints. Made-with: Cursor
Reduce domain.py to models transitively required by the currently supported SDK endpoints and remove remaining platform-era account, billing, notification, artifact, and assistant model surface. Made-with: Cursor
Drop the unsupported computes client surface, remove the local /computes stub endpoint, and remove the obsolete ComputeListItem domain model to keep the SDK API runtime-only. Made-with: Cursor
Drop API key and bearer-token support, remove assistant/user-org/visibility/compute metadata from the SDK domain and endpoints, prioritize local-first examples, and remove the legacy tools folder. Made-with: Cursor
Drop connector, generator, and synthetic dataset usage stat models and fields, including no_of_threads and related no_of_* counters, and align list docstrings with the reduced runtime surface. Made-with: Cursor
Drop sort_by query support from connector, generator, and synthetic dataset list APIs and remove remaining NO_OF_* sort-related surface now that listing is recency-only. Made-with: Cursor
Drop the disabled client-mode E2E workflow job and remove outdated api_domain filters that referenced removed Share/LiteLlm/DataLlm/UsageReport surfaces. Made-with: Cursor
Update initialization examples to require explicit base_url for client mode and remove remaining mention of MOSTLY_BASE_URL environment configuration. Made-with: Cursor
Drop /about and /models endpoint support from local routes and client helpers, remove the remaining AboutService domain model, and update docs/tests accordingly. Made-with: Cursor
Drop MOSTLY_LOCAL-based mode selection so SDK mode is configured only through explicit constructor arguments. Made-with: Cursor
Drop short-lived file token arguments from local file/log download endpoints so the SDK API no longer exposes legacy platform-era token plumbing. Made-with: Cursor
Fix Docker runtime entrypoint after tools removal, refresh local server description text, and update project homepage plus stale build excludes. Made-with: Cursor
Make docs and tutorials local-first by removing SDV comparison notebooks, dropping support contact links, and cleaning external blog references while updating README install/quick-start ordering. Made-with: Cursor
Update Quick Start to show only LOCAL mode setup and use uv pip install for the primary install command. Made-with: Cursor
Remove redundant wording in the Quick Start local install sentence while keeping the local-first guidance intact. Made-with: Cursor
These three were the most complex connectors in the codebase:
- Redshift shipped a custom DBAPI shim and SQLAlchemy dialect built from
scratch on top of `DefaultDialect`, including hand-rolled
`information_schema` introspection and case-insensitive table handling.
- Databricks owned a non-standard write path (parquet -> temp Volume ->
COPY INTO), Azure service-principal auth via `azure-identity`, and a
fragile error-message-keyword mapper.
- Hive monkey-patched both `ImpalaDialect` and `HiveDialect` for SA 2.0
compatibility, was the sole consumer of the kerberos plumbing in
`SqlAlchemyContainer`, and pulled in three extras
(`pyhive`, `impyla`, `kerberos`).
Remove the connector modules along with:
- `ConnectorType.{hive,databricks,redshift}` and their docstring entries
in `domain.py` and `client/api.py`.
- `DatabricksContainerParameters` plus the keystore/kerberos fields on
`SqlAlchemyContainerParameters` and `SslCertificates`.
- The kerberized() context manager and supporting kerberos plumbing in
`_data/db/base.py`, plus `_data/util/kerberos.py` and its test.
- The `databricks`, `hive`, and `redshift` extras in `pyproject.toml`
and the corresponding wording in the README.
Co-authored-by: Michi Platzer <michael.platzer@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pull Request
Stacks on top of #720 (
feat/local-first-major-release). Drops the three most complex data connectors so the LOCAL-first SDK only ships connectors we can actually maintain and test.Changes
mostlyai/sdk/_data/db/hive.pymostlyai/sdk/_data/db/databricks.pymostlyai/sdk/_data/db/redshift.pyConnectorType.{hive, databricks, redshift}frommostlyai/sdk/domain.py(enum + docstring entries) and the matchingmostly.connectexamples inmostlyai/sdk/client/api.py.databricks,hive, andredshiftextras frompyproject.tomland refresh the README's connector list / install example.DatabricksContainerParametersplus the Hive-onlykeystore/keystore_passwordfields and all Kerberos fields frommostlyai/sdk/_data/metadata_objects.py. The sharedca_certificatefield is kept (still used by AWS S3 and Postgres SSL).mostlyai/sdk/_data/db/base.py, remove thekerberized()context manager, all Kerberos init args / attributes / helpers, theKRB5_CONF_TEMPLATE, and now-unusedbase64/hashlib/os/subprocessimports.use_sa_enginenow only wrapsuse_ssh_tunnel()anduse_ssl_connection().mostlyai/sdk/_data/util/kerberos.pyand its unit tests (tests/_data/unit/util/test_kerberos.py); these were used exclusively by the Hive connector.uv.lockaccordingly.Why this change?
Per the LOCAL-first refactor goal, we want a connector surface that we can credibly support. These three are by far the heaviest:
DefaultDialect-derived SQLAlchemy dialect and a hand-rolled DBAPI wrapper, plus bespokeinformation_schemaqueries, case-insensitivity workarounds, and parameter-limit chunking. It's the largest single connector in the repo.COPY INTO), AzureClientSecretCredentialservice-principal auth, and a fragile error-message-keyword mapper inis_accessible().ImpalaDialectandHiveDialectfor SA 2.0 compatibility and was the sole driver of the Kerberos plumbing inSqlAlchemyContainerand thekerberos/pyhive/impylaextras.None of these had end-to-end coverage in this repo (only SQLite is covered by
tests/_local/end_to_end/test_connector.py), so dropping them removes both the most code and the most untested code.The remaining DB connectors (
postgresql,mysql,mariadb,mssql,oracle,snowflake,bigquery,sqlite) are thin SQLAlchemy wrappers aroundmostlyai/sdk/_data/db/base.py, and all file connectors are unaffected.Testing
uv run python -m compileall -q mostlyai/sdk— clean.uv run pytest tests/test_domain.py tests/_local/unit/ tests/client/unit/ tests/_data/unit/db/ tests/_data/unit/util/ tests/_data/unit/file/— 177 passed, 1 skipped.uv run pytest tests/_local/end_to_end/test_connector.py— 4 passed (SQLite read / write / delete / query).uv run ruff checkon the modified files — all clean. (Pre-existing lint warnings in unrelated files undermostlyai/sdk/_local/execution/andmostlyai/sdk/_data/{auto_detect,non_context}.pyare inherited from the base branch and not touched here.)Additional Notes
ConnectorTypeloses theHIVE,DATABRICKS, andREDSHIFTmembers. Existing connectors of those types stored on disk would no longer load — this is expected for a local-first major-release branch.ca_certificatefield is intentionally retained onSqlAlchemyContainerParametersandSslCertificates; only the Hive-specifickeystore/keystore_passwordfields are removed.