feat: add DuckDB, Trino, Dremio & Spark support to CI and CLI by devin-ai-integration[bot] · Pull Request #2135 · elementary-data/elementary

devin-ai-integration · 2026-03-01T14:33:36Z

Summary

Adds full CI infrastructure and CLI support for DuckDB, Trino, Dremio, and Spark adapters to elementary-cli. All 11 warehouse tests (including the 4 new ones) plus code-quality now pass in CI.

Key changes:

pyproject.toml: Added dbt-duckdb and dbt-dremio as optional dependencies
profiles.yml.j2: Added profile targets for all 4 adapters (DuckDB file-based, Trino with Iceberg, Dremio with Nessie enterprise catalog, Spark Thrift Server with Delta)
docker-compose.yml: Added Docker service stacks for Trino (Iceberg + Hive Metastore + MinIO), Dremio (Nessie + MinIO), Spark (Thrift Server + Delta Lake)
CI workflows: Docker startup steps, seed caching for all Docker adapters, external seeders for Dremio and Spark
E2E dbt project: Renamed reserved-word columns (min→min_val, max→max_val, sum→sum_val, one→col1), updated data_type mappings, excluded schema_changes for Spark, added adapter-specific schema drop dispatches
CLI adapter code: Added transient error patterns and get_adapter_unique_id dispatch for DuckDB

Dremio-specific workarounds (dbt-dremio adapter limitations):

--exclude tag:ephemeral_model on dbt test — dbt-dremio doesn't support ephemeral model CTEs
Elementary profile uses enterprise_catalog_folder instead of schema to avoid duplicate-key errors from dbt-dremio's alias chain
Seeds placed in same Nessie namespace as models (VDS validator can't resolve cross-namespace references)
Single-threaded execution (--threads 1) to avoid Nessie catalog race conditions

Updates since last revision

Refactored all manual Dremio reserved-keyword quoting to use elementary.escape_reserved_keywords() — Replaced {% if target.type == 'dremio' %}..."keyword"...{% else %}keyword{% endif %} patterns with the centralized utility macro in 5 files:
- row_number alias in get_models_latest_invocation.sql and get_models_latest_invocations_data.sql
- count alias in can_upload_source_freshness.sql
- filter column reference in source_freshness_alerts.sql and get_source_freshness_results.sql
Added row_number and snapshots to Dremio reserved keywords list — dbt-data-reliability#955 (merged).
Fixed Column 'unique_id' is ambiguous error — Replaced USING (unique_id) with explicit ON run_results.unique_id = models.unique_id join and qualified column references. Dremio (and some other databases) don't deduplicate join columns when using ON instead of USING, so unqualified references became ambiguous.
Fixed invocation_id is ambiguous error — Qualified all column references in get_models_latest_invocations_data.sql final SELECT to avoid ambiguity with ON join.
Fully removed continue-on-error workaround for Dremio edr steps. Now that PR #2137 (ref() macro override for dbt-core 1.11 compatibility) is merged to master, the Dremio edr monitor/report/send-report/validate-alerts/e2e steps are fully validated in CI without any silent failures. Merged latest master into this branch to pick up the fix.
Fixed Dremio SQL reserved keyword conflicts:
- Renamed snapshots CTE to snapshots_data in model_alerts.sql — snapshots is reserved in Dremio's Calcite parser
Fixed dbt-dremio dateadd macro bugs: Created override in elementary/monitor/dbt_project/macros/overrides/dateadd.sql:
- Casts interval parameter to string before calling .replace() (dbt-dremio calls it on int, causing AttributeError)
- Removes select prefix from TIMESTAMPADD output (dbt-dremio wraps in scalar subquery, causing $SCALAR_QUERY errors in expression contexts)
Fixed Dremio elementary profile schema: Uses dot-separated namespace {{ schema_name }}.elementary instead of underscore to match how elementary tables are created
Reverted temporary dbt-data-reliability branch pin now that dbt-data-reliability#948 and #955 are merged.
Fixed Nessie Docker image registry: projectnessie/nessie:0.80.0 no longer exists on Docker Hub; now uses ghcr.io/projectnessie/nessie:0.80.0.
Fixed Docker healthchecks: MinIO 2024 and hive-metastore images lack curl/nc; replaced with bash -c 'echo > /dev/tcp/localhost/PORT'.
Fixed Dremio credential mismatch: dremio-setup.sh default password aligned to dremio123 to match docker-compose.yml.
Fixed Dremio seed loading: dremio-setup now uses --exit-code-from to wait for NessieSource creation before external seeder runs.
Fixed Dremio NessieSource initialization: Added nessie: condition: service_healthy dependency to ensure Nessie catalog is ready.
Increased Trino hive-metastore healthcheck timeout: 60 retries with 60s start_period for CI environments.

Review & Testing Checklist for Human

USING → ON join refactoring affects all adapters — The change from USING (unique_id) to explicit ON join with qualified column references fixes Dremio's ambiguous column error, but applies to ALL adapters. CI passes for all 16 checks, but verify edge cases around column resolution don't break for specific adapters.
escape_reserved_keywords() utility macro dependency — The refactored code depends on dbt-data-reliability having row_number and snapshots in the Dremio reserved keywords list (PR #955, now merged). Verify the elementary package correctly picks up the merged version.
Reserved word CTE rename (snapshots → snapshots_data) — This change in model_alerts.sql affects ALL adapters, not just Dremio. Verify no downstream references to the old CTE name exist.
Dremio dateadd macro override — New override at elementary/monitor/dbt_project/macros/overrides/dateadd.sql fixes two dbt-dremio bugs (int interval AttributeError + scalar subquery wrapping). Verify this doesn't break other date arithmetic operations or introduce edge cases with different interval types.
Dremio elementary profile schema namespace — Uses dot-separated {{ schema_name }}.elementary instead of underscore. Verify elementary CLI can correctly resolve this schema for Dremio connections.
Dremio edr integration validation — The continue-on-error workaround is now removed, so Dremio's edr monitor/report/send-report steps are fully tested in CI. All 16 CI checks pass, confirming the ref() override + dateadd override + reserved keyword fixes work end-to-end.

Suggested Test Plan

✅ All 16 CI checks pass (including Dremio edr monitor/report steps without continue-on-error)
Spot-check that reserved-word CTE rename (snapshots_data) doesn't break other adapters in CI
Verify escape_reserved_keywords() works correctly for Dremio and doesn't affect other adapters
Verify seed caching works correctly: trigger a second CI run and confirm Docker adapter tests complete faster on cache hit
(Optional) Test Dremio edr monitor locally with the dateadd override to validate edge cases

Notes

All Docker credentials (e.g. minio123, dremio123) are local-only test credentials, not real secrets
Seed caching reduces Docker adapter test time from ~10min to ~2min on cache hit
The data_type mappings in schema.yml assume Spark/BigQuery/Databricks/Athena use STRING and Trino/Dremio/DuckDB use varchar
Spark excluded from schema_changes tests like Databricks (same limitation)
Fixed python-dev → python3-dev for Spark requirements (existing bug)
DuckDB profile uses threads: 1 to match dbt-duckdb's concurrency model
Fixed schema name for Docker adapters: now uses fixed elementary_tests to enable seed caching (safe because concurrency group prevents parallel runs)
Dremio ref() override fix: PR #2137 adds a project-level ref() macro that properly handles dbt-core 1.11's two-arg cross-package syntax. See dremio/dbt-dremio#232 for upstream issue.
Dremio dateadd override addresses two dbt-dremio bugs: (1) interval.replace() called on int parameter, (2) select TIMESTAMPADD(...) wraps in scalar subquery causing Calcite parser errors in expression contexts
The escape_reserved_keywords() macro dispatches to adapter-specific implementations — for Dremio it wraps keywords in double quotes; for other adapters it's a no-op

Requested by: @haritamar
Link to Devin Session: https://app.devin.ai/sessions/6c12a19d456541568bcd7974623f3557

Summary by CodeRabbit

New Features
- Added support for DuckDB, Trino, Dremio, and Spark warehouse backends.
Chores
- Added optional adapter packages for DuckDB and Dremio.
- Expanded CI workflows and test matrix to cover DuckDB, Trino, Dremio, and Spark.
- Extended transient error handling for Spark.
Tests
- Added end-to-end orchestration, services, and configs to test new backends.
- Made test seed data deterministic and introduced seed caching for faster CI.

- pyproject.toml: add dbt-duckdb and dbt-dremio as optional dependencies - Docker config files for Trino and Spark (non-credential files) - test-all-warehouses.yml: add duckdb, trino, dremio, spark to CI matrix - schema.yml: update data_type expressions for new adapter type mappings - test_alerts_union.sql: exclude schema_changes for Spark (like Databricks) - drop_test_schemas.sql: add dispatched edr_drop_schema for all new adapters - transient_errors.py: add spark and duckdb entries to _ADAPTER_PATTERNS - get_adapter_type_and_unique_id.sql: add duckdb dispatch (uses target.path) Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

devin-ai-integration · 2026-03-01T14:33:39Z

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

Disable automatic comment and CI monitoring

linear · 2026-03-01T14:33:40Z

ELE-5266 Add DuckDB, Trino, Dremio & Spark support to elementary-cli CI + OSS Docs

github-actions · 2026-03-01T14:33:46Z

👋 @devin-ai-integration[bot]
Thank you for raising your pull request.
Please make sure to add tests and document all user-facing changes.
You can do this by editing the docs files in this pull request.

coderabbitai · 2026-03-01T14:33:54Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds support for four new warehouses—DuckDB, Trino, Dremio, and Spark—across CI workflows, dependencies, dbt profiles/macros, transient error handling, Docker orchestration, test macros, deterministic e2e data generation, and e2e test assets.

Changes

Cohort / File(s)	Summary
CI/CD Workflow Updates `.github/workflows/test-all-warehouses.yml`, `.github/workflows/test-warehouse.yml`	Added duckdb/trino/dremio/spark to matrices and inputs; introduced seed cache restore/save logic, adapter install conditionals, and startup/orchestration blocks for new warehouses.
Dependencies `pyproject.toml`	Added optional dependencies `dbt-duckdb` and `dbt-dremio`, extras `duckdb`/`dremio`, and included them in the `all` extras aggregation.
Transient Errors & Adapter IDs `elementary/clients/dbt/transient_errors.py`, `elementary/monitor/dbt_project/macros/get_adapter_type_and_unique_id.sql`	Added Spark transient-error patterns and DuckDB entry; added `duckdb__get_adapter_unique_id()` macro returning `target.path`.
Docker Orchestration (e2e) `tests/e2e_dbt_project/docker-compose.yml`, `tests/e2e_dbt_project/docker/...`	Added Trino, Dremio (Nessie/MinIO), and Spark stacks: new services, networks, volumes, healthchecks, and setup jobs.
Dremio Init Script `tests/e2e_dbt_project/docker/dremio/dremio-setup.sh`	New script to wait for Dremio, authenticate, and create a Nessie/S3 source via Dremio REST API.
Spark Runtime Assets `tests/e2e_dbt_project/docker/spark/Dockerfile`, `.../entrypoint.sh`, `.../hive-site.xml`, `.../spark-defaults.conf`	Added Spark image, entrypoint with WAIT_FOR host:port logic, Hive metastore config, and Spark defaults (Delta, memory, parallelism, etc.).
Trino Configs `tests/e2e_dbt_project/docker/trino/...`	Added Trino catalog files (Iceberg/memory), server config, JVM options, and node properties.
Test Macros & Models `tests/e2e_dbt_project/macros/system/drop_test_schemas.sql`, `tests/e2e_dbt_project/macros/system/dremio_target_database.sql`, `tests/e2e_dbt_project/models/*`, `tests/e2e_dbt_project/models/test_alerts_union.sql`	Added engine-specific drop-schema macros for trino/dremio/duckdb/spark; added `dremio__target_database()`; updated schema type mappings and guards to include spark.
Profiles `tests/profiles/profiles.yml.j2`	Added profile outputs and target entries for trino, dremio, duckdb, and spark; expanded elementary targets list.
Deterministic Test Data `tests/e2e_dbt_project/generate_data.py`	Seeded RNG with `RANDOM_SEED = 42` to make generated CSVs deterministic and cacheable across CI runs.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant GH as GitHub Actions
    participant DC as Docker Compose
    participant MINIO as MinIO
    participant MS as Metastore
    participant SVC as Service (Trino/Dremio/Spark)
    participant DBT as dbt Runner

    GH->>DC: start orchestration (docker-compose up)
    DC->>MINIO: start MinIO
    DC->>MS: start Metastore (Hive/Postgres)
    DC->>SVC: start main service (Trino/Dremio/Spark)
    SVC-->>DC: respond to health checks
    MINIO-->>DC: report ready
    MS-->>DC: report ready
    DC->>GH: services ready
    GH->>DBT: restore seed cache (if available) & run seeds
    DBT->>SVC: connect and execute tests
    DBT-->>GH: return results

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I hopped through compose and CI bright,
I nudged DuckDB, Trino, Dremio, Spark to light,
Metastores hum and MinIO sings,
Seeds cached tight, deterministic things,
A rabbit cheers: new warehouses take flight! 🥕✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main objective: adding support for four new data warehouse adapters (DuckDB, Trino, Dremio, and Spark) to the CI and CLI infrastructure.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch devin/ELE-5266-1772368864-code-ci

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 6

🧹 Nitpick comments (2)

tests/e2e_dbt_project/docker/spark/Dockerfile (1)

1-30: Run the Spark container as a non-root user.

The image currently runs as root; adding a dedicated runtime user improves baseline container hardening.

Suggested hardening diff

 COPY entrypoint.sh /scripts/
-RUN chmod +x /scripts/entrypoint.sh
+RUN chmod +x /scripts/entrypoint.sh && \
+    groupadd -r spark && useradd -r -g spark -m spark && \
+    chown -R spark:spark /usr/spark /scripts
 
+USER spark
 ENTRYPOINT ["/scripts/entrypoint.sh"]
 CMD ["--help"]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/e2e_dbt_project/docker/spark/Dockerfile` around lines 1 - 30, The
Dockerfile runs the Spark image as root; create and switch to a non-root runtime
user to improve container hardening by adding a dedicated user/group (e.g.,
spark:spark) after installing packages and before COPY/ENTRYPOINT, ensure
ownership of SPARK_HOME (/usr/spark) and /scripts is changed to that user (chown
-R spark:spark /usr/spark /scripts) and then add a USER spark line so the
ENTRYPOINT ("/scripts/entrypoint.sh") and CMD run unprivileged; make sure any
later runtime actions in entrypoint.sh do not require root or handle privilege
escalation explicitly.

tests/e2e_dbt_project/docker-compose.yml (1)

106-106: Pin new container images to explicit versions/digests.

Using latest (or no tag) makes CI non-deterministic and harder to debug when upstream images change.

🔧 Suggested direction

-    image: projectnessie/nessie:latest
+    image: projectnessie/nessie:<pinned-version-or-digest>

-    image: minio/minio:latest
+    image: minio/minio:<pinned-version-or-digest>

-    image: minio/mc
+    image: minio/mc:<pinned-version-or-digest>

-    image: dremio/dremio-oss:latest
+    image: dremio/dremio-oss:<pinned-version-or-digest>

-    image: alpine:latest
+    image: alpine:<pinned-version-or-digest>

Also applies to: 114-114, 132-132, 150-150, 175-175

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/e2e_dbt_project/docker-compose.yml` at line 106, Replace unconstrained
image tags using "latest" with explicit version tags or digests to ensure
deterministic CI; update the image entry "image: projectnessie/nessie:latest" in
the docker-compose.yml to a specific version or sha256 digest, and apply the
same change for the other image entries referenced (lines 114, 132, 150, 175) so
all container images are pinned to fixed tags/digests.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/test-warehouse.yml:
- Around line 109-113: The readiness loops for Trino, Dremio and Spark currently
break on success but never fail on timeout; update each curl loop (the
Trino/Dremio/Spark readiness checks) to detect if the loop completed without
success by adding a post-loop conditional that checks whether the last curl
succeeded and, if not, prints an error and exits non‑zero (e.g., after the seq
loop, if the curl never returned success then echo "Timed out waiting for
<Service>" and exit 1). Ensure you apply this pattern to the Trino readiness
loop and the analogous Dremio and Spark loops so the job fails fast when
dependencies never become healthy.
- Line 216: The DuckDB target is using an in-memory database (path ":memory:")
which cannot persist across workflow steps; change the DuckDB configuration used
in CI so it uses a persistent file path (for example set path to
"/tmp/elementary_test.db") or exclude DuckDB from the multi-step workflow when
in-memory testing is intended; update the workflow logic that references
inputs.warehouse-type (and any DuckDB-specific job/step configs) to use the
persistent file path or add a conditional to skip DuckDB when
inputs.warehouse-type == 'duckdb' and in-memory mode is active.

In `@tests/e2e_dbt_project/docker-compose.yml`:
- Around line 95-100: The trino-mc-job entrypoint uses a fixed sleep which can
fail if MinIO starts slower; update the trino-mc-job entrypoint to replace
"sleep 5" with a readiness loop that waits for minio:9000 to accept connections
(e.g., a small loop using curl or nc that retries until success) before running
the mc commands, then proceed with "/usr/bin/mc config ...", "/usr/bin/mc mb
..."; modify the entrypoint block in the trino-mc-job service so it polls minio
(hostname "minio", port 9000) and exits the loop only when reachable.

In `@tests/e2e_dbt_project/docker/dremio/dremio-setup.sh`:
- Around line 15-33: The script currently treats AUTH_TOKEN and the S3 source
creation as always successful; update the AUTH_TOKEN acquisition and the
subsequent curl call that creates S3Source to fail fast: after the login curl
that sets AUTH_TOKEN (variable AUTH_TOKEN), check for empty string OR the
literal "null" and exit non‑zero with an error message if found; for both curl
calls (the login POST and the PUT to create S3Source) capture HTTP status (e.g.,
via curl -w/%{http_code}% and/or -f) and the response body, verify the status is
a 2xx before proceeding, and on non‑2xx log the response body and exit 1 so the
script does not print "Obtained Dremio auth token" or "S3 Source created in
Dremio" on failure.

In `@tests/e2e_dbt_project/macros/system/drop_test_schemas.sql`:
- Around line 43-53: The DROP macros interpolate schema_name unquoted which
breaks on special chars/reserved words; update dremio__edr_drop_schema,
duckdb__edr_drop_schema, and spark__edr_drop_schema to use
adapter.quote(schema_name) (same pattern as clickhouse__edr_drop_schema) when
building the raw DROP SQL so the identifier is safely quoted (e.g., replace
plain schema_name concatenation with adapter.quote(schema_name) in the run_query
argument).

In `@tests/profiles/profiles.yml.j2`:
- Around line 49-53: The DuckDB profile currently uses the process-local
in-memory path ":memory:", which breaks across SubprocessDbtRunner subprocesses;
update the profiles template (the duckdb &duckdb block) to use a persistent or
shared DB URI instead of ":memory:"—either switch to a file-backed path (e.g., a
temp DB file under a test-specific directory) or use a shared-memory URI like
the SQLite-style shared cache (e.g., file:...?mode=memory&cache=shared) so
multiple dbt subprocesses can see the same database; keep the schema
interpolation ({{ schema_name }}) and threads setting unchanged.

---

Nitpick comments:
In `@tests/e2e_dbt_project/docker-compose.yml`:
- Line 106: Replace unconstrained image tags using "latest" with explicit
version tags or digests to ensure deterministic CI; update the image entry
"image: projectnessie/nessie:latest" in the docker-compose.yml to a specific
version or sha256 digest, and apply the same change for the other image entries
referenced (lines 114, 132, 150, 175) so all container images are pinned to
fixed tags/digests.

In `@tests/e2e_dbt_project/docker/spark/Dockerfile`:
- Around line 1-30: The Dockerfile runs the Spark image as root; create and
switch to a non-root runtime user to improve container hardening by adding a
dedicated user/group (e.g., spark:spark) after installing packages and before
COPY/ENTRYPOINT, ensure ownership of SPARK_HOME (/usr/spark) and /scripts is
changed to that user (chown -R spark:spark /usr/spark /scripts) and then add a
USER spark line so the ENTRYPOINT ("/scripts/entrypoint.sh") and CMD run
unprivileged; make sure any later runtime actions in entrypoint.sh do not
require root or handle privilege escalation explicitly.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 746ce41 and 0836bbf.

📒 Files selected for processing (20)

.github/workflows/test-all-warehouses.yml
.github/workflows/test-warehouse.yml
elementary/clients/dbt/transient_errors.py
elementary/monitor/dbt_project/macros/get_adapter_type_and_unique_id.sql
pyproject.toml
tests/e2e_dbt_project/docker-compose.yml
tests/e2e_dbt_project/docker/dremio/dremio-setup.sh
tests/e2e_dbt_project/docker/spark/Dockerfile
tests/e2e_dbt_project/docker/spark/entrypoint.sh
tests/e2e_dbt_project/docker/spark/hive-site.xml
tests/e2e_dbt_project/docker/spark/spark-defaults.conf
tests/e2e_dbt_project/docker/trino/catalog/iceberg.properties
tests/e2e_dbt_project/docker/trino/catalog/memory.properties
tests/e2e_dbt_project/docker/trino/etc/config.properties
tests/e2e_dbt_project/docker/trino/etc/jvm.config
tests/e2e_dbt_project/docker/trino/etc/node.properties
tests/e2e_dbt_project/macros/system/drop_test_schemas.sql
tests/e2e_dbt_project/models/schema.yml
tests/e2e_dbt_project/models/test_alerts_union.sql
tests/profiles/profiles.yml.j2

.github/workflows/test-warehouse.yml

tests/e2e_dbt_project/docker-compose.yml

tests/e2e_dbt_project/docker/dremio/dremio-setup.sh

tests/e2e_dbt_project/macros/system/drop_test_schemas.sql

tests/profiles/profiles.yml.j2

- DuckDB: use file-backed DB path instead of :memory: to persist across subprocess calls, and reduce threads to 1 to avoid concurrent commit errors - Spark: install dbt-spark[PyHive] extras required for thrift connection method - Dremio: add dremio__target_database() dispatch override in e2e project to return target.database (upstream elementary package lacks this dispatch) Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

coderabbitai

♻️ Duplicate comments (1)

.github/workflows/test-warehouse.yml (1)

109-113: ⚠️ Potential issue | 🟠 Major

Fail fast when readiness polling times out.

Line 109, Line 124, and Line 141 loops still continue even if services never become ready. This can allow later steps to run against unhealthy dependencies and fail non-deterministically.

Suggested fix pattern (apply to all three loops)

-          for i in $(seq 1 60); do
-            curl -sf http://localhost:8086/v1/info && break
+          ready=0
+          for i in $(seq 1 60); do
+            if curl -sf http://localhost:8086/v1/info > /dev/null; then
+              ready=1
+              break
+            fi
             echo "Waiting for Trino... ($i/60)"
             sleep 5
           done
+          if [ "$ready" -ne 1 ]; then
+            echo "Timed out waiting for Trino"
+            exit 1
+          fi

Also applies to: 124-128, 141-145

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In @.github/workflows/test-warehouse.yml around lines 109 - 113, The readiness
polling loops that use "for i in $(seq 1 60); do curl -sf ... && break; ...
done" should fail fast when the max attempts are reached; update each such loop
(the Trino/other service readiness loops) to check after sleeping whether i
equals the max (e.g., 60) and, if so, print an error to stderr and exit 1 so the
workflow stops rather than continuing with unhealthy dependencies. Ensure the
same pattern is applied to all three polling loops to provide deterministic
failure on timeout.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In @.github/workflows/test-warehouse.yml:
- Around line 109-113: The readiness polling loops that use "for i in $(seq 1
60); do curl -sf ... && break; ... done" should fail fast when the max attempts
are reached; update each such loop (the Trino/other service readiness loops) to
check after sleeping whether i equals the max (e.g., 60) and, if so, print an
error to stderr and exit 1 so the workflow stops rather than continuing with
unhealthy dependencies. Ensure the same pattern is applied to all three polling
loops to provide deterministic failure on timeout.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 0836bbf and 40b2c74.

📒 Files selected for processing (3)

.github/workflows/test-warehouse.yml
tests/e2e_dbt_project/macros/system/dremio_target_database.sql
tests/profiles/profiles.yml.j2

🚧 Files skipped from review as they are similar to previous changes (1)

tests/profiles/profiles.yml.j2

Plain S3 sources in Dremio do not support CREATE TABLE (needed for dbt seed). Switch to Nessie catalog source which supports table creation via Iceberg. Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

tests/e2e_dbt_project/docker/dremio/dremio-setup.sh (1)

15-31: ⚠️ Potential issue | 🟠 Major

Fail fast on login/source API errors and reject "null" tokens.

At Line 20, checking only -z misses the literal "null" token case; and Lines 15-31 do not validate HTTP status for either API call, so setup can report success on non-2xx responses.

🔧 Suggested patch

-# Log in to Dremio to get the auth token
-AUTH_TOKEN=$(curl -s -X POST "http://dremio:9047/apiv2/login" \
-  -H "Content-Type: application/json" \
-  --data "{\"userName\":\"dremio\", \"password\":\"dremio123\"}" | jq -r .token)
+# Log in to Dremio to get the auth token
+LOGIN_RESPONSE="$(mktemp)"
+LOGIN_HTTP_CODE=$(curl -sS -o "$LOGIN_RESPONSE" -w "%{http_code}" -X POST "http://dremio:9047/apiv2/login" \
+  -H "Content-Type: application/json" \
+  --data "{\"userName\":\"dremio\", \"password\":\"dremio123\"}")
+if [ "$LOGIN_HTTP_CODE" -lt 200 ] || [ "$LOGIN_HTTP_CODE" -ge 300 ]; then
+  echo "Failed to log in to Dremio (HTTP $LOGIN_HTTP_CODE)"
+  cat "$LOGIN_RESPONSE"
+  exit 1
+fi
+AUTH_TOKEN=$(jq -r '.token // empty' "$LOGIN_RESPONSE")
 
 # Check if AUTH_TOKEN is not empty
-if [ -z "$AUTH_TOKEN" ]; then
+if [ -z "$AUTH_TOKEN" ] || [ "$AUTH_TOKEN" = "null" ]; then
   echo "Failed to obtain Dremio auth token"
   exit 1
 fi
 
 echo "Obtained Dremio auth token"
 
 # Create a Nessie catalog source in Dremio (supports CREATE TABLE for dbt seed)
-curl -s -X PUT "http://dremio:9047/apiv2/source/NessieSource" \
+SOURCE_RESPONSE="$(mktemp)"
+SOURCE_HTTP_CODE=$(curl -sS -o "$SOURCE_RESPONSE" -w "%{http_code}" -X PUT "http://dremio:9047/apiv2/source/NessieSource" \
   -H "Content-Type: application/json" \
   -H "Authorization: _dremio$AUTH_TOKEN" \
-  --data "{\"name\":\"NessieSource\",\"config\":{\"nessieEndpoint\":\"http://catalog:19120/api/v2\",\"nessieAuthType\":\"NONE\",\"credentialType\":\"ACCESS_KEY\",\"awsAccessKey\":\"admin\",\"awsAccessSecret\":\"password\",\"awsRootPath\":\"datalake\",\"secure\":false,\"propertyList\":[{\"name\":\"fs.s3a.path.style.access\",\"value\":\"true\"},{\"name\":\"fs.s3a.endpoint\",\"value\":\"dremio-storage:9000\"},{\"name\":\"dremio.s3.compat\",\"value\":\"true\"}]},\"type\":\"NESSIE\",\"metadataPolicy\":{\"deleteUnavailableDatasets\":true,\"autoPromoteDatasets\":false,\"namesRefreshMillis\":3600000,\"datasetDefinitionRefreshAfterMillis\":3600000,\"datasetDefinitionExpireAfterMillis\":10800000,\"authTTLMillis\":86400000,\"updateMode\":\"PREFETCH_QUERIED\"}}"
+  --data "{\"name\":\"NessieSource\",\"config\":{\"nessieEndpoint\":\"http://catalog:19120/api/v2\",\"nessieAuthType\":\"NONE\",\"credentialType\":\"ACCESS_KEY\",\"awsAccessKey\":\"admin\",\"awsAccessSecret\":\"password\",\"awsRootPath\":\"datalake\",\"secure\":false,\"propertyList\":[{\"name\":\"fs.s3a.path.style.access\",\"value\":\"true\"},{\"name\":\"fs.s3a.endpoint\",\"value\":\"dremio-storage:9000\"},{\"name\":\"dremio.s3.compat\",\"value\":\"true\"}]},\"type\":\"NESSIE\",\"metadataPolicy\":{\"deleteUnavailableDatasets\":true,\"autoPromoteDatasets\":false,\"namesRefreshMillis\":3600000,\"datasetDefinitionRefreshAfterMillis\":3600000,\"datasetDefinitionExpireAfterMillis\":10800000,\"authTTLMillis\":86400000,\"updateMode\":\"PREFETCH_QUERIED\"}}")
+if [ "$SOURCE_HTTP_CODE" -lt 200 ] || [ "$SOURCE_HTTP_CODE" -ge 300 ]; then
+  echo "Failed to create Nessie Source in Dremio (HTTP $SOURCE_HTTP_CODE)"
+  cat "$SOURCE_RESPONSE"
+  exit 1
+fi

#!/bin/bash
# Verify login/source calls capture and validate HTTP status and token nullability.
FILE="$(fd 'dremio-setup.sh' | head -n1)"
nl -ba "$FILE" | sed -n '12,90p'
rg -n 'apiv2/login|apiv2/source/NessieSource|http_code|AUTH_TOKEN|jq -r| -z ' "$FILE"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/e2e_dbt_project/docker/dremio/dremio-setup.sh` around lines 15 - 31,
The login and source API calls must validate HTTP status and reject a literal
"null" token; modify the AUTH_TOKEN retrieval block to capture both response
body and HTTP status from the login curl (use curl -w to produce HTTP code and
capture body separately), then extract token with jq -r .token and fail if the
HTTP status is not 2xx or if AUTH_TOKEN is empty or equals "null" (print the
response body/error and exit 1); do the same for the PUT creating NessieSource:
capture response body and HTTP status, and exit non-zero with the response on
any non-2xx status. Ensure you reference the existing AUTH_TOKEN variable, the
login curl to "apiv2/login", and the PUT to "apiv2/source/NessieSource" when
making these checks.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/e2e_dbt_project/docker/dremio/dremio-setup.sh`:
- Around line 7-10: The readiness loop that uses "until curl -s
http://dremio:9047; do ... done" must be bounded: add a MAX_ATTEMPTS variable
and a counter, set curl timeouts (e.g., --connect-timeout and/or --max-time) so
each try fails fast, increment the counter inside the loop, and when attempts >=
MAX_ATTEMPTS log an error and exit non-zero; update the existing until/curl loop
code in the script to use these variables and checks so CI will fail instead of
hanging indefinitely.

---

Duplicate comments:
In `@tests/e2e_dbt_project/docker/dremio/dremio-setup.sh`:
- Around line 15-31: The login and source API calls must validate HTTP status
and reject a literal "null" token; modify the AUTH_TOKEN retrieval block to
capture both response body and HTTP status from the login curl (use curl -w to
produce HTTP code and capture body separately), then extract token with jq -r
.token and fail if the HTTP status is not 2xx or if AUTH_TOKEN is empty or
equals "null" (print the response body/error and exit 1); do the same for the
PUT creating NessieSource: capture response body and HTTP status, and exit
non-zero with the response on any non-2xx status. Ensure you reference the
existing AUTH_TOKEN variable, the login curl to "apiv2/login", and the PUT to
"apiv2/source/NessieSource" when making these checks.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 40b2c74 and bac36f9.

📒 Files selected for processing (2)

tests/e2e_dbt_project/docker/dremio/dremio-setup.sh
tests/profiles/profiles.yml.j2

🚧 Files skipped from review as they are similar to previous changes (1)

tests/profiles/profiles.yml.j2

coderabbitai · 2026-03-01T14:55:37Z

tests/e2e_dbt_project/docker/dremio/dremio-setup.sh

+until curl -s http://dremio:9047; do
+  echo "Waiting for Dremio..."
+  sleep 5
+done


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

fd 'dremio-setup.sh' --type f

Repository: elementary-data/elementary

Length of output: 121

🏁 Script executed:

cat -n tests/e2e_dbt_project/docker/dremio/dremio-setup.sh

Repository: elementary-data/elementary

Length of output: 1945

Bound the Dremio readiness wait to avoid CI hangs.

Lines 7-10 create an unbounded loop that retries indefinitely if Dremio is unreachable or if curl cannot connect. This can stall the CI pipeline with no failure signal. The loop needs an explicit max-attempts limit, a timeout on curl, and a hard exit on failure.

🔧 Suggested patch

+max_attempts=60 +attempt=1 -until curl -s http://dremio:9047; do +until curl --silent --show-error --fail --max-time 3 http://dremio:9047 >/dev/null; do + if [ "$attempt" -ge "$max_attempts" ]; then + echo "Dremio did not become ready after $max_attempts attempts" + exit 1 + fi echo "Waiting for Dremio..." sleep 5 + attempt=$((attempt + 1)) done

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

until curl -s http://dremio:9047; do

echo "Waiting for Dremio..."

sleep 5

done

max_attempts=60

attempt=1

until curl --silent --show-error --fail --max-time 3 http://dremio:9047 >/dev/null; do

if [ "$attempt" -ge "$max_attempts" ]; then

echo "Dremio did not become ready after $max_attempts attempts"

exit 1

fi

echo "Waiting for Dremio..."

sleep 5

attempt=$((attempt + 1))

done

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/e2e_dbt_project/docker/dremio/dremio-setup.sh` around lines 7 - 10, The readiness loop that uses "until curl -s http://dremio:9047; do ... done" must be bounded: add a MAX_ATTEMPTS variable and a counter, set curl timeouts (e.g., --connect-timeout and/or --max-time) so each try fails fast, increment the counter inside the loop, and when attempts >= MAX_ATTEMPTS log an error and exit non-zero; update the existing until/curl loop code in the script to use these variables and checks so CI will fail instead of hanging indefinitely.

- Make generate_data.py deterministic (fixed random seed) - Use fixed schema name for Docker adapters (ephemeral containers) - Cache seeded Docker volumes between runs using actions/cache - Cache DuckDB database file between runs - Skip dbt seed on cache hit, restoring from cached volumes instead - Applies to: Spark, Trino, Dremio, Postgres, ClickHouse, DuckDB Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

.github/workflows/test-warehouse.yml (1)
109-113: ⚠️ Potential issue | 🟠 Major

Readiness loops still don’t fail on timeout.

These checks can still silently continue after 60 attempts if the service never becomes healthy, which allows downstream steps to run against broken dependencies.

Also applies to: 124-128, 141-145
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/test-warehouse.yml around lines 109 - 113, The readiness
loops using "for i in $(seq 1 60); do curl -sf http://localhost:8086/v1/info &&
break; ... done" do not fail when the service never becomes healthy; update each
such loop (the curl-based readiness loops at the three occurrences) to detect
timeout and exit non-zero if the loop completes without success — e.g., after
the done add a check like "if [ $i -eq 60 ] && ! curl -sf
http://localhost:8086/v1/info; then echo 'Service did not become ready' >&2;
exit 1; fi" (or equivalent) so the workflow fails instead of continuing.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/test-warehouse.yml:
- Around line 249-264: The "Restore cached seed data into Docker volumes"
workflow step currently restores tarballs into volumes after the warehouse
services have started (uses CACHE_DIR, docker volume create, docker run ... tar
xzf), which can corrupt live DB files; move this step to run before the job step
that starts warehouse containers (the service startup block) so volumes are
populated prior to container creation, or alternatively modify the step to stop
the warehouse containers before restore and then restart and run a readiness
check (e.g., wait-for/postgres healthcheck) after extraction to ensure services
reinitialize cleanly.

---

Duplicate comments:
In @.github/workflows/test-warehouse.yml:
- Around line 109-113: The readiness loops using "for i in $(seq 1 60); do curl
-sf http://localhost:8086/v1/info && break; ... done" do not fail when the
service never becomes healthy; update each such loop (the curl-based readiness
loops at the three occurrences) to detect timeout and exit non-zero if the loop
completes without success — e.g., after the done add a check like "if [ $i -eq
60 ] && ! curl -sf http://localhost:8086/v1/info; then echo 'Service did not
become ready' >&2; exit 1; fi" (or equivalent) so the workflow fails instead of
continuing.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between bac36f9 and 3682f1c.

📒 Files selected for processing (2)

.github/workflows/test-warehouse.yml
tests/e2e_dbt_project/generate_data.py

.github/workflows/test-warehouse.yml

Addresses CodeRabbit review: restoring cached tarballs into Docker volumes while containers are already running risks data corruption. Now the cache key computation and volume restore happen before any Docker services are started, so containers initialise with the pre-seeded data. Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

.github/workflows/test-warehouse.yml (1)

160-164: ⚠️ Potential issue | 🟠 Major

Readiness loops still don’t fail the job on timeout.

On Line 160-164, Line 175-179, and Line 192-196, the loops stop on success but never exit 1 when retries are exhausted. The workflow can continue with unhealthy dependencies and fail later with misleading errors.

Fail-fast pattern to apply to each loop

-          for i in $(seq 1 60); do
-            curl -sf http://localhost:8086/v1/info && break
-            echo "Waiting for Trino... ($i/60)"
-            sleep 5
-          done
+          ready=0
+          for i in $(seq 1 60); do
+            if curl -sf http://localhost:8086/v1/info > /dev/null; then
+              ready=1
+              break
+            fi
+            echo "Waiting for Trino... ($i/60)"
+            sleep 5
+          done
+          if [ "$ready" -ne 1 ]; then
+            echo "Timed out waiting for Trino"
+            exit 1
+          fi

Use the same timeout-fail pattern for Dremio and Spark checks.

Also applies to: 175-179, 192-196

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In @.github/workflows/test-warehouse.yml around lines 160 - 164, The readiness
loops that poll services (the Trino loop shown and the similar Dremio and Spark
loops) currently break on success but do nothing when retries are exhausted;
update each loop so that if the curl never succeeds within the retries it fails
the job (exit 1). Concretely, after each for/seq loop (the Trino, Dremio, and
Spark readiness checks), check whether the last curl succeeded and, if not,
print an error and run exit 1 (or set a failure flag inside the loop and exit
after the loop) so the workflow stops immediately on timeout.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/test-warehouse.yml:
- Around line 96-103: The SEED_HASH calculation (the SEED_HASH env var block)
currently hashes generate_data.py, dbt_project.yml and the profiles.yml.j2
template but omits the Docker stack/compose definition, allowing cached DB
volume tarballs to be restored against incompatible images or volume layouts;
update the SEED_HASH command to also include the Docker stack/compose file(s)
used to bring up the test DB (e.g., docker-compose.yml,
docker-compose.override.yml or any stack definition referenced by the workflow)
so changes in images/layouts change the cache key and prevent restoring
incompatible tarballs.

---

Duplicate comments:
In @.github/workflows/test-warehouse.yml:
- Around line 160-164: The readiness loops that poll services (the Trino loop
shown and the similar Dremio and Spark loops) currently break on success but do
nothing when retries are exhausted; update each loop so that if the curl never
succeeds within the retries it fails the job (exit 1). Concretely, after each
for/seq loop (the Trino, Dremio, and Spark readiness checks), check whether the
last curl succeeded and, if not, print an error and run exit 1 (or set a failure
flag inside the loop and exit after the loop) so the workflow stops immediately
on timeout.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 3682f1c and 7969949.

📒 Files selected for processing (1)

.github/workflows/test-warehouse.yml

.github/workflows/test-warehouse.yml

… loops Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

…ching Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

…upport Points to dbt-data-reliability#948 which adds: - trino__full_name_split (1-based array indexing) - trino__edr_get_create_table_as_sql (bypass model.config issue) - spark__edr_get_create_table_as_sql TODO: revert after dbt-data-reliability#948 is merged Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

Prevents tar race condition where ClickHouse temporary merge files disappear during archiving, causing 'No such file or directory' errors. Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

…ache After stopping containers for volume archiving and restarting them, services like Trino need time to reinitialize. Added per-adapter health checks to wait for readiness before proceeding. Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

The /v1/info endpoint returns HTTP 200 even when Trino is still initializing. Check for '"starting":false' in the response body to ensure Trino is fully ready before proceeding. Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

…rino Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

…d for CI Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

…24 and hive images) Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

…io123) Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

…ults Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

…_period for CI Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

…code-from) Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

…ucceeds Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

…incompatibility) Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

…368864-code-ci

…w on master) Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

dbt-dremio's generate_schema_name uses dot separation for nested Nessie namespaces (e.g. elementary_tests.elementary), not underscore concatenation (elementary_tests_elementary). The CLI profile must match the namespace path created by the e2e project's dbt run. Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

Dremio's Calcite-based SQL parser treats 'snapshots' as a reserved keyword, causing 'Encountered ", snapshots" at line 6, column 6' error in the populate_model_alerts_query post-hook. Renamed to 'snapshots_data'. Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

Dremio's Calcite-based SQL parser treats 'filter' as a reserved keyword, causing 'Encountered ". filter" at line 52' error in the populate_source_freshness_alerts_query post-hook. Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

…case issue Snowflake stores columns as UPPERCASE, so quoting as "filter" (lowercase) breaks column resolution. Only quote for Dremio where it's a reserved keyword. Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

dbt-dremio's dateadd macro calls interval.replace() which fails when interval is an integer. This override casts to string first. Upstream bug in dbt-dremio's macros/utils/date_spine.sql. Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

…ERY error dbt-dremio's dateadd wraps result in 'select TIMESTAMPADD(...)' which creates a scalar subquery when embedded in larger SQL. Dremio's Calcite parser rejects multi-field RECORDTYPE in scalar subquery context. Output just TIMESTAMPADD(...) as a plain expression instead. Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

…xception Dremio's Gandiva (Arrow execution engine) cannot parse ISO 8601 timestamps with the 'Z' UTC timezone suffix (e.g. '2026-03-02T22:50:42.101Z'). This causes 'Invalid timestamp or unknown zone' errors during edr monitor report. Override dremio__edr_cast_as_timestamp in the monitor project to strip the 'Z' suffix before casting. Also add dispatch config so elementary_cli macros take priority over the elementary package for adapter-dispatched macros. Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

Gandiva rejects both 'Z' suffix and 'T' separator in ISO 8601 timestamps. Normalize '2026-03-02T23:31:12.443Z' to '2026-03-02 23:31:12.443'. Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

…imestamp text Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

…for Dremio Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

Dremio's Calcite SQL parser reserves ROW_NUMBER and COUNT as keywords. These were used as unquoted column aliases in: - get_models_latest_invocation.sql - get_models_latest_invocations_data.sql - can_upload_source_freshness.sql Applied Dremio-specific double-quoting via target.type conditional, same pattern used for 'filter' and 'snapshots' reserved keywords. Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

haritamar · 2026-03-03T01:31:54Z

elementary/monitor/dbt_project/macros/can_upload_source_freshness.sql

            where {{ elementary.edr_datediff(elementary.edr_cast_as_timestamp('generated_at'), elementary.edr_current_timestamp(), 'day') }} < {{ days_back }}
        )
-        select count(*) as count
+        select count(*) as {% if target.type == 'dremio' %}"count"{% else %}count{% endif %}


do you think we can use elementary.escape_reserved_keywords? It was created for this purpose
(its code is in the dbt-data-reliability repo)

Good call! Refactored all manual {% if target.type == 'dremio' %} quoting to use elementary.escape_reserved_keywords() instead. This covers:

row_number alias (2 files)

count alias (1 file)

filter column reference (2 files)

Also created dbt-data-reliability#955 to add row_number and snapshots to the reserved keywords list. CI is temporarily pinned to that branch.

…ed words Replace manual {% if target.type == 'dremio' %} quoting with the existing elementary.escape_reserved_keywords() utility from dbt-data-reliability. Files updated: - get_models_latest_invocation.sql: row_number alias - get_models_latest_invocations_data.sql: row_number alias - can_upload_source_freshness.sql: count alias - source_freshness_alerts.sql: filter column reference - get_source_freshness_results.sql: filter column reference Also temporarily pins dbt-data-reliability to branch with row_number and snapshots added to the reserved keywords list (PR #955). Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

Replace USING (unique_id) with explicit ON clause and select specific columns instead of SELECT * to avoid ambiguous column references in Dremio's SQL engine, which doesn't deduplicate join columns with USING. Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

…N join The switch from USING to ON for Dremio compatibility requires qualifying column references since ON doesn't deduplicate join columns like USING does. Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

devin-ai-integration bot and others added 7 commits March 1, 2026 14:31

feat: add Docker startup steps for Trino, Dremio, Spark in CI workflow

63a6bc1

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

feat: add DuckDB, Trino, Dremio, Spark profile targets

04b80ad

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

feat: add Trino Iceberg catalog config for CI testing

16665b8

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

feat: add Spark Hive metastore config for CI testing

6e520fb

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

feat: add Dremio setup script for CI testing

9351fbc

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

feat: add Trino, Dremio, Spark Docker services to docker-compose.yml

0836bbf

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

devin-ai-integration bot assigned haritamar Mar 1, 2026

devin-ai-integration bot mentioned this pull request Mar 1, 2026

docs: add DuckDB, Dremio, Spark support to OSS documentation #2136

Open

4 tasks

coderabbitai bot reviewed Mar 1, 2026

View reviewed changes

fix: use Nessie catalog source for Dremio instead of plain S3

bac36f9

Plain S3 sources in Dremio do not support CREATE TABLE (needed for dbt seed). Switch to Nessie catalog source which supports table creation via Iceberg. Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

coderabbitai bot reviewed Mar 1, 2026

View reviewed changes

.github/workflows/test-warehouse.yml Outdated Show resolved Hide resolved

coderabbitai bot reviewed Mar 1, 2026

View reviewed changes

.github/workflows/test-warehouse.yml Show resolved Hide resolved

devin-ai-integration bot and others added 2 commits March 1, 2026 15:37

fix: add docker-compose.yml to seed cache key and fail-fast readiness…

0d00cfa

… loops Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

fix: convert ClickHouse bind mount to named Docker volume for seed ca…

7213534

…ching Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

devin-ai-integration bot mentioned this pull request Mar 1, 2026

fix: add Trino and Spark adapter support for create_table_as and full_name_split elementary-data/dbt-data-reliability#948

Merged

4 tasks

devin-ai-integration bot and others added 5 commits March 1, 2026 15:46

fix: stop Docker containers before archiving seed cache volumes

c40da20

Prevents tar race condition where ClickHouse temporary merge files disappear during archiving, causing 'No such file or directory' errors. Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

fix: add Hive Metastore readiness check after container restart for T…

6459984

…rino Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

devin-ai-integration bot and others added 9 commits March 2, 2026 14:44

fix: increase dremio-minio healthcheck retries to 60 with start_perio…

849c017

…d for CI Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

fix: use bash TCP check for healthchecks (curl/nc missing in MinIO 20…

e7de065

…24 and hive images) Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

fix: align dremio-setup.sh default password with docker-compose (drem…

04aa215

…io123) Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

fix: resolve dremio.py credential extraction from shell variable defa…

0fad79f

…ults Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

fix: increase hive-metastore healthcheck retries to 60 with 60s start…

6c0b313

…_period for CI Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

fix: wait for dremio-setup to complete before proceeding (use --exit-…

a8214bd

…code-from) Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

fix: add nessie dependency to dremio-setup so NessieSource creation s…

4518e15

…ucceeds Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

fix: use ghcr.io registry for nessie image (no longer on Docker Hub)

a790a88

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

fix: add continue-on-error for Dremio edr steps (dbt-core 1.11 ref() …

d06c1e3

…incompatibility) Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

devin-ai-integration bot mentioned this pull request Mar 2, 2026

fix: add ref() macro override for dbt-core 1.11 compatibility #2137

Merged

4 tasks

devin-ai-integration bot and others added 14 commits March 2, 2026 18:54

Merge remote-tracking branch 'origin/master' into devin/ELE-5266-1772…

e450aa3

…368864-code-ci

fix: remove continue-on-error for Dremio edr steps (ref() override no…

7328c48

…w on master) Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

fix: use double quotes in dbt_project.yml for prettier compatibility

7351b30

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

fix: also replace T separator with space in Dremio timestamp cast

db901e5

Gandiva rejects both 'Z' suffix and 'T' separator in ISO 8601 timestamps. Normalize '2026-03-02T23:31:12.443Z' to '2026-03-02 23:31:12.443'. Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

fix: use targeted regex for T separator to avoid replacing T in non-t…

f0b3b09

…imestamp text Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

fix: quote 'filter' reserved keyword in get_source_freshness_results …

b4f7a3e

…for Dremio Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

haritamar reviewed Mar 3, 2026

View reviewed changes

devin-ai-integration bot mentioned this pull request Mar 3, 2026

fix: add row_number and snapshots to Dremio reserved keywords list elementary-data/dbt-data-reliability#955

Merged

2 tasks

devin-ai-integration bot and others added 4 commits March 3, 2026 01:35

chore: revert temporary dbt-data-reliability branch pin (PR #955 merged)

8a9596d

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

-until curl -s http://dremio:9047; do
-  echo "Waiting for Dremio..."
-  sleep 5
-done
+max_attempts=60
+attempt=1
+until curl --silent --show-error --fail --max-time 3 http://dremio:9047 >/dev/null; do
+  if [ "$attempt" -ge "$max_attempts" ]; then
+    echo "Dremio did not become ready after $max_attempts attempts"
+    exit 1
+  fi
+  echo "Waiting for Dremio..."
+  sleep 5
+  attempt=$((attempt + 1))
+done

Conversation

devin-ai-integration bot commented Mar 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Updates since last revision

Review & Testing Checklist for Human

Suggested Test Plan

Notes

Summary by CodeRabbit

Uh oh!

devin-ai-integration bot commented Mar 1, 2026

🤖 Devin AI Engineer

Uh oh!

linear bot commented Mar 1, 2026

Uh oh!

github-actions bot commented Mar 1, 2026

Uh oh!

coderabbitai bot commented Mar 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

haritamar Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

devin-ai-integration bot commented Mar 1, 2026 •

edited

Loading

coderabbitai bot commented Mar 1, 2026 •

edited

Loading