Skip to content

feat: support starting_timestamp to start from an offset#12

Merged
mdrakiburrahman merged 2 commits intomainfrom
dev/mdrrahman/starting-timestamp
Apr 7, 2026
Merged

feat: support starting_timestamp to start from an offset#12
mdrakiburrahman merged 2 commits intomainfrom
dev/mdrrahman/starting-timestamp

Conversation

@mdrakiburrahman
Copy link
Copy Markdown
Contributor

Why this change is needed

Closes #11

When a model has no checkpoint yet, dbt-scope processes all source files from the beginning of time. For large datasets with years of historical files, this can be very expensive. Users need a way to skip historical files and start processing from a given point in time — analogous to Delta Streaming's startingTimestamp.

How

A new starting_timestamp config parameter (ISO-8601 UTC string, e.g. "2026-04-07T10:00:00+00:00") is added to the adapter's file discovery pipeline.

Behavior:

Checkpoint exists? starting_timestamp set? Behavior
Yes Any No-op — checkpoint watermark takes precedence
No Not set Process all files (existing behavior, backward compatible)
No Valid, before latest file Skip files ≤ timestamp, process the rest
No Valid, after all files Throw DbtRuntimeError
No Invalid/unparseable Throw DbtRuntimeError
No Naive (no timezone) Throw DbtRuntimeError

Key implementation details:

  • _parse_starting_timestamp() validates the timestamp string and ensures it has timezone info, converting to UTC
  • When no checkpoint exists and starting_timestamp is provided, a synthetic Watermark is created from it and used for filtering — no changes to file_tracker.py or checkpoint.py were needed
  • The "timestamp after all files" check only triggers an extra LIST call in the error case (no performance impact on the happy path)
  • Both table.sql and incremental.sql macros read the config and pass it through to adapter.discover_files()

Files changed:

  • dbt/adapters/scope/impl.py — core logic: _parse_starting_timestamp(), extended discover_files() and has_unprocessed_files()
  • dbt/include/scope/macros/materializations/table.sql — reads and passes starting_timestamp
  • dbt/include/scope/macros/materializations/incremental.sql — reads and passes starting_timestamp
  • tests/integration/dbt_project/models/append_no_delete.sql — uses 1900-01-01 (process all)
  • tests/integration/dbt_project/models/filtered_edition.sql — uses 2026-01-01 (skip pre-2026)

Test

  • 12 new unit tests in tests/unit/test_starting_timestamp.py covering:
    • Timestamp parsing: valid UTC, valid with offset, invalid string, empty string, naive (no tz)
    • Filtering: used when no watermark, ignored when watermark exists, correct file filtering
    • Validation: invalid raises, after-all-files raises, empty source returns empty, None processes all
  • All 210 unit tests pass (uv run pytest tests/unit/ -q)
  • Ruff lint and format checks pass on all changed files

mdrakiburrahman and others added 2 commits April 7, 2026 18:23
Add a starting_timestamp config parameter (ISO-8601 UTC) that allows
skipping historical source files when no checkpoint exists. When a
checkpoint watermark is present, the parameter is silently ignored.

- Add _parse_starting_timestamp() validator in impl.py
- Extend discover_files() and has_unprocessed_files() with the new param
- Create synthetic Watermark from starting_timestamp when no checkpoint
- Throw DbtRuntimeError on invalid or future timestamps
- Update table.sql and incremental.sql macros to pass the parameter
- Add 12 unit tests covering all behaviors
- Update integration test models with starting_timestamp config

Closes #11

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When any step fails, upload /tmp/imds-router.log and .logs/
as a downloadable artifact with 7-day retention.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@mdrakiburrahman mdrakiburrahman merged commit 9983455 into main Apr 7, 2026
2 checks passed
@mdrakiburrahman mdrakiburrahman deleted the dev/mdrrahman/starting-timestamp branch April 7, 2026 20:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Support starting_timestamp to start from an offset

1 participant