Skip to content

Commit speedups and dbt integration#510

Merged
mildbyte merged 15 commits intomasterfrom
feature/airbyte-fixes-CU-pvg55y
Aug 3, 2021
Merged

Commit speedups and dbt integration#510
mildbyte merged 15 commits intomasterfrom
feature/airbyte-fixes-CU-pvg55y

Conversation

@mildbyte
Copy link
Copy Markdown
Contributor

@mildbyte mildbyte commented Aug 3, 2021

  • Speed up partitioning on commit by only recording partition boundaries instead of copying the whole table with a partition ID on each row
  • Also write partitions in multiple threads
  • Add a size-per-row estimate to the LQ planner.
  • Add some helper functions to run a dbt transformation from Git against a single schema (clone, patch the dbt project)
  • Add ability to choose a different normalization mode when running Airbyte data sources, in line with the Airbyte runner itself:
    • none: only produce _raw tables
    • basic: Airbyte's basic normalization
    • custom: Git URL to a custom dbt project that converts the _raw tables.

mildbyte added 15 commits August 3, 2021 13:03
…only making a temporary table with chunk boundaries (instead of copying every row and its chunk ID over into a temporary table).
…ph (get a repo, patch its data sources to point to the same schema).
…rmalization and pass in a custom Git repo with a dbt project. Note that downstream users inheriting from this data source should merge the `params_schema`/`credentials_schema` using the `merge_jsonschema` function.
…e AB normalization, as well as a test for no normalization at all.
… it so that other threads that chunk the table up can see it (since they use different PG connections).
…avoid corner cases with PK-less tables having NULLs in them and breaking comparisons.
… row to text) if there's no PK and make sure to do it consistently (order by this PK and filter by it too when partitioning).
@mildbyte mildbyte merged commit 7c143e2 into master Aug 3, 2021
mildbyte added a commit that referenced this pull request Aug 18, 2021
  * Various Airbyte ingestion improvements and support for different normalization modes, including a custom dbt model (#510, #513, #514)
  * Fix mount for data source with empty credentials schema (#515)
  * Fix `sgr cloud load`/`dump` (#520)

Full set of changes: [`v0.2.15...v0.2.16`](v0.2.15...v0.2.16)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant