Initial support for "transforming" plugins, incl. dbt#574
Merged
Conversation
Add ability to compile and export the model's manifest file. This sadly requires a connection to an engine, even though it just returns a manifest.json file with compiled dbt views for every model. This is useful for finding out which models a dbt repository provides as well as their dependency tree. Add ability to override each dbt data source's schema separately. Add ability to filter a list of models when building a dbt model.
* introspect() gets the repository's manifest and gets a list of all models
from it that materialize as tables, allowing the user to select models
that will get loaded into their Splitgraph repo.
* The plugin requires a map of dbt source names to Splitgraph images (uses
a JSONSchema with an array)
* Yet-unimplemented preload step mounts all required images into temporary
schemas and passes them to the plugin (similar to the Splitfile executor).
* load() builds the source -> temporary schema map and calls our dbt
wrapper that builds the required tables.
* Add a `TransformingDataSource` mixin that provides a context manager temporarily
mounting images into throwaway schemas
* Add an `ImageMounter` interface that converts image reference tuples to
schemas (TODO: Splitfile execution does something similar, need to figure out
how to merge them)
* Add the `TransformingDataSource` mixin to the DBT plugin and use it to
generate the schema map.
* Switch the type required by data sources back to `PostgresEngine`, as we
need to use that engine to generate Repository objects.
* Add tables and mounter to the constructor * Get `DBTDataSource` to call the correct parent (`TransformingDataSource`).
(only data sources that support mount are meant to be shown)
gruuya
approved these changes
Nov 26, 2021
mildbyte
added a commit
that referenced
this pull request
Dec 17, 2021
Fleshing out the `splitgraph.yml` (aka `repositories.yml`) format that defines a Splitgraph Cloud "project" (datasets, their sources and metadata). Existing users of `repositories.yml` don't need to change anything, though note that `sgr cloud` commands using the YAML format will now default to `splitgraph.yml` unless explicitly set to `repositories.yml`. New sgr cloud commands: See #582 and #587 These let users manipulate Splitgraph Cloud and ingestion jobs from the CLI: * `sgr cloud status`: view the status of ingestion jobs in the current project * `sgr cloud logs`: view job logs * `sgr cloud upload`: upload a CSV file to Splitgraph Cloud (without using the engine) * `sgr cloud sync`: trigger a one-off load of a dataset * `sgr cloud stub`: generate a `splitgraph.yml` file * `sgr cloud seed`: generate a Splitgraph Cloud project with a `splitgraph.yml`, GitHub Actions, dbt etc * `sgr cloud validate`: merge multiple project files and output the result (like `docker-compose config`) * `sgr cloud download`: download a query result from Splitgraph Cloud as a CSV file, bypassing time/query size limits. repositories.yml/splitgraph.yml format: Change various commands that use `repositories.yml` to default to `splitgraph.yml` instead. Allow "mixing in" multiple `.yml` files Docker Compose-style, useful for splitting credentials (and not checking them in) and data settings. Temporary location for the new full documentation on `splitgraph.yml`: https://github.com/splitgraph/splitgraph.com/blob/f7ac524cb5023091832e8bf51b277991c435f241/content/docs/0900_splitgraph-cloud/0500_splitgraph-yml.mdx Miscellaneous: * Initial backend support for "transforming" Splitgraph plugins, including dbt (#574) * Dump scheduled ingestion/transformation jobs with `sgr cloud dump` (#577)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
TransformingDataSourcemixin that lets a data source define what images it requires to run, with anImageMounterclass that can mount and give it links to the schemas these images are mounted in at runtime.compilemode and extracts its manifest (https://docs.getdbt.com/reference/artifacts/manifest-json) to list models that the project can buildDBTDataSourcethat runs dbt transformations from Git:from it that materialize as tables, allowing the user to select models
that will get loaded into their Splitgraph repo.
a JSONSchema with an array)
ImageMounterfunctionality to build the source -> temporary schema map and calls the dbtwrapper that injects the Splitgraph image references instead of dbt sources and builds the required tables
Tested against a real-life project at https://github.com/splitgraph/jaffle_shop_archive/tree/sg-integration-test
Not implemented yet:
ref()invocations that don't map to any models/sources