Conversation
(WIP, still need to unpack the response nicer and fix mypy)
WIP: doesn't use settings from repositories.yml
Required a small wrapper for yaml.safe_load/safe_dump to avoid deprecation warnings, but otherwise a drop-in replacement.
(bring it back in line with the PyYAML output which adds a line break after every dict element)
Limitations:
- Isn't/can't be aware of the tables in the source repositories, so we have
a placeholder there
- Using a placeholder for the Git URL so that we can inject the repo URL
at runtime in a GitHub Action
Optionally add a final stage to the GHA pipeline running dbt against all loaded repos. Also set repositories as live/not live based on whether they support mount. For repositories that don't support mount, run ingestion as previously and use `sgr cloud load` to set up metadata. For live repos, use `sgr cloud load` to set up metadata and the external data source settings.
(including defaults and tests). Also delete the inline repositories.yml format documentation from the `sgr cloud load` commandline (wrote actual docs).
Default still public; override with `--initial-private`
Run the `sgr cloud sync` first with `--initial-private` so that the user's repo by default becomes private; only then run `sgr cloud load` to set up the metadata. Doing it vice versa will make `sgr cloud load` create a public repo (and if we're doing `--skip-external`, we'll only be implicitly creating the repo through the Postgraphile API where we can't edit initial visibility settings).
Wire it to the `AddExternalRepositoryRequest` model.
…aded. Avoid redundantly setting up credentials if we're running multiple `sgr cloud load` instances from different jobs (otherwise it'll upload all credentials for every repository in `splitgraph.yml` in every job). This is idempotent but still a waste of time.
Log the errors for credential/add-external endpoints (for credentials, the JSONSchema error text also quotes the original object, so we mask it unless the user runs the command with `--verbosity DEBUG`).
mildbyte
added a commit
that referenced
this pull request
Dec 17, 2021
Fleshing out the `splitgraph.yml` (aka `repositories.yml`) format that defines a Splitgraph Cloud "project" (datasets, their sources and metadata). Existing users of `repositories.yml` don't need to change anything, though note that `sgr cloud` commands using the YAML format will now default to `splitgraph.yml` unless explicitly set to `repositories.yml`. New sgr cloud commands: See #582 and #587 These let users manipulate Splitgraph Cloud and ingestion jobs from the CLI: * `sgr cloud status`: view the status of ingestion jobs in the current project * `sgr cloud logs`: view job logs * `sgr cloud upload`: upload a CSV file to Splitgraph Cloud (without using the engine) * `sgr cloud sync`: trigger a one-off load of a dataset * `sgr cloud stub`: generate a `splitgraph.yml` file * `sgr cloud seed`: generate a Splitgraph Cloud project with a `splitgraph.yml`, GitHub Actions, dbt etc * `sgr cloud validate`: merge multiple project files and output the result (like `docker-compose config`) * `sgr cloud download`: download a query result from Splitgraph Cloud as a CSV file, bypassing time/query size limits. repositories.yml/splitgraph.yml format: Change various commands that use `repositories.yml` to default to `splitgraph.yml` instead. Allow "mixing in" multiple `.yml` files Docker Compose-style, useful for splitting credentials (and not checking them in) and data settings. Temporary location for the new full documentation on `splitgraph.yml`: https://github.com/splitgraph/splitgraph.com/blob/f7ac524cb5023091832e8bf51b277991c435f241/content/docs/0900_splitgraph-cloud/0500_splitgraph-yml.mdx Miscellaneous: * Initial backend support for "transforming" Splitgraph plugins, including dbt (#574) * Dump scheduled ingestion/transformation jobs with `sgr cloud dump` (#577)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
New
sgr cloudcommandsThese let users manipulate Splitgraph Cloud and ingestion jobs from the CLI:
sgr cloud status: view the status of ingestion jobs in the current projectsgr cloud logs: view job logssgr cloud csv: upload a CSV file to Splitgraph Cloud (without using the engine)sgr cloud sync: trigger a one-off load of a datasetsgr cloud stub: generate asplitgraph.ymlfilesgr cloud seed: generate a Splitgraph Cloud project with asplitgraph.yml, GitHub Actions, dbt etcsgr cloud validate: merge multiple project files and output the result (likedocker-compose config)splitgraph.ymlDefault various commands that use
repositories.ymltosplitgraph.ymlinstead. Allow "mixing in" multiple.ymlfiles Docker Compose-style (mostly useful for keeping credentials separate fromWrote some documentation on the new format, GitHub Actions workflow reference-style (a header for every field with its full path in the YAML). It temporarily lives here while we can't easily deploy the docs site: https://github.com/splitgraph/splitgraph.com/blob/f7ac524cb5023091832e8bf51b277991c435f241/content/docs/0900_splitgraph-cloud/0500_splitgraph-yml.mdx
Sample project generation
sgr cloud seedgenerates a sample Splitgraph Cloud project from a base64-encoded "seed" (e.g.eyJuYW1lc3BhY2UiOiJtaWxkYnl0ZSIsInBsdWdpbnMiOlsicG9zdGdyZXNfZmR3Iiwic25vd2ZsYWtlIl0sImluY2x1ZGVfZGJ0Ijp0cnVlfQo=).This is mostly for our marketing website which will let people "check out" with a Splitgraph Cloud project that contains their chosen data sources + a dbt transformation. Interested CLI users can still use it by encoding a JSON as base64:
{"namespace":"mildbyte","plugins":["postgres_fdw","snowflake"],"include_dbt":true}and passing it to
sgr cloud seed.The intended usage is:
Miscellaneous
sgr cloud sync/sgr cloud load(pass--initial-privateto create the repo as private if it doesn't yet exist)