New `sgr cloud` commands, fleshing out the `splitgraph.yml` (ex-`repositories.yml`), codegen for Splitgraph Cloud projects. by mildbyte · Pull Request #582 · splitgraph/sgr

mildbyte · 2021-12-16T14:38:57Z

New `sgr cloud` commands

These let users manipulate Splitgraph Cloud and ingestion jobs from the CLI:

sgr cloud status: view the status of ingestion jobs in the current project
sgr cloud logs: view job logs
sgr cloud csv: upload a CSV file to Splitgraph Cloud (without using the engine)
sgr cloud sync: trigger a one-off load of a dataset
sgr cloud stub: generate a splitgraph.yml file
sgr cloud seed: generate a Splitgraph Cloud project with a splitgraph.yml, GitHub Actions, dbt etc
sgr cloud validate: merge multiple project files and output the result (like docker-compose config)

`splitgraph.yml`

Default various commands that use repositories.yml to splitgraph.yml instead. Allow "mixing in" multiple .yml files Docker Compose-style (mostly useful for keeping credentials separate from

Wrote some documentation on the new format, GitHub Actions workflow reference-style (a header for every field with its full path in the YAML). It temporarily lives here while we can't easily deploy the docs site: https://github.com/splitgraph/splitgraph.com/blob/f7ac524cb5023091832e8bf51b277991c435f241/content/docs/0900_splitgraph-cloud/0500_splitgraph-yml.mdx

Sample project generation

sgr cloud seed generates a sample Splitgraph Cloud project from a base64-encoded "seed" (e.g. eyJuYW1lc3BhY2UiOiJtaWxkYnl0ZSIsInBsdWdpbnMiOlsicG9zdGdyZXNfZmR3Iiwic25vd2ZsYWtlIl0sImluY2x1ZGVfZGJ0Ijp0cnVlfQo=).

This is mostly for our marketing website which will let people "check out" with a Splitgraph Cloud project that contains their chosen data sources + a dbt transformation. Interested CLI users can still use it by encoding a JSON as base64:

{"namespace":"mildbyte","plugins":["postgres_fdw","snowflake"],"include_dbt":true}

and passing it to sgr cloud seed.

The intended usage is:

https://github.com/splitgraph/splitgraph-cloud-template is a template that a user can clone
Run the "seed" Action in their cloned repo to set up the initial project
Next steps: set up credentials, dbt models etc.
Example output https://github.com/mildbyte/template-test (see the README there for more information)

Miscellaneous

Add lightweight SVG icons to builtin plugins (not used in the CLI but used in Splitgraph Cloud).
Allow setting initial repo visibility in sgr cloud sync/sgr cloud load (pass --initial-private to create the repo as private if it doesn't yet exist)
Use Unified GQL API instead of separate GQL endpoints
Start using pytest-snapshot for tests that involve asserting long CLI/file outputs

(WIP, still need to unpack the response nicer and fix mypy)

WIP: doesn't use settings from repositories.yml

…SONSchema)

… dump`)

Required a small wrapper for yaml.safe_load/safe_dump to avoid deprecation warnings, but otherwise a drop-in replacement.

(bring it back in line with the PyYAML output which adds a line break after every dict element)

Limitations: - Isn't/can't be aware of the tables in the source repositories, so we have a placeholder there - Using a placeholder for the Git URL so that we can inject the repo URL at runtime in a GitHub Action

Optionally add a final stage to the GHA pipeline running dbt against all loaded repos. Also set repositories as live/not live based on whether they support mount. For repositories that don't support mount, run ingestion as previously and use `sgr cloud load` to set up metadata. For live repos, use `sgr cloud load` to set up metadata and the external data source settings.

(including defaults and tests). Also delete the inline repositories.yml format documentation from the `sgr cloud load` commandline (wrote actual docs).

Default still public; override with `--initial-private`

Run the `sgr cloud sync` first with `--initial-private` so that the user's repo by default becomes private; only then run `sgr cloud load` to set up the metadata. Doing it vice versa will make `sgr cloud load` create a public repo (and if we're doing `--skip-external`, we'll only be implicitly creating the repo through the Postgraphile API where we can't edit initial visibility settings).

Wire it to the `AddExternalRepositoryRequest` model.

…aded. Avoid redundantly setting up credentials if we're running multiple `sgr cloud load` instances from different jobs (otherwise it'll upload all credentials for every repository in `splitgraph.yml` in every job). This is idempotent but still a waste of time.

Log the errors for credential/add-external endpoints (for credentials, the JSONSchema error text also quotes the original object, so we mask it unless the user runs the command with `--verbosity DEBUG`).

Fleshing out the `splitgraph.yml` (aka `repositories.yml`) format that defines a Splitgraph Cloud "project" (datasets, their sources and metadata). Existing users of `repositories.yml` don't need to change anything, though note that `sgr cloud` commands using the YAML format will now default to `splitgraph.yml` unless explicitly set to `repositories.yml`. New sgr cloud commands: See #582 and #587 These let users manipulate Splitgraph Cloud and ingestion jobs from the CLI: * `sgr cloud status`: view the status of ingestion jobs in the current project * `sgr cloud logs`: view job logs * `sgr cloud upload`: upload a CSV file to Splitgraph Cloud (without using the engine) * `sgr cloud sync`: trigger a one-off load of a dataset * `sgr cloud stub`: generate a `splitgraph.yml` file * `sgr cloud seed`: generate a Splitgraph Cloud project with a `splitgraph.yml`, GitHub Actions, dbt etc * `sgr cloud validate`: merge multiple project files and output the result (like `docker-compose config`) * `sgr cloud download`: download a query result from Splitgraph Cloud as a CSV file, bypassing time/query size limits. repositories.yml/splitgraph.yml format: Change various commands that use `repositories.yml` to default to `splitgraph.yml` instead. Allow "mixing in" multiple `.yml` files Docker Compose-style, useful for splitting credentials (and not checking them in) and data settings. Temporary location for the new full documentation on `splitgraph.yml`: https://github.com/splitgraph/splitgraph.com/blob/f7ac524cb5023091832e8bf51b277991c435f241/content/docs/0900_splitgraph-cloud/0500_splitgraph-yml.mdx Miscellaneous: * Initial backend support for "transforming" Splitgraph plugins, including dbt (#574) * Dump scheduled ingestion/transformation jobs with `sgr cloud dump` (#577)

mildbyte added 30 commits December 7, 2021 13:07

Move GQL queries into a separate module.

9566916

Add sgr cloud status to get job ingestion status

ef6c8a9

(WIP, still need to unpack the response nicer and fix mypy)

Add sgr cloud logs to view ingestion job logs.

0f0b25b

Add sgr cloud upload for an end-to-end CSV upload.

2e08be6

Pull IngestionJobStatus into a separate type.

c4f8b9b

Add tests for sgr cloud status

4d0c5e2

Add tests for sgr cloud logs.

32b69bb

Factor table name deduplication out

d1974b8

Add tests for sgr cloud upload

75f9967

Use unified GQL in sgr cloud instead of separate endpoints.

a5a5df6

Use ASCII spinners if SG_ASCII is set.

0d552d4

Update docstrings and add new commands to the doc bundle.

628dded

Add sgr cloud sync (trigger a one-off load)

4f746a7

WIP: doesn't use settings from repositories.yml

[WIP] Implement using sgr cloud sync with repositories.yml.

ebc6bc2

Use RepositoryType in the CLI to parse repos.

434f3c2

Add tests for sgr cloud sync.

9f903e1

Add sgr cloud plugins to list/search data plugins

ad80d5d

Add ruamel.yaml to deps and regenerate the lockfile.

99744a0

Fix linter errors

3ec37ec

Add sgr cloud stub (generate repositories.yml based on a plugin's J…

0e1500e

…SONSchema)

Delete fetch_repositories_from_splitgraph (superceded by `sgr cloud…

4be2bf4

… dump`)

Use ruamel.yaml instead of PyYAML everywhere.

b56c6e2

Required a small wrapper for yaml.safe_load/safe_dump to avoid deprecation warnings, but otherwise a drop-in replacement.

Remove PyYAML, regenerate the lockfile.

533b1b5

Add tests for sgr cloud plugins

6db6ae9

Add pytest-snapshot and use it to test sgr cloud stub.

20715ba

Replace more output checks with pytest-snapshot.

2005aa3

Fix deprecation warning.

a4bfcc4

Use pytest-snapshot for sgr cloud dump tests

76bfbe6

Use default_flow_style=False in YAML output

217ada4

(bring it back in line with the PyYAML output which adds a line break after every dict element)

Disable the spinner spam when stdout-forwarding.

372473a

mildbyte added 28 commits December 13, 2021 14:16

Fix linter issue

ad36915

Generate sample dbt project from a list of repos

d152a11

Limitations: - Isn't/can't be aware of the tables in the source repositories, so we have a placeholder there - Using a placeholder for the Git URL so that we can inject the repo URL at runtime in a GitHub Action

Fix some visual bugs with stubbed YAMLs

fc1190d

Allow repositories to be optional in repositories.yml

3147823

Remove spaces from separators to save a few bytes

76c5f20

Various type fixes

d3b68a5

Add sgr cloud seed (run generate_project from CLI)

74a94c1

Add dbt_project.yml to project generation tests.

284aff5

Make test_generate_dbt_project consistent (sort files)

9395424

Move splitgraph.yml-specific models into a separate module

7c32b6a

Rename repositories.yml to splitgraph.yml

f68c96f

(including defaults and tests). Also delete the inline repositories.yml format documentation from the `sgr cloud load` commandline (wrote actual docs).

Rename RepositoriesYAML -> SplitgraphYAML

cfaebd7

Move project templates into a separate file.

78e6a0f

Add icons to builtin data sources.

3122814

Remove iconUrl from the plugins query

2479aa1

Add logo for dbt. Make get_icon non-abstract.

c1cf46d

Remove a straggler icon_url.

985ef90

Move non-CLI tests into test/splitgraph/cloud.

747bd8f

Allow setting initial repo visibility in sgr cloud sync

6454a38

Default still public; override with `--initial-private`

Add initial_private to sgr cloud load as well.

50085fc

Wire it to the `AddExternalRepositoryRequest` model.

Set default deploy URL to splitgraph.com

5076612

Add a README to the autogenerated project.

2403ada

Fix stylistic issues with the autogen project

13cec46

Add new commands to the CLI reference.

953da08

Log JSONSchema validation errors.

c308bf0

Log the errors for credential/add-external endpoints (for credentials, the JSONSchema error text also quotes the original object, so we mask it unless the user runs the command with `--verbosity DEBUG`).

mildbyte merged commit 544dc66 into master Dec 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New `sgr cloud` commands, fleshing out the `splitgraph.yml` (ex-`repositories.yml`), codegen for Splitgraph Cloud projects.#582

New `sgr cloud` commands, fleshing out the `splitgraph.yml` (ex-`repositories.yml`), codegen for Splitgraph Cloud projects.#582
mildbyte merged 69 commits intomasterfrom
feature/sgr-cloud-commands

mildbyte commented Dec 16, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mildbyte commented Dec 16, 2021