Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(dbt-sync): Migrate legacy endpoints to fix virtual dataset creation #232

Merged
merged 5 commits into from
Aug 2, 2023

Conversation

Vitor-Avila
Copy link
Contributor

This PR updates the _run_query function to use the new API endpoint by default, but maintains backwards compatibility in case the new endpoint is not available (returns 404) (for older Superset versions).

By default, the virtual dataset creation would include running the query, but this PR also updates the default behavior since now the dataset endpoint supports virtual dataset creation directly (without columns information). This change was also made with backwards compatibility in mind for older Superset versions.

This PR fixes #231 and #219.

Comment on lines 567 to +578
if "sql" not in kwargs:
return self.create_resource("dataset", **kwargs)

# Check if the dataset creation supports sql directly
not_legacy = self.validate_key_in_resource_schema(
"dataset",
"sql",
keys=["add_columns"],
)
not_legacy = not_legacy["add_columns"]
if not_legacy:
return self.create_resource("dataset", **kwargs)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about doing the validation first, and then change line 567 to if "sql" not in kwargs or not_legacy so that we would only have a single return, but then I realized most commonly the integration syncs physical datasets, and this change would always execute validate_key_in_resource_schema which is another API request. With that in mind, I decided to have a second block for this, so that we only call this other endpoint if needed (trying to not increase the dataset sync duration).

@@ -84,7 +84,7 @@ def dbt_core( # pylint: disable=too-many-arguments, too-many-locals
profiles: Optional[str] = None,
exposures: Optional[str] = None,
import_db: bool = False,
disallow_edits: bool = True,
disallow_edits: bool = False,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed this was set to True here even tough the default state is False, so I updated only for consistency but it doesn't really change the result. It should be totally fine to revert it to True if we want to.

Comment on lines +106 to +114
if "MANAGER_URL" not in ctx.obj and disallow_edits:
warnings.warn(
(
"The managed externally feature was only introduced in Superset v1.5."
"Make sure you are running a compatible version."
),
category=UserWarning,
stacklevel=2,
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought about only including is_managed_externally to the update payloads in case --disallow-edits flag was used, but then I realized that if a user wanted to trigger a new sync without this flag to set it to False, that wouldn't work. So I only added this warning which in addition to the error faced for legacy instances should be pretty straightforward (the error would actually happens even if --disallow-edits isn't included).

Copy link
Member

@betodealmeida betodealmeida left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is amazing! I left a few suggestions.

src/preset_cli/api/clients/superset.py Outdated Show resolved Hide resolved
src/preset_cli/api/clients/superset.py Show resolved Hide resolved
src/preset_cli/api/clients/superset.py Show resolved Hide resolved
Vitor-Avila and others added 2 commits August 1, 2023 23:12
Co-authored-by: Beto Dealmeida <roberto@dealmeida.net>
@Vitor-Avila Vitor-Avila merged commit d05d01b into main Aug 2, 2023
4 checks passed
@Vitor-Avila Vitor-Avila deleted the endpoint_migrations branch August 2, 2023 13:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

dbt sync is failing to create virtual datasets for Preset Cloud
2 participants