Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MAINTENANCE] Fluent Datasources - don't register "private" Datasource classes #7124

Merged
merged 12 commits into from Feb 13, 2023

Conversation

Kilo59
Copy link
Member

@Kilo59 Kilo59 commented Feb 12, 2023

Changes proposed in this pull request:

  • don't register private Datasource classes (ex _PandasDatasource)
  • improve invoke schema --sync to be more transparent about when and why schemas are updated (or not updated)
    • start saving/committing Datasource schemas in addition to DataAsset schemas

Definition of Done

  • My code follows the Great Expectations style guide
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added unit tests where applicable and made sure that new and existing tests are passing.
  • I have run any local integration tests and made sure that nothing is broken.

@netlify
Copy link

netlify bot commented Feb 12, 2023

Deploy Preview for niobium-lead-7998 ready!

Name Link
🔨 Latest commit 881ac0c
🔍 Latest deploy log https://app.netlify.com/sites/niobium-lead-7998/deploys/63ea4fdf30d7e700083a4946
😎 Deploy Preview https://deploy-preview-7124--niobium-lead-7998.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

@ghost
Copy link

ghost commented Feb 12, 2023

👇 Click on the image for a new way to code review
  • Make big changes easier — review code in small groups of related files

  • Know where to start — see the whole change at a glance

  • Take a code tour — explore the change with an interactive tour

  • Make comments and review — all fully sync’ed with github

    Try it now!

Review these changes using an interactive CodeSee Map

Legend

CodeSee Map Legend

@Kilo59 Kilo59 self-assigned this Feb 12, 2023
@Kilo59 Kilo59 added the zep Zero Entry Pool work label Feb 12, 2023
Comment on lines -551 to +548
print(f" {name} - {schema_path.name} schema updated")
print(f"🔃 {name} - {schema_path.name} schema updated")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the symbol here so it's easier to tell at-a-glance if schemas have changed

Comment on lines -541 to -543
if issubclass(model, Datasource):
print(f"🙈 {name} - is a Datasource; skipping")
continue
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stop skipping Datasource schema generation

@Kilo59 Kilo59 marked this pull request as ready for review February 12, 2023 22:45
@@ -80,7 +79,6 @@ class _PandasDatasource(Datasource):
asset_types: ClassVar[List[Type[DataAsset]]] = list(_ASSET_MODELS.values())

# instance attributes
type: str = pydantic.Field("_pandas")
Copy link
Member Author

@Kilo59 Kilo59 Feb 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This field serves no purpose because this class is not meant to be instantiated. Was only defined to prevent a crash during Metadatasource.__new__ registration. Now that Metadatasource skips registration for _Datasources we can remove it.

@Kilo59 Kilo59 requested a review from a team February 12, 2023 22:48
@Kilo59 Kilo59 changed the title [MAINTENANCE] Fluent Datasources - QOL updates [MAINTENANCE] Fluent Datasources - don't register "private" Datasource classes Feb 12, 2023
@@ -49,7 +48,6 @@ class _SparkDatasource(Datasource):
asset_types: ClassVar[List[Type[DataAsset]]] = [CSVSparkAsset]

# instance attributes
type: str = pydantic.Field("_spark")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Kilo59 I may need some help next week figuring out the right way to create SparkFilesystemDatasource, SparkS3Datasource, and so on in terms of the type field as well as the Schema file naming and generation. Thank you.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think this should just be 'spark_file' or 'spark_filesystem' and 'spark_s3'.

@@ -22,10 +22,16 @@
"additionalProperties": {
"$ref": "#/definitions/CSVSparkAsset"
}
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Kilo59 I may need some help next week figuring out the right way to create SparkFilesystemDatasource, SparkS3Datasource, and so on in terms of the type field as well as the Schema file naming and generation. Most likely, this particular file will have to attain a different name (e.g., SparkFilesystemDatasource.json). Thank you.

Copy link
Contributor

@alexsherstinsky alexsherstinsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (in addition, please see the comments). Thank you!

@Kilo59 Kilo59 enabled auto-merge (squash) February 13, 2023 14:56
@Kilo59 Kilo59 merged commit 6c36b37 into develop Feb 13, 2023
@Kilo59 Kilo59 deleted the m/great-1662/zep-qol branch February 13, 2023 15:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core-team platform zep Zero Entry Pool work
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants