[MAINTENANCE] Fluent Datasources - don't register "private" Datasource classes #7124

Kilo59 · 2023-02-12T19:05:44Z

Changes proposed in this pull request:

don't register private Datasource classes (ex _PandasDatasource)
improve invoke schema --sync to be more transparent about when and why schemas are updated (or not updated)
- start saving/committing Datasource schemas in addition to DataAsset schemas

Definition of Done

My code follows the Great Expectations style guide
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have added unit tests where applicable and made sure that new and existing tests are passing.
I have run any local integration tests and made sure that nothing is broken.

netlify · 2023-02-12T19:05:48Z

✅ Deploy Preview for niobium-lead-7998 ready!

Name	Link
🔨 Latest commit	`881ac0c`
🔍 Latest deploy log	https://app.netlify.com/sites/niobium-lead-7998/deploys/63ea4fdf30d7e700083a4946
😎 Deploy Preview	https://deploy-preview-7124--niobium-lead-7998.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

ghost · 2023-02-12T19:07:01Z

👇 Click on the image for a new way to code review

Make big changes easier — review code in small groups of related files
Know where to start — see the whole change at a glance
Take a code tour — explore the change with an interactive tour
Make comments and review — all fully sync’ed with github

Try it now!

Legend

Kilo59 · 2023-02-12T19:10:50Z

tasks.py

-                print(f"✅  {name} - {schema_path.name} schema updated")
+                print(f"🔃  {name} - {schema_path.name} schema updated")


Changed the symbol here so it's easier to tell at-a-glance if schemas have changed

Kilo59 · 2023-02-12T22:42:11Z

tasks.py

-                if issubclass(model, Datasource):
-                    print(f"🙈  {name} - is a Datasource; skipping")
-                    continue


Stop skipping Datasource schema generation

Kilo59 · 2023-02-12T22:47:56Z

great_expectations/experimental/datasources/pandas_datasource.py

@@ -80,7 +79,6 @@ class _PandasDatasource(Datasource):
    asset_types: ClassVar[List[Type[DataAsset]]] = list(_ASSET_MODELS.values())

    # instance attributes
-    type: str = pydantic.Field("_pandas")


This field serves no purpose because this class is not meant to be instantiated. Was only defined to prevent a crash during Metadatasource.__new__ registration. Now that Metadatasource skips registration for _Datasources we can remove it.

alexsherstinsky · 2023-02-12T23:15:07Z

great_expectations/experimental/datasources/spark_datasource.py

@@ -49,7 +48,6 @@ class _SparkDatasource(Datasource):
    asset_types: ClassVar[List[Type[DataAsset]]] = [CSVSparkAsset]

    # instance attributes
-    type: str = pydantic.Field("_spark")


@Kilo59 I may need some help next week figuring out the right way to create SparkFilesystemDatasource, SparkS3Datasource, and so on in terms of the type field as well as the Schema file naming and generation. Thank you.

I would think this should just be 'spark_file' or 'spark_filesystem' and 'spark_s3'.

alexsherstinsky · 2023-02-12T23:15:48Z

great_expectations/experimental/datasources/schemas/SparkDatasource.json

@@ -22,10 +22,16 @@
            "additionalProperties": {
                "$ref": "#/definitions/CSVSparkAsset"
            }
+        },


@Kilo59 I may need some help next week figuring out the right way to create SparkFilesystemDatasource, SparkS3Datasource, and so on in terms of the type field as well as the Schema file naming and generation. Most likely, this particular file will have to attain a different name (e.g., SparkFilesystemDatasource.json). Thank you.

alexsherstinsky

LGTM (in addition, please see the comments). Thank you!

Kilo59 added 7 commits February 12, 2023 13:47

don't register private _Datasource classes

d67c6c7

change symbol for updated schema

73570a1

don't skip Datasource schemas

9ad339d

SparkDatasource.json changes

045cdae

commit PandasFileSystemDatasource.json

1037a1a

remove xfail exceptions for test_schemas

bf07919

TODO before merge

ef44f91

github-actions bot added core-team platform labels Feb 12, 2023

Kilo59 self-assigned this Feb 12, 2023

Kilo59 added the zep Zero Entry Pool work label Feb 12, 2023

Kilo59 commented Feb 12, 2023

View reviewed changes

Kilo59 added 3 commits February 12, 2023 14:56

better pandas ds & asset identifying logic

9b5c955

better message about wrong pandas version

a146cac

exit 0 after schema sync

5fd2c4a

Kilo59 commented Feb 12, 2023

View reviewed changes

import sorting

7cb98c6

Kilo59 marked this pull request as ready for review February 12, 2023 22:45

Kilo59 commented Feb 12, 2023

View reviewed changes

Kilo59 requested a review from a team February 12, 2023 22:48

Kilo59 changed the title ~~[MAINTENANCE] Fluent Datasources - QOL updates~~ [MAINTENANCE] Fluent Datasources - don't register "private" Datasource classes Feb 12, 2023

alexsherstinsky reviewed Feb 12, 2023

View reviewed changes

alexsherstinsky approved these changes Feb 12, 2023

View reviewed changes

Kilo59 requested review from billdirks and NathanFarmer February 13, 2023 00:27

NathanFarmer approved these changes Feb 13, 2023

View reviewed changes

Kilo59 enabled auto-merge (squash) February 13, 2023 14:56

Merge branch 'develop' into m/great-1662/zep-qol

881ac0c

Kilo59 merged commit 6c36b37 into develop Feb 13, 2023

Kilo59 deleted the m/great-1662/zep-qol branch February 13, 2023 15:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MAINTENANCE] Fluent Datasources - don't register "private" Datasource classes #7124

[MAINTENANCE] Fluent Datasources - don't register "private" Datasource classes #7124

Kilo59 commented Feb 12, 2023 •

edited

netlify bot commented Feb 12, 2023 •

edited

ghost commented Feb 12, 2023 •

edited by ghost

Kilo59 Feb 12, 2023

Kilo59 Feb 12, 2023

Kilo59 Feb 12, 2023 •

edited

alexsherstinsky Feb 12, 2023

Kilo59 Feb 13, 2023

alexsherstinsky Feb 12, 2023

alexsherstinsky left a comment

		print(f"✅ {name} - {schema_path.name} schema updated")
		print(f"🔃 {name} - {schema_path.name} schema updated")

[MAINTENANCE] Fluent Datasources - don't register "private" Datasource classes #7124

[MAINTENANCE] Fluent Datasources - don't register "private" Datasource classes #7124

Conversation

Kilo59 commented Feb 12, 2023 • edited

Changes proposed in this pull request:

Definition of Done

netlify bot commented Feb 12, 2023 • edited

✅ Deploy Preview for niobium-lead-7998 ready!

ghost commented Feb 12, 2023 • edited by ghost

Legend

Kilo59 Feb 12, 2023

Choose a reason for hiding this comment

Kilo59 Feb 12, 2023

Choose a reason for hiding this comment

Kilo59 Feb 12, 2023 • edited

Choose a reason for hiding this comment

alexsherstinsky Feb 12, 2023

Choose a reason for hiding this comment

Kilo59 Feb 13, 2023

Choose a reason for hiding this comment

alexsherstinsky Feb 12, 2023

Choose a reason for hiding this comment

alexsherstinsky left a comment

Choose a reason for hiding this comment

Kilo59 commented Feb 12, 2023 •

edited

netlify bot commented Feb 12, 2023 •

edited

ghost commented Feb 12, 2023 •

edited by ghost

Kilo59 Feb 12, 2023 •

edited