Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Add sqlite datasource #6657

Merged
merged 8 commits into from
Dec 29, 2022
Merged

[FEATURE] Add sqlite datasource #6657

merged 8 commits into from
Dec 29, 2022

Conversation

billdirks
Copy link
Contributor

@billdirks billdirks commented Dec 28, 2022

There are 2 substantive changes in this PR"

  1. We make the Datasource class generic over the data asset. This lets us use properties of the particular data asset in the datasource without casting/ignoring errors.
  2. Adds the new sqlite datasource.
  • My code follows the Great Expectations style guide
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added unit tests where applicable and made sure that new and existing tests are passing.
  • I have run any local integration tests and made sure that nothing is broken.

@netlify
Copy link

netlify bot commented Dec 28, 2022

Deploy Preview for niobium-lead-7998 ready!

Name Link
🔨 Latest commit 62be266
🔍 Latest deploy log https://app.netlify.com/sites/niobium-lead-7998/deploys/63ae1cb75801a200091ef950
😎 Deploy Preview https://deploy-preview-6657--niobium-lead-7998.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

@ghost
Copy link

ghost commented Dec 28, 2022

👇 Click on the image for a new way to code review
  • Make big changes easier — review code in small groups of related files

  • Know where to start — see the whole change at a glance

  • Take a code tour — explore the change with an interactive tour

  • Make comments and review — all fully sync’ed with github

    Try it now!

Review these changes using an interactive CodeSee Map

Legend

CodeSee Map Legend

assert isinstance(data_asset, TableAsset), "data_asset must be a TableAsset"
assert isinstance(
data_asset.datasource.execution_engine, SqlAlchemyExecutionEngine
def _query_for_year_and_month(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The strategy here is to take the code that use to be in param_defaults and put it here. We parameterize the query_datetime_range Callable which let's us break out the sqlite specific code.

column_splitter: Optional[SqliteYearMonthSplitter] = None # type: ignore[assignment]

# This asset type will support a variety of splitters
def add_year_and_month_splitter(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the new SqliteTableAsset so I could have this helper function add correct splitter.

default_factory=lambda: ["year", "month"]
)

def param_defaults(self, table_asset: TableAsset) -> Dict[str, List]:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried making the table_asset input parameter more specific (ie SqliteTableAsset) using Generics but ran into some problems using pydantic and Generics together. I decided not to resolve the issue right now.

order_by=order_by or [], # type: ignore[arg-type] # coerce list[str]
# see TableAsset._parse_order_by_sorter()
)
# TODO (kilo59): custom init for `DataAsset` to accept datasource in constructor?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was basically lifted from the sql datasource. I'm not sure if I should keep or delete this comment.

We may want to use the setter for the datasource instead of accessing _datasource directly. I didn't do that now because I then wanted to typecheck on assignment which led to defining more methods on each class. I can do this in a followup PR.

Copy link
Member

@Kilo59 Kilo59 Dec 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the comments are still on the parent method then I think we could delete them here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines +109 to +114
# We should make an assertion about the execution_engine earlier. Right now it is assigned to
# after construction. We may be able to use a hook in a property setter.
from great_expectations.execution_engine import SqlAlchemyExecutionEngine

assert isinstance(
table_asset.datasource.execution_engine, SqlAlchemyExecutionEngine
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be a custom pydantic validator on the SQLDatasource?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this happens in assignment I think the way to use pydantic validators would be to turn on validate_assignment on a model config which would turn it on for all fields. I'm not sure we want to do that. I do think it would be better to do this check on the property setter and when we use it, like here. Right now we set it using the private variable _datasource so some cleanup is needed. I'll do this in a followup PR.

# right side of the operator determines the type name
# left side enforces the names on instance creation
type: Literal["sqlite"] = "sqlite" # type: ignore[assignment]
connection_string: str
Copy link
Member

@Kilo59 Kilo59 Dec 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should consider adding some constraints to this connection_string field.

It could as simple a custom validator that checks that the connection_string .startswith('sqlite')

Or is could be with a pydantic constrained string type (with a regex).
A benefit of using the constrained type is the regex will show up in the JSON schema for the model.
https://docs.pydantic.dev/usage/types/#constrained-types
https://docs.pydantic.dev/usage/types/#arguments-to-constr

https://docs.sqlalchemy.org/en/14/dialects/sqlite.html#

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added constr and some tests.

Comment on lines +80 to +82
class Datasource(
ExperimentalBaseModel, Generic[DataAssetType], metaclass=MetaDatasource
):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So now the type-system knows that a PostgresDatasource contains TableAssets and not a general/non-specific DataAsset type?
🙌

@billdirks billdirks enabled auto-merge (squash) December 29, 2022 22:22
@billdirks billdirks changed the title Add sqlite datasource [FEATURE] Add sqlite datasource Dec 29, 2022
@billdirks billdirks enabled auto-merge (squash) December 29, 2022 22:41
@billdirks billdirks merged commit 9ff03a9 into develop Dec 29, 2022
@billdirks billdirks deleted the add-sqlite-datasource branch December 29, 2022 23:54
Shinnnyshinshin pushed a commit that referenced this pull request Dec 30, 2022
…eat-expectations/great_expectations into f/dx-106/pandas-named-indices

* 'f/dx-106/pandas-named-indices' of https://github.com/great-expectations/great_expectations:
  [FEATURE] Add sqlite datasource (#6657)
  [DOCS] doc-356 importable prerequisite box with correct theme applied and default entries (#6666)
  [DOCS] DOC-400 remove an outdated video link (#6667)
  [FEATURE] Added new expectation: ExpectYesterdayCountComparedToAvgEquivalentDaysOfWeek… (#6622)
  [FEATURE] ZEP - load from shared config (#6655)
  [MAINTENANCE] Add redshift/snowflake casting for sample data on expect_column_values_to_be_of_type (#6659)
  [MAINTENANCE] Add separate tests for exact stdev for redshift and snowflake (#6658)
  [MAINTENANCE] Make only column y data_alt different in col_pair_equal tests (#6661)
Shinnnyshinshin pushed a commit that referenced this pull request Dec 30, 2022
…lag-for-returning-query

* develop:
  [MAINTENANCE] Only spark v2 for special test in not_be_in_set (#6669)
  [BUGFIX] Avoid key collisions between Rule-Based Profiler terminal literals and MetricConfiguration value keys (#6675)
  [MAINTENANCE] Make only column y data_alt different in col_pair_in_set tests (#6670)
  [MAINTENANCE] Add redshift/snowflake casting for sample data on expect_column_values_to_be_in_type_list (#6668)
  [FEATURE] Add sqlite datasource (#6657)
  [DOCS] doc-356 importable prerequisite box with correct theme applied and default entries (#6666)
  [DOCS] DOC-400 remove an outdated video link (#6667)
  [FEATURE] Added new expectation: ExpectYesterdayCountComparedToAvgEquivalentDaysOfWeek… (#6622)
  [FEATURE] ZEP - load from shared config (#6655)
  [MAINTENANCE] Add redshift/snowflake casting for sample data on expect_column_values_to_be_of_type (#6659)
  [MAINTENANCE] Add separate tests for exact stdev for redshift and snowflake (#6658)
  [MAINTENANCE] Make only column y data_alt different in col_pair_equal tests (#6661)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants