Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(datasets) Move default_condition at table level instead of dataset level #536

Merged
merged 10 commits into from Oct 22, 2019

Conversation

fpacifici
Copy link
Contributor

The default_condition method in a dataset is used (today) to define some structural conditions that have to be applied to all the query of that dataset to provide a consistent data model (like excluding tombstoned records by adding a deleted=0 condition automatically to all events query).

This is fine if a dataset coincides with a table, but when a dataset groups several tables that can be joined together and where queries can be processed by changing the tables we are querying, this integrity constraints cannot be expressed at dataset level anymore.
I believe they belong to the tables we are querying (more in general to RelationalSource) so that only the relevant conditions are applied after query processing.

This PR
-moves the default_condition logic into a mandatory_conditions method on RelationalSource.
-It moves events and groups logic default_condition logic into this new method
-leaves a default_condition at dataset in place. This may still be used for higher level default conditions we want to apply to the dataset as a whole.

Test plan:

  • test added for the default condition to the event dataset.

@fpacifici fpacifici requested a review from a team October 19, 2019 01:31
# TODO: This will be replaced as soon as expressions won't be strings
# thus we will be able to easily add an alias to a column in an
# expression.
(qualified_column('record_deleted', self.GROUPS_ALIAS), '=', 0)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Soon accessing a column in an expression will look like something like :

cols = expression.get_columns()
for c in cols:
      c.add_alias(alias)
      # (speculation, I still have to think through the syntax, but this is to explain the concept)

Right now, being the expression either a string or a sequence, extracting columns and adding the alias is quite complex, thus think it is not worth building all the expression editing methods for strings when the structure is going to change very soon.

@@ -115,6 +122,11 @@ def get_columns(self) -> QualifiedColumnSet:
column_sets = {alias: table.get_columns() for alias, table in tables.items()}
return QualifiedColumnSet(column_sets)

def get_mandatory_conditions(self) -> Sequence[Condition]:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still considering whether it would make sense to have this method only in classes that extend TableSource instead of any RelationalSource. Having it at the top makes things a little cleaner when applying these conditions.

Copy link
Contributor

@tkaemming tkaemming left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me structurally, but I think it could use some typing/interface changes.

snuba/datasets/schemas/tables.py Outdated Show resolved Hide resolved
snuba/datasets/schemas/tables.py Outdated Show resolved Hide resolved
snuba/datasets/schemas/tables.py Outdated Show resolved Hide resolved
snuba/datasets/schemas/tables.py Outdated Show resolved Hide resolved
snuba/datasets/schemas/tables.py Outdated Show resolved Hide resolved
def __init__(self,
table_name: str,
columns: ColumnSet,
mandatory_conditions: Optional[Sequence[Condition]],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this also have a default of None? Is this just because of the parameter order? I guess this relates to the comment below a bit too…

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed this is just for parameters order. In order to be consistent with the order of the parent class this cannot have a default value, unless I impose that every parameter after columns is not positional.
In the end it seems better to me having to provide a parameter that has a trivial default value than breaking the consistency with the parent class.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, though I think that this might be an indication that TableJoinNode should actually take a TableSource as a parameter, rather than subclassing it.

snuba/datasets/schemas/join.py Outdated Show resolved Hide resolved
tests/query/test_organization_extension.py Outdated Show resolved Hide resolved
@fpacifici fpacifici merged commit 91b4286 into master Oct 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants