feat(datasets) Move default_condition at table level instead of dataset level #536

fpacifici · 2019-10-19T01:31:44Z

The default_condition method in a dataset is used (today) to define some structural conditions that have to be applied to all the query of that dataset to provide a consistent data model (like excluding tombstoned records by adding a deleted=0 condition automatically to all events query).

This is fine if a dataset coincides with a table, but when a dataset groups several tables that can be joined together and where queries can be processed by changing the tables we are querying, this integrity constraints cannot be expressed at dataset level anymore.
I believe they belong to the tables we are querying (more in general to RelationalSource) so that only the relevant conditions are applied after query processing.

This PR
-moves the default_condition logic into a mandatory_conditions method on RelationalSource.
-It moves events and groups logic default_condition logic into this new method
-leaves a default_condition at dataset in place. This may still be used for higher level default conditions we want to apply to the dataset as a whole.

Test plan:

test added for the default condition to the event dataset.

fpacifici · 2019-10-19T01:34:08Z

snuba/datasets/groups.py

+                    # TODO: This will be replaced as soon as expressions won't be strings
+                    # thus we will be able to easily add an alias to a column in an
+                    # expression.
+                    (qualified_column('record_deleted', self.GROUPS_ALIAS), '=', 0)


Soon accessing a column in an expression will look like something like :

cols = expression.get_columns() for c in cols: c.add_alias(alias) # (speculation, I still have to think through the syntax, but this is to explain the concept)

Right now, being the expression either a string or a sequence, extracting columns and adding the alias is quite complex, thus think it is not worth building all the expression editing methods for strings when the structure is going to change very soon.

fpacifici · 2019-10-19T01:39:02Z

snuba/datasets/schemas/join.py

@@ -115,6 +122,11 @@ def get_columns(self) -> QualifiedColumnSet:
        column_sets = {alias: table.get_columns() for alias, table in tables.items()}
        return QualifiedColumnSet(column_sets)

+    def get_mandatory_conditions(self) -> Sequence[Condition]:


Still considering whether it would make sense to have this method only in classes that extend TableSource instead of any RelationalSource. Having it at the top makes things a little cleaner when applying these conditions.

tkaemming

This looks good to me structurally, but I think it could use some typing/interface changes.

snuba/datasets/schemas/tables.py

snuba/datasets/schemas/join.py

tkaemming · 2019-10-22T20:43:44Z

snuba/datasets/schemas/join.py

+    def __init__(self,
+        table_name: str,
+        columns: ColumnSet,
+        mandatory_conditions: Optional[Sequence[Condition]],


Should this also have a default of None? Is this just because of the parameter order? I guess this relates to the comment below a bit too…

indeed this is just for parameters order. In order to be consistent with the order of the parent class this cannot have a default value, unless I impose that every parameter after columns is not positional.
In the end it seems better to me having to provide a parameter that has a trivial default value than breaking the consistency with the parent class.

Makes sense, though I think that this might be an indication that TableJoinNode should actually take a TableSource as a parameter, rather than subclassing it.

snuba/datasets/schemas/join.py

tests/query/test_organization_extension.py

fpacifici added 5 commits October 18, 2019 13:30

Move condition type into its own module

b36ebaa

Merge branch 'master' into feat/moveDefaultConditions

4117d2d

Merge branch 'master' into feat/moveDefaultConditions

9a9f25e

Add mandatory conditions to RelationalSource

751a11e

Replace default_onditions with mandatory conditions

15b38af

fpacifici requested a review from a team October 19, 2019 01:31

fpacifici commented Oct 19, 2019

View reviewed changes

Move a comment in the right file

9147f55

fpacifici commented Oct 19, 2019

View reviewed changes

tkaemming suggested changes Oct 21, 2019

View reviewed changes

tkaemming reviewed Oct 21, 2019

View reviewed changes

snuba/datasets/schemas/join.py Outdated Show resolved Hide resolved

fpacifici added 3 commits October 22, 2019 11:05

Merge branch 'master' into feat/moveDefaultConditions

1551ad2

Make mandatory_condition field optional

afb9d74

Add a default value

13a2c76

fpacifici requested a review from tkaemming October 22, 2019 20:35

tkaemming approved these changes Oct 22, 2019

View reviewed changes

Nits

10c0836

fpacifici merged commit 91b4286 into master Oct 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(datasets) Move default_condition at table level instead of dataset level #536

feat(datasets) Move default_condition at table level instead of dataset level #536

fpacifici commented Oct 19, 2019

fpacifici Oct 19, 2019

fpacifici Oct 19, 2019

tkaemming left a comment

tkaemming Oct 22, 2019

fpacifici Oct 22, 2019

tkaemming Oct 23, 2019

feat(datasets) Move default_condition at table level instead of dataset level #536

feat(datasets) Move default_condition at table level instead of dataset level #536

Conversation

fpacifici commented Oct 19, 2019

fpacifici Oct 19, 2019

Choose a reason for hiding this comment

fpacifici Oct 19, 2019

Choose a reason for hiding this comment

tkaemming left a comment

Choose a reason for hiding this comment

tkaemming Oct 22, 2019

Choose a reason for hiding this comment

fpacifici Oct 22, 2019

Choose a reason for hiding this comment

tkaemming Oct 23, 2019

Choose a reason for hiding this comment