Backend filtering refactor #837

dmos62 · 2021-11-22T14:11:16Z

Fixes #385, fixes #846

Sets up the backend infrastructure for filtering.

Technical details

Uses our fork of sqlalchemy-filters.

Checklist

My pull request has a descriptive title (not a vague title like Update index.md).
My pull request targets the master branch of the repository
My commit messages follow best practices.
My code follows the established code style of the repository.
I added tests for the changes I made (if applicable).
I added or updated documentation (if applicable).
I tried running the project locally and verified that there are no
visible errors.

My TODO

Developer Certificate of Origin

Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.


Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the open source license
    indicated in the file; or

(b) The contribution is based upon previous work that, to the best
    of my knowledge, is covered under an appropriate open source
    license and I have the right under that license to submit that
    work with modifications, whether created in whole or in part
    by me, under the same open source license (unless I am
    permitted to submit under a different license), as indicated
    in the file; or

(c) The contribution was provided directly to me by some other
    person who certified (a), (b) or (c) and I have not modified
    it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.

Also includes implementation for converting to SA filter spec.

Apparently dataclass defaults don't carry over from mixins.

I was wrong. Apparently I just got the order of mixin application wrong.

There's nothing faux about it.

kgodey · 2021-11-23T22:23:52Z

@dmos62 I see that you're working off of a fork still, you have write access to the repo now so I recommend switching to a branch instead.

dmos62 · 2021-11-24T17:58:32Z

At the moment possible predicates are exposed in the REST API like this:

http://localhost:8000/api/v0/databases/1/types/:

[
    {
        "identifier": "boolean",
        "name": "Boolean",
        "db_types": [
            "BOOLEAN"
        ],
        "filters": [
            {
                "superType": "leaf",
                "type": "equal",
                "parameterCount": "single",
                "parameterMathesarType": "boolean"
            },
            {
                "superType": "leaf",
                "type": "greater",
                "parameterCount": "single",
                "parameterMathesarType": "boolean"
            },
            {
                "superType": "leaf",
                "type": "greater_or_equal",
                "parameterCount": "single",
                "parameterMathesarType": "boolean"
            },
            {
                "superType": "leaf",
                "type": "lesser",
                "parameterCount": "single",
                "parameterMathesarType": "boolean"
            },
            {
                "superType": "leaf",
                "type": "lesser_or_equal",
                "parameterCount": "single",
                "parameterMathesarType": "boolean"
            },
            {
                "superType": "leaf",
                "type": "empty",
                "parameterCount": "none"
            },
            {
                "superType": "leaf",
                "type": "in",
                "parameterCount": "multi",
                "parameterMathesarType": "boolean"
            },
            {
                "superType": "branch",
                "type": "not",
                "parameterCount": "single"
            },
            {
                "superType": "branch",
                "type": "and",
                "parameterCount": "multi"
            },
            {
                "superType": "branch",
                "type": "or",
                "parameterCount": "multi"
            }
        ]
    },
    ...
]

kgodey · 2021-11-24T23:07:44Z

Couple of quick comments:

I don't think we should mix snake case and camel case in key names. I think we should stick with snake case since that's what the rest of the API uses. (This also applies to function names).
I'm not sure what superType means.

dmos62 · 2021-11-25T11:56:01Z

@kgodey thanks for noticing the casing conflict. I'm using the tree abstraction for predicates. The empty predicate's superType is leaf, because it's always a leaf node on the predicate tree (has height zero). branch predicates are never leaves (have height that's non-zero), like and, or or not. Example composite predicate:

and(not(empty(field1)), equal(field2, value))

and and not will always have other predicates within them (they're branches), while empty and equal never have predicates within them (they're leafes).

kgodey · 2021-11-26T17:37:54Z

I'm using the tree abstraction for predicates. The empty predicate's superType is leaf, because it's always a leaf node on the predicate tree (has height zero). branch predicates are never leaves (have height that's non-zero), like and, or or not.

I figured that out, I meant specifically that the "super type" nomenclature is a little confusing, if I was just paying attention to the key names, seems like it's a superset of the "type" key somehow (which it's not). Is there a more obvious name for it? Or are you using some standard set of names derived from something else?

kgodey · 2021-11-26T18:36:20Z

db/filters/base.py

+
+def not_empty(l): return len(l) > 0
+
+def assertPredicateCorrect(predicate):


@dmos62 we use snake case for Python functions and variables too. We only use camel case (capitalized) for class names.

Fixed it. Thanks for pointing that out.

dmos62 · 2021-11-26T18:56:59Z

superType is a superset in that every Predicate subclass (every type in other words) is also a subclass of one of the superTypes. Or do you mean something else?

I'm open to suggestions. Calling it a parent type would have a similar meaning. That's pretty much talking about the underlying class/mixin hierarchy. We could also have nomenclature that talks about predicate names (e.g. empty, greater, and) and positions in predicate trees (leaf or branch).

kgodey · 2021-11-26T19:03:02Z

Calling it a parent type would have a similar meaning. That's pretty much talking about the underlying class/mixin hierarchy.

API users will probably not know or care about how it's implemented, I think the nomenclature should prioritize API readability.

We could also have nomenclature that talks about predicate names (e.g. empty, greater, and) and positions in predicate trees (leaf or branch).

I like this. How about name instead of type and position instead of super_type?

…perly

dmos62 · 2021-12-09T14:28:38Z

Since @mathemancer is still making changes on the dependency PR (#862), I'll hold off on merging it into this one.

kgodey

I'll review this more detail later.

db/filters/base.py

silentninja · 2021-12-20T12:25:08Z

Have we decided on using dataclasses and typing in our codebase?

…tering-numbers

mathemancer

First pass review. I'll be more precise once this is more stabilized. However, I have some broader comments to make. Overall, I really like the tree concept. I do think there's a false dichotomy introduced; there are branches and leaves, but you can't assume that an "and" is a branch (since it could be an "and" between two boolean columns). I.e., the dichotomy isn't between types exactly.

My biggest concern is the specificity. I really think we need to take a try and see what happens approach to some of these things, rather than trying to check everything beforehand. Let the DB tell you if a given proposition makes no sense, and handle the error. This will be more flexible, and avoid having to define things in multiple places. Long run, I really think it'll be easier to maintain that way. For example, avoiding specifying branch vs. leaf is more flexible for composition. I acknowledge that it would sometimes run into issues, but that can be handled by really good feedback and errors.

Final note for this round: we need to come to some kind of team-wide agreement about type hints (and by implication dataclasses). I'm ambivalent on these issues, but I suspect I'm the only one.

db/filters/operations/deserialize.py

db/filters/base.py

kgodey · 2021-12-21T00:25:08Z

Have we decided on using dataclasses and typing in our codebase?

Final note for this round: we need to come to some kind of team-wide agreement about type hints (and by implication dataclasses). I'm ambivalent on these issues, but I suspect I'm the only one.

I think we should take the discussion about dataclasses and type hints to a GitHub discussion so as to not clutter up this PR. @dmos62, I think it would make sense for you to start this.

kgodey · 2021-12-21T00:45:35Z

@dmos62 This is a large PR. I think it would help me review the code if you could do a write up of the changes. Topics I think would be useful:

Explaining the code structure and why you chose it + any alternatives you considered.
Extensibility - how to add new filters and new data types
The benefits of dataclasses in this particular application

I'm having a hard time grokking the code because it doesn't seem Pythonic somehow. That's probably not useful but I can't articulate any more specifics, I'm hoping reading through your explanation will help me either understand the code or articulate why it's hard for me to understand.

mathesar/api/serializers/records.py

dmos62 · 2021-12-21T11:59:25Z

Writeup

I'm collapsing this write-up; please see its copy-pasted (and possibly updated) version on the initial post of the new thread for this PR.

Collapsed

### Features

Notice that I might use the terms filter specification and predicate interchangeably.

Some of the things this new predicate data structure does is:

Declares (predicate) correctness to a high degree
- An illegal predicate cannot be instantiated; an instantiated predicate is always legal
  - Notable exception
    - A predicate referencing unexisting columns can be instantiated
      - Referenced column existance is checked right before applying the filter specification to a query
Declares what predicates the frontend can use under what circumstances and how
- It can tell if a predicate is compatible with some column type based on its properties (like comparability, compatibility with LIKE, compatibility with URI-type-specific SQL functions)
- It can tell the frontend how many parameters a predicate takes (e.g. empty, equal and in take different number of params) and it can tell it what options it accepts (e.g. should starts_with be case insensitive)
- It can tell the frontend how to compose predicates: leaf vs branch predicates: parameters on branches are other predicates
It supports SQL functions
- You can use SQL functions to, for example, destructure a URI and filter based on that
  - Though this extension is/will be in a newer PR
  - Was not possible with sqlalchemy-filters

What it does not (currently) do:

Does not support using other columns as parameters
- Can't do {equal: {column: x, parameter: {column: y}}}
- This is an oversight
- Does not seem difficult to implement if/when there's interest

Technical details

I organized the relevant pieces in the predicate data structure into a mixin hierarchy and also used this PR as a testbed for frozen dataclasses. Some of my objectives with the basic structure were:

Immutability
- Used properties where I could
  - Didn't use properties where an SQLAlchemy filter is returned, since its mutability is uncertain
Many small classes
- Use the mixin/type system to capture information
  - A class that is a predicate that takes one parameter and relies on LIKE will directly or indirectly mix in ReliesOnLike, SingleParameter, Leaf and Predicate
- Some logic I offloaded to and hence centralized in static methods: correctness is declared on db/filters/base::assert_predicate_correct: and, what the JSON filter specification is is declared in db/filters/operations/deserialize
  - Has the drawback that the control flow in these methods can seem daunting, since it walks itself through the mixin hierarchy
    - These methods are not complicated, but big: each branch is simple: it's just that there's many of them
    - Might fan this logic out into the mixin definitions
Used parameter where a singular parameter is expected and parameters where a sequence is expected
- I have reservations about this
  - It makes the specifics of single/multi-parameter requirements obvious
  - Procedural instantiation is more verbose (like when testing), since you have to change the argument name depending on circumstance
    - But that can be solved with an auxillary constructor that's only for utilities

How to extend with new predicate

Introduce the appropriate class; for example, Greater; this includes defining the new Predicate's properties through mixins: ReliesOnComparability, SingleParameter, Leaf in this case (note that mixin order matters, see Python docs);

@frozen_dataclass
class Greater(ReliesOnComparability, SingleParameter, Leaf):
    type: LeafPredicateType = static(LeafPredicateType.GREATER)
    name: str = static("Greater")

    def to_sa_filter(self):
        return column(self.column) > self.parameter

Introduce the predicate type enum: LeafPredicateType.GREATER in this case; it's what the JSON filter spec will use to identify a predicate;
Tell it how to generate an equivalent SQLAlchemy filter by implementing the abstract Predicate::to_sa_filter (as seen above);
Update correctness definition (db/filters/base::assert_predicate_correct), if needed; currently this involves finding the spot in the method's control flow tree that corresponds to this new predicate, adding a new type check, etc.
Add new predicate to the all_predicates set;
Update mathesar.database.types::is_ma_type_supported_by_predicate, if needed; this would involve declaring what types have the properties the new predicate depends on: in this PR it's:

def _is_ma_type_comparable(ma_type: MathesarTypeIdentifier) -> bool:
    return ma_type in comparable_mathesar_types

def is_ma_type_supported_by_predicate(ma_type: MathesarTypeIdentifier, predicate: Type[Predicate]):
    if relies_on_comparability(predicate):
        return _is_ma_type_comparable(ma_type)
    else:
        return True

Notice that relies_on_comparability is an auxiliary function returning true when a predicate is a subclass of ReliesOnComparability.

`dataclasses` and `typing`

As Kriti suggested, I'll start a dedicated discussion on the use of dataclasses and typing.

But, to summarize:

dataclasses eliminated custom boilerplate; I got immutability, _post_init and defaults without writing a single constructor; I think this is great since it's essentially standardized boilerplate;
typing got me partial typing, which is great:
- in combination with my LSP server/client (pyright): catches a lot of mistakes and conflicts
- in that I can annotate when I think it's useful and not when I don't (partial typing)
- for readability, for example the above method is_ma_type_supported_by_predicate operates on uninstantiated predicate classes, not instances, so to express that I can just say predicate: Type[Predicate]) instead of predicate_subclass
- doesn't have a cost; another developer can just omit type or write Any if he prefers

dmos62 added 9 commits November 22, 2021 16:03

Intro dataclass hierarchy for modeling filtering

f714909

Also includes implementation for converting to SA filter spec.

Rename primitive to leaf and operator to branch

6bdb935

Rename subject to branch

90710c3

Introduce BranchType and ParameterType

b51cf6b

Refactor to not use mixins

1819c0b

Apparently dataclass defaults don't carry over from mixins.

Implement remaining predicates

639c14b

Revert to using mixins

5025f87

I was wrong. Apparently I just got the order of mixin application wrong.

Ammend fauxStatic comment

906cbd1

Rename fauxStatic to static

efd3dc2

There's nothing faux about it.

kgodey added the status: draft label Nov 23, 2021

dmos62 added 2 commits November 24, 2021 19:51

Rename parameterType to Count; other minor changes

aa9239e

Implement exposing filtering options through REST API

828814a

dmos62 added 8 commits November 25, 2021 14:32

Dead code

58a9432

switch to snake_case

7705bae

Make name more specific

7379d8c

Impl. parsing our custom filter spec to Predicate

ee1f01a

Rename Leaf.field to Leaf.column

c18e0fe

Move serialization and deserialization routines

74c6626

Rename function to something more specific

78bf19b

Add basic serialization test

311456e

Implement Predicate parameter constraints

201837f

kgodey reviewed Nov 26, 2021

View reviewed changes

Change filter api nomenclature

54e143a

Intro. rudimentary test that duplicate_only param is being routed pro…

f5f8edc

…perly

dmos62 added the status: draft label Dec 9, 2021

dmos62 mentioned this pull request Dec 10, 2021

Backend text filtering #879

Closed

7 tasks

kgodey marked this pull request as draft December 10, 2021 22:13

kgodey reviewed Dec 10, 2021

View reviewed changes

db/filters/base.py Show resolved Hide resolved

kgodey mentioned this pull request Dec 12, 2021

Range grouping 1 #862

Merged

7 tasks

dmos62 added 4 commits December 18, 2021 11:55

Factor out sqlalchemy-filters

7033be3

Backport error catching from newer PRs

f8ac768

Improve exception message

f84dbab

Implement referenced column existance check

ed7998d

kgodey modified the milestones: [06] Working with Tables, [07] Initial Data Types Dec 19, 2021

dmos62 added 2 commits December 20, 2021 11:46

Remove redundant Predicate mixins

405a6ea

Quick clean up

80055ab

dmos62 added 3 commits December 20, 2021 18:27

Merge remote-tracking branch 'origin/range_grouping' into backend-fil…

8f05a9f

…tering-numbers

Fix circular dependency caused by dead import

b2e78e8

Dead imports

2f0b63c

mathemancer reviewed Dec 20, 2021

View reviewed changes

db/filters/operations/deserialize.py Show resolved Hide resolved

db/filters/base.py Show resolved Hide resolved

db/filters/base.py Show resolved Hide resolved

db/filters/base.py Show resolved Hide resolved

db/filters/base.py Show resolved Hide resolved

kgodey reviewed Dec 21, 2021

View reviewed changes

mathesar/api/serializers/records.py Show resolved Hide resolved

kgodey reviewed Dec 21, 2021

View reviewed changes

mathesar/api/serializers/records.py Show resolved Hide resolved

dmos62 changed the base branch from master to range_grouping December 21, 2021 08:01

mathemancer deleted the branch mathesar-foundation:range_grouping December 22, 2021 11:03

mathemancer closed this Dec 22, 2021

dmos62 mentioned this pull request Dec 22, 2021

Backend filtering refactor #921

Closed

7 tasks

dmos62 removed this from the [07] Initial Data Types milestone Dec 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backend filtering refactor #837

Backend filtering refactor #837

dmos62 commented Nov 22, 2021 •

edited

Loading

kgodey commented Nov 23, 2021

dmos62 commented Nov 24, 2021

kgodey commented Nov 24, 2021 •

edited

Loading

dmos62 commented Nov 25, 2021 •

edited

Loading

kgodey commented Nov 26, 2021 •

edited

Loading

kgodey Nov 26, 2021

dmos62 Nov 28, 2021

dmos62 commented Nov 26, 2021

kgodey commented Nov 26, 2021

dmos62 commented Dec 9, 2021

kgodey left a comment

silentninja commented Dec 20, 2021

mathemancer left a comment

kgodey commented Dec 21, 2021 •

edited

Loading

kgodey commented Dec 21, 2021

dmos62 commented Dec 21, 2021 •

edited

Loading

Technical details

How to extend with new predicate

`dataclasses` and `typing`


		def not_empty(l): return len(l) > 0

		def assertPredicateCorrect(predicate):

Backend filtering refactor #837

Backend filtering refactor #837

Conversation

dmos62 commented Nov 22, 2021 • edited Loading

Checklist

My TODO

Developer Certificate of Origin

kgodey commented Nov 23, 2021

dmos62 commented Nov 24, 2021

kgodey commented Nov 24, 2021 • edited Loading

dmos62 commented Nov 25, 2021 • edited Loading

kgodey commented Nov 26, 2021 • edited Loading

kgodey Nov 26, 2021

Choose a reason for hiding this comment

dmos62 Nov 28, 2021

Choose a reason for hiding this comment

dmos62 commented Nov 26, 2021

kgodey commented Nov 26, 2021

dmos62 commented Dec 9, 2021

kgodey left a comment

Choose a reason for hiding this comment

silentninja commented Dec 20, 2021

mathemancer left a comment

Choose a reason for hiding this comment

kgodey commented Dec 21, 2021 • edited Loading

kgodey commented Dec 21, 2021

dmos62 commented Dec 21, 2021 • edited Loading

Writeup

Technical details

How to extend with new predicate

dataclasses and typing

dmos62 commented Nov 22, 2021 •

edited

Loading

kgodey commented Nov 24, 2021 •

edited

Loading

dmos62 commented Nov 25, 2021 •

edited

Loading

kgodey commented Nov 26, 2021 •

edited

Loading

kgodey commented Dec 21, 2021 •

edited

Loading

dmos62 commented Dec 21, 2021 •

edited

Loading

`dataclasses` and `typing`