Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring Query objects #62

Closed
eugene-eeo opened this issue Jun 5, 2015 · 10 comments
Labels
Milestone

Comments

@eugene-eeo
Copy link
Contributor

@eugene-eeo eugene-eeo commented Jun 5, 2015

This is a proposal for a much simpler Query model that can be used, which essentially enables us to do the following, even without any advanced metaclass or inheritance magic:

User = Query()
db.insert({'metadata': {'id': 1}})
db.find(User.metadata.id == 1)

Which is very SQLAlchemy-like and what one would expect, and I would say even more intuitive than the current style/model. The following is a simple implementation that only handles the very basic:

import operator

class Query(object):
    def __init__(self, function=None):
        self.function = function or (lambda k: k)

    def __getattr__(self, attr):
        return Query(operator.itemgetter(attr))

    def __eq__(self, value):
        return Query(lambda k: self.validate(k) == value)

    def validate(self, value):
        return self.function(value)

The advantages over the current style is perhaps that this is much more intuitive and simple. We don't need to have a new class for every special case. It is also simpler to generate the intermediate query expressions so that they can be cached, instead of the rather primitive or hacky string representations that we are having at the moment, which is error prone and needs us to worry about Unicode etc.

@msiemens

This comment has been minimized.

Copy link
Owner

@msiemens msiemens commented Jun 5, 2015

Having a single class which handles all the query syntax would definitely look better than the current implementation. However I'm still not sure how the __hash__ method would look like. Have you already some ideas on that?

@eugene-eeo

This comment has been minimized.

Copy link
Contributor Author

@eugene-eeo eugene-eeo commented Jun 6, 2015

Probably via the generation of some namedtuples, e.g.

BinExpr = namedtuple('BinExpr', 'operator,a,b')
User.id == 1 # => BinExpr('==', 'id', 1)
(User.id == 1) & (User.age >= 10) #=> BinExpr('and', BinExpr('==', 'id', 1), BinExpr('>=', 'age', 10))

Advantage over strings is once again, that it is less fragile and can lend itself well to really nested queries.

Using this kind of generation may provide some base for some really nice optimisations via meta-compilation (which is essentially compiling a single function which handles the query without having to resort to multiple function calls, i.e., does the "big picture") in the future.

@msiemens

This comment has been minimized.

Copy link
Owner

@msiemens msiemens commented Jun 6, 2015

That's a great idea! The big question now is whether we should continue to support the current syntax (where('User').has('metadata).has('id') == 1). I'd propably prefer to see it gone, but I'm not sure how others feel about that.

@eugene-eeo

This comment has been minimized.

Copy link
Contributor Author

@eugene-eeo eugene-eeo commented Jun 6, 2015

Make deprecation warnings first, then remove it in a 3.x release. Although I'd imagine that you probably need to do some serious counselling persuading in order for the new syntax to "stick". In the meantime we would probably need to migrate the current caching system to the new tuple-based one.

@danielmartins

This comment has been minimized.

Copy link

@danielmartins danielmartins commented Aug 12, 2015

👍

@msiemens msiemens added this to the v3.0 milestone Aug 17, 2015
msiemens added a commit that referenced this issue Sep 17, 2015
@msiemens

This comment has been minimized.

Copy link
Owner

@msiemens msiemens commented Sep 17, 2015

I've now finally found time to work on the new Query model. An implementation can be found in a211459. Notable changes include:

  • queries.py is now only 190 lines of code (before: around 620 lines). We only declare 2 classes (before: 8). Also the whole implemenation is more straightforward. A clear win!
  • where('...').contains('...') has been renamed to where('...').search('...') to indicate that it works on regexes
  • where('foo') is an alias for Query().foo
  • where('foo').has('bar') is replaced by either where('foo').bar or Query().foo.bar
    • In case the key is not a valid Python identifier, array notation can be used: where('a.b.c') is now Query()['a.b.c']
  • Checking for the existence of a key has to be done explicitely: where('foo').exists()

While using Query() may seem odd at first, it's quite elegant when used as a variable like this:

User = Query()
db.find(User.name == 'John Doe')

What do you guys think about the implementation?

@msiemens msiemens referenced this issue Sep 17, 2015
Merged
9 of 9 tasks complete
@eugene-eeo

This comment has been minimized.

Copy link
Contributor Author

@eugene-eeo eugene-eeo commented Sep 18, 2015

According to the hashval attribute, the object being compared to has to be hashable. I would much prefer it if some nodes were generated instead of just hashing the right value.

@msiemens

This comment has been minimized.

Copy link
Owner

@msiemens msiemens commented Sep 18, 2015

@eugene-eeo Can you explain in what way nodes should be generated?

@eugene-eeo

This comment has been minimized.

Copy link
Contributor Author

@eugene-eeo eugene-eeo commented Sep 18, 2015

Probably via the generation of some namedtuples, e.g.

 BinExpr = namedtuple('BinExpr', 'operator,a,b')
 User.id == 1 # => BinExpr('==', 'id', 1)
 (User.id == 1) & (User.age >= 10) #=> BinExpr('and', BinExpr('==', 'id', 1), BinExpr('>=', 'age', 10))

Advantage over strings is once again, that it is less fragile and can lend itself well to really nested queries.

Using this kind of generation may provide some base for some really nice optimisations via meta-compilation (which is essentially compiling a single function which handles the query without having to resort to multiple function calls, i.e., does the "big picture") in the future.

@msiemens

This comment has been minimized.

Copy link
Owner

@msiemens msiemens commented Sep 18, 2015

That's what I'm doing now. Only I'm using normal tuples instead of namedtuples.

@msiemens msiemens closed this in #69 Nov 13, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.