-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
initial commit of MongoDB backend #2
Conversation
f2cd190
to
d154f74
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving not to hold up merge.
Will retroactively ask more questions as things progress!
Great work thus far!
if attr in ["connection", "database"]: | ||
assert not self.connected | ||
self._connect() | ||
return getattr(self, attr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would we want connection
or database
raise an attribute error when the class has established a connection?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you may have read it wrong. AttributeError is raised for any attribute besides connection and database. The purpose of this code is the connect to the database whenever connection
or database
are first accessed. This implementation is copied from Django MongoDB Engine and is different from other SQL backends. It may need to be reworked in the future or there might be good reasons for keeping it this way.
if not any(k in ["save", "delete", "update"] for k in self.operation_flags): | ||
# Flags apply to all operations. | ||
flags = self.operation_flags | ||
self.operation_flags = {"save": flags, "delete": flags, "update": flags} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iiuc, this will completely override ignore any flags for operations outside of save
, delete
, update
. If these are the only three options available, that's fine, but if there are other operations that get flags and need to be set, is it okay to do that here? For instance is {"OPERATIONS": {'read': flags}} was set, this would be ignored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code was copied uncritically from Django MongoDB Engine. See documentation: https://django-mongodb-engine.readthedocs.io/en/latest/reference/settings.html#acknowledged-operations. If it looks deficient, we can omit it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TL;DR:
- We'll leave it for now, but can you leave a comment under the function to re-evaluate? Let it be a bonafide JIRA ticket for ease of tracking.
Got it. I don't know the best strategy here yet, so we'll pencil it for now. I think propagating the same flags to each write operation is fine, it's really the fact that we can propagate other flags and would want to keep them. The docs you linked (and thank you for linking them) show me that this was really meant for just save/delete/update operations at first.
django_mongodb/base.py
Outdated
"document_class": dict, | ||
"tz_aware": False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"document_class": dict, | |
"tz_aware": False, |
These are the default connection options, so I'd say unless the options
provide changes we can remove this part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will remove.
if user and password and not self.database.authenticate(user, password): | ||
raise ImproperlyConfigured("Invalid username or password.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: Since mongodb usually takes a URI and aforementioned URI generally has the user & password already provided, I'm wondering if having these pieces of information are even worthwhile because any secure cloud connection needs the username and password already in the URI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. This code is copied from Django MongoDB Engine. We can discuss it more later. Django doesn't normally accept a URL for its configuration, but there's a third-party package to allow it: https://pypi.org/project/dj-database-url/
def cursor(self): | ||
return Cursor() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keeping this comment here just as a note for later.
Need to understand the differences between a Django Cursor and a MongoDB Cursor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a quick hack to prevent the need to override some Django internals where a "nodb" cursor is created but (for MongoDB purposes) not used. It may or may not stay around long-term.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to find more information on cursors in django, any good documentation reference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's up to the database driver (psycopg, mysqlclient, etc.) to implement, but the interface is described in PEP 249.
44e490b
to
2be3bc0
Compare
|
||
def __getattr__(self, attr): | ||
if attr in ["connection", "database"]: | ||
assert not self.connected |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not getting the point. I think it should be 'if' instead of 'assert'. Why should it fail if I call 'self.connection' twice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__getattr__()
is only called the first time when the attribute doesn't exist.
from .query import MongoQuery, safe_call | ||
|
||
|
||
class SQLCompiler(compiler.SQLCompiler): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think It would be good to rename this class, I know that some of the methods are public interface so we can't override, execute_sql is one of the thing that We cannot rename, can we?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jib made the same comment shortly after you. Here is my reply in case you missed that thread: These names are names are set by convention (hardcoded) in Django's code, e.g. https://github.com/django/django/blob/85c154da2f07485a1cdc4d886eee4c1a1ef56137/django/db/models/sql/query.py#L226
django_mongodb/compiler.py
Outdated
Queries requiring JOINs (many-to-many relations, abstract model bases, | ||
or model spanning filtering), using DISTINCT (QuerySet.distinct()) or | ||
using the SQL-specific QuerySet.extra() aren't supported. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should make sure there's explicit documentation on the queries we will NOT be supporting. We do have an equivalent for JOIN behavior, but I think that should be deferred to later development.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a few things to the README's" Known issues and limitations" section. In time it'll become clearer what we can and cannot support.
from django.utils.translation import gettext_lazy as _ | ||
|
||
|
||
class MongoAutoField(AutoField): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So is this automatically generating an ObjectId? Or is it remaining empty UNTIL the database write occurs and an ObjectId is returned?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latter
return getattr(collection, method)(criteria, update_spec, **options).matched_count | ||
|
||
|
||
class SQLAggregateCompiler(SQLCompiler): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not support annotations ($group
or group by
in SQL), does it?
If It so, are we going to support it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'll try to support at least some simple aggregations, but I think it'll be complicated. I haven't looked into the details yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, aggregate operations are pretty interesting and need a full design evaluation. I'd say we defer this and focus on the aforementioned compilers for now. Leaving this class makes sense.
@classmethod | ||
def settings_to_cmd_args_env(cls, settings_dict, parameters): | ||
raise NotImplementedError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's raise an issue for this ticket as well. Not sure to what extent this will get used, but I do see this as the proxy for making any command line calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last section to look at is query.py but I've left more comments. Let me know what you think. Great work from what I've seen!
def execute_sql( | ||
self, result_type=MULTI, chunked_fetch=False, chunk_size=GET_ITERATOR_CHUNK_SIZE | ||
): | ||
self.pre_sql_setup() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Read into this step. This preps the query to have:
SELECT
WHERE
HAVING
QUALIFY
ORDERBY
GROUPBY
and order it if available.
I think this function could be more useful down the line as of right now, I don't think it contributes to any functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Django's SQL compilers have a function with the same name that each compiler calls. This backend overrides it and each custom compiler similarly calls it.
django_mongodb/compiler.py
Outdated
converters = self.connection.ops.get_db_converters(col) + field.get_db_converters( | ||
self.connection | ||
) | ||
for converter in converters: | ||
value = converter(value, col, self.connection) | ||
result.append(value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you know if these get_db_converter
calls are mutually exclusive? Or is it that it should always convert field.get_db_converters
as an override to the connection.ops.get_db_converters
.
Noting to see if there would be a performance win by making this an if statement that calls convert if one OR the other is present rather than a for loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was on my mind to optimize this a bit more. Take a look at commit "fetch database converters once per query rather than for each result." To answer your question, there could be both types of converters. The logic comes from django.db.models.sql.SQLCompiler.get_converters()
.
django_mongodb/compiler.py
Outdated
if len(doc) == 1: | ||
# insert with empty model | ||
doc.clear() | ||
else: | ||
raise DatabaseError("Can't save entity with _id set to None") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this, the case where someone does:
{_id: None}
will upload to a collection as {}
--> {_id: ObjectId}
Should we not just raise the DatabaseError
regardless?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll also note that _id
doesn't have to be the exclusive primary key in every collection, but that's a fix to put in later. This should probably have some class specific assignment such as self.pk or self.index
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment one: This code seems to have been obsoleted since its introduction. The test now passes without it.
Comment two: Django doesn't support composite primary keys.
django_mongodb/compiler.py
Outdated
if returning_fields: | ||
return [collection.insert_one(doc, **options).inserted_id] | ||
|
||
collection.insert_one(doc, **options) | ||
return [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if returning_fields: | |
return [collection.insert_one(doc, **options).inserted_id] | |
collection.insert_one(doc, **options) | |
return [] | |
inserted_id = collection.insert_one(doc, **options).inserted_id | |
return [inserted_id] if returning_fields else [] |
return getattr(collection, method)(criteria, update_spec, **options).matched_count | ||
|
||
|
||
class SQLAggregateCompiler(SQLCompiler): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, aggregate operations are pretty interesting and need a full design evaluation. I'd say we defer this and focus on the aforementioned compilers for now. Leaving this class makes sense.
if field.unique: | ||
multi = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Outside of this, is there any way to disambiguate updating a unique field? I would have done so by specifying the id, but I can't tell where I'd put the find vs. the update part in the build_query
step. I'll come back to this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks to me like something like Model.objects.filter(id=1).update(name="foo")
will use update_many
. We could certainly try to take the filtering fields into account but is the complication worth it?
Actually, I'm not certain this current logic that uses field.unique
is appropriate.
Under the current logic, if name
is unique, then Model.objects.update(name="foo")
will update just one (random) document. In SQL-land that QuerySet tries to update all rows but fails and raises IntegrityError
due to violation of the unique constraint. Created #14 for this bug.
django_mongodb/operations.py
Outdated
return value | ||
if lookup in ("in", "range", "year"): | ||
return [ | ||
self._prep_lookup_value(subvalue, field, field_kind, db_type, lookup) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would actually expect this to call a mixture of self.prep_lookup_value
rather than self._prep_lookup_value
just to capture all the possible field-level conversions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DatabaseOperations.prep_lookup_value()
is called with value
as a single value or a list of values. _prep_lookup_value()
processes each value.
prep_lookup_value()
isn't a standard Django method. It's what's left of Mongo Engine's DatabaseOperations._value_from_db(). I started out copying the entire thing but later found much of it was unnecessary. It may be expanded or obsoleted as we continue to triage the Django test suite.
django_mongodb/query.py
Outdated
def safe_call(func): | ||
@wraps(func) | ||
def wrapper(*args, **kwargs): | ||
try: | ||
return func(*args, **kwargs) | ||
except DuplicateKeyError as e: | ||
raise IntegrityError from e | ||
except PyMongoError as e: | ||
raise DatabaseError from e | ||
|
||
return wrapper |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious why the naming of safe_call
is the purpose just to communicate that it's raising an error that won't crash the backend server?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name is copied from the original Mongo Engine but it's not a good one. I will rename to wrap_database_errors
which is the same name used for a similar purpose in SQL backends.
"lte": lambda val: {"$lte": val}, | ||
"in": lambda val: {"$in": val}, | ||
"range": lambda val: {"$gte": val[0], "$lte": val[1]}, | ||
"isnull": lambda val: None if val else {"$ne": None}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be {"$eq": None}
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I tested, the queries {'field': None}
and {'field': {'$eq': None}}
give the same results. Is that your concern?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I misread the output. Yes, that's fine.
So showing {"$ne": None}
probably occurs when the query evaluation is something like...
Models.object.filter(field__isnull=False)
Something to note:
I know in MongoDB the query for { bar: None }
Will surface
{
_id: 1234,
team: "bar"
},
{
_id: 1235,
team: "baz",
bar: None
}
in its results. In the relational world, this isn't an issue because all fields exist; this is doing the same thing, functionally, that its relational counterpart will do, but the field doesn't exist which has historically been a problem for some ODMs that expect the field. No changes needed, just wanted to note the logical behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last of the comments!
The last function that looks like it needs clarification is MongoQuery._decode_child
as it works with a lot of assumptions about the data structure coming through
django_mongodb/query.py
Outdated
if lookup_type in ("month", "day"): | ||
raise DatabaseError("MongoDB does not support month/day queries.") | ||
if self._negated and lookup_type == "range": | ||
raise DatabaseError("Negated range lookups are not supported.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we have these be part of the lambda functions? The lambda result is raising the DatabaseError, etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
month/day is dead code. In modern versions of Django, an error is raised elsewhere for these lookups (and will be fixed in #9).
And I'm not sure why the restriction about range queries was added. Perhaps MongoDB has added support since 2011.
IntegerModel.objects.exclude(integer__range=(2, 4))
translates to {'integer': {'$not': {'$gte': 2, '$lte': 4}}}
which seems to work just fine.
if self._negated and lookup_type in NEGATED_OPERATORS_MAP: | ||
op_func = NEGATED_OPERATORS_MAP[lookup_type] | ||
already_negated = True | ||
else: | ||
op_func = OPERATORS_MAP[lookup_type] | ||
if self._negated: | ||
already_negated = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking here for understanding:
If we're negating an operation -- using NOT
-- this will check that the lookup_type is supported and then say we've run a negation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct
cursor.limit(int(self.query.high_mark - self.query.low_mark)) | ||
return cursor | ||
|
||
def add_filters(self, filters, query=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should cut a task to break up and simplify this function. It's far too lengthy and confusing, especially as we get into the negation logic. It isn't encapsulated well, so it gets hard to remember what was happening previously.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
created #13
django_mongodb/query.py
Outdated
if not isinstance(existing, dict): | ||
if not self._negated: | ||
# {'a': o1} + {'a': o2} --> {'a': {'$all': [o1, o2]}} | ||
assert not isinstance(lookup, dict) | ||
subquery[column] = {"$all": [existing, lookup]} | ||
else: | ||
# {'a': o1} + {'a': {'$not': o2}} --> {'a': {'$all': [o1], '$nin': [o2]}} | ||
if already_negated: | ||
assert lookup.keys() == ["$ne"] | ||
lookup = lookup["$ne"] | ||
assert not isinstance(lookup, dict) | ||
subquery[column] = {"$all": [existing], "$nin": [lookup]} | ||
else: | ||
not_ = existing.pop("$not", None) | ||
if not_: | ||
assert not existing | ||
if isinstance(lookup, dict): | ||
assert lookup.keys() == ["$ne"] | ||
lookup = lookup.values()[0] | ||
assert not isinstance(lookup, dict), (not_, lookup) | ||
if self._negated: | ||
# {'not': {'a': o1}} + {'a': {'not': o2}} --> {'a': {'nin': [o1, o2]}} | ||
subquery[column] = {"$nin": [not_, lookup]} | ||
else: | ||
# {'not': {'a': o1}} + {'a': o2} --> | ||
# {'a': {'nin': [o1], 'all': [o2]}} | ||
subquery[column] = {"$nin": [not_], "$all": [lookup]} | ||
else: | ||
if isinstance(lookup, dict): | ||
if "$ne" in lookup: | ||
if "$nin" in existing: | ||
# {'$nin': [o1, o2]} + {'$ne': o3} --> {'$nin': [o1, o2, o3]} | ||
assert "$ne" not in existing | ||
existing["$nin"].append(lookup["$ne"]) | ||
elif "$ne" in existing: | ||
# {'$ne': o1} + {'$ne': o2} --> {'$nin': [o1, o2]} | ||
existing["$nin"] = [existing.pop("$ne"), lookup["$ne"]] | ||
else: | ||
existing.update(lookup) | ||
else: | ||
if "$in" in lookup and "$in" in existing: | ||
# {'$in': o1} + {'$in': o2} --> {'$in': o1 union o2} | ||
existing["$in"] = list(set(lookup["$in"] + existing["$in"])) | ||
else: | ||
# {'$gt': o1} + {'$lt': o2} --> {'$gt': o1, '$lt': o2} | ||
assert all(key not in existing for key in lookup), [ | ||
lookup, | ||
existing, | ||
] | ||
existing.update(lookup) | ||
else: | ||
key = "$nin" if self._negated else "$all" | ||
existing.setdefault(key, []).append(lookup) | ||
|
||
if filters.connector == OR and subquery: | ||
or_conditions.append(subquery) | ||
|
||
if filters.negated: | ||
self._negated = not self._negated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, I think the logic here works, but I think we should heavily consider refactoring this for readability.
- Placing it in its own function with a clear statement of function
- Adding docstring to explain what it's doing (combining queries and handling negation logic)
django_mongodb/query.py
Outdated
# Remove percent signs added by Field.get_db_prep_lookup() for LIKE | ||
# queries. | ||
if lookup_type in ("startswith", "istartswith"): | ||
value = value[:-1] | ||
elif lookup_type in ("endswith", "iendswith"): | ||
value = value[1:] | ||
elif lookup_type in ("contains", "icontains"): | ||
value = value[1:-1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably still check if there's a percent sign there. That or could we subclass Field
to remove this layer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While it wouldn't be much extra computation to be defensive, I can't imagine a way it could break, especially in a way that wouldn't be caught by the test suite.
The comment is actually out of date. The percent signs are now added by PatternLookup.process_rhs(). I don't think we can monkey patch that without breaking usage of this backend alongside a SQL backend (and the same comment applies if the logic was in Field.get_db_prep_lookup()
).
69e5f46
to
8ce323c
Compare
🎉 thanks @timgraham! |
No description provided.