initial commit of MongoDB backend #2

timgraham · 2024-04-22T13:27:54Z

No description provided.

Jibola

Approving not to hold up merge.
Will retroactively ask more questions as things progress!

Great work thus far!

Jibola · 2024-04-26T16:53:44Z

django_mongodb/base.py

+        if attr in ["connection", "database"]:
+            assert not self.connected
+            self._connect()
+            return getattr(self, attr)


Why would we want connection or database raise an attribute error when the class has established a connection?

I think you may have read it wrong. AttributeError is raised for any attribute besides connection and database. The purpose of this code is the connect to the database whenever connection or database are first accessed. This implementation is copied from Django MongoDB Engine and is different from other SQL backends. It may need to be reworked in the future or there might be good reasons for keeping it this way.

Jibola · 2024-04-26T17:13:24Z

django_mongodb/base.py

+        if not any(k in ["save", "delete", "update"] for k in self.operation_flags):
+            # Flags apply to all operations.
+            flags = self.operation_flags
+            self.operation_flags = {"save": flags, "delete": flags, "update": flags}


iiuc, this will completely override ignore any flags for operations outside of save, delete, update. If these are the only three options available, that's fine, but if there are other operations that get flags and need to be set, is it okay to do that here? For instance is {"OPERATIONS": {'read': flags}} was set, this would be ignored.

This code was copied uncritically from Django MongoDB Engine. See documentation: https://django-mongodb-engine.readthedocs.io/en/latest/reference/settings.html#acknowledged-operations. If it looks deficient, we can omit it.

TL;DR:

We'll leave it for now, but can you leave a comment under the function to re-evaluate? Let it be a bonafide JIRA ticket for ease of tracking.

Got it. I don't know the best strategy here yet, so we'll pencil it for now. I think propagating the same flags to each write operation is fine, it's really the fact that we can propagate other flags and would want to keep them. The docs you linked (and thank you for linking them) show me that this was really meant for just save/delete/update operations at first.

Jibola · 2024-04-26T17:17:09Z

django_mongodb/base.py

+            "document_class": dict,
+            "tz_aware": False,


Suggested change

"document_class": dict,

"tz_aware": False,

These are the default connection options, so I'd say unless the options provide changes we can remove this part.

Will remove.

Jibola · 2024-04-26T17:19:50Z

django_mongodb/base.py

+        if user and password and not self.database.authenticate(user, password):
+            raise ImproperlyConfigured("Invalid username or password.")


NIT: Since mongodb usually takes a URI and aforementioned URI generally has the user & password already provided, I'm wondering if having these pieces of information are even worthwhile because any secure cloud connection needs the username and password already in the URI.

Okay. This code is copied from Django MongoDB Engine. We can discuss it more later. Django doesn't normally accept a URL for its configuration, but there's a third-party package to allow it: https://pypi.org/project/dj-database-url/

Jibola · 2024-04-26T17:22:15Z

django_mongodb/base.py

+    def cursor(self):
+        return Cursor()


Keeping this comment here just as a note for later.
Need to understand the differences between a Django Cursor and a MongoDB Cursor.

This was a quick hack to prevent the need to override some Django internals where a "nodb" cursor is created but (for MongoDB purposes) not used. It may or may not stay around long-term.

I'm trying to find more information on cursors in django, any good documentation reference?

It's up to the database driver (psycopg, mysqlclient, etc.) to implement, but the interface is described in PEP 249.

WaVEV · 2024-04-28T21:10:10Z

django_mongodb/base.py

+
+    def __getattr__(self, attr):
+        if attr in ["connection", "database"]:
+            assert not self.connected


I am not getting the point. I think it should be 'if' instead of 'assert'. Why should it fail if I call 'self.connection' twice?

__getattr__() is only called the first time when the attribute doesn't exist.

WaVEV · 2024-04-29T16:45:19Z

django_mongodb/compiler.py

+from .query import MongoQuery, safe_call
+
+
+class SQLCompiler(compiler.SQLCompiler):


I think It would be good to rename this class, I know that some of the methods are public interface so we can't override, execute_sql is one of the thing that We cannot rename, can we?

Jib made the same comment shortly after you. Here is my reply in case you missed that thread: These names are names are set by convention (hardcoded) in Django's code, e.g. https://github.com/django/django/blob/85c154da2f07485a1cdc4d886eee4c1a1ef56137/django/db/models/sql/query.py#L226

django_mongodb/compiler.py

Jibola · 2024-04-29T16:07:26Z

django_mongodb/compiler.py

+        Queries requiring JOINs (many-to-many relations, abstract model bases,
+        or model spanning filtering), using DISTINCT (QuerySet.distinct()) or
+        using the SQL-specific QuerySet.extra() aren't supported.


We should make sure there's explicit documentation on the queries we will NOT be supporting. We do have an equivalent for JOIN behavior, but I think that should be deferred to later development.

I added a few things to the README's" Known issues and limitations" section. In time it'll become clearer what we can and cannot support.

Jibola · 2024-04-29T16:28:31Z

django_mongodb/fields.py

+from django.utils.translation import gettext_lazy as _
+
+
+class MongoAutoField(AutoField):


So is this automatically generating an ObjectId? Or is it remaining empty UNTIL the database write occurs and an ObjectId is returned?

django_mongodb/features.py

WaVEV · 2024-04-29T17:28:40Z

django_mongodb/compiler.py

+        return getattr(collection, method)(criteria, update_spec, **options).matched_count
+
+
+class SQLAggregateCompiler(SQLCompiler):


It does not support annotations ($group or group by in SQL), does it?
If It so, are we going to support it?

I think we'll try to support at least some simple aggregations, but I think it'll be complicated. I haven't looked into the details yet.

Yeah, aggregate operations are pretty interesting and need a full design evaluation. I'd say we defer this and focus on the aforementioned compilers for now. Leaving this class makes sense.

Jibola · 2024-04-30T18:45:28Z

django_mongodb/client.py

+    @classmethod
+    def settings_to_cmd_args_env(cls, settings_dict, parameters):
+        raise NotImplementedError


Let's raise an issue for this ticket as well. Not sure to what extent this will get used, but I do see this as the proxy for making any command line calls.

It is #3. During our last call, I suggested this as a good starter task for @WaVEV.

Jibola

Last section to look at is query.py but I've left more comments. Let me know what you think. Great work from what I've seen!

Jibola · 2024-05-01T13:34:09Z

django_mongodb/compiler.py

+    def execute_sql(
+        self, result_type=MULTI, chunked_fetch=False, chunk_size=GET_ITERATOR_CHUNK_SIZE
+    ):
+        self.pre_sql_setup()


Read into this step. This preps the query to have:
SELECT
WHERE
HAVING
QUALIFY
ORDERBY
GROUPBY

and order it if available.

I think this function could be more useful down the line as of right now, I don't think it contributes to any functionality.

Django's SQL compilers have a function with the same name that each compiler calls. This backend overrides it and each custom compiler similarly calls it.

Jibola · 2024-05-01T14:12:41Z

django_mongodb/compiler.py

+                converters = self.connection.ops.get_db_converters(col) + field.get_db_converters(
+                    self.connection
+                )
+                for converter in converters:
+                    value = converter(value, col, self.connection)
+            result.append(value)


do you know if these get_db_converter calls are mutually exclusive? Or is it that it should always convert field.get_db_converters as an override to the connection.ops.get_db_converters.

Noting to see if there would be a performance win by making this an if statement that calls convert if one OR the other is present rather than a for loop.

It was on my mind to optimize this a bit more. Take a look at commit "fetch database converters once per query rather than for each result." To answer your question, there could be both types of converters. The logic comes from django.db.models.sql.SQLCompiler.get_converters().

Jibola · 2024-05-01T14:42:46Z

django_mongodb/compiler.py

+                if len(doc) == 1:
+                    # insert with empty model
+                    doc.clear()
+                else:
+                    raise DatabaseError("Can't save entity with _id set to None")


In this, the case where someone does:
{_id: None} will upload to a collection as {} --> {_id: ObjectId}

Should we not just raise the DatabaseError regardless?

I'll also note that _id doesn't have to be the exclusive primary key in every collection, but that's a fix to put in later. This should probably have some class specific assignment such as self.pk or self.index

Comment one: This code seems to have been obsoleted since its introduction. The test now passes without it.

Comment two: Django doesn't support composite primary keys.

Jibola · 2024-05-01T14:48:32Z

django_mongodb/compiler.py

+        if returning_fields:
+            return [collection.insert_one(doc, **options).inserted_id]
+
+        collection.insert_one(doc, **options)
+        return []


Suggested change

if returning_fields:

return [collection.insert_one(doc, **options).inserted_id]

collection.insert_one(doc, **options)

return []

inserted_id = collection.insert_one(doc, **options).inserted_id

return [inserted_id] if returning_fields else []

Jibola · 2024-05-01T15:00:07Z

django_mongodb/compiler.py

+        return getattr(collection, method)(criteria, update_spec, **options).matched_count
+
+
+class SQLAggregateCompiler(SQLCompiler):


Yeah, aggregate operations are pretty interesting and need a full design evaluation. I'd say we defer this and focus on the aforementioned compilers for now. Leaving this class makes sense.

Jibola · 2024-05-01T15:02:13Z

django_mongodb/compiler.py

+            if field.unique:
+                multi = False


Outside of this, is there any way to disambiguate updating a unique field? I would have done so by specifying the id, but I can't tell where I'd put the find vs. the update part in the build_query step. I'll come back to this.

It looks to me like something like Model.objects.filter(id=1).update(name="foo") will use update_many. We could certainly try to take the filtering fields into account but is the complication worth it?

Actually, I'm not certain this current logic that uses field.unique is appropriate.
Under the current logic, if name is unique, then Model.objects.update(name="foo") will update just one (random) document. In SQL-land that QuerySet tries to update all rows but fails and raises IntegrityError due to violation of the unique constraint. Created #14 for this bug.

Jibola · 2024-05-01T18:00:06Z

django_mongodb/operations.py

+            return value
+        if lookup in ("in", "range", "year"):
+            return [
+                self._prep_lookup_value(subvalue, field, field_kind, db_type, lookup)


I would actually expect this to call a mixture of self.prep_lookup_value rather than self._prep_lookup_value just to capture all the possible field-level conversions

DatabaseOperations.prep_lookup_value() is called with value as a single value or a list of values. _prep_lookup_value() processes each value.

prep_lookup_value() isn't a standard Django method. It's what's left of Mongo Engine's DatabaseOperations._value_from_db(). I started out copying the entire thing but later found much of it was unnecessary. It may be expanded or obsoleted as we continue to triage the Django test suite.

Jibola · 2024-05-01T18:43:37Z

django_mongodb/query.py

+def safe_call(func):
+    @wraps(func)
+    def wrapper(*args, **kwargs):
+        try:
+            return func(*args, **kwargs)
+        except DuplicateKeyError as e:
+            raise IntegrityError from e
+        except PyMongoError as e:
+            raise DatabaseError from e
+
+    return wrapper


Curious why the naming of safe_call is the purpose just to communicate that it's raising an error that won't crash the backend server?

The name is copied from the original Mongo Engine but it's not a good one. I will rename to wrap_database_errors which is the same name used for a similar purpose in SQL backends.

Jibola · 2024-05-01T19:28:06Z

django_mongodb/query.py

+    "lte": lambda val: {"$lte": val},
+    "in": lambda val: {"$in": val},
+    "range": lambda val: {"$gte": val[0], "$lte": val[1]},
+    "isnull": lambda val: None if val else {"$ne": None},


should this be {"$eq": None}?

As far as I tested, the queries {'field': None} and {'field': {'$eq': None}} give the same results. Is that your concern?

Oh, I misread the output. Yes, that's fine.

So showing {"$ne": None} probably occurs when the query evaluation is something like...
Models.object.filter(field__isnull=False)

Something to note:
I know in MongoDB the query for { bar: None }
Will surface

{ _id: 1234, team: "bar" }, { _id: 1235, team: "baz", bar: None }

in its results. In the relational world, this isn't an issue because all fields exist; this is doing the same thing, functionally, that its relational counterpart will do, but the field doesn't exist which has historically been a problem for some ODMs that expect the field. No changes needed, just wanted to note the logical behavior.

Jibola

Last of the comments!

The last function that looks like it needs clarification is MongoQuery._decode_child as it works with a lot of assumptions about the data structure coming through

Jibola · 2024-05-02T17:43:31Z

django_mongodb/query.py

+            if lookup_type in ("month", "day"):
+                raise DatabaseError("MongoDB does not support month/day queries.")
+            if self._negated and lookup_type == "range":
+                raise DatabaseError("Negated range lookups are not supported.")


Could we have these be part of the lambda functions? The lambda result is raising the DatabaseError, etc?

month/day is dead code. In modern versions of Django, an error is raised elsewhere for these lookups (and will be fixed in #9).

And I'm not sure why the restriction about range queries was added. Perhaps MongoDB has added support since 2011.
IntegerModel.objects.exclude(integer__range=(2, 4)) translates to {'integer': {'$not': {'$gte': 2, '$lte': 4}}} which seems to work just fine.

Jibola · 2024-05-02T17:50:58Z

django_mongodb/query.py

+            if self._negated and lookup_type in NEGATED_OPERATORS_MAP:
+                op_func = NEGATED_OPERATORS_MAP[lookup_type]
+                already_negated = True
+            else:
+                op_func = OPERATORS_MAP[lookup_type]
+                if self._negated:
+                    already_negated = False


Checking here for understanding:
If we're negating an operation -- using NOT-- this will check that the lookup_type is supported and then say we've run a negation.

Jibola · 2024-05-02T17:53:42Z

django_mongodb/query.py

+            cursor.limit(int(self.query.high_mark - self.query.low_mark))
+        return cursor
+
+    def add_filters(self, filters, query=None):


We should cut a task to break up and simplify this function. It's far too lengthy and confusing, especially as we get into the negation logic. It isn't encapsulated well, so it gets hard to remember what was happening previously.

created #13

Jibola · 2024-05-02T19:10:24Z

django_mongodb/query.py

+            if not isinstance(existing, dict):
+                if not self._negated:
+                    # {'a': o1} + {'a': o2} --> {'a': {'$all': [o1, o2]}}
+                    assert not isinstance(lookup, dict)
+                    subquery[column] = {"$all": [existing, lookup]}
+                else:
+                    # {'a': o1} + {'a': {'$not': o2}} --> {'a': {'$all': [o1], '$nin': [o2]}}
+                    if already_negated:
+                        assert lookup.keys() == ["$ne"]
+                        lookup = lookup["$ne"]
+                    assert not isinstance(lookup, dict)
+                    subquery[column] = {"$all": [existing], "$nin": [lookup]}
+            else:
+                not_ = existing.pop("$not", None)
+                if not_:
+                    assert not existing
+                    if isinstance(lookup, dict):
+                        assert lookup.keys() == ["$ne"]
+                        lookup = lookup.values()[0]
+                    assert not isinstance(lookup, dict), (not_, lookup)
+                    if self._negated:
+                        # {'not': {'a': o1}} + {'a': {'not': o2}} --> {'a': {'nin': [o1, o2]}}
+                        subquery[column] = {"$nin": [not_, lookup]}
+                    else:
+                        # {'not': {'a': o1}} + {'a': o2} -->
+                        #     {'a': {'nin': [o1], 'all': [o2]}}
+                        subquery[column] = {"$nin": [not_], "$all": [lookup]}
+                else:
+                    if isinstance(lookup, dict):
+                        if "$ne" in lookup:
+                            if "$nin" in existing:
+                                # {'$nin': [o1, o2]} + {'$ne': o3} --> {'$nin': [o1, o2, o3]}
+                                assert "$ne" not in existing
+                                existing["$nin"].append(lookup["$ne"])
+                            elif "$ne" in existing:
+                                # {'$ne': o1} + {'$ne': o2} --> {'$nin': [o1, o2]}
+                                existing["$nin"] = [existing.pop("$ne"), lookup["$ne"]]
+                            else:
+                                existing.update(lookup)
+                        else:
+                            if "$in" in lookup and "$in" in existing:
+                                # {'$in': o1} + {'$in': o2} --> {'$in': o1 union o2}
+                                existing["$in"] = list(set(lookup["$in"] + existing["$in"]))
+                            else:
+                                # {'$gt': o1} + {'$lt': o2} --> {'$gt': o1, '$lt': o2}
+                                assert all(key not in existing for key in lookup), [
+                                    lookup,
+                                    existing,
+                                ]
+                                existing.update(lookup)
+                    else:
+                        key = "$nin" if self._negated else "$all"
+                        existing.setdefault(key, []).append(lookup)
+
+                if filters.connector == OR and subquery:
+                    or_conditions.append(subquery)
+
+        if filters.negated:
+            self._negated = not self._negated


Overall, I think the logic here works, but I think we should heavily consider refactoring this for readability.

Placing it in its own function with a clear statement of function

Adding docstring to explain what it's doing (combining queries and handling negation logic)

Jibola · 2024-05-02T19:14:44Z

django_mongodb/query.py

+        # Remove percent signs added by Field.get_db_prep_lookup() for LIKE
+        # queries.
+        if lookup_type in ("startswith", "istartswith"):
+            value = value[:-1]
+        elif lookup_type in ("endswith", "iendswith"):
+            value = value[1:]
+        elif lookup_type in ("contains", "icontains"):
+            value = value[1:-1]


This should probably still check if there's a percent sign there. That or could we subclass Field to remove this layer?

While it wouldn't be much extra computation to be defensive, I can't imagine a way it could break, especially in a way that wouldn't be caught by the test suite.

The comment is actually out of date. The percent signs are now added by PatternLookup.process_rhs(). I don't think we can monkey patch that without breaking usage of this backend alongside a SQL backend (and the same comment applies if the logic was in Field.get_db_prep_lookup()).

blink1073 · 2024-05-07T00:48:42Z

🎉 thanks @timgraham!

timgraham force-pushed the init branch from 36a9098 to dd0b6b1 Compare April 23, 2024 15:11

timgraham marked this pull request as ready for review April 23, 2024 16:28

timgraham force-pushed the init branch 2 times, most recently from f2cd190 to d154f74 Compare April 25, 2024 15:03

blink1073 mentioned this pull request Apr 26, 2024

Add THIRD-PARTY-NOTICES #5

Closed

Jibola approved these changes Apr 26, 2024

View reviewed changes

timgraham force-pushed the init branch 5 times, most recently from 44e490b to 2be3bc0 Compare April 28, 2024 01:34

WaVEV reviewed Apr 28, 2024

View reviewed changes

timgraham force-pushed the init branch from 2be3bc0 to 4d2e558 Compare April 29, 2024 13:08

WaVEV reviewed Apr 29, 2024

View reviewed changes

timgraham mentioned this pull request Apr 29, 2024

add support for write concern #6

Open

Jibola reviewed Apr 29, 2024

View reviewed changes

WaVEV reviewed Apr 29, 2024

View reviewed changes

timgraham force-pushed the init branch from 4d2e558 to e6bd6c2 Compare April 30, 2024 01:28

Jibola reviewed Apr 30, 2024

View reviewed changes

Jibola reviewed May 1, 2024

View reviewed changes

Jibola reviewed May 2, 2024

View reviewed changes

timgraham force-pushed the init branch 6 times, most recently from 69e5f46 to 8ce323c Compare May 7, 2024 00:17

timgraham added 3 commits May 6, 2024 20:22

initial commit of MongoDB backend

6a86eb1

fix escaping of special characters in text lookups

01a2ed1

fix QuerySet.update() with related model instance

2e0d21a

timgraham added 10 commits May 6, 2024 20:22

raise error for pattern lookups on UUIDField

35c0871

fix BinaryField encoding

34e6e64

fix overflow on IntegerField lookups

f99017c

add skips for all model_fields tests

28f2a34

made QuerySet.exists() return a boolean

50782dd

fix deferral of foreign key fields

e646e2b

add skips for all defer and defer regress tests

0b6ec2e

make DatabaseCreation._destroy_test_db() clear the test database

0106ad1

fetch database converters once per query rather than for each result

09da727

fix isnull lookup

28f52b1

timgraham force-pushed the init branch from 8ce323c to 28f52b1 Compare May 7, 2024 00:23

timgraham merged commit 28f52b1 into mongodb-labs:main May 7, 2024
3 checks passed

timgraham deleted the init branch May 7, 2024 00:36

		if user and password and not self.database.authenticate(user, password):
		raise ImproperlyConfigured("Invalid username or password.")

		from .query import MongoQuery, safe_call


		class SQLCompiler(compiler.SQLCompiler):

		from django.utils.translation import gettext_lazy as _


		class MongoAutoField(AutoField):

		return getattr(collection, method)(criteria, update_spec, **options).matched_count


		class SQLAggregateCompiler(SQLCompiler):

initial commit of MongoDB backend #2

initial commit of MongoDB backend #2

Conversation

timgraham commented Apr 22, 2024

Jibola left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

WaVEV Apr 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Jibola left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

timgraham May 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Jibola May 6, 2024 • edited Loading

Choose a reason for hiding this comment

Jibola left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

blink1073 commented May 7, 2024

WaVEV Apr 29, 2024 •

edited

Loading

timgraham May 5, 2024 •

edited

Loading

Jibola May 6, 2024 •

edited

Loading