Skip to content

DRIVERS-2917 - Standardized Performance Testing of ODMs and Integrations #366

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

NoahStapp
Copy link

@NoahStapp NoahStapp commented Aug 14, 2025

Purpose

This is the first draft of the Django implementation of the MongoDB ODM Benchmarking suite. It contains a small suite of benchmark tests designed to measure django-mongodb-backend performance across data sizes and structures for what we expect to be common user operations.

This is NOT intended to be a comprehensive test suite for every operation, only the most common and widely applicable. This is also not intended to be Django specific: each of these tests must be implementable across all of our ODMs, up to the features supported by each library.

Structure

The benchmarks are contained within a separate Django application within tests/performance/perftest. This application exists only to execute the benchmarking suite.

Running Locally

After starting a local MongoDB server, the tests can be run by running python manage.py test from the tests/performance directory. The full suite run locally is expected to take approximately 10 minutes. For faster benchmarks, pass FASTBENCH=1 as an environment variable totests/performance/perftest/tests.py.

Review Items

  • Do the Django operations being performed conform to our expectations of how django-mongodb-backend should be used?
  • Are any operations missing that we expect to be frequently performed by users?
  • Is the scope and breadth of the benchmarks sufficient for coverage without being cumbersome to implement across potentially dozens of libraries, first and third-party?
  • Are the data sizes and model structures being used representative of real-world situations?

@aclark4life
Copy link
Collaborator

aclark4life commented Aug 18, 2025

In general, awesome!

Re:

Structure

The benchmarks are contained within a separate Django application within tests/performance/perftest. This application exists only to execute the benchmarking suite.

Can this be integrated with existing tests such that some established testing conventions are followed? For example,

  • tests/<testsuite>_ - contains test suites for various features
  • performance - Django project for perf testing that contains perf testing app?

IOW If the performance directory "contained a test suite for testing performance" then we could follow the convention of adding the _ at the end so the Django tests could run the performance tests. Since performance appears to be a standalone Django project, I'm not sure what the best thing to do is (especially given the need to be able to run without the Django test runner.) I wonder if a top level perf_testing project that contains the perftest app may be better than including the perf tests in tests/. Or maybe a top level perftest/perftest, project and app respectively. Also not sure if it helps, but apps don't have to be in projects …

@NoahStapp
Copy link
Author

In general, awesome!

Re:

Structure

The benchmarks are contained within a separate Django application within tests/performance/perftest. This application exists only to execute the benchmarking suite.

Can this be integrated with existing tests such that some established testing conventions are followed? For example,

  • tests/<testsuite>_ - contains test suites for various features
  • performance - Django project for perf testing that contains perf testing app?

IOW If the performance directory "contained a test suite for testing performance" then we could follow the convention of adding the _ at the end so the Django tests could run the performance tests. Since performance appears to be a standalone Django project, I'm not sure what the best thing to do is (especially given the need to be able to run without the Django test runner.) I wonder if a top level perf_testing project that contains the perftest app may be better than including the perf tests in tests/. Or maybe a top level perftest/perftest, project and app respectively. Also not sure if it helps, but apps don't have to be in projects …

Running the performance tests alongside the Django tests would add too much runtime to be practical. Putting the perftest project somewhere not in the tests directory is fine with me, but having every test suite be within tests makes more sense from a consistency standpoint.

Is there a way to independently run tests on the perftest without a project? If we can do away with needing to have a full (if minimal) Django project at all here that would be ideal, but I couldn't find a way to do so with how the other tests here are set up.

@timgraham
Copy link
Collaborator

I'm inclined to agree that the performance tests should be in their own directory performance_tests or test_performance? There not meant to be run like the other tests, so to intermingle them seems confusing (and requires workarounds in the test runner, as you added).

Probably integrating with Django's runtests.py isn't the way to go, but I'll think about it. At least the settings file you added could be minimized. I could do that without much effort. I have a few comments from a cursory glance and will dig in more later.

Comment on lines 191 to 199
class TestSmallFlatDocCreation(SmallFlatDocTest, TestCase):
def do_task(self):
for doc in self.documents:
model = SmallFlatModel(**doc)
model.save()

def after(self):
SmallFlatModel.objects.all().delete()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little uncertain how this whole thing fits together. On the hand, you inherit django.test.TestCase which has its own setUp/tearDown strategy, on the other hand, you seem to implement your scheme with an after() method. Some classes use after(), others tearDown()... is it intentional? (edit: perhaps answered by the benchmark spec.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ODM benchmark spec (draft PR here: mongodb/specifications#1828) is helpful here, but the short answer is yes, we need the ability to do setup or teardown both before and after every iteration of a test as well as the entire set of iterations.

class TestSmallFlatDocCreation(SmallFlatDocTest, TestCase):
def do_task(self):
for doc in self.documents:
model = SmallFlatModel(**doc)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typically for this pattern you would use Model.objects.create() which does the same without a need to call save().

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the typical Django user be aware of that pattern? It's important to align our benchmark code with how the average user would expect to use the backend.

Copy link
Contributor

@Jibola Jibola Aug 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that's common pattern for users to call Model.objects.create()

Comment on lines 1 to 13
# Copyright 2025-present MongoDB, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same license as the rest of the repo. Not sure if we're supposed to put this boilerplate on all our files?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jibola do we have a different policy for integrations such as this? PyMongo has this license in every single file.

Copy link
Contributor

@Jibola Jibola Aug 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this performance test is within the repo and there's no outside dependency.

It's fine to remove it

Comment on lines +1 to +12
{
"field1": "kj9$mxz#p2qw8r*vn4@h7c&u1s",
"field2": "x3@9zf#mk7w$qp2n8v*r6j4h&c1u5s0a",
"field3": "p9#m2x$k7z@w3q8v*n4r&j6h1c5u0s",
"field4": "z8@x3m#k9w$p2q7v*r4n6j&h1c5u0s",
"field5": "m7#k9x$z3w@p8q2v*n4r6j&h1c5u0s",
"field6": "k6$x9m#z7w3p@q8v2n*r4j6h&c1u5s0",
"field7": "x5@m9k#z6w$p3q7v*n2r8j&h4c1u0s",
"field8": "m4#x8k$z9w6p@q3v*n7r2j&h5c1u0s",
"field9": "k3$m7x#z8w9p@q6v*n2r4j&h1c5u0s",
"field10": "x2@k6m#z7w8p$q9v*n3r1j&h4c5u0s",
"field11": "m1#x5k$z6w7p@q8v*n9r2j&h3c4u0s",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "Django-ey" solution is to use fixtures, but I don't insist on it. It might be more trouble than it's worth, especially if Django's TestCase machinery isn't actually running (see other comment).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intended workflow here is that the tests will clone the specifications repo and extract these datasets for each run of the suite. The datasets are used across all of the implementing ODMs, so using a Django-specific feature like a fixture seems out-of-scope here (despite looking nice and clean).


"""Tests for the MongoDB ODM Performance Benchmark Spec.
See https://github.com/mongodb/specifications/blob/master/source/benchmarking/odm-benchmarking.md
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 215 to 216
def after(self):
SmallFlatModel.objects.all().delete()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little uncertain how this whole thing fits together. On the hand, you inherit django.test.TestCase which has its own setUp/tearDown strategy, on the other hand, you seem to implement your scheme with an after() method. Reading the spec, it looks like some after() methods should be tearDown(). In this case, if you delete the objects after the first iteration, future iterations are actually creating new models rather than updating existing ones. Check your other tests too!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Definitely a bug here. I'll have to modify the benchmark to ensure that each iteration of updated_value is unique so the database actually performs an update operation.

Comment on lines 73 to 83
include_expansions_in_env:
- requester
- revision_order_id
- project_id
- version_id
- build_variant
- parsed_order_id
- task_name
- task_id
- execution
- is_mainline
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

evergreen nit:
For consistency are these able to be an array?

Comment on lines +7 to +9
# Install django-mongodb-backend
/opt/python/3.10/bin/python3 -m venv venv
. venv/bin/activate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT:
Whilst I'd rather not have a new dependency on drivers-evergreen-tools, to future-proof the binary usage, you could add in the find_python function

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This I copied over from the existing run-tests.sh script here. Let's update them both in a separate ticket.

"end": int(end_time.timestamp()),
"elapsed": elapsed_secs,
}
report = {"failures": 0, "results": [results]}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we expect no failures, what's the point of providing the key? Is it just required to be passed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe so, I'll remove it and see.

Comment on lines +160 to +161
# Always omit the performance benchmarking suite.
if x.name != "gis_tests_" and x.name != "performance"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

@@ -0,0 +1 @@
{"field1":"miNVpaKW","field2":"CS5VwrwN","field3":"Oq5Csk1w","field4":"ZPm57dhu","field5":"gxUpzIjg","field6":"Smo9whci","field7":"TW34kfzq","field8":55336395,"field9":41992681,"field10":72188733,"field11":46660880,"field12":3527055,"field13":74094448}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this get formatted?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file won't be in the repo. It'll live entirely in the specifications repo and be cloned for each run of the benchmark. It's included here for ease of review.

name = self.__class__.__name__[4:]
median = self.percentile(50)
megabytes_per_sec = self.data_size / median / 1000000
print( # noqa: T201
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use LOGGER here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logging.info doesn't play nicely with tests here. For perf tests I don't think it's worth the effort to fix.

Comment on lines 223 to 226
for doc in self.documents:
model = SmallFlatModel(**doc)
model.save()
self.ids.append(model.id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since creation isn't part of benchmark time, can we place this in a bulk_create or a threadpool to reduce the time consumption?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using bulk_create wherever possible makes sense.

for doc in self.documents:
model = SmallFlatModel(**doc)
model.save()
self.models = list(SmallFlatModel.objects.all())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.models.append(model) should suffice, especially if you use Models.objects.create. Removes the need for database call.

@NoahStapp NoahStapp requested review from Jibola and timgraham August 20, 2025 13:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants