Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom field accessor database performance #3185

Closed
paravoid opened this issue May 14, 2019 · 3 comments
Closed

Custom field accessor database performance #3185

paravoid opened this issue May 14, 2019 · 3 comments
Labels
status: accepted This issue has been accepted for implementation type: feature Introduction of new functionality to the application

Comments

@paravoid
Copy link
Contributor

Environment

  • Python version: 3.5.3
  • NetBox version: 2.5.12

Use Case

One of the Netbox reports that we've written, needs to basically look at multiple Devices, and for each of them check various fields against another database for consistency. Some of them are Netbox's built-in fields, some of them are custom fields. Basically, semi-realistic example:

def test_devices_match(self):
    devices = Device.objects.all()  # or filter(...)
    for device in devices:
        other = fetch_from_other_data_source(device.serial)

        if device.asset_tag != other["asset_tag"]:
            self.log_failure(device, "Asset tag does not match")

        if device.cf()["ticket"] != other["ticket"]:
            self.log_failure(device, "Ticket does not match")

Observed behavior

Netbox issues hundreds of database queries, making this inefficient, slow and tough on the database. The .cf() accessor seems to be the culprit, as it's making a lot of queries for each of the ContentTypes etc., for each of those Device objects.

Proposed behavior

We've worked around that, with this piece of code:

devices = devices.prefetch_related("custom_field_values__field")
[...]
    netbox_ticket = None
    for cfv in device.custom_field_values.all():
        if cfv.field.name == "ticket":
            netbox_ticket = cfv.value
            break
[...]

This reduces the hotpath queries to just 1 big one, making this code path about 20-30x more efficient.

I think this could be generalized, and potentially replace the .cf() accessor.

I'm unaware of the history of .cf(), and it is a complicated piece of code, so I'm a bit hesitant diving in and submitting a PR there. Would love some guidance :)

@jeremystretch
Copy link
Member

Related to #3190

@jeremystretch
Copy link
Member

First, I should point out that cf was recently changed from a method to a property in #3190. Just something to be aware of with the upcoming v2.5.13 release.

There are a few opportunities for improvement here. First, we have this line in get_custom_fields():

fields = CustomField.objects.filter(obj_type=content_type)

Here, we're retrieving all applicable custom fields for the current instance. (This is needed to ensure we also represent fields which have no value assigned.) This line is called each time cf is accessed: not ideal. We should take a look at caching these similar to how ContentTypes are cached, however this will necessitate introducing a new manager for custom field models. This is a large enough undertaking to warrant its own issue.

Further down, we have this line:

values = CustomFieldValue.objects.filter(obj_type=content_type, obj_id=self.pk).select_related('field')

This forfeits any benefit from having pre-fetched values on the parent queryset. We should be referencing self.custom_field_values.all() instead.

And finally there's the cf property itself, which is currently calling get_custom_fields() on each access:

return {field.name: value for field, value in self.get_custom_fields().items()}

This means that accessing e.g. obj.cf.foo and obj.cf.bar will result in two separate queries. Instead, we can can cache the results from the first call to get_custom_fields() on the instance, and use that for successive references.

@jeremystretch
Copy link
Member

I've pushed 823257c to address the last two points above, and opened #3226 for the first one.

@jeremystretch jeremystretch added status: accepted This issue has been accepted for implementation type: feature Introduction of new functionality to the application labels May 29, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Jan 16, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
status: accepted This issue has been accepted for implementation type: feature Introduction of new functionality to the application
Projects
None yet
Development

No branches or pull requests

2 participants