address issues with NetBox database and Logstash's NetBox cache #259

mmguero · 2023-09-12T18:05:48Z

I'm not going to hold up the v23.09.0 release for this, but this is an issue I just discovered that needs to be address:

The Logstash Ruby script in charge of enrichment from netbox uses some LRU caches to avoid frequent repetitive API lookups of items.

However, the netbox database is not static: things can be added (not so much of an issue, since a cache miss will just cause the thing to be looked up) but also deleted and changed.

So Logstash could be going along and have information in its cache that is now invalid because the underlying database has been changed.

I don't think there's a good way to "trigger" Logstash when changes are written to NetBox (externally to auto-population, like through the UI or through ./scripts/netbox-restore). But we need to consider how to handle this.

At the very least, a complete wipe/restore of the database via ./scripts/netbox-restore needs to trigger a complete Logstash restart (or, if we want to be more subtle about it, somehow notify logstash to clear all of its caches).

This needs some careful thought to figure out how to deal with it. On one hand, we know NetBox is using postgresql and redis to do its own database and caching: maybe we simply don't need to cache in LogStash at all and it won't be too expensive to just call the API every time? I don't know.

The text was updated successfully, but these errors were encountered:

mmguero · 2023-10-26T19:40:13Z

After looking a little closer at this, I think that's the approach I'm going to take. Since netbox is already caching things I think we're adding complication without a lot of gain by doing our own caching on top of that.

…olab#259); work in progress, almost certainly broken in this state

mmguero · 2023-10-26T21:23:42Z

Working on it, but as of mmguero-dev/Malcolm@cab66bd it's broken (seems to be a concurrency issue). Just as a reminder to myself...

…s NetBox cache (idaholab#259)

…s NetBox cache, should fix locking issues (idaholab#259)

mmguero · 2023-10-27T13:44:34Z

The good news is I've removed our extra layer of caching and it does seem to be more consistent now. The bad news is the CPU load increases quite a bit when, especially when doing autopopulation due to the increased API load. I need to do some benchmarks to compare, and also compare when autopopulation is turned on vs. not.

mmguero · 2023-10-27T15:49:52Z

Okay, on further analysis I think actually we're going to have to come to a middle ground: still do the caching, but with two changes: 1) decrease the TTL significantly (from 600 seconds down to maybe like 30 or 60 seconds) and 2) have ALL of the caches in the file be a TTL cache, rather than having the site/role/device type just being LRU.

I think removing the caching completely will just make it too slow.

…olab#259); restore caching for performance reasons, but decrease TTL significantly and allow it to be specified via environment variable

mmguero · 2023-10-27T17:19:35Z

Closing as per my last comment and commit.

Some rough benchmarks (the last number is milliseconds in the filter):

With no caching, autopopulate on:

ruby_netbox_enrich_destination_ip_segment;87489;87489;3017294
ruby_netbox_enrich_source_ip_segment;96610;96610;3411570
ruby_netbox_enrich_source_ip_device;96610;96610;10968202
ruby_netbox_enrich_destination_ip_device;87489;87489;11361163

With no caching, autopopulate off:

ruby_netbox_enrich_destination_ip_segment;85755;85755;169437
ruby_netbox_enrich_source_ip_segment;94305;94305;222455
ruby_netbox_enrich_destination_ip_device;85755;85755;958852
ruby_netbox_enrich_source_ip_device;94305;94305;1607698

Although this is rough (the numbers of the events aren't exactly the same), you can see that without the caching it's like 10x+ worse.

mmguero added bug Something isn't working logstash Relating to Malcolm's use of Logstash netbox Related to Malcolm's use of NetBox labels Sep 12, 2023

mmguero added this to the v23.10.0 milestone Sep 12, 2023

mmguero modified the milestones: v23.10.0, v23.11.0 Oct 23, 2023

mmguero self-assigned this Oct 26, 2023

mmguero added a commit to mmguero-dev/Malcolm that referenced this issue Oct 26, 2023

address issues with NetBox database and Logstash's NetBox cache (idah…

cab66bd

…olab#259); work in progress, almost certainly broken in this state

mmguero added a commit to mmguero-dev/Malcolm that referenced this issue Oct 27, 2023

work in pgoress for address issues with NetBox database and Logstash'…

8831476

…s NetBox cache (idaholab#259)

mmguero added a commit to mmguero-dev/Malcolm that referenced this issue Oct 27, 2023

work in pgoress for address issues with NetBox database and Logstash'…

20a89d5

…s NetBox cache, should fix locking issues (idaholab#259)

mmguero closed this as completed Oct 27, 2023

This was referenced Dec 4, 2023

Malcolm v23.12.0 #307

Merged

Malcolm v23.12.0 cisagov/Malcolm#290

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

address issues with NetBox database and Logstash's NetBox cache #259

address issues with NetBox database and Logstash's NetBox cache #259

mmguero commented Sep 12, 2023

mmguero commented Oct 26, 2023

mmguero commented Oct 26, 2023 •

edited

Loading

mmguero commented Oct 27, 2023

mmguero commented Oct 27, 2023

mmguero commented Oct 27, 2023

address issues with NetBox database and Logstash's NetBox cache #259

address issues with NetBox database and Logstash's NetBox cache #259

Comments

mmguero commented Sep 12, 2023

mmguero commented Oct 26, 2023

mmguero commented Oct 26, 2023 • edited Loading

mmguero commented Oct 27, 2023

mmguero commented Oct 27, 2023

mmguero commented Oct 27, 2023

mmguero commented Oct 26, 2023 •

edited

Loading