Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inventory Optimisation: Querying for services and interfaces in bulk rather than by device #143

Closed
DouglasHeriot opened this issue Mar 25, 2020 · 6 comments

Comments

@DouglasHeriot
Copy link
Contributor

ISSUE TYPE
  • Feature Idea
SUMMARY

Right now when the inventory plugin fetches the services and interfaces associated with devices and VMs, it does a separate request for each device's services and interfaces. In some circumstances it could be more efficient to just query for all interfaces and all services (similar to how everything else is queried) and then match them up with each device the whole list has been downloaded.

This method would be more efficient in the case where the entire contents of Netbox is wanted in the Ansible inventory. The down side of this would be if there are query_filters in place the limit what devices are required - there would be no way to filter the services or interfaces by device.

Right now I've got 2000 devices in Netbox, less than 1000 services defined, and 10,000 interfaces. And this is less than 10% of our current infrastructure - we're only getting started putting stuff into Netbox. This currently requires 4000 HTTP requests to get the services and interfaces of each device, when it could be achieved with just a couple of batch requests. I haven't even looked at what sort of server CPU and database load this creates.

Alternatively - should I make a request with the Netbox project to include returning interfaces and services as part of the main /dcim/devices/ query?

EXPECTED RESULTS
Fetching: https://netbox/api/ipam/services/?limit=0
ACTUAL RESULTS
Fetching: https://netbox/api/ipam/services/?device=HIL-01
Fetching: https://netbox/api/ipam/services/?device=HIL-02
Fetching: https://netbox/api/ipam/services/?device=HIL-03
Fetching: https://netbox/api/ipam/services/?device=HIL-04
Fetching: https://netbox/api/ipam/services/?device=HIL-05
Fetching: https://netbox/api/ipam/services/?device=HIL-06
Fetching: https://netbox/api/ipam/services/?device=HIL-07
Fetching: https://netbox/api/ipam/services/?device=HIL-08
...
@FragmentedPacket
Copy link
Contributor

Could you provide any numbers to this? Possibly print time before and after the fetches or maybe the complete run of this between the two options?

Thinking about this quickly, maybe add if/else to either fetch all or per device depending on the query_filters? If no query filters, and we want to fetch those, then it would make sense to just fetch everything with the limit=0

I don't have a big data set to test the change against on whether or not it's worth it to implement it.

I'll look for more discussion on this and the PR when it's put in. Don't have a strong opinion on it, but would like to see if there is a performance improvement.

@DouglasHeriot
Copy link
Contributor Author

DouglasHeriot commented Mar 26, 2020

Here's some more specific numbers in our setup. Ansible is being run from a VM on-prem, and Netbox is hosted in EC2. We have a ping of just 2ms from Ansible to Netbox.

  • 2236 Devices
  • 4 services (total - yeah, we haven't really filled that out yet)
  • 10403 Interfaces
  • 3283 IP Addresses

Summary

Interfaces Services Time HTTP Requests
Yes Yes 7m14s 6787
No Yes 3m28s 2274
No No 17s 12

Interfaces and Services enabled

$ time ansible-inventory -vvv --inventory inventory/netbox.yml --graph 2>&1 | tee inventory.txt
...
real    7m14.441s
user    0m28.269s
sys     0m4.395s

6787 HTTP requests (counted by the self.display.v("Fetching: " + url) from _fetch_information. I removed the one from get_resource_list to avoid duplicate logs)

That's a 7 minute overhead for running any Ansible playbook.

Interfaces disabled, Services enabled by default

real    3m28.690s
user    0m9.222s
sys     0m1.342s

2274 HTTP requests

Neither interfaces or services

I had to comment out the services group extractor "services": self.extract_services,

real    0m17.436s
user    0m2.181s
sys     0m0.166s

12 HTTP requests

Caching?

Caching does help significantly - but not in all use cases. We plan to set up webhooks from Netbox to trigger a CI server to run the Ansible playbook automatically. Caching won't be helpful here as you're always looking to get the latest data from Netbox. Caching would only help in the development situation where you're running playbooks multiple times against the same set of inventory.

Thoughts

Both interfaces and services have a large impact on performance reducing where this inventory plugin can be used. Right now there's not even an option to disable services - I'll be adding one in so we can continue to use this as we did before services were added.

I like the idea to provide users a choice which option to take - query some devices or just fetch all interfaces/services/ip-addresses. I'm pretty sure for most uses cases the trade-off of fetching all interfaces in a couple of large requests vs hundreds or thousands of smaller requests - it's going to be quicker to just get all of them.

I'll consider adding more data into the test deployment scripts, but not sure what the performance of Travis CI is like or if it's worth risking slowing that down too much.

@FragmentedPacket
Copy link
Contributor

Great! Thanks for those test. This improvement will definitely be welcomed. I agree with the choice of toggling bulk GET or individual calls to allow some flexibility. Looking forward to the PR!

DouglasHeriot added a commit to hillsong/ansible_modules that referenced this issue Apr 3, 2020
Querying every single device's services as a separate HTTP request can be very slow. Allow users to disable this (similar to interfaces) if it is not required.
DouglasHeriot added a commit to hillsong/ansible_modules that referenced this issue Apr 3, 2020
Querying every single device's services as a separate HTTP request can be very slow. Allow users to disable this (similar to interfaces) if it is not required.
DouglasHeriot added a commit to hillsong/ansible_modules that referenced this issue Apr 6, 2020
DouglasHeriot added a commit to hillsong/ansible_modules that referenced this issue Apr 7, 2020
…unity#143)

In most cases this isn't too bad, but for the interfaces and services extractors it would have been resulting in twice as many HTTP requests?
DouglasHeriot added a commit to hillsong/ansible_modules that referenced this issue Apr 14, 2020
…unity#143)

In most cases this isn't too bad, but for the interfaces and services extractors it would have been resulting in twice as many HTTP requests?
DouglasHeriot added a commit to hillsong/ansible_modules that referenced this issue Apr 15, 2020
…unity#143)

In most cases this isn't too bad, but for the interfaces and services extractors it would have been resulting in twice as many HTTP requests?
DouglasHeriot added a commit to hillsong/ansible_modules that referenced this issue Apr 21, 2020
…unity#143)

In most cases this isn't too bad, but for the interfaces and services extractors it would have been resulting in twice as many HTTP requests?
@FragmentedPacket
Copy link
Contributor

@DouglasHeriot Do you think it would be worth it to do a fetch of all interfaces, ip addresses, etc. as well when those options are specified? Just stuff in all these optimizations into the PR for this.

@DouglasHeriot
Copy link
Contributor Author

@FragmentedPacket which options are you thinking of?

I'm planning for this to introduce an option fetch_all that would determine whether the interfaces and services options both fetch all, or per-device.

I would be curious on different people's use-cases for this - I guess the default should be to fetch all, but I wonder if anyone will ever query such a small portion of their database it makes sense to turn it off?

@FragmentedPacket
Copy link
Contributor

Pretty much just those at this point; interfaces, services, ip addresses. I think those are the main ones that have to be fetched outside of the normal device/VM lookups.

I don't use the inventory plugin at this point so my input isn't use-case driven or anything.

DouglasHeriot added a commit to hillsong/ansible_modules that referenced this issue May 7, 2020
… interfaces/services.

A side effect is it resolves netbox-community#142 fetching services for VMs

Includes starting to better support virtual chasis - should only take the master device and not the children. Some work on this started in ansible/ansible#60642
DouglasHeriot added a commit to hillsong/ansible_modules that referenced this issue May 7, 2020
… interfaces/services.

A side effect is it resolves netbox-community#142 fetching services for VMs

Includes starting to better support virtual chasis - should only take the master device and not the children. Some work on this started in ansible/ansible#60642

See the TODO comments for work still to be done before being ready to merge.
DouglasHeriot added a commit to hillsong/ansible_modules that referenced this issue May 8, 2020
… interfaces/services.

A side effect is it resolves netbox-community#142 fetching services for VMs

Includes better support virtual chasis - only take the master device and not the children. Some of this from ansible/ansible#60642

See the TODO comments for work still to be done before being ready to merge.
DouglasHeriot added a commit to hillsong/ansible_modules that referenced this issue May 11, 2020
* Allow tuning for largest size permitted by your webserver, for optimum performance reducing number of required HTTP requests.

* Handle exceptions from within threads and raise on the main thread after being joined. This ensures that any HTTP errors from refresh_ methods will stop execution of the rest of the plugin. For example, you'll notice if you receive a HTTP 414 URI Too Long.

Added unit tests for these things.
DouglasHeriot added a commit to hillsong/ansible_modules that referenced this issue May 11, 2020
I found in my install I was getting HTTP 400 errors with 8000 length. 4000 length works. May depend on web server, CDN, etc.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants