Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
metrics: reduce memory footprint by over 60% #1251
Without these changes the ingest process will not run on production machine even with 10GB of RAM. This drop the required memory from around 12GB to 4.5GB. The small batches also reduces the influxdb process overhead during writing. Interestingly, as noted in commit message above, one of the improvements was the original way I approached this, but changed to match influxdb interface. :( Additionally, the need to call
This prevents the memory required to process to be enough to load all parsed request element trees at once. Instead only one page of requests is loaded at a time and the memory freed after processed. The end result is the memory consumption reduced by just over 20% (current Factory drops by around 2.5GB).
Savings of around 400MB per 10,000 requests. Using a named tuple was the original approach for this reason, but the influxdb interface requires dict()s and it seemed silly to spend time converting them. Additionally, influxdb client already does batching. Unfortunately, with the amount of data processed for Factory that will continue to grow this approach is necessary. The dict() final structures are buffered up to ~1000 before being written and released. Another benefit of the batching is that influxdb does not allocate memory for the entire incoming batch.