This repository has been archived by the owner on Jul 31, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 139
Add a global locate sample rate, with optional rate controller #1426
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
jwhitlock
force-pushed
the
global-sample-rate-1398
branch
3 times, most recently
from
December 3, 2020 19:04
ce0f735
to
41dfacd
Compare
Change sample rate calculation to use float math and random.random() instead of random.randint(). Use mocks to test the algorithm directly rather than statistically.
The global locate sample rate will allow reducing sampling when backend processing is overloaded.
These queues have not generated any metrics in the last year, probably because the namei are queue_export_* rather than export_queue_*.
jwhitlock
force-pushed
the
global-sample-rate-1398
branch
from
December 8, 2020 21:43
0ca825f
to
964c343
Compare
jwhitlock
changed the title
WIP: Add a dynamically controlled global locate sample rate
Add a global locate sample rate, with optional rate controller
Dec 8, 2020
Add tags for queue_type (task or data) and data_type (various) to the queue entries. This will make it easier to filter and aggregate data in Graphana. Tests now have a list of the expected queues and their tags. This highlighted some missing queues in the documentation.
The station data backlog is computed while measuring the queue sizes, and the rate controller parameters are read from Redis. If the rate controller is enabled, the controller is initialized, and the previous internal state (if any) is loaded. A new global locate sample rate is determined and written to Redis, along with the rate controller state. Since the rate controller parameters will be set manually, they are validated, and the rate controller turned off if they are invalid. New validation is used in the API to read the global sample rate as well, to make it a little safer, defaulting to 100%.
Instead of auto-disabling rate control, set the PID paramters to reasonable settings.
jwhitlock
force-pushed
the
global-sample-rate-1398
branch
from
December 9, 2020 00:13
b8af23f
to
39c5df3
Compare
I'm a little bothered by the way the rate jumps around when under rate control. This could be solved by a filter on the rate output (some suggest a second PID controller on the output). However, it is more important to get some real-world data from using this code, so stopping here. Locally, I ran:
The rate controller can't be tested outside of simulations (in private docs and private repositories, since it uses production data for simulated traffic), but I was able to manually test:
Let's get some real-world usage data. |
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
For issue #1398, add a global sample rate for locate observations. Automatically adjust it, with the goal of processing as many observations as possible without accumulating a growing backlog (and without manual intervention).
To make the data queues easier to monitor, the
queue
metrics has two new tags:queue_type
:data
ortask
data_type
: For data queues, the type of data.bluetooth
,cell
, and (most of all)wifi
contribute to the backlog calculationDataQueue
and derived classes store thedata_type
, passed as a new initialization variable.The
queue
metric taskmonitor_queue_size
is nowmonitor_queue_size_and_rate_control
, and runs the rate controller if enabled. The rate controller is implemented by a PID controller provided by the simple-pid library. Simulation showed that only proportional control was needed, so it is more of a P Controller.This PR adds some new Redis keys:
global_locate_sample_rate
: Read in web app, assumed 100.0 (100%) if unset. Set in the task app, if the rate controller is enabledrate_controller_target
: The target maximum data queue size, as an integer. Must be set by an administrator to enable the rate controller.rate_controller_enabled
: Read in the task app,1
to enable the rate controller,0
or unset to disable. Set by an administrator.rate_controller_kp
,rate_controller_ki
,rate_controller_kd
: Kp, Ki, and Kd, the proportional, integral, and derivative gain terms. They are set to defaults (8, 0, and 0) when the rate controller is enabled, and could be adjusted by an administrator.rate_controller_state
: The internal state of the PID controller, as a JSON-encoded string, used to reload it when the task runs.There are new metrics as well:
rate_control.locate
: The current value ofglobal_locate_sample_rate
, or 100.0 if unsetrate_control.locate.kp
,rate_control.locate.ki
,rate_control.locate.kd
: The current values of PID gains Kp, Ki, and Kd.rate_control.locate.pterm
,rate_control.locate.iterm
,rate_control.locate.dterm
: The internal components of the PID controller, for debugging and adjusting the PID gains.There are new documents for rate control, as well as updates to the metrics docs.