Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] AutoRange #38706

Open
wants to merge 11 commits into
base: master
from

Conversation

Projects
None yet
3 participants
@vdaviot
Copy link

commented Feb 11, 2019

From engine
This pull-request is related to:

  • cli: For the docker-compose.yml parsing and $ docker container stats format
  • swarmkit: For the API update

What

AutoRange is a feature that helps to find and to apply the optimal limits for a service. It is an update for the docker collector and require swarm mode to be enabled.

Why

This collector extension was thought as a way to monitor and predict the optimal limits for a service. The goal was to find the point where a service could function properly, but still save as much resources as possible. It was written as a way to answer the question
How to optimize the number of services running on our infrastructure without losing quality of service?

How

The logic behind the feature can be described in 3 points:
- First, we collect the metrics and apply some transformations on it to generate two values.
Those values represent a "box" around the actual consumption.
- Then, we transform these values into time-series, using some of the key data collected previously to
weight our operations. The amplitude of change between values is monitored to know if it's time to stop measurements.
- Finally, we obtain refined values that we apply as limitation to the service. The data are then kept in a reduced form to limit memory usage.

Usage

The functionality is declared by adding the autorange key to the docker-compose.yml.
The mechanism is available for cpu% and memory, with or without base values.
Below is an example of both.

services:
  myservice:
    autorange:
      memory:
      cpu%:

The available keys are:
- min (in octets)
- max (in octets)
- threshold% (only for memory, represents a security margin that will be refined by the algorithm)

services:
  myservice:
    autorange:
      memory:
          min: "110000"
          max: "120000"
          threshold%: "10"
      cpu%:
          min: "60"
          max: "70"

This functionality is deployed with $ docker stack deploy --compose-file=/your/compose/file and then $ docker container stats --format autorange (format is not necessary but shows the predicted values). $ docker container stats is mandatory to start the collector. The collector needs to keep running to accumulate and predict values. If the $ docker container stats screen is left, the mechanism is paused and the accumulated data are not lost.

Improvements

Thing that could be improved:
- Compatibility with docker-compose, by removing the need for swarm labels.
- Avoid running $ docker container stats to start the collector, getting stats directly from the api.
- Implement a trigger to re-ignite the mechanism, using the old values as bases to further refine the predictions, in case of a change in behavior.

I'm open to any suggestion on how to refine the code/feature to better suit the docker scheme.

Cute animal

doggo

vdaviot added some commits Feb 7, 2019

autorange patch
Signed-off-by: Valentin Daviot <valentin.daviot@alterway.fr>
updated vendor.conf
Signed-off-by: Valentin Daviot <valentin.daviot@alterway.fr>

vdaviot added some commits Feb 12, 2019

code passed through gofmt, goimports, gosimple, golint, unconvert, in…
…effassign

Signed-off-by: Valentin Daviot <valentin.daviot@alterway.fr>
goimports removing used imports
Signed-off-by: Valentin Daviot <valentin.daviot@alterway.fr>
fixed conversion / deadcode
Signed-off-by: Valentin Daviot <valentin.daviot@alterway.fr>
added tests for autorange, refactored to be more readable and testable.
Signed-off-by: Valentin Daviot <valentin.daviot@alterway.fr>
@codecov

This comment has been minimized.

Copy link

commented Feb 18, 2019

Codecov Report

Merging #38706 into master will decrease coverage by 0.13%.
The diff coverage is 31.03%.

@@            Coverage Diff             @@
##           master   #38706      +/-   ##
==========================================
- Coverage   36.55%   36.42%   -0.14%     
==========================================
  Files         610      613       +3     
  Lines       45395    45869     +474     
==========================================
+ Hits        16596    16706     +110     
- Misses      26507    26865     +358     
- Partials     2292     2298       +6
added test and fixed somes potential bugs
Signed-off-by: Valentin Daviot <valentin.daviot@alterway.fr>

vdaviot added some commits Apr 3, 2019

Fixed a problem when initial value weren't given, it attributed 1 by …
…default and made the whole algorithm fail, fixed the threshold assignation too

Signed-off-by: Valentin Daviot <valentin.daviot@alterway.fr>
refactor done, need testing
Signed-off-by: Valentin Daviot <valentin.daviot@alterway.fr>
added timer to apply the limit, more testing on edge case needed.
Signed-off-by: Valentin Daviot <valentin.daviot@alterway.fr>
refactor stable version
Signed-off-by: Valentin Daviot <valentin.daviot@alterway.fr>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.