Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] AutoRange #38706

Open
wants to merge 12 commits into
base: master
Choose a base branch
from
Open

[FEATURE] AutoRange #38706

wants to merge 12 commits into from

Conversation

vdaviot
Copy link

@vdaviot vdaviot commented Feb 11, 2019

From engine
This pull-request is related to:

  • cli: For the docker-compose.yml parsing and $ docker container stats format
  • swarmkit: For the API update

What

AutoRange is a feature that helps to find and to apply the optimal limits for a service. It is an update for the docker collector and require swarm mode to be enabled.

Why

This collector extension was thought as a way to monitor and predict the optimal limits for a service. The goal was to find the point where a service could function properly, but still save as much resources as possible. It was written as a way to answer the question
How to optimize the number of services running on our infrastructure without losing quality of service?

How

The logic behind the feature can be described in 3 points:
- First, we collect the metrics and apply some transformations on it to generate two values.
Those values represent a "box" around the actual consumption.
- Then, we transform these values into time-series, using some of the key data collected previously to
weight our operations. The amplitude of change between values is monitored to know if it's time to stop measurements.
- Finally, we obtain refined values that we apply as limitation to the service. The data are then kept in a reduced form to limit memory usage.

Usage

The functionality is declared by adding the autorange key to the docker-compose.yml.
The mechanism is available for cpu% and memory, with or without base values.
Below is an example of both.

services:
  myservice:
    autorange:
      memory:
      cpu%:

The available keys are:
- min (in octets)
- max (in octets)
- threshold% (only for memory, represents a security margin that will be refined by the algorithm)

services:
  myservice:
    autorange:
      memory:
          min: "110000"
          max: "120000"
          threshold%: "10"
      cpu%:
          min: "60"
          max: "70"

This functionality is deployed with $ docker stack deploy --compose-file=/your/compose/file and then $ docker container stats --format autorange (format is not necessary but shows the predicted values). $ docker container stats is mandatory to start the collector. The collector needs to keep running to accumulate and predict values. If the $ docker container stats screen is left, the mechanism is paused and the accumulated data are not lost.

Improvements

Thing that could be improved:
- Compatibility with docker-compose, by removing the need for swarm labels.
- Avoid running $ docker container stats to start the collector, getting stats directly from the api.
- Implement a trigger to re-ignite the mechanism, using the old values as bases to further refine the predictions, in case of a change in behavior.

I'm open to any suggestion on how to refine the code/feature to better suit the docker scheme.

Cute animal

doggo

valentin.daviot added 2 commits February 8, 2019 17:34
Signed-off-by: Valentin Daviot <valentin.daviot@alterway.fr>
Signed-off-by: Valentin Daviot <valentin.daviot@alterway.fr>
Valentin Daviot added 4 commits February 12, 2019 17:37
…effassign

Signed-off-by: Valentin Daviot <valentin.daviot@alterway.fr>
Signed-off-by: Valentin Daviot <valentin.daviot@alterway.fr>
Signed-off-by: Valentin Daviot <valentin.daviot@alterway.fr>
Signed-off-by: Valentin Daviot <valentin.daviot@alterway.fr>
@codecov
Copy link

codecov bot commented Feb 18, 2019

Codecov Report

Merging #38706 into master will decrease coverage by 0.13%.
The diff coverage is 31.03%.

@@            Coverage Diff             @@
##           master   #38706      +/-   ##
==========================================
- Coverage   36.55%   36.42%   -0.14%     
==========================================
  Files         610      613       +3     
  Lines       45395    45869     +474     
==========================================
+ Hits        16596    16706     +110     
- Misses      26507    26865     +358     
- Partials     2292     2298       +6

Signed-off-by: Valentin Daviot <valentin.daviot@alterway.fr>
Valentin Daviot added 5 commits April 3, 2019 11:03
…default and made the whole algorithm fail, fixed the threshold assignation too

Signed-off-by: Valentin Daviot <valentin.daviot@alterway.fr>
Signed-off-by: Valentin Daviot <valentin.daviot@alterway.fr>
Signed-off-by: Valentin Daviot <valentin.daviot@alterway.fr>
Signed-off-by: Valentin Daviot <valentin.daviot@alterway.fr>
Signed-off-by: Valentin Daviot <valentin.daviot@alterway.fr>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants