Negative cpu rate with measures aggregation #1044

berndbausch · 2019-08-02T14:51:06Z

Running two CPU-intensive instances on OpenStack Stein. CPU rate measures of individual instances look correct, but CPU rate measures of the aggregation of the two instances are often negative. In my mind this is impossible, as CPU is a cumulative measure, so that the rate must always be positive.
Even if not negative, rate CPU measures don't seem to be correlated to the non-rate CPU measures.
I am sure I am missing something.

Which version of Gnocchi are you using

4.3.1.dev38
Installed by Stein Devstack.

How to reproduce your problem

On a stable/stein Devstack, I configure this default archive policy in gnocchi_resources.yaml:

  - name: ceilometer-medium-rate
    aggregation_methods:
      - mean
      - rate:mean
    back_window: 0
    definition:
      - granularity: 1 minute
        timespan: 7 days
      - granularity: 1 hour
        timespan: 365 days

Ceilometer adds this policy to Gnocchi, as expected:

$ gnocchi archive-policy list
| ceilometer-medium-rate |           0 | - points: 10080, granularity: 0:01:00, timespan: 7 days, 0:00:00      | rate:mean, mean                 |

and gnocchi metric list confirms that all metrics use the ceilometer-medium-rate policy.

I run two CPU-intensive instances:

openstack server create --property metering.server_group=myapp ... cpu-user1
openstack server create --property metering.server_group=myapp ... cpu-user2

What is the result that you get

$ gnocchi measures aggregation --query server_group=myapp --resource-type instance --aggregation mean --metric cpu
+---------------------------+-------------+---------------+
| timestamp                 | granularity |         value |
+---------------------------+-------------+---------------+
| 2019-08-02T15:13:00+09:00 |        60.0 | 19995000000.0 |
| 2019-08-02T15:14:00+09:00 |        60.0 | 46495000000.0 |
| 2019-08-02T15:15:00+09:00 |        60.0 | 62710000000.0 |
| 2019-08-02T15:16:00+09:00 |        60.0 | 87570000000.0 |
| 2019-08-02T15:17:00+09:00 |        60.0 |   1.16445e+11 |
| 2019-08-02T15:18:00+09:00 |        60.0 |   1.40075e+11 |
| 2019-08-02T15:19:00+09:00 |        60.0 |    1.4856e+11 |

$ gnocchi measures aggregation --query server_group=myapp --resource-type instance --aggregation rate:mean --metric cpu
+---------------------------+-------------+----------------+
| timestamp                 | granularity |          value |
+---------------------------+-------------+----------------+
| 2019-08-02T15:15:00+09:00 |        60.0 | -10285000000.0 |
| 2019-08-02T15:16:00+09:00 |        60.0 |   8645000000.0 |
| 2019-08-02T15:17:00+09:00 |        60.0 |   4015000000.0 |
| 2019-08-02T15:18:00+09:00 |        60.0 |  -5245000000.0 |
| 2019-08-02T15:19:00+09:00 |        60.0 | -15145000000.0 |

What is result that you expected

Positive rate values. Also, the difference between the 15:16 and 15:15 CPU measure is 24860000000, but the rate is 8645000000.
Perhaps I misunderstand the meaning of "rate" in this case. What I want is the CPU utilization of an instance, no matter if measured in percent or in nanoseconds.

The text was updated successfully, but these errors were encountered:

chungg · 2019-08-02T20:45:28Z

gnocchi measures aggregation is actually the deprecated command i believe. there is a gnocchi aggregates command which has a DSL of sorts and allows you to specify more complex queries.

that said, what is happening i believe is because you don't specify the --reaggregation field, it ends up computing the rate of rates, or the acceleration of timeseries. if you add --reaggregation mean it should give you what you expect. if you use the use the aggregates command you can divide by the frequency to get a percentage.

berndbausch · 2019-08-03T01:05:28Z

Many thanks Gord. Wow your answer was fast. Yes, by adding --reaggregation mean to my command, I get the expected figures. So for now my problem is solved, though I don't pretend to understand why reaggregation is needed, what reaggregation is in the first place, and what I am doing there. I have to study a bit more.

Just in case you are interested in helping a total newcomer to the world of measurement and statistics:
Is there a "time series for dummies" book somewhere, or anything where the terms "aggregation" and "reaggregation" are defined for dummies? An intro how to use Gnocchi in the context of OpenStack?

Also, it would be great if the Gnocchi client documentation mentioned which commands are deprecated. I would not mind helping out with this kind of work, if I knew what precisely is in fact deprecated:)

Thanks again, you removed a roadblock.

berndbausch · 2019-08-03T02:05:35Z

I closed this issue because I have a solution, but here is an improvement.

This non-deprecated command provides the same result as the deprecated gnocchi measures aggregation:

gnocchi aggregates --resource-type instance   \
                   "(aggregate rate:mean (metric cpu mean))"    \
                   "server_group=myapp"

This is great, since I can now apply simple arithmetic to turn the nanosecond results into percentages:

gnocchi aggregates --resource-type instance  \
                   "(* ( / (aggregate rate:mean (metric cpu mean)) 60000000000.0) 100)"  \
                   "server_group=myapp"

chungg · 2019-08-03T16:16:47Z

i would welcome changes to the docs so feel free to contribute to https://github.com/gnocchixyz/gnocchi/tree/master/doc/source or https://github.com/gnocchixyz/python-gnocchiclient/tree/master/doc/source. unfortunately, it seems the publishing of the docs are not working so even the docs online do not reflect what is in repository :(

yes, i can see how it is ambiguous, it is because gnocchi stores aggregates as its base and not raw datapoints. so in your archive policy you have rate:mean and mean at granularity: 0:01:00 which means gnocchi is actually storing two timeseries, one for each. so when you make your query, you first need to specify the aggregate you told gnocchi to store and then because you are dynamically aggregating on a metric across multiple resources, that is the reaggregation

so in the example:

gnocchi aggregates --resource-type instance   \
                   "(aggregate rate:mean (metric cpu mean))"    \
                   "server_group=myapp

you are selecting all the stored cpu mean metrics for instances with server_group=myapp (which returns many series) and then you're telling gnocchi to aggregate those series into one by computing the mean rate across them.

(aggregate mean (metric cpu rate:mean)) would also work for you (and would be more accurate). it gets the rate:mean metrics and computes the mean across them to return one timeseries.

as last example, (aggregate mean (metric cpu max)) would not work for you, because cpu max is not a aggregate stored according to your policy. alternatively, (aggregate max (metric cpu mean)) will work and will return the max of all your cpu mean metrics

berndbausch · 2019-08-04T01:22:31Z

Whenever I leave a comment here I learn something new. Another lightbulb moment. Thanks Gord!

giorgiove · 2020-01-29T14:51:35Z

Hi, just a quick one if I may.
How do you create an aodh alarm that tracks CPU util based on the
gnocchi aggregates --resource-type instance
"(* ( / (aggregate rate:mean (metric cpu mean)) 60000000000.0) 100)"
"server_group=myapp"
output.
I'm sure I'm missing something but the documentation is rather concise .. so to speak
Thanks

chungg · 2020-01-29T19:35:34Z

disclaimer: i don't remember much about aodh and didn't know much to begin with but you may want to look at gnocchi_aggregation_by_resources_threshold alarm type.

that said, you'll probably get better feedback from openstack community... if not, that's probably a good sign to find an alternative to aodh.

ohryhorov · 2020-04-28T05:44:12Z

Hi, just a quick one if I may.
How do you create an aodh alarm that tracks CPU util based on the
gnocchi aggregates --resource-type instance
"(* ( / (aggregate rate:mean (metric cpu mean)) 60000000000.0) 100)"
"server_group=myapp"
output.
I'm sure I'm missing something but the documentation is rather concise .. so to speak
Thanks

Hello,
Have you managed to define aodh alarm based on calculated metric?
Of course it could be done to define threshold in nanoseconds for metric cpu but it definitely not convenient.

zhenjiangma · 2020-11-25T07:41:32Z

Hey, I just meet a problem.
I want to get cpu_util according to the statements above, but my command doesn't work well.
When I run "openstack metric aggregates '(metric cpu rate:mean)' id=e319d4e6-67fb-4398-be0f-3c6790b50eec" , it works well.
However, when I run " openstack metric aggregates '(* (/ (aggregate rate:mean (metric cpu mean)) 30000000000) 100)' id=e319d4e6-67fb-4398-be0f-3c6790b50eec" , it told me "Invalid input: '*' operation invalid for dictionary value @ data[u'operations'] (HTTP 400)", so I don't know why. Could you help me?

unlenen · 2021-01-26T11:27:47Z

Hi ,
I have the same problem . As @berndbausch explained , I can compute the cpu usage of a instance , but I could not find a way to insert this aggregation to aodh. I need a alarm when instance cpu is higher that %90 for auto scaling. Could you help me about this?

My Test:

Server : 8 core , 8G Ram
Test route : stress-ng --cpu 8 --cpu-load 100
Granularity : 300
Calculation : gnocchi aggregates '(* (/ (aggregate rate:mean (metric cpu mean)) granularity*1000000000) 100)' id=<server_id>
Response : 800 that means every cpu run at %100

Thanks @berndbausch , helps me to understand what gnocchi is 🥇

zhenjiangma · 2021-01-26T11:35:06Z

font{ line-height: 1.6; } ul,ol{ padding-left: 20px; list-style-position: inside; } Sorry, I don't use aodh, I don't know how to make it. Maybe you can look up the aodh docs for it. On 1/26/2021 19:28，unlenen<notifications@github.com> wrote： Hi , I have the same problem . As @berndbausch explained , I can compute the cpu usage of a instance , but I could not find a way to insert this aggregation to aodh. I need a alarm when instance cpu is higher that %90 for auto scaling. Could you help me about this? My Test: Server : 8 core , 8G Ram Test route : stress-ng --cpu 8 --cpu-load 100 Granularity : 300 Calculation : gnocchi aggregates '(* (/ (aggregate rate:mean (metric cpu mean)) granularity*1000000000) 100)' id=<server_id> Response : 800 that means every cpu run at %100 Thanks @berndbausch , helps me to understand what gnocchi is 🥇 —You are receiving this because you commented.Reply to this email directly, view it on GitHub, or unsubscribe.

unlenen · 2021-01-26T11:40:31Z

So it it possible to create a measure or metric from a aggregates , so we can extend usage

GizemElove · 2021-01-26T13:21:58Z

Hi, I have similar issue as well.
"gnocchi aggregates '(* (/ (aggregate rate:mean (metric cpu mean)) 300000000000) 100)' id=..." I tried the below query to retrieve CPU utilization and it worked. But, I'm using Openstack Tacker and I need to trigger automatic scaling of my vnf group when my CPU utilization is greater then 80%. I tried to create an alarm with aodh but i can not use this query like this.
Any help on this would be helpful for me.

paramite · 2021-06-30T13:53:54Z

Unfortunately looking at the Aodh code [1], there is no gnocchi based alarm evaluator, that would call self._gnocchi_client.aggregates ([2][3]). There is only alarm types which call self._gnocchi_client.metric.aggregation, which does not support "operations". We would need to implement new Aodh alarm type for aggregates.

[1] https://github.com/openstack/aodh/blob/stable/train/aodh/evaluator/gnocchi.py
[2] https://github.com/gnocchixyz/python-gnocchiclient/blob/master/gnocchiclient/v1/aggregates.py
[3] https://github.com/gnocchixyz/python-gnocchiclient/blob/master/gnocchiclient/v1/aggregates_cli.py#L49

manuvakery1 · 2022-01-28T09:35:56Z

Hi , I have the same problem . As @berndbausch explained , I can compute the cpu usage of a instance , but I could not find a way to insert this aggregation to aodh. I need a alarm when instance cpu is higher that %90 for auto scaling. Could you help me about this?

My Test:

Server : 8 core , 8G Ram

Test route : stress-ng --cpu 8 --cpu-load 100

Granularity : 300

Calculation : gnocchi aggregates '(* (/ (aggregate rate:mean (metric cpu mean)) granularity*1000000000) 100)' id=<server_id>

Response : 800 that means every cpu run at %100

Thanks @berndbausch , helps me to understand what gnocchi is 1st_place_medal

@unlenen have you managed to do this?

unlenen · 2022-01-28T14:08:14Z

Check this code path in aodh , you need to restart aodh-evaluator after the code changes

https://review.opendev.org/c/openstack/aodh/+/786880

Edit : You may need to upgrade your gnocchi-client where you install the aodh . Be use that gnocchi-client ver . must be bigger than 7.0.6 . I also want to metion that this code can only helps where your ceilometer notification inverval is same with heat template interval

manuvakery1 · 2022-02-14T12:15:26Z

Check this code path in aodh , you need to restart aodh-evaluator after the code changes

https://review.opendev.org/c/openstack/aodh/+/786880

Edit : You may need to upgrade your gnocchi-client where you install the aodh . Be use that gnocchi-client ver . must be bigger than 7.0.6 . I also want to metion that this code can only helps where your ceilometer notification inverval is same with heat template interval

@unlenen ok .. thanks .. I will try this

tobias-urdin · 2022-03-12T12:38:46Z

Aodh will get Dynamic Aggregates API support with [1] with a small issue in Gnocchi [2] (hopefully fixed soon).

[1] https://review.opendev.org/c/openstack/aodh/+/829870
[2] #1202

manuvakery1 · 2022-03-24T07:03:01Z

@tobias-urdin can you please provide a sample alarm using the dynamic aggregate?

berndbausch closed this as completed Aug 3, 2019

ghost mentioned this issue Nov 2, 2020

gnocchi CPU time is supposed accumulated nano time, but when VM stopped, the accumulation seems starting from 0 again, which end up cpu_util appear native by '(* ( / (aggregate rate:mean (metric cpu mean)) 60000000000.0) 100)' #1079

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Negative cpu rate with measures aggregation #1044

Negative cpu rate with measures aggregation #1044

berndbausch commented Aug 2, 2019

chungg commented Aug 2, 2019

berndbausch commented Aug 3, 2019 •

edited

Loading

berndbausch commented Aug 3, 2019 •

edited

Loading

chungg commented Aug 3, 2019

berndbausch commented Aug 4, 2019

giorgiove commented Jan 29, 2020

chungg commented Jan 29, 2020

ohryhorov commented Apr 28, 2020 •

edited

Loading

zhenjiangma commented Nov 25, 2020 •

edited

Loading

unlenen commented Jan 26, 2021

zhenjiangma commented Jan 26, 2021 via email

unlenen commented Jan 26, 2021

GizemElove commented Jan 26, 2021

paramite commented Jun 30, 2021 •

edited

Loading

manuvakery1 commented Jan 28, 2022

unlenen commented Jan 28, 2022 •

edited

Loading

manuvakery1 commented Feb 14, 2022

tobias-urdin commented Mar 12, 2022

manuvakery1 commented Mar 24, 2022

Negative cpu rate with measures aggregation #1044

Negative cpu rate with measures aggregation #1044

Comments

berndbausch commented Aug 2, 2019

Which version of Gnocchi are you using

How to reproduce your problem

What is the result that you get

What is result that you expected

chungg commented Aug 2, 2019

berndbausch commented Aug 3, 2019 • edited Loading

berndbausch commented Aug 3, 2019 • edited Loading

chungg commented Aug 3, 2019

berndbausch commented Aug 4, 2019

giorgiove commented Jan 29, 2020

chungg commented Jan 29, 2020

ohryhorov commented Apr 28, 2020 • edited Loading

zhenjiangma commented Nov 25, 2020 • edited Loading

unlenen commented Jan 26, 2021

zhenjiangma commented Jan 26, 2021 via email

unlenen commented Jan 26, 2021

GizemElove commented Jan 26, 2021

paramite commented Jun 30, 2021 • edited Loading

manuvakery1 commented Jan 28, 2022

unlenen commented Jan 28, 2022 • edited Loading

manuvakery1 commented Feb 14, 2022

tobias-urdin commented Mar 12, 2022

manuvakery1 commented Mar 24, 2022

berndbausch commented Aug 3, 2019 •

edited

Loading

berndbausch commented Aug 3, 2019 •

edited

Loading

ohryhorov commented Apr 28, 2020 •

edited

Loading

zhenjiangma commented Nov 25, 2020 •

edited

Loading

paramite commented Jun 30, 2021 •

edited

Loading

unlenen commented Jan 28, 2022 •

edited

Loading