Metrics API > Host Nodes and Grids #1929

daniel-bytes · 2017-03-03T18:25:10Z

API endpoints to expose metrics for nodes and grids.

SpComb · 2017-03-06T09:48:10Z

server/db/migrations/19_create_host_node_stats_created_at_index.rb

@@ -0,0 +1,5 @@
+class CreateHostNodeStatsCreatedAtIndex < Mongodb::Migration
+  def self.up
+    HostNodeStat.create_indexes


Is this for #1908?

Kind of... I added the created_at index to the model definition for #1908, but it wasn't necessary at that point as we weren't using it yet. I still need to verify if that index is enough for the aggregations being done in this PR, or if it needs to be modified (ie making it a compound index).

Migration number 19 is already used in #1874, so prepare to change (depending on which one got merged first)

maybe migration numbering should be changed to something like "201703081345_foofoo.rb"

Updating node stats API to better handle from/to parameters when only one is specified.

nevalla

Good job. I added couple of comments regarding Ruby coding style. I would say that let's forget average calculation at this point. It would make things a little bit simpler now when we don't actually use those. My suggestion to response json is:

{
   "stats": [
       {  
         "timestamp":"2017-03-07T17:00:00.000+00:00",
         "data_points":1,
         "cpu_usage_percent":3.11,
         "memory_used_bytes":1384435712.0,
         "memory_used_percent":0.66,
         "filesystem_used_bytes":36423213056.0,
         "filesystem_total_bytes":67371577344.0,
         "filesystem_used_percent":0.54,
         "network_in_bytes_per_second":27.0,
         "network_out_bytes_per_second":16.0
      },
      ...
   ]
}

Then it would be more aligned to other responses. I would use stats as root level key since that is what user is requesting.

nevalla · 2017-03-08T08:58:23Z

agent/lib/kontena/workers/node_info_worker.rb

+        total / stats.size.to_f
+      end
+
+      return {


No explicit return needed here.

I know... I actually only used it because it looked so ugly to me starting the line with just the open parens

nevalla · 2017-03-08T08:59:07Z

agent/lib/kontena/workers/node_info_worker.rb

+      }
+    end
+
+    def get_network_interface()


No parentheses needed if no params

nevalla · 2017-03-08T09:10:51Z

server/db/migrations/19_create_host_node_stats_created_at_index.rb

@@ -0,0 +1,5 @@
+class CreateHostNodeStatsCreatedAtIndex < Mongodb::Migration
+  def self.up
+    HostNodeStat.create_indexes


Migration number 19 is already used in #1874, so prepare to change (depending on which one got merged first)

nevalla · 2017-03-08T09:36:49Z

server/app/routes/v1/nodes_api.rb

+
+          # GET /v1/nodes/:grid/:node/stats
+          r.on 'stats' do
+            default_seconds = 60 * 60


You can use also 1.hour.to_i which is more human readable (ActiveSupport provides that functionality). Actually you don't even need to cast it to integer to - 1.hour should work.

SpComb · 2017-03-08T09:56:58Z

agent/lib/kontena/workers/node_info_worker.rb

+        end
+      end
+
+      averages = all.transpose.map do |stats|


If I understood this correctly, these would be total percentages over all CPUs?

total_system_percentage, total_user_percentage, total_idle_percentage = all.transpose.map do ...

BTW re summing up CPU utilization %, there's two schools of thought.. This kind of averaging is like Windows, where two busy-looping processes on a four-core machine will be 50% usage. Generally, in Linux (like what top etc do), two busy-looping processes/threads on an N-core machine will show as 200% usage.

I personally prefer the Linux style... You don't need to know how many CPUs your machine has to see if something is bottlenecking on a single CPU .

Haha maybe too many years of Windows has corrupted my mind. :)

I find it confusing because then 25% means something very different on my 8 core laptop than it does on a 1 core micro cloud VM. Maybe we should return # of cores back to the cloud UI?

@nevalla what do you think?

SpComb · 2017-03-08T10:00:08Z

agent/lib/kontena/workers/node_info_worker.rb

+    # @param [Array<Vmstat::Cpu>] current_cpu_stats
+    # @return [Hash]
+    def calculate_average_cpu(prev_cpu_stats, current_cpu_stats)
+      all = prev_cpu_stats.zip(current_cpu_stats).map do |prev, current|


I think this needs some comments to understand :) I guess the all will be shaped like:

stats_per_cpu = [ [cpu0_system_percentage, cpu0_user_percentage, cpu0_idle_percentage], ... [cpuN_system_percentage, cpuN_user_percentage, cpuN_idle_percentage], ]

And then the all.transposewill look like

cpus_per_stat = [ [cpu0_system_percentage, ..., cpuN_system_percentage], [cpu0_user_percentage, ..., cpuN_user_percentage], [cpu0_idle_percentage, ..., cpuN_idle_percentage], ]

Yes thats correct. I actually found someone online doing something similar w/ VmStat data, that's how I came to this. Pretty cool once you wrap your head around it.

SpComb · 2017-03-08T10:02:27Z

👍 for the CPU averages, the agent is the correct place to calculate those

daniel-bytes · 2017-03-08T14:08:54Z

agent/lib/kontena/workers/node_info_worker.rb

+      }
+    end
+
+    def get_network_interface()


@SpComb @nevalla : Any thoughts on this algorithm for getting the network interface to monitor? I'm just grabbing the ethernet adapter with the most outgoing traffic. Wasn't sure if I should instead get a sum of all ethernet (or non-loopback) adapters I/O.

jakolehm

NodeInfoWorker is getting too big and has too many responsibilities.. we should split it into multiple actors (maybe separate PR?)

… dependencies (#1950) * agent server: fix ubuntu xenial to also support docker-ce | docker-ee * fix ubuntu trusty packages to support docker-engine, docker-ce, docker-ee * bump docker-engine dep to 1.10

* Hide bintray credentials from the `rake` sh output by reading them from a `~/.netrc` file * Change the bintray upload to also publish the packages * Update the revoked bintray credentials * Fixed the build scripts to use `set -ue` to check that all secrets are set correctly in travis

Updating node stats API to better handle from/to parameters when only one is specified.

…/kontena into feature/metrics_hostnode

Some refactoring of node stats api classes.

daniel-bytes · 2017-03-09T22:04:06Z

Branch updated with fixes requested from code review. Also adding stats API endpoint for grids (aggregate of node stats for all nodes on a grid). Removed WIP status.

daniel-bytes · 2017-03-09T22:36:23Z

Closing this WIP, going to open a new PR

daniel-bytes self-assigned this Mar 3, 2017

daniel-bytes added agent enhancement server labels Mar 3, 2017

SpComb reviewed Mar 6, 2017

View reviewed changes

daniel-bytes force-pushed the feature/metrics_hostnode branch 2 times, most recently from cafa75c to 96060e9 Compare March 7, 2017 15:48

jakolehm requested a review from nevalla March 7, 2017 16:42

jakolehm added this to the 1.2.0 milestone Mar 7, 2017

Daniel Battaglia added 9 commits March 7, 2017 12:55

Adding code to gather host node CPU averages

96f97e5

HostNodeStats metrics aggregation class w/ unit tests.

97f0f40

Adding network stats to host node stats. Adding initial API.

11bbc50

Additional fixes for host node network statistics gathering.

ef9ec96

Adding network info to mongodb collection writes for host_node_stats.

b5aff13

Initial working version of API call

f23b2b6

Refactoring HostNodeStatsMetrics to provide metrics by node or grid.

82ad868

Updating node stats API to better handle from/to parameters when only one is specified.

Updating hostnode metrics spec to include query by grid id

9aa876f

Updating API

2058f31

daniel-bytes force-pushed the feature/metrics_hostnode branch from 96060e9 to 2058f31 Compare March 7, 2017 17:55

Daniel Battaglia added 4 commits March 7, 2017 13:03

Adding timestamps from/to back to metrics results.

5d1188e

Fixing spec issue

9dc72af

Specs added for node stats API.

ea483b8

Rolling back grid level stats.

0beb82e

nevalla suggested changes Mar 8, 2017

View reviewed changes

SpComb reviewed Mar 8, 2017

View reviewed changes

daniel-bytes commented Mar 8, 2017

View reviewed changes

Code review changes.

fc2a4ac

jakolehm reviewed Mar 9, 2017

View reviewed changes

Fix ubuntu packages to also support docker-{ce,ee}, fix docker-engine…

d1b308b

… dependencies (#1950) * agent server: fix ubuntu xenial to also support docker-ce | docker-ee * fix ubuntu trusty packages to support docker-engine, docker-ce, docker-ee * bump docker-engine dep to 1.10

SpComb and others added 17 commits March 9, 2017 14:32

Adding code to gather host node CPU averages

b43f468

HostNodeStats metrics aggregation class w/ unit tests.

e4cedc1

Adding network stats to host node stats. Adding initial API.

de6742f

Additional fixes for host node network statistics gathering.

d53812d

Adding network info to mongodb collection writes for host_node_stats.

c67007b

Initial working version of API call

4f055ff

Refactoring HostNodeStatsMetrics to provide metrics by node or grid.

83a52e3

Updating node stats API to better handle from/to parameters when only one is specified.

Updating hostnode metrics spec to include query by grid id

a865874

Updating API

c57c1bc

Adding timestamps from/to back to metrics results.

318a5ff

Fixing spec issue

a347505

Specs added for node stats API.

a26a0f5

Rolling back grid level stats.

7dc1453

Code review changes.

bead472

Merge branch 'feature/metrics_hostnode' of https://github.com/kontena…

ba6cda9

…/kontena into feature/metrics_hostnode

Grid stats api.

cbdf5ef

Some refactoring of node stats api classes.

daniel-bytes changed the title ~~[WIP] Metrics API > Host Node~~ Metrics API > Host Nodes and Grids Mar 9, 2017

daniel-bytes closed this Mar 9, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metrics API > Host Nodes and Grids #1929

Metrics API > Host Nodes and Grids #1929

daniel-bytes commented Mar 3, 2017 •

edited

SpComb Mar 6, 2017

daniel-bytes Mar 6, 2017

nevalla Mar 8, 2017

kke Mar 8, 2017

nevalla left a comment

nevalla Mar 8, 2017

daniel-bytes Mar 8, 2017

nevalla Mar 8, 2017

nevalla Mar 8, 2017

nevalla Mar 8, 2017

SpComb Mar 8, 2017

SpComb Mar 8, 2017

daniel-bytes Mar 8, 2017

SpComb Mar 8, 2017 •

edited

SpComb Mar 8, 2017 •

edited

daniel-bytes Mar 8, 2017

SpComb commented Mar 8, 2017

daniel-bytes Mar 8, 2017

jakolehm left a comment

daniel-bytes commented Mar 9, 2017

daniel-bytes commented Mar 9, 2017

Metrics API > Host Nodes and Grids #1929

Metrics API > Host Nodes and Grids #1929

Conversation

daniel-bytes commented Mar 3, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nevalla left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SpComb Mar 8, 2017 • edited

Choose a reason for hiding this comment

SpComb Mar 8, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SpComb commented Mar 8, 2017

Choose a reason for hiding this comment

jakolehm left a comment

Choose a reason for hiding this comment

daniel-bytes commented Mar 9, 2017

daniel-bytes commented Mar 9, 2017

daniel-bytes commented Mar 3, 2017 •

edited

SpComb Mar 8, 2017 •

edited

SpComb Mar 8, 2017 •

edited