New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metrics API > Host Nodes and Grids #1929
Conversation
@@ -0,0 +1,5 @@ | |||
class CreateHostNodeStatsCreatedAtIndex < Mongodb::Migration | |||
def self.up | |||
HostNodeStat.create_indexes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this for #1908?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kind of... I added the created_at index to the model definition for #1908, but it wasn't necessary at that point as we weren't using it yet. I still need to verify if that index is enough for the aggregations being done in this PR, or if it needs to be modified (ie making it a compound index).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Migration number 19 is already used in #1874, so prepare to change (depending on which one got merged first)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe migration numbering should be changed to something like "201703081345_foofoo.rb"
cafa75c
to
96060e9
Compare
Updating node stats API to better handle from/to parameters when only one is specified.
96060e9
to
2058f31
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job. I added couple of comments regarding Ruby coding style. I would say that let's forget average calculation at this point. It would make things a little bit simpler now when we don't actually use those. My suggestion to response json is:
{
"stats": [
{
"timestamp":"2017-03-07T17:00:00.000+00:00",
"data_points":1,
"cpu_usage_percent":3.11,
"memory_used_bytes":1384435712.0,
"memory_used_percent":0.66,
"filesystem_used_bytes":36423213056.0,
"filesystem_total_bytes":67371577344.0,
"filesystem_used_percent":0.54,
"network_in_bytes_per_second":27.0,
"network_out_bytes_per_second":16.0
},
...
]
}
Then it would be more aligned to other responses. I would use stats
as root level key since that is what user is requesting.
total / stats.size.to_f | ||
end | ||
|
||
return { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No explicit return
needed here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know... I actually only used it because it looked so ugly to me starting the line with just the open parens
} | ||
end | ||
|
||
def get_network_interface() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No parentheses needed if no params
@@ -0,0 +1,5 @@ | |||
class CreateHostNodeStatsCreatedAtIndex < Mongodb::Migration | |||
def self.up | |||
HostNodeStat.create_indexes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Migration number 19 is already used in #1874, so prepare to change (depending on which one got merged first)
server/app/routes/v1/nodes_api.rb
Outdated
|
||
# GET /v1/nodes/:grid/:node/stats | ||
r.on 'stats' do | ||
default_seconds = 60 * 60 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use also 1.hour.to_i
which is more human readable (ActiveSupport provides that functionality). Actually you don't even need to cast it to integer to - 1.hour
should work.
end | ||
end | ||
|
||
averages = all.transpose.map do |stats| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understood this correctly, these would be total percentages over all CPUs?
total_system_percentage, total_user_percentage, total_idle_percentage = all.transpose.map do ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW re summing up CPU utilization %, there's two schools of thought.. This kind of averaging is like Windows, where two busy-looping processes on a four-core machine will be 50% usage. Generally, in Linux (like what top
etc do), two busy-looping processes/threads on an N-core machine will show as 200% usage.
I personally prefer the Linux style... You don't need to know how many CPUs your machine has to see if something is bottlenecking on a single CPU .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha maybe too many years of Windows has corrupted my mind. :)
I find it confusing because then 25% means something very different on my 8 core laptop than it does on a 1 core micro cloud VM. Maybe we should return # of cores back to the cloud UI?
@nevalla what do you think?
# @param [Array<Vmstat::Cpu>] current_cpu_stats | ||
# @return [Hash] | ||
def calculate_average_cpu(prev_cpu_stats, current_cpu_stats) | ||
all = prev_cpu_stats.zip(current_cpu_stats).map do |prev, current| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs some comments to understand :) I guess the all
will be shaped like:
stats_per_cpu = [
[cpu0_system_percentage, cpu0_user_percentage, cpu0_idle_percentage],
...
[cpuN_system_percentage, cpuN_user_percentage, cpuN_idle_percentage],
]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And then the all.transpose
will look like
cpus_per_stat = [
[cpu0_system_percentage, ..., cpuN_system_percentage],
[cpu0_user_percentage, ..., cpuN_user_percentage],
[cpu0_idle_percentage, ..., cpuN_idle_percentage],
]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes thats correct. I actually found someone online doing something similar w/ VmStat data, that's how I came to this. Pretty cool once you wrap your head around it.
👍 for the CPU averages, the agent is the correct place to calculate those |
} | ||
end | ||
|
||
def get_network_interface() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NodeInfoWorker
is getting too big and has too many responsibilities.. we should split it into multiple actors (maybe separate PR?)
… dependencies (#1950) * agent server: fix ubuntu xenial to also support docker-ce | docker-ee * fix ubuntu trusty packages to support docker-engine, docker-ce, docker-ee * bump docker-engine dep to 1.10
* Hide bintray credentials from the `rake` sh output by reading them from a `~/.netrc` file * Change the bintray upload to also publish the packages * Update the revoked bintray credentials * Fixed the build scripts to use `set -ue` to check that all secrets are set correctly in travis
Updating node stats API to better handle from/to parameters when only one is specified.
…/kontena into feature/metrics_hostnode
Some refactoring of node stats api classes.
Branch updated with fixes requested from code review. Also adding stats API endpoint for grids (aggregate of node stats for all nodes on a grid). Removed WIP status. |
Closing this WIP, going to open a new PR |
API endpoints to expose metrics for nodes and grids.