Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics API > Host Nodes and Grids #1929

Closed
wants to merge 32 commits into from
Closed

Conversation

daniel-bytes
Copy link
Contributor

@daniel-bytes daniel-bytes commented Mar 3, 2017

API endpoints to expose metrics for nodes and grids.

@@ -0,0 +1,5 @@
class CreateHostNodeStatsCreatedAtIndex < Mongodb::Migration
def self.up
HostNodeStat.create_indexes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this for #1908?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind of... I added the created_at index to the model definition for #1908, but it wasn't necessary at that point as we weren't using it yet. I still need to verify if that index is enough for the aggregations being done in this PR, or if it needs to be modified (ie making it a compound index).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Migration number 19 is already used in #1874, so prepare to change (depending on which one got merged first)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe migration numbering should be changed to something like "201703081345_foofoo.rb"

@daniel-bytes daniel-bytes force-pushed the feature/metrics_hostnode branch 2 times, most recently from cafa75c to 96060e9 Compare March 7, 2017 15:48
@jakolehm jakolehm requested a review from nevalla March 7, 2017 16:42
@jakolehm jakolehm added this to the 1.2.0 milestone Mar 7, 2017
Copy link
Contributor

@nevalla nevalla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job. I added couple of comments regarding Ruby coding style. I would say that let's forget average calculation at this point. It would make things a little bit simpler now when we don't actually use those. My suggestion to response json is:

{
   "stats": [
       {  
         "timestamp":"2017-03-07T17:00:00.000+00:00",
         "data_points":1,
         "cpu_usage_percent":3.11,
         "memory_used_bytes":1384435712.0,
         "memory_used_percent":0.66,
         "filesystem_used_bytes":36423213056.0,
         "filesystem_total_bytes":67371577344.0,
         "filesystem_used_percent":0.54,
         "network_in_bytes_per_second":27.0,
         "network_out_bytes_per_second":16.0
      },
      ...
   ]
}

Then it would be more aligned to other responses. I would use stats as root level key since that is what user is requesting.

total / stats.size.to_f
end

return {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No explicit return needed here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know... I actually only used it because it looked so ugly to me starting the line with just the open parens

}
end

def get_network_interface()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No parentheses needed if no params

@@ -0,0 +1,5 @@
class CreateHostNodeStatsCreatedAtIndex < Mongodb::Migration
def self.up
HostNodeStat.create_indexes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Migration number 19 is already used in #1874, so prepare to change (depending on which one got merged first)


# GET /v1/nodes/:grid/:node/stats
r.on 'stats' do
default_seconds = 60 * 60
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use also 1.hour.to_i which is more human readable (ActiveSupport provides that functionality). Actually you don't even need to cast it to integer to - 1.hour should work.

end
end

averages = all.transpose.map do |stats|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understood this correctly, these would be total percentages over all CPUs?

total_system_percentage, total_user_percentage, total_idle_percentage = all.transpose.map do ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW re summing up CPU utilization %, there's two schools of thought.. This kind of averaging is like Windows, where two busy-looping processes on a four-core machine will be 50% usage. Generally, in Linux (like what top etc do), two busy-looping processes/threads on an N-core machine will show as 200% usage.

I personally prefer the Linux style... You don't need to know how many CPUs your machine has to see if something is bottlenecking on a single CPU .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha maybe too many years of Windows has corrupted my mind. :)

I find it confusing because then 25% means something very different on my 8 core laptop than it does on a 1 core micro cloud VM. Maybe we should return # of cores back to the cloud UI?

@nevalla what do you think?

# @param [Array<Vmstat::Cpu>] current_cpu_stats
# @return [Hash]
def calculate_average_cpu(prev_cpu_stats, current_cpu_stats)
all = prev_cpu_stats.zip(current_cpu_stats).map do |prev, current|
Copy link
Contributor

@SpComb SpComb Mar 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs some comments to understand :) I guess the all will be shaped like:

stats_per_cpu = [
  [cpu0_system_percentage, cpu0_user_percentage, cpu0_idle_percentage],
  ...
  [cpuN_system_percentage, cpuN_user_percentage, cpuN_idle_percentage],
]

Copy link
Contributor

@SpComb SpComb Mar 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And then the all.transposewill look like

cpus_per_stat = [
  [cpu0_system_percentage, ..., cpuN_system_percentage],
  [cpu0_user_percentage, ..., cpuN_user_percentage],
  [cpu0_idle_percentage, ..., cpuN_idle_percentage],
]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes thats correct. I actually found someone online doing something similar w/ VmStat data, that's how I came to this. Pretty cool once you wrap your head around it.

@SpComb
Copy link
Contributor

SpComb commented Mar 8, 2017

👍 for the CPU averages, the agent is the correct place to calculate those

}
end

def get_network_interface()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SpComb @nevalla : Any thoughts on this algorithm for getting the network interface to monitor? I'm just grabbing the ethernet adapter with the most outgoing traffic. Wasn't sure if I should instead get a sum of all ethernet (or non-loopback) adapters I/O.

Copy link
Contributor

@jakolehm jakolehm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NodeInfoWorker is getting too big and has too many responsibilities.. we should split it into multiple actors (maybe separate PR?)

… dependencies (#1950)

* agent server: fix ubuntu xenial to also support docker-ce | docker-ee

* fix ubuntu trusty packages to support docker-engine, docker-ce, docker-ee

* bump docker-engine dep to 1.10
@daniel-bytes daniel-bytes changed the title [WIP] Metrics API > Host Node Metrics API > Host Nodes and Grids Mar 9, 2017
@daniel-bytes
Copy link
Contributor Author

Branch updated with fixes requested from code review. Also adding stats API endpoint for grids (aggregate of node stats for all nodes on a grid). Removed WIP status.

@daniel-bytes
Copy link
Contributor Author

Closing this WIP, going to open a new PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants