New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HostNode statistics are missing timestamps #1908
Conversation
…t call. Adding MessageHandler#on_node_stat unit test.
Adding support for Mongoid timestamps when saving container stats. Refactoring on_container_stat to use a configurable batch size
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will probably conflict with #1855 (just a note, it can be resolved after pr is merged).
@@ -188,13 +194,17 @@ def on_node_stat(grid, data) | |||
node = grid.host_nodes.find_by(node_id: data['id']) | |||
return unless node | |||
data = fixnums_to_float(data) | |||
time = data['time'] ? Time.parse(data['time']) : Time.now.utc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't actually send time
from the agent.. should we? /cc @nevalla
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, it looks like both the agent host and container stats are missing any time
field:
- https://github.com/kontena/kontena/blob/fix/hostnode_statistics_missing_timestamp/agent/lib/kontena/workers/node_info_worker.rb#L126
- https://github.com/kontena/kontena/blob/fix/hostnode_statistics_missing_timestamp/agent/lib/kontena/workers/stats_worker.rb#L109
The agent events may be queued up for long periods of time if there's e.g. websocket connection issues.
IMO either also fix the agent side as part of this PR, or file a separate issue for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, it would be more reliable if agent sends the time. Could be also a separate PR.
Had to fix several merge conflicts as a result of refactoring done in PR 1855 |
…imestamps passed in
@jakolehm @nevalla @SpComb I have a question: Since these data points are immutable facts, would it make more sense to only store the created_at timestamp in the database? It looks like mongoid provides this. Considering we will probably store a lot of these statistics, it could end up saving a decent amount of duplicated data. Proposal: |
@daniel-bytes yes, that makes sense. |
Updated PR to handle timestamp values being sent from the agent. Added unit tests on agent side. |
|
||
describe '#publish_node_stats' do | ||
it 'sends stats via rpc' do | ||
expect(rpc_client).to receive(:notification).once do |key, msg| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be able to compose rspec matchers to do this, without having to use a block:
expect(rpc_client).to receive(:notification).once.with('/nodes/stats', [Hash])
Or even better for this spec:
expect(rpc_client).to receive(:notification).once.with('/nodes/stats', [hash_including(time: String)])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, let me take a look. I was having issues with hash_including but I probably was doing it wrong.
@@ -107,23 +107,148 @@ | |||
#puts bm | |||
total_time = Time.now.to_f - start_time | |||
expect(grid.container_logs.count).to eq(1_000) | |||
expect(total_time < 0.5).to be_truthy | |||
expect(total_time).to be < 0.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 for much more useful rspec errors with this kind of expectation
end | ||
end | ||
|
||
describe '#on_container_health' do | ||
it 'updates health status' do | ||
describe '#save' do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did this group of #save
specs get duplicated during the merge?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes good call I think there is some duplication going on in this spec file. The merge got pretty nasty....
@@ -7,6 +7,11 @@ class NodeHandler | |||
|
|||
def initialize(grid) | |||
@grid = grid | |||
@db_session = ContainerLog.collection.session.with( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ContainerLog
looks a bit odd here, but I suppose it doesn't really make any difference. Maybe use HostNode
instead, though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that makes sense. Let me fix this.
Fixed duplicated unit test after bad merge.
Has this been tested to verify that the agent timestamps in the stats entries make it all the way to the server MongoDB intact? |
E2E tests on my local environment looked good. |
Fixes 1905
In MessageHandler#on_container_stat and MessageHandler#on_node_stat, Mongoid timestamp values are now manually set, by either parsing incoming data['time'] values or defaulting to Time.now.utc.
Refactored MessageHandler#on_container_stat buffering to use a configurable buffer size.
Additional unit tests added to verify timestamp fields, buffer size refactor code, and on_container_stat calls (previously on_container_stat did not have any unit tests).