Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add linux conntrack metrics to host metrics receiver #3370

Closed

Conversation

emilgelman
Copy link

Description:
Added support for collecting conntrack metrics (linux only) using the host metrics receiver. The metrics are taken from https://github.com/shirou/gopsutil (already being used in the receiver)

@emilgelman emilgelman requested a review from a team as a code owner June 6, 2021 08:03
@emilgelman emilgelman requested a review from dmitryax June 6, 2021 08:03
@emilgelman
Copy link
Author

@mx-psi
Reopened in a new PR to fix the EasyCLA issues.
Please review

Copy link
Member

@mx-psi mx-psi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments :)

receiver/hostmetricsreceiver/README.md Outdated Show resolved Hide resolved
receiver/hostmetricsreceiver/README.md Outdated Show resolved Hide resolved
}

// newNetworkScraper creates a set of Network related metrics
func newNetworkScraper(_ context.Context, cfg *Config) (*scraper, error) {
scraper := &scraper{config: cfg, bootTime: host.BootTime, ioCounters: net.IOCounters, connections: net.Connections}
scraper := &scraper{config: cfg, bootTime: host.BootTime, ioCounters: net.IOCounters, connections: net.Connections, conntrack: net.FilterCounters}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use FilterCountersWithContext instead, so that we can pass the context in case we want to have cancellation at some point?

Not super convinced that we should do this here (IOCounters also has IOCountersWithContext and we are not using it), so maybe you can just open an issue and we can deal with it in a separate PR.

receiver/hostmetricsreceiver/metadata.yaml Outdated Show resolved Hide resolved
receiver/hostmetricsreceiver/metadata.yaml Outdated Show resolved Hide resolved
@@ -293,6 +293,22 @@ metrics:
monotonic: false
labels: [network.protocol, network.state]

system.network.conntrack.count:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not knowledgeable enough about macOS or Windows to know if there are equivalents to conntrack in these OSs. It might be a good idea to check this and make the name more general in case we want to add support for other platforms later.

A good way to get feedback on this is to open a PR on the https://github.com/open-telemetry/opentelemetry-specification repository so that other people can chime in.

@mx-psi
Copy link
Member

mx-psi commented Jun 16, 2021

Unit tests are failing with

--- FAIL: TestGatherMetrics_EndToEnd (0.17s)
    hostmetrics_receiver_test.go:177: 
        	Error Trace:	hostmetrics_receiver_test.go:177
        	            				hostmetrics_receiver_test.go:152
        	            				asm_amd64.s:1371
        	Error:      	Not equal: 
        	            	expected: 26
        	            	actual  : 24
        	Test:       	TestGatherMetrics_EndToEnd
    hostmetrics_receiver_test.go:179: 
        	Error Trace:	hostmetrics_receiver_test.go:179
        	            				hostmetrics_receiver_test.go:152
        	            				asm_amd64.s:1371
        	Error:      	map[string]struct {}{"system.cpu.load_average.15m":struct {}{}, "system.cpu.load_average.1m":struct {}{}, "system.cpu.load_average.5m":struct {}{}, "system.cpu.time":struct {}{}, "system.disk.io":struct {}{}, "system.disk.io_time":struct {}{}, "system.disk.merged":struct {}{}, "system.disk.operation_time":struct {}{}, "system.disk.operations":struct {}{}, "system.disk.pending_operations":struct {}{}, "system.disk.weighted_io_time":struct {}{}, "system.filesystem.inodes.usage":struct {}{}, "system.filesystem.usage":struct {}{}, "system.memory.usage":struct {}{}, "system.network.connections":struct {}{}, "system.network.dropped":struct {}{}, "system.network.errors":struct {}{}, "system.network.io":struct {}{}, "system.network.packets":struct {}{}, "system.paging.faults":struct {}{}, "system.paging.operations":struct {}{}, "system.paging.usage":struct {}{}, "system.processes.count":struct {}{}, "system.processes.created":struct {}{}} does not contain "system.conntrack.count"
        	Test:       	TestGatherMetrics_EndToEnd
    hostmetrics_receiver_test.go:179: 
        	Error Trace:	hostmetrics_receiver_test.go:179
        	            				hostmetrics_receiver_test.go:152
        	            				asm_amd64.s:1371
        	Error:      	map[string]struct {}{"system.cpu.load_average.15m":struct {}{}, "system.cpu.load_average.1m":struct {}{}, "system.cpu.load_average.5m":struct {}{}, "system.cpu.time":struct {}{}, "system.disk.io":struct {}{}, "system.disk.io_time":struct {}{}, "system.disk.merged":struct {}{}, "system.disk.operation_time":struct {}{}, "system.disk.operations":struct {}{}, "system.disk.pending_operations":struct {}{}, "system.disk.weighted_io_time":struct {}{}, "system.filesystem.inodes.usage":struct {}{}, "system.filesystem.usage":struct {}{}, "system.memory.usage":struct {}{}, "system.network.connections":struct {}{}, "system.network.dropped":struct {}{}, "system.network.errors":struct {}{}, "system.network.io":struct {}{}, "system.network.packets":struct {}{}, "system.paging.faults":struct {}{}, "system.paging.operations":struct {}{}, "system.paging.usage":struct {}{}, "system.processes.count":struct {}{}, "system.processes.created":struct {}{}} does not contain "system.conntrack.max"
        	Test:       	TestGatherMetrics_EndToEnd
FAIL

@emilgelman
Copy link
Author

@mx-psi First of all, thank you for the review.
I've also opened a PR in the specification repo:
open-telemetry/opentelemetry-specification#1761

Still trying to understand why the e2e test fails, running the same docker image with this branch seems to work fine.

@emilgelman
Copy link
Author

emilgelman commented Jun 16, 2021

@mx-psi Please review again.
I wasn't able to validate the new conntrack metrics in TestGatherMetrics_EndToEnd
I tried reproducing the CI flow using cimg/go:1.16, and it does work locally. However, it seems like the netfilter files are missing when executed through the CI:
https://app.circleci.com/pipelines/github/open-telemetry/opentelemetry-collector/8159/workflows/4cca0725-3807-4123-821d-17a5faa63eb9/jobs/88360

        	Error:      	Received unexpected error:
        	            	open /proc/sys/net/netfilter/nf_conntrack_count: no such file or directory

Although they exist:

➜  ~ docker run  cimg/go:1.16 ls -la  /proc/sys/net/netfilter/nf_conntrack_count

-r--r--r-- 1 root root 0 Jun 16 20:08 /proc/sys/net/netfilter/nf_conntrack_count
➜  ~ docker run  cimg/go:1.16 ls -la  /proc/sys/net/netfilter/nf_conntrack_max

-rw-r--r-- 1 root root 0 Jun 16 20:08 /proc/sys/net/netfilter/nf_conntrack_max

I believe it happens because the nf_conntrack kernel module isn't loaded on the builder host of CircleCI.

Copy link
Member

@mx-psi mx-psi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just one documentation comment! I think it's fine to not have the conntrack end to end test (but maintainers may disagree with me)

receiver/hostmetricsreceiver/README.md Outdated Show resolved Hide resolved
Copy link
Member

@mx-psi mx-psi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Now an approver/maintainer should have a look; remember that the focus is on getting to GA so please be patient; if this doesn't get attention feel free to raise this on the CNCF Slack

@github-actions
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Jun 30, 2021
@mx-psi
Copy link
Member

mx-psi commented Jun 30, 2021

Not stale but I don't have permissions to remove the label (I hope this comment is enough to mark it as active)

@github-actions
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Jul 12, 2021
@albertvaka
Copy link

Hi! I'm commenting to remove the stale label. mx-psi is on vacation but will be reviewing this when he's back :)

@github-actions github-actions bot removed the Stale label Jul 16, 2021
@mx-psi
Copy link
Member

mx-psi commented Jul 27, 2021

@emilgelman can you update the PR with the new Number types as was done in #3710?

Copy link
Contributor

@codeboten codeboten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good, I added some suggestions to change the code from IntSum to Sum as per @mx-psi's recommendation, please take a look.


func assertNetworkConntrackMetricValid(t *testing.T, metric pdata.Metric, descriptor pdata.Metric) {
internal.AssertDescriptorEqual(t, descriptor, metric)
assert.Equal(t, 1, metric.IntSum().DataPoints().Len())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert.Equal(t, 1, metric.IntSum().DataPoints().Len())
assert.Equal(t, 1, metric.Sum().DataPoints().Len())

func initializeNetworkConntrackMetric(metric pdata.Metric, metricIntf metadata.MetricIntf, now pdata.Timestamp, value int64) {
metricIntf.Init(metric)

idps := metric.IntSum().DataPoints()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
idps := metric.IntSum().DataPoints()
idps := metric.Sum().DataPoints()

Comment on lines +54 to +56
func initializeNetworkConntrackDataPoint(dataPoint pdata.IntDataPoint, now pdata.Timestamp, value int64) {
dataPoint.SetTimestamp(now)
dataPoint.SetValue(value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func initializeNetworkConntrackDataPoint(dataPoint pdata.IntDataPoint, now pdata.Timestamp, value int64) {
dataPoint.SetTimestamp(now)
dataPoint.SetValue(value)
func initializeNetworkConntrackDataPoint(dataPoint pdata.NumberDataPoint, now pdata.Timestamp, value int64) {
dataPoint.SetTimestamp(now)
dataPoint.SetIntVal(value)

description: Number of currently allocated flow entries.
unit: "{flow entries}"
data:
type: int sum
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
type: int sum
type: sum

description: Size of connection tracking table.
unit: "{flow entries}"
data:
type: int sum
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
type: int sum
type: sum

@@ -59,6 +59,8 @@ type metricStruct struct {
ProcessDiskIo MetricIntf
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will need to be regenerated after the suggestions are merged to change from IntSum to Sum

@github-actions
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@bogdandrutu
Copy link
Member

@emilgelman sorry, unfortunately this needs to be moved to contrib, since we moved the hostmetricsreceiver there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants