Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add referenced memory metric #2495

Merged
merged 1 commit into from
Apr 22, 2020
Merged

Conversation

katarzyna-z
Copy link
Collaborator

see: https://github.com/brendangregg/wss#wsspl-referenced-page-flag

  • Introduce map of collectors in containerData
  • Introduce map of managers in Manager

Signed-off-by: Katarzyna Kujawa katarzyna.kujawa@intel.com

The working set size indicates how much memory a container needs to keep working. It's intrusive metric because collection of metric can influence kernel page reclaim policy and add latency (this is mentioned in documentation). Wss metric is disabled by default.

@katarzyna-z
Copy link
Collaborator Author

I'm working on fix for data race indicated in tests:

==================
WARNING: DATA RACE
Read at 0x00c0004d2db8 by goroutine 50:
  encoding/json.sliceEncoder.encode()
      /usr/local/go/src/reflect/value.go:1071 +0x137
  encoding/json.sliceEncoder.encode-fm()
      /usr/local/go/src/encoding/json/encode.go:760 +0x7b
  encoding/json.structEncoder.encode()
      /usr/local/go/src/encoding/json/encode.go:664 +0x40d
  encoding/json.structEncoder.encode-fm()
      /usr/local/go/src/encoding/json/encode.go:635 +0xa0
  encoding/json.ptrEncoder.encode()
      /usr/local/go/src/encoding/json/encode.go:810 +0xfc
  encoding/json.ptrEncoder.encode-fm()
      /usr/local/go/src/encoding/json/encode.go:805 +0x7b
  encoding/json.arrayEncoder.encode()
      /usr/local/go/src/encoding/json/encode.go:791 +0xe3
  encoding/json.arrayEncoder.encode-fm()
      /usr/local/go/src/encoding/json/encode.go:784 +0x7b
  encoding/json.sliceEncoder.encode()
      /usr/local/go/src/encoding/json/encode.go:765 +0xda
  encoding/json.sliceEncoder.encode-fm()
      /usr/local/go/src/encoding/json/encode.go:760 +0x7b
  encoding/json.structEncoder.encode()
      /usr/local/go/src/encoding/json/encode.go:664 +0x40d
  encoding/json.structEncoder.encode-fm()
      /usr/local/go/src/encoding/json/encode.go:635 +0xa0
  encoding/json.mapEncoder.encode()
      /usr/local/go/src/encoding/json/encode.go:706 +0x36e
  encoding/json.mapEncoder.encode-fm()
      /usr/local/go/src/encoding/json/encode.go:682 +0x7b
  encoding/json.(*encodeState).reflectValue()
      /usr/local/go/src/encoding/json/encode.go:337 +0x93
  encoding/json.(*encodeState).marshal()
      /usr/local/go/src/encoding/json/encode.go:309 +0xcc
  encoding/json.Marshal()
      /usr/local/go/src/encoding/json/encode.go:161 +0x73
  github.com/google/cadvisor/cmd/internal/api.writeResult()
      /go/src/github.com/google/cadvisor/cmd/internal/api/handler.go:126 +0x5d
  github.com/google/cadvisor/cmd/internal/api.(*version1_2).HandleRequest()
      /go/src/github.com/google/cadvisor/cmd/internal/api/versions.go:236 +0x570
  github.com/google/cadvisor/cmd/internal/api.(*version1_3).HandleRequest()
      /go/src/github.com/google/cadvisor/cmd/internal/api/versions.go:272 +0x104
  github.com/google/cadvisor/cmd/internal/api.handleRequest()
      /go/src/github.com/google/cadvisor/cmd/internal/api/handler.go:121 +0x766
  github.com/google/cadvisor/cmd/internal/api.RegisterHandlers.func1()
      /go/src/github.com/google/cadvisor/cmd/internal/api/handler.go:51 +0x75
  net/http.HandlerFunc.ServeHTTP()
      /usr/local/go/src/net/http/server.go:2007 +0x51
  net/http.(*ServeMux).ServeHTTP()
      /usr/local/go/src/net/http/server.go:2387 +0x288
  net/http.(*ServeMux).ServeHTTP()
      /usr/local/go/src/net/http/server.go:2387 +0x288
  net/http.serverHandler.ServeHTTP()
      /usr/local/go/src/net/http/server.go:2802 +0xce
  net/http.(*conn).serve()
      /usr/local/go/src/net/http/server.go:1890 +0x837

Previous write at 0x00c0004d2db8 by goroutine 61:
  github.com/google/cadvisor/perf.(*collector).UpdateStats()
      /go/src/github.com/google/cadvisor/perf/collector_libpfm.go:102 +0x95a
  github.com/google/cadvisor/manager.(*containerData).updateStats()
      /go/src/github.com/google/cadvisor/manager/container.go:658 +0x514
  github.com/google/cadvisor/manager.(*containerData).housekeepingTick()
      /go/src/github.com/google/cadvisor/manager/container.go:535 +0x23a
  github.com/google/cadvisor/manager.(*containerData).housekeeping()
      /go/src/github.com/google/cadvisor/manager/container.go:483 +0x2ee

Goroutine 50 (running) created at:
  net/http.(*Server).Serve()
      /usr/local/go/src/net/http/server.go:2928 +0x5b5
  net/http.(*Server).ListenAndServe()
      /usr/local/go/src/net/http/server.go:2825 +0x102
  main.main()
      /usr/local/go/src/net/http/server.go:3081 +0x10e9

Goroutine 61 (running) created at:
  github.com/google/cadvisor/manager.(*containerData).Start()
      /go/src/github.com/google/cadvisor/manager/container.go:108 +0x4c
  github.com/google/cadvisor/manager.(*manager).createContainerLocked()
      /go/src/github.com/google/cadvisor/manager/manager.go:1030 +0xa4e
  github.com/google/cadvisor/manager.(*manager).createContainer()
      /go/src/github.com/google/cadvisor/manager/manager.go:926 +0xbd
  github.com/google/cadvisor/manager.(*manager).watchForNewContainers.func1()
      /go/src/github.com/google/cadvisor/manager/manager.go:1182 +0x41a
==================

@katarzyna-z katarzyna-z changed the title Add working set size metric WIP: Add working set size metric Apr 16, 2020
Copy link
Collaborator

@dashpole dashpole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it is still WIP, but I have a couple high-level questions.

@@ -899,6 +899,9 @@ type ContainerStats struct {

// Statistics originating from perf events
PerfStats []PerfStat `json:"perf_stats,omitempty"`

// Working set size
Wss uint64 `json:"wss,omitempty"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we nest this under MemoryStats?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All memory stats come from memory cgroup so I think that wss does not fit to them.

cmd/cadvisor.go Outdated Show resolved Hide resolved
docs/storage/prometheus.md Outdated Show resolved Hide resolved
manager/manager.go Outdated Show resolved Hide resolved
manager/manager.go Outdated Show resolved Hide resolved
@katarzyna-z katarzyna-z force-pushed the kk-wss-metric branch 3 times, most recently from 7a7d5bb to f25f2fd Compare April 20, 2020 13:32
@katarzyna-z katarzyna-z changed the title WIP: Add working set size metric Add working set size metric Apr 20, 2020
@katarzyna-z
Copy link
Collaborator Author

@dashpole It's now ready for review.

cmd/cadvisor.go Outdated Show resolved Hide resolved
cmd/cadvisor.go Outdated
@@ -87,6 +87,7 @@ var (
container.ProcessSchedulerMetrics: struct{}{},
container.ProcessMetrics: struct{}{},
container.HugetlbUsageMetrics: struct{}{},
container.ReferencedMetric: struct{}{},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT about ReferencedMemoryMetrics instead? Just Referenced seems a little generic...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and similarly referenced_memory for the flag string

} else {
stats.Referenced, err = referencedBytesStat(pids, h.cycles, *referencedResetInterval)
if err != nil {
klog.V(4).Infof("Unable to get working set size: %v", err)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/working set size/referenced memory

return referencedKBytes * 1024, nil
}

func getReferencedBytes(pids []int) (uint64, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getReferencedKBytes?

smapsFilePath := fmt.Sprintf(smapsFilePathPattern, pid)
smapsContent, err := ioutil.ReadFile(smapsFilePath)
if err != nil {
klog.V(3).Infof("Cannot read %s file, err: %s", smapsFilePath, err)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For logging that is expected in some scenarios, lets lower to V(5)

return referencedKBytes, nil
}

func clearReferencedBytes(pids []int, cycles uint64, resetInterval uint64) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure I understand, if clearReferencedBytes errors, we will wait another resetInterval cycles before attempting to reset again, right? If so, is that better than always trying to clear after an error?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, if clearReferencedBytes return error, we will wait another resetInterval but error occurs only if there is a serious issue in system (problem with writing into existing file or problem with closing previously opened file). I don't think that is a good idea to force clearing in case of errors.
User can easily observe if referenced bytes were cleared or not, seeing value of referenced bytes.

Since your last review I've introduced an option to switch off clearing of referenced bytes (setting referenced_reset_interval to 0). It's documented here. During experiments it's more user friendly to set 0 than very long reset interval if it's desired to observe referenced bytes in longer period.

@katarzyna-z katarzyna-z force-pushed the kk-wss-metric branch 3 times, most recently from 82dc9bd to cc869de Compare April 22, 2020 11:58
Copy link
Collaborator

@dashpole dashpole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@dashpole dashpole merged commit a7d2254 into google:master Apr 22, 2020
@dashpole dashpole changed the title Add working set size metric Add referenced memory metric May 29, 2020
JensErat added a commit to mercedes-benz/cadvisor that referenced this pull request Dec 17, 2020
google#2495 introduced parsing smaps memory metrics, but only exposes referenced memory. As proposed in google#2634, also other metrics in here are of interest, specifically LazyFree; but generally also the others seem of interest.

This commit replaces the referenced memory metrics by a generic smaps series of metrics. Runtime costs in cadvisor are not really impacted, as all the smaps parsing already happens anyway. The cardinality per container extends from 1 to 19 metrics, though. I think this is acceptable: also referenced memory scraping was disabled by default, so is scraping smaps.

Resolves google#2634.

Signed-off-by: Jens Erat <jens.erat@daimler.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants