Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CoreOS not reporting disk size. #2380

Closed
NeilW opened this issue Oct 20, 2015 · 15 comments
Closed

CoreOS not reporting disk size. #2380

NeilW opened this issue Oct 20, 2015 · 15 comments
Assignees
Labels
area/agent Issues that deal with the Rancher Agent kind/bug Issues that are defects reported by users or that we know have reached a real release

Comments

@NeilW
Copy link

NeilW commented Oct 20, 2015

The GUI isn't reporting the disk size of a CoreOS server.

screen shot 2015-10-20 at 17 59 57

@deniseschannon
Copy link

Can you please provide your Rancher server version? Click on the cow icon in the upper left corner.

@NeilW
Copy link
Author

NeilW commented Oct 20, 2015

screen shot 2015-10-20 at 18 06 34

@NeilW
Copy link
Author

NeilW commented Oct 20, 2015

Stack trace on seg fault from the agent:

I1020 17:09:09.973662 05705 manager.go:127] cAdvisor running in container: "/system.slice/docker-524bc8e7624fc0304d41bb01f8737d90f8a898a613cd11ce5c11ce72e13e3777.scope"
I1020 17:09:09.974317 05705 fs.go:93] Filesystem partitions: map[/dev/vda9:{mountpoint:/ major:254 minor:9} /dev/vda3:{mountpoint:/usr major:254 minor:3} /dev/vda6:{mountpoint:/usr/share/oem major:254 minor:6}]
fatal error: unexpected signal during runtime execution
[signal 0xb code=0x1 addr=0x0 pc=0x0]

runtime stack:
runtime.gothrow(0xb9f4d0, 0x2a)
    /usr/src/go/src/runtime/panic.go:503 +0x8e
runtime.sigpanic()
    /usr/src/go/src/runtime/sigpanic_unix.go:14 +0x5e

goroutine 14 [syscall, locked to thread]:
runtime.cgocall_errno(0x401930, 0xc208018cd0, 0x0)
    /usr/src/go/src/runtime/cgocall.go:130 +0xf5 fp=0xc208018c90 sp=0xc208018c68
net._C2func_getaddrinfo(0x1f41e50, 0x0, 0xc208018dc8, 0xc208018d18, 0xc200000000, 0x0, 0x0)
    /usr/src/go/src/net/:26 +0x55 fp=0xc208018cd0 sp=0xc208018c90
net.cgoLookupIPCNAME(0xc20801e7a0, 0x18, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
    /usr/src/go/src/net/cgo_unix.go:96 +0x1c5 fp=0xc208018e00 sp=0xc208018cd0
net.cgoLookupIP(0xc20801e7a0, 0x18, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
    /usr/src/go/src/net/cgo_unix.go:148 +0x65 fp=0xc208018e58 sp=0xc208018e00
net.lookupIP(0xc20801e7a0, 0x18, 0x0, 0x0, 0x0, 0x0, 0x0)
    /usr/src/go/src/net/lookup_unix.go:64 +0x5f fp=0xc208018ea0 sp=0xc208018e58
net.func·026(0x0, 0x0, 0x0, 0x0)
    /usr/src/go/src/net/lookup.go:79 +0x55 fp=0xc208018f08 sp=0xc208018ea0
net.(*singleflight).doCall(0x1112970, 0xc20800aa80, 0xc20801e7a0, 0x18, 0xc2080d8330)
    /usr/src/go/src/net/singleflight.go:91 +0x2f fp=0xc208018fb8 sp=0xc208018f08
runtime.goexit()
    /usr/src/go/src/runtime/asm_amd64.s:2232 +0x1 fp=0xc208018fc0 sp=0xc208018fb8
created by net.(*singleflight).DoChan
    /usr/src/go/src/net/singleflight.go:84 +0x42b

goroutine 1 [select]:
net/http.(*Transport).getConn(0xc2080641b0, 0xc2080348f0, 0x0, 0xb60550, 0x4, 0xc20801e7a0, 0x1b, 0x0, 0x0, 0x0)
    /usr/src/go/src/net/http/transport.go:525 +0x608
net/http.(*Transport).RoundTrip(0xc2080641b0, 0xc2080348f0, 0xa, 0x0, 0x0)
    /usr/src/go/src/net/http/transport.go:228 +0x4d4
google.golang.org/cloud/internal.(*UATransport).RoundTrip(0xc20802b180, 0xc2080348f0, 0xc20801e700, 0x0, 0x0)
    /source/build/src/github.com/google/cadvisor/Godeps/_workspace/src/google.golang.org/cloud/internal/cloud.go:47 +0x109
net/http.send(0xc2080340d0, 0x7fd0a8cddb28, 0xc20802b180, 0x1f, 0x0, 0x0)
    /usr/src/go/src/net/http/client.go:219 +0x4fc
net/http.(*Client).send(0x1113420, 0xc2080340d0, 0x1f, 0x0, 0x0)
    /usr/src/go/src/net/http/client.go:142 +0x15b
net/http.(*Client).doFollowingRedirects(0x1113420, 0xc2080340d0, 0xc08450, 0x0, 0x0, 0x0)
    /usr/src/go/src/net/http/client.go:367 +0xb25
net/http.(*Client).Get(0x1113420, 0xb60550, 0x1f, 0x25, 0x0, 0x0)
    /usr/src/go/src/net/http/client.go:299 +0xba
github.com/GoogleCloudPlatform/gcloud-golang/compute/metadata.OnGCE(0x0)
    /source/build/src/github.com/google/cadvisor/Godeps/_workspace/src/github.com/GoogleCloudPlatform/gcloud-golang/compute/metadata/metadata.go:126 +0xa6
github.com/google/cadvisor/utils/cloudinfo.onGCE(0xe)
    /source/build/src/github.com/google/cadvisor/utils/cloudinfo/gce.go:25 +0x1f
github.com/google/cadvisor/utils/cloudinfo.detectCloudProvider(0x0, 0x0)
    /source/build/src/github.com/google/cadvisor/utils/cloudinfo/cloudinfo.go:52 +0x31
github.com/google/cadvisor/utils/cloudinfo.NewRealCloudInfo(0x0, 0x0)
    /source/build/src/github.com/google/cadvisor/utils/cloudinfo/cloudinfo.go:34 +0x31
github.com/google/cadvisor/manager.getMachineInfo(0x7fd0a8cde1d8, 0x1122fb8, 0x7fd0a8cde348, 0xc2080d8c00, 0x11135c0, 0x0, 0x0)
    /source/build/src/github.com/google/cadvisor/manager/machine.go:277 +0x82e
github.com/google/cadvisor/manager.New(0xc20806e880, 0x7fd0a8cde1d8, 0x1122fb8, 0x0, 0x0, 0x0, 0x0)
    /source/build/src/github.com/google/cadvisor/manager/manager.go:151 +0x5d0
main.main()
    /source/build/src/github.com/google/cadvisor/cadvisor.go:69 +0x454

goroutine 5 [chan receive]:
github.com/golang/glog.(*loggingT).flushDaemon(0x1113980)
    /source/build/src/github.com/google/cadvisor/Godeps/_workspace/src/github.com/golang/glog/glog.go:839 +0x78
created by github.com/golang/glog.init·1
    /source/build/src/github.com/google/cadvisor/Godeps/_workspace/src/github.com/golang/glog/glog.go:406 +0x2a7

goroutine 17 [syscall, locked to thread]:
runtime.goexit()
    /usr/src/go/src/runtime/asm_amd64.s:2232 +0x1

goroutine 11 [syscall]:
os/signal.loop()
    /usr/src/go/src/os/signal/signal_unix.go:21 +0x1f
created by os/signal.init·1
    /usr/src/go/src/os/signal/signal_unix.go:27 +0x35

goroutine 13 [select]:
net.lookupIPDeadline(0xc20801e7a0, 0x18, 0xecdb86c36, 0x2b51724b, 0x11135c0, 0x0, 0x0, 0x0, 0x0, 0x0)
    /usr/src/go/src/net/lookup.go:82 +0x6cb
net.resolveInternetAddr(0xace950, 0x3, 0xc20801e7a0, 0x1b, 0xecdb86c36, 0x2b51724b, 0x11135c0, 0x0, 0x0, 0x0, ...)
    /usr/src/go/src/net/ipsock.go:285 +0x49b
net.resolveAddr(0xabff70, 0x4, 0xace950, 0x3, 0xc20801e7a0, 0x1b, 0xecdb86c36, 0x2b51724b, 0x11135c0, 0x0, ...)
    /usr/src/go/src/net/dial.go:110 +0x378
net.(*Dialer).Dial(0xc208032bc0, 0xace950, 0x3, 0xc20801e7a0, 0x1b, 0x0, 0x0, 0x0, 0x0)
    /usr/src/go/src/net/dial.go:158 +0xf6
net.*Dialer.Dial·fm(0xace950, 0x3, 0xc20801e7a0, 0x1b, 0x0, 0x0, 0x0, 0x0)
    /usr/src/go/src/net/http/transport.go:38 +0x79
net/http.(*Transport).dial(0xc2080641b0, 0xace950, 0x3, 0xc20801e7a0, 0x1b, 0x0, 0x0, 0x0, 0x0)
    /usr/src/go/src/net/http/transport.go:479 +0x84
net/http.(*Transport).dialConn(0xc2080641b0, 0x0, 0xb60550, 0x4, 0xc20801e7a0, 0x1b, 0x0, 0x0, 0x0)
    /usr/src/go/src/net/http/transport.go:564 +0x1678
net/http.func·019()
    /usr/src/go/src/net/http/transport.go:520 +0x42
created by net/http.(*Transport).getConn
    /usr/src/go/src/net/http/transport.go:522 +0x335

@NeilW
Copy link
Author

NeilW commented Oct 20, 2015

agent is v0.8.2

@vincent99
Copy link
Contributor

That trace is cadvisor crashing... We just package into the agent. Is this repeatable? Do you have beta or stable CoreOS hosts you can try? (835.1.0 is alpha)

We may just need to update the packaged version (@cloudnautique?), though there's no obvious "fix CoreOS" in the changelog.

@NeilW
Copy link
Author

NeilW commented Oct 20, 2015

screen shot 2015-10-20 at 18 59 08

Same error in the agent.

@vincent99 vincent99 added kind/bug Issues that are defects reported by users or that we know have reached a real release area/agent Issues that deal with the Rancher Agent labels Oct 20, 2015
@deniseschannon deniseschannon added this to the Milestone 10/28/2015 milestone Oct 22, 2015
@hugomarisco
Copy link

Same thing here.

@will-chan will-chan modified the milestones: Milestone 10/28/2015, Milestone 11/4/2015 Oct 29, 2015
@davidpenn
Copy link

I am able to get the dashboard to report stats by running cadvisor in its own container

screen shot 2015-11-06 at 17 26 10

cadvisor:
  restart: on-failure:5
  labels:
    io.rancher.scheduler.global: 'true'
  tty: true
  command:
  - -listen_ip
  - 127.0.0.1
  - -port
  - '9344'
  - -docker_root
  - /var/lib/docker
  image: google/cadvisor
  volumes:
  - /:/rootfs:ro
  - /var/run:/var/run:rw
  - /sys:/sys:ro
  - /var/lib/docker/:/var/lib/docker:ro
  stdin_open: true
  net: host

however, every couple of days I have to destroy and restart the rancher-agent on every host because is writing 1GB of log data a day to /var/lib/docker/containers/uuid/uuid-json.log

docker stop rancher-agent
docker rm rancher-agent rancher-agent-state
docker start (original rancher/agent id)

@cloudnautique
Copy link
Contributor

We merged rancher/cattle#1013 which should address this. Our initial build process wasn't compiling the static binary fully. Should be in the next release.

@davidpenn
Copy link

Great!! I just pulled down https://github.com/rancher/cadvisor-package/releases/download/v0.19.3/cadvisor.tar.gz to the agent/instance and replaced the cadvisor inside the container and everything looks to be working correctly!! No more crashing and live stats are still showing in the dashboard

@ibuildthecloud
Copy link
Contributor

@davidpenn Thanks for testing, I'm glad to hear it's working.

@sangeethah
Copy link
Contributor

Tested with build from master - Nov 12

rancher-server is running on a ubuntu VM.
Added CoreOS host.
I am able to see the disk space being reported for host. But it is reported incorrectly tracked in #2661
Also able to see the container and host stats.

@clemenko
Copy link

What version of rancher/agent is this fixed in?

@cloudnautique
Copy link
Contributor

@clemenko this gets updated via rancher/server. And actually, I thought this issue was no stats what so ever.. There is a bug on the main display where it adds the device mapper pool to the disk, and reports incorrectly. Its accurate on the host detail page, because it splits the devices out.

@cloudnautique cloudnautique reopened this Nov 12, 2015
@cloudnautique
Copy link
Contributor

Bug is tracked in #2661

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/agent Issues that deal with the Rancher Agent kind/bug Issues that are defects reported by users or that we know have reached a real release
Projects
None yet
Development

No branches or pull requests

10 participants