-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues with Docker Redeploys #76
Comments
Thanks for moving this @lagalbra I was under the impression this repo was only for instances where docker was deployed inside a container. For clarification this is the latests OMSAgent running on a Cento 7 VM on azure. With a deploy of the OMS Agent. Things work smoothly until we redeploy our docker instances on the VM and then we start seeing the above errors which look like the Agent is trying to stat the old bind mount locations that no longer exist. Was curious if we didn't do something correct as I know we can exclude /var/lib/docker/ from filesystem monitoring but I was expecting it to be a bit more intelligent when containers were removed. Please let me know if we didn't have something set up correct or I need to provide more information! |
Digging around this error seems to match the code here in the PAL software: I'm wondering if overlay needs to be added to excludes somewhere, however manually making a basic overlay mount doesn't produce the statvfs errors when its unmounted, only the warning about overlay not being recognized. I was able to reproduce this on a new centos image with fresh OMS deploy and just running a basic hello world container. Restarting the omid.service looks to make this go away, so not sure if something needs to be aware to update this when a container is removed? This only seems to trigger when logical disk performance counters are enabled and only when the file system was a docker overlay FS mount. |
Running a test based on the old removal tasks for disks I was able to clean this up w/o restarting omid.service:
|
Raised microsoft/SCXcore#89 with scx provider. Closing this here. |
Re-raised from microsoft/OMS-Agent-for-Linux#579
Really hoping someone can point me in the right direction here, every time we redeploy our docker instances our SCX logs at (/var/opt/microsoft/scx/log/scx.log) beging to fill very rapidly with the following messages:
2017-09-29T15:02:39,523Z Error [scx.core.common.pal.system.disk.statisticallogicaldiskinstance:############] statvfs() failed for /var/lib/docker/overlay/######################/merged; errno = 2
Systemctl restart omsagent##### seems to take care of this, but we were expecting the agent to be aware when a container went away and to stop trying to stat the directory it used to be mounted to.
So far we have tried removing and reinstalling the OMS bundle. But curious if there is something else we are doing wrong here?
@kevi5702
The text was updated successfully, but these errors were encountered: