Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with Docker Redeploys #76

Closed
lagalbra opened this issue Sep 29, 2017 · 5 comments
Closed

Issues with Docker Redeploys #76

lagalbra opened this issue Sep 29, 2017 · 5 comments

Comments

@lagalbra
Copy link

Re-raised from microsoft/OMS-Agent-for-Linux#579

Really hoping someone can point me in the right direction here, every time we redeploy our docker instances our SCX logs at (/var/opt/microsoft/scx/log/scx.log) beging to fill very rapidly with the following messages:

2017-09-29T15:02:39,523Z Error [scx.core.common.pal.system.disk.statisticallogicaldiskinstance:############] statvfs() failed for /var/lib/docker/overlay/######################/merged; errno = 2

Systemctl restart omsagent##### seems to take care of this, but we were expecting the agent to be aware when a container went away and to stop trying to stat the directory it used to be mounted to.

So far we have tried removing and reinstalling the OMS bundle. But curious if there is something else we are doing wrong here?

@kevi5702

@lagalbra
Copy link
Author

@samisms @keikhara How do you currently handle Docker instances being removed and added to the host which the OMSAgent is installed on?

@kevi5702
Copy link

kevi5702 commented Sep 29, 2017

Thanks for moving this @lagalbra I was under the impression this repo was only for instances where docker was deployed inside a container.

For clarification this is the latests OMSAgent running on a Cento 7 VM on azure. With a deploy of the OMS Agent. Things work smoothly until we redeploy our docker instances on the VM and then we start seeing the above errors which look like the Agent is trying to stat the old bind mount locations that no longer exist.

Was curious if we didn't do something correct as I know we can exclude /var/lib/docker/ from filesystem monitoring but I was expecting it to be a bit more intelligent when containers were removed. Please let me know if we didn't have something set up correct or I need to provide more information!

@kevi5702
Copy link

kevi5702 commented Oct 2, 2017

Digging around this error seems to match the code here in the PAL software:

https://github.com/Microsoft/pal/blob/master/source/code/scxsystemlib/disk/statisticallogicaldiskinstance.cpp#L269

I'm wondering if overlay needs to be added to excludes somewhere, however manually making a basic overlay mount doesn't produce the statvfs errors when its unmounted, only the warning about overlay not being recognized.

I was able to reproduce this on a new centos image with fresh OMS deploy and just running a basic hello world container.

Restarting the omid.service looks to make this go away, so not sure if something needs to be aware to update this when a container is removed?

This only seems to trigger when logical disk performance counters are enabled and only when the file system was a docker overlay FS mount.

@kevi5702
Copy link

kevi5702 commented Oct 2, 2017

Running a test based on the old removal tasks for disks I was able to clean this up w/o restarting omid.service:

[root@dockercentos log]# /opt/omi/bin/omicli iv root/scx { SCX_FileSystem } RemoveByName { Name /var/lib/docker/overlay/######/merged }
instance of SCX_FileSystem
{
    Caption=File system information
    Description=Information about a logical unit of secondary storage
    [Key] Name=/var/lib/docker/overlay/######/merged
    [Key] CSCreationClassName=SCX_ComputerSystem
    [Key] CSName=dockercentos
    [Key] CreationClassName=SCX_FileSystem
    Root=/var/lib/docker/overlay/######/merged
    BlockSize=0
    FileSystemSize=0
    AvailableSpace=0
    ReadOnly=false
    EncryptionMethod=Unknown
    CompressionMethod=Unknown
    CaseSensitive=true
    CasePreserved=true
    MaxFileNameLength=0
    FileSystemType=overlay
    PersistenceType=0
    IsOnline=false
}
instance of RemoveByName
{
    ReturnValue=true
}

@samisms
Copy link
Contributor

samisms commented Oct 3, 2017

Raised microsoft/SCXcore#89 with scx provider. Closing this here.

@samisms samisms closed this as completed Oct 3, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants