accounting txt file not updated when corrupt VMs expire #49

Closed
timf opened this Issue May 12, 2011 · 1 comment

Projects

None yet

3 participants

@timf
Nimbus member

The accounting file is out of sync with the system. When corrupt VMs expire, there is no corresponding REMOVED line. Reported by John Ouellette, thankyou.

@oldpatricka
Nimbus member

After looking through John's logs, here's what I think happened here:

A VM starts:

2011-05-06 15:49:12,139 INFO  defaults.CreationManagerImpl
[ServiceThread-24,successPrint:1432] [NIMBUS-EVENT][id-12958]:

WORKSPACE INSTANCE CREATED:
    - Name: 'http://wst4'
    - Start time:                May 6, 2011 3:49:12 PM
    - Shutdown time:             May 8, 2011 3:49:12 PM
    - Resource termination time: May 8, 2011 3:51:12 PM
    - Creator: /C=CA/O=Grid/OU=hia.nrc.ca/CN=John Ouellette
    - ID: 12958, VMM: proc5-28.nope

Then at 2011-05-06 16:22:34,791, a nimbus-full-reset happens,

Then we see nimbus get a notification from the worker node about that VM,
but since the service has been reset, it doesn't know anything about it
anymore:

2011-05-06 16:22:40,258 WARN  site.NotificationPoll
[Timer-1,oneNotification:113] received workspace-control notification
about unknown id 12958

So you never see a remove entry in your log.
@labisso labisso closed this Jun 9, 2011
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment