Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeouts (Minion did not return) since 2014.1.7 #14343

Closed
Whissi opened this issue Jul 18, 2014 · 56 comments
Closed

Timeouts (Minion did not return) since 2014.1.7 #14343

Whissi opened this issue Jul 18, 2014 · 56 comments
Labels
Bug broken, incorrect, or confusing behavior cannot-reproduce cannot be replicated with info/context provided Core relates to code central or existential to Salt P4 Priority 4 severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around
Milestone

Comments

@Whissi
Copy link
Contributor

Whissi commented Jul 18, 2014

Hi,

when there was no Salt activity for ~30-60 minutes, the next Salt command (for example salt '*' --show-timeout cmd.run 'uname -a' will timeout for most minions.

I need to run this command 2-3x times... after 1-2 minutes all minions are back and now I can run my desired command without any timeouts.

This didn't happen with 2014.1.5.

# salt --versions-report
           Salt: 2014.1.7
         Python: 2.7.7 (default, Jul  6 2014, 23:14:10)
         Jinja2: 2.7.3
       M2Crypto: 0.22
 msgpack-python: 0.4.2
   msgpack-pure: Not Installed
       pycrypto: 2.6.1
         PyYAML: 3.11
          PyZMQ: 14.3.1
            ZMQ: 4.0.4
@basepi
Copy link
Contributor

basepi commented Jul 18, 2014

Very strange! Thanks for the report, we'll investigate this.

@basepi basepi added this to the Approved milestone Jul 18, 2014
@basepi
Copy link
Contributor

basepi commented Jul 18, 2014

When you run the commands, does the job show up in the minion log? Wondering if it's a loss of connection, or just a problem with the return data, or what...

@sigwinch28
Copy link

Experiencing the same issue. CentOS 7.

# salt --versions-report
           Salt: 2014.1.7
         Python: 2.7.5 (default, Jun 17 2014, 18:11:42)
         Jinja2: 2.7.2
       M2Crypto: 0.21.1
 msgpack-python: 0.1.13
   msgpack-pure: Not Installed
       pycrypto: 2.6.1
         PyYAML: 3.10
          PyZMQ: 14.3.1
            ZMQ: 3.2.4

No indications in minion logs apart from:

SaltReqTimeoutError: Waited 60 seconds

@basepi
Copy link
Contributor

basepi commented Jul 21, 2014

@JoeHazzers Did you also just start seeing it on 2014.1.7, or have you seen it previously?

@Whissi
Copy link
Contributor Author

Whissi commented Jul 21, 2014

It is not easy to reproduce the problem. Currently it is gone:

After posting this issue I enabled debug mode for master and 2 minions.

Also, we rebooted every server in our network over the weekend (kernel upgrade).

Since we rebooted every system we haven't seen this problem anymore. So it looks like a missing reboot or something like that. Or maybe running in debug mode doesn't trigger the bug?

@basepi
Copy link
Contributor

basepi commented Jul 21, 2014

I doubt debug log level affects this bug, but who knows? Keep us posted if you find a way to reproduce this issue, or if you get those logs that I mentioned.

@sigwinch28
Copy link

@basepi I have only seen this issue on 2014.1.7
However, like @Whissi after leaving it for a day or two and a system reboot, it now works fine. I can no longer reproduce the issue.

@basepi
Copy link
Contributor

basepi commented Jul 21, 2014

Very strange. I'll keep this open for now, keep us posted!

@Whissi
Copy link
Contributor Author

Whissi commented Jul 21, 2014

Now it happened again. No software was updated so I am 100% sure that there was no pending reboot.

But as said, debug log level was turned off. So nothing I can show you. I'll turn on logging again, maybe this will only happen when the master was running for some hours...

@basepi
Copy link
Contributor

basepi commented Jul 21, 2014

Curses. =) It happens. If you don't mind keeping debug logging on for a time, that would be helpful.

@Whissi
Copy link
Contributor Author

Whissi commented Jul 22, 2014

OK, here we go.I was able to reproduce the issue in a reduced test network:

Seems to be an authentication problem (Failed to authenticate message) which will happen sometimes.

# salt '*' --show-timeout cmd.run 'date'
srv5.example.intern:
    Tue Jul 22 16:39:17 CEST 2014
srv4.example.intern:
    Tue Jul 22 16:39:17 CEST 2014
srv2.example.intern:
    Tue Jul 22 16:39:17 CEST 2014
srv7.example.intern:
    Tue Jul 22 16:39:17 CEST 2014
srv6.example.intern:
    Tue Jul 22 16:39:17 CEST 2014
srv3.example.intern:
    Minion did not return
srv8.example.intern:
    Minion did not return

master debug log:

2014-07-22 16:15:52,608 [salt.fileserver                             ][DEBUG   ] Updating fileserver cache
2014-07-22 16:15:52,611 [salt.fileserver                             ][DEBUG   ] diff_mtime_map: the maps are the same
2014-07-22 16:15:52,611 [salt.utils.verify                           ][DEBUG   ] This salt-master instance has accepted 7 minion keys.
2014-07-22 16:16:52,677 [salt.fileserver                             ][DEBUG   ] Updating fileserver cache
2014-07-22 16:16:52,680 [salt.fileserver                             ][DEBUG   ] diff_mtime_map: the maps are the same
2014-07-22 16:16:52,680 [salt.utils.verify                           ][DEBUG   ] This salt-master instance has accepted 7 minion keys.
2014-07-22 16:17:52,746 [salt.fileserver                             ][DEBUG   ] Updating fileserver cache
2014-07-22 16:17:52,749 [salt.fileserver                             ][DEBUG   ] diff_mtime_map: the maps are the same
2014-07-22 16:17:52,749 [salt.utils.verify                           ][DEBUG   ] This salt-master instance has accepted 7 minion keys.
2014-07-22 16:18:52,815 [salt.fileserver                             ][DEBUG   ] Updating fileserver cache
2014-07-22 16:18:52,817 [salt.fileserver                             ][DEBUG   ] diff_mtime_map: the maps are the same
2014-07-22 16:18:52,818 [salt.utils.verify                           ][DEBUG   ] This salt-master instance has accepted 7 minion keys.
2014-07-22 16:19:52,883 [salt.fileserver                             ][DEBUG   ] Updating fileserver cache
2014-07-22 16:19:52,885 [salt.fileserver                             ][DEBUG   ] diff_mtime_map: the maps are the same
2014-07-22 16:19:52,886 [salt.utils.verify                           ][DEBUG   ] This salt-master instance has accepted 7 minion keys.
2014-07-22 16:20:05,690 [salt.master                                 ][INFO    ] Clear payload received with command _auth
2014-07-22 16:20:05,690 [salt.master                                 ][INFO    ] Authentication request from srv4.example.intern
2014-07-22 16:20:05,690 [salt.master                                 ][INFO    ] Authentication accepted from srv4.example.intern
2014-07-22 16:20:05,907 [salt.master                                 ][INFO    ] Clear payload received with command _auth
2014-07-22 16:20:05,907 [salt.master                                 ][INFO    ] Authentication request from srv4.example.intern
2014-07-22 16:20:05,907 [salt.master                                 ][INFO    ] Authentication accepted from srv4.example.intern
2014-07-22 16:20:06,096 [salt.master                                 ][INFO    ] AES payload received with command _mine
2014-07-22 16:20:08,293 [salt.master                                 ][INFO    ] Clear payload received with command _auth
2014-07-22 16:20:08,293 [salt.master                                 ][INFO    ] Authentication request from srv7.example.intern
2014-07-22 16:20:08,293 [salt.master                                 ][INFO    ] Authentication accepted from srv7.example.intern
2014-07-22 16:20:08,617 [salt.master                                 ][INFO    ] Clear payload received with command _auth
2014-07-22 16:20:08,617 [salt.master                                 ][INFO    ] Authentication request from srv7.example.intern
2014-07-22 16:20:08,618 [salt.master                                 ][INFO    ] Authentication accepted from srv7.example.intern
2014-07-22 16:20:08,877 [salt.master                                 ][INFO    ] AES payload received with command _mine
2014-07-22 16:20:52,944 [salt.fileserver                             ][DEBUG   ] Updating fileserver cache
2014-07-22 16:20:52,947 [salt.fileserver                             ][DEBUG   ] diff_mtime_map: the maps are the same
2014-07-22 16:20:52,947 [salt.utils.verify                           ][DEBUG   ] This salt-master instance has accepted 7 minion keys.
2014-07-22 16:21:53,004 [salt.fileserver                             ][DEBUG   ] Updating fileserver cache
2014-07-22 16:21:53,007 [salt.fileserver                             ][DEBUG   ] diff_mtime_map: the maps are the same
2014-07-22 16:21:53,007 [salt.utils.verify                           ][DEBUG   ] This salt-master instance has accepted 7 minion keys.
2014-07-22 16:22:53,073 [salt.fileserver                             ][DEBUG   ] Updating fileserver cache
2014-07-22 16:22:53,075 [salt.fileserver                             ][DEBUG   ] diff_mtime_map: the maps are the same
2014-07-22 16:22:53,076 [salt.utils.verify                           ][DEBUG   ] This salt-master instance has accepted 7 minion keys.
2014-07-22 16:23:53,141 [salt.fileserver                             ][DEBUG   ] Updating fileserver cache
2014-07-22 16:23:53,144 [salt.fileserver                             ][DEBUG   ] diff_mtime_map: the maps are the same
2014-07-22 16:23:53,144 [salt.utils.verify                           ][DEBUG   ] This salt-master instance has accepted 7 minion keys.
2014-07-22 16:24:53,209 [salt.fileserver                             ][DEBUG   ] Updating fileserver cache
2014-07-22 16:24:53,212 [salt.fileserver                             ][DEBUG   ] diff_mtime_map: the maps are the same
2014-07-22 16:24:53,213 [salt.utils.verify                           ][DEBUG   ] This salt-master instance has accepted 7 minion keys.
2014-07-22 16:25:53,278 [salt.fileserver                             ][DEBUG   ] Updating fileserver cache
2014-07-22 16:25:53,281 [salt.fileserver                             ][DEBUG   ] diff_mtime_map: the maps are the same
2014-07-22 16:25:53,281 [salt.utils.verify                           ][DEBUG   ] This salt-master instance has accepted 7 minion keys.
2014-07-22 16:26:53,347 [salt.fileserver                             ][DEBUG   ] Updating fileserver cache
2014-07-22 16:26:53,350 [salt.fileserver                             ][DEBUG   ] diff_mtime_map: the maps are the same
2014-07-22 16:26:53,350 [salt.utils.verify                           ][DEBUG   ] This salt-master instance has accepted 7 minion keys.
2014-07-22 16:27:53,415 [salt.fileserver                             ][DEBUG   ] Updating fileserver cache
2014-07-22 16:27:53,418 [salt.fileserver                             ][DEBUG   ] diff_mtime_map: the maps are the same
2014-07-22 16:27:53,418 [salt.utils.verify                           ][DEBUG   ] This salt-master instance has accepted 7 minion keys.
2014-07-22 16:28:53,484 [salt.fileserver                             ][DEBUG   ] Updating fileserver cache
2014-07-22 16:28:53,487 [salt.fileserver                             ][DEBUG   ] diff_mtime_map: the maps are the same
2014-07-22 16:28:53,487 [salt.utils.verify                           ][DEBUG   ] This salt-master instance has accepted 7 minion keys.
2014-07-22 16:29:53,553 [salt.fileserver                             ][DEBUG   ] Updating fileserver cache
2014-07-22 16:29:53,556 [salt.fileserver                             ][DEBUG   ] diff_mtime_map: the maps are the same
2014-07-22 16:29:53,556 [salt.utils.verify                           ][DEBUG   ] This salt-master instance has accepted 7 minion keys.
2014-07-22 16:30:08,595 [salt.master                                 ][INFO    ] Clear payload received with command _auth
2014-07-22 16:30:08,595 [salt.master                                 ][INFO    ] Authentication request from srv6.example.intern
2014-07-22 16:30:08,596 [salt.master                                 ][INFO    ] Authentication accepted from srv6.example.intern
2014-07-22 16:30:08,812 [salt.master                                 ][INFO    ] Clear payload received with command _auth
2014-07-22 16:30:08,812 [salt.master                                 ][INFO    ] Authentication request from srv6.example.intern
2014-07-22 16:30:08,813 [salt.master                                 ][INFO    ] Authentication accepted from srv6.example.intern
2014-07-22 16:30:09,001 [salt.master                                 ][INFO    ] AES payload received with command _mine
2014-07-22 16:30:44,313 [salt.master                                 ][INFO    ] Clear payload received with command _auth
2014-07-22 16:30:44,313 [salt.master                                 ][INFO    ] Authentication request from srv5.example.intern
2014-07-22 16:30:44,313 [salt.master                                 ][INFO    ] Authentication accepted from srv5.example.intern
2014-07-22 16:30:44,529 [salt.master                                 ][INFO    ] Clear payload received with command _auth
2014-07-22 16:30:44,529 [salt.master                                 ][INFO    ] Authentication request from srv5.example.intern
2014-07-22 16:30:44,529 [salt.master                                 ][INFO    ] Authentication accepted from srv5.example.intern
2014-07-22 16:30:44,718 [salt.master                                 ][INFO    ] AES payload received with command _mine
2014-07-22 16:30:53,622 [salt.fileserver                             ][DEBUG   ] Updating fileserver cache
2014-07-22 16:30:53,625 [salt.fileserver                             ][DEBUG   ] diff_mtime_map: the maps are the same
2014-07-22 16:30:53,625 [salt.utils.verify                           ][DEBUG   ] This salt-master instance has accepted 7 minion keys.
2014-07-22 16:31:48,021 [salt.master                                 ][INFO    ] Clear payload received with command _auth
2014-07-22 16:31:48,021 [salt.master                                 ][INFO    ] Authentication request from srv3.example.intern
2014-07-22 16:31:48,022 [salt.master                                 ][INFO    ] Authentication accepted from srv3.example.intern
2014-07-22 16:31:48,333 [salt.master                                 ][INFO    ] Clear payload received with command _auth
2014-07-22 16:31:48,333 [salt.master                                 ][INFO    ] Authentication request from srv3.example.intern
2014-07-22 16:31:48,334 [salt.master                                 ][INFO    ] Authentication accepted from srv3.example.intern
2014-07-22 16:31:48,586 [salt.master                                 ][INFO    ] AES payload received with command _mine
2014-07-22 16:31:53,690 [salt.fileserver                             ][DEBUG   ] Updating fileserver cache
2014-07-22 16:31:53,693 [salt.fileserver                             ][DEBUG   ] diff_mtime_map: the maps are the same
2014-07-22 16:31:53,694 [salt.utils.verify                           ][DEBUG   ] This salt-master instance has accepted 7 minion keys.
2014-07-22 16:32:53,759 [salt.fileserver                             ][DEBUG   ] Updating fileserver cache
2014-07-22 16:32:53,762 [salt.fileserver                             ][DEBUG   ] diff_mtime_map: the maps are the same
2014-07-22 16:32:53,762 [salt.utils.verify                           ][DEBUG   ] This salt-master instance has accepted 7 minion keys.
2014-07-22 16:33:53,828 [salt.fileserver                             ][DEBUG   ] Updating fileserver cache
2014-07-22 16:33:53,830 [salt.fileserver                             ][DEBUG   ] diff_mtime_map: the maps are the same
2014-07-22 16:33:53,831 [salt.utils.verify                           ][DEBUG   ] This salt-master instance has accepted 7 minion keys.
2014-07-22 16:34:53,866 [salt.fileserver                             ][DEBUG   ] Updating fileserver cache
2014-07-22 16:34:53,869 [salt.fileserver                             ][DEBUG   ] diff_mtime_map: the maps are the same
2014-07-22 16:34:53,869 [salt.utils.verify                           ][DEBUG   ] This salt-master instance has accepted 7 minion keys.
2014-07-22 16:34:57,158 [salt.master                                 ][INFO    ] Clear payload received with command _auth
2014-07-22 16:34:57,158 [salt.master                                 ][INFO    ] Authentication request from srv2.example.intern
2014-07-22 16:34:57,159 [salt.master                                 ][INFO    ] Authentication accepted from srv2.example.intern
2014-07-22 16:34:57,374 [salt.master                                 ][INFO    ] Clear payload received with command _auth
2014-07-22 16:34:57,375 [salt.master                                 ][INFO    ] Authentication request from srv2.example.intern
2014-07-22 16:34:57,375 [salt.master                                 ][INFO    ] Authentication accepted from srv2.example.intern
2014-07-22 16:34:57,564 [salt.master                                 ][INFO    ] AES payload received with command _mine
2014-07-22 16:35:53,934 [salt.fileserver                             ][DEBUG   ] Updating fileserver cache
2014-07-22 16:35:53,937 [salt.fileserver                             ][DEBUG   ] diff_mtime_map: the maps are the same
2014-07-22 16:35:53,938 [salt.utils.verify                           ][DEBUG   ] This salt-master instance has accepted 7 minion keys.
2014-07-22 16:36:54,003 [salt.fileserver                             ][DEBUG   ] Updating fileserver cache
2014-07-22 16:36:54,006 [salt.fileserver                             ][DEBUG   ] diff_mtime_map: the maps are the same
2014-07-22 16:36:54,006 [salt.utils.verify                           ][DEBUG   ] This salt-master instance has accepted 7 minion keys.
2014-07-22 16:37:54,072 [salt.fileserver                             ][DEBUG   ] Updating fileserver cache
2014-07-22 16:37:54,074 [salt.fileserver                             ][DEBUG   ] diff_mtime_map: the maps are the same
2014-07-22 16:37:54,075 [salt.utils.verify                           ][DEBUG   ] This salt-master instance has accepted 7 minion keys.
2014-07-22 16:38:54,138 [salt.fileserver                             ][DEBUG   ] Updating fileserver cache
2014-07-22 16:38:54,141 [salt.fileserver                             ][DEBUG   ] diff_mtime_map: the maps are the same
2014-07-22 16:38:54,142 [salt.utils.verify                           ][DEBUG   ] This salt-master instance has accepted 7 minion keys.
2014-07-22 16:39:17,125 [salt.master                                 ][INFO    ] Clear payload received with command publish
2014-07-22 16:39:17,126 [salt.master                                 ][INFO    ] User root Published command cmd.run with jid 20140722163917125243
2014-07-22 16:39:17,126 [salt.master                                 ][DEBUG   ] Published command details {'tgt_type': 'glob', 'jid': '20140722163917125243', 'tgt': '*', 'ret': '', 'user': 'root', 'arg': ['date'], 'fun': 'cmd.run'}
2014-07-22 16:39:17,143 [salt.master                                 ][INFO    ] AES payload received with command _return
2014-07-22 16:39:17,144 [salt.master                                 ][INFO    ] Got return from srv5.example.intern for job 20140722163917125243
2014-07-22 16:39:17,145 [salt.master                                 ][INFO    ] AES payload received with command _return
2014-07-22 16:39:17,145 [salt.master                                 ][INFO    ] Got return from srv4.example.intern for job 20140722163917125243
2014-07-22 16:39:17,146 [salt.master                                 ][INFO    ] AES payload received with command _return
2014-07-22 16:39:17,146 [salt.master                                 ][INFO    ] Got return from srv2.example.intern for job 20140722163917125243
2014-07-22 16:39:17,169 [salt.master                                 ][INFO    ] AES payload received with command _return
2014-07-22 16:39:17,169 [salt.master                                 ][INFO    ] Got return from srv7.example.intern for job 20140722163917125243
2014-07-22 16:39:17,170 [salt.master                                 ][INFO    ] AES payload received with command _return
2014-07-22 16:39:17,170 [salt.master                                 ][INFO    ] Got return from srv6.example.intern for job 20140722163917125243
2014-07-22 16:39:22,135 [salt.master                                 ][INFO    ] Clear payload received with command publish
2014-07-22 16:39:22,136 [salt.master                                 ][INFO    ] User root Published command saltutil.find_job with jid 20140722163922135841
2014-07-22 16:39:22,137 [salt.master                                 ][DEBUG   ] Published command details {'tgt_type': 'glob', 'jid': '20140722163922135841', 'tgt': '*', 'ret': '', 'user': 'root', 'arg': ['20140722163917125243'], 'fun': 'saltutil.find_job'}
2014-07-22 16:39:22,144 [salt.master                                 ][INFO    ] AES payload received with command _return
2014-07-22 16:39:22,144 [salt.master                                 ][INFO    ] AES payload received with command _return
2014-07-22 16:39:22,145 [salt.master                                 ][INFO    ] Got return from srv2.example.intern for job 20140722163922135841
2014-07-22 16:39:22,145 [salt.master                                 ][INFO    ] Got return from srv5.example.intern for job 20140722163922135841
2014-07-22 16:39:22,145 [salt.master                                 ][INFO    ] AES payload received with command _return
2014-07-22 16:39:22,145 [salt.master                                 ][INFO    ] Got return from srv4.example.intern for job 20140722163922135841
2014-07-22 16:39:22,145 [salt.master                                 ][INFO    ] AES payload received with command _return
2014-07-22 16:39:22,146 [salt.master                                 ][INFO    ] Got return from srv6.example.intern for job 20140722163922135841
2014-07-22 16:39:22,162 [salt.master                                 ][INFO    ] AES payload received with command _return
2014-07-22 16:39:22,162 [salt.master                                 ][INFO    ] Got return from srv7.example.intern for job 20140722163922135841
2014-07-22 16:39:27,200 [salt.master                                 ][INFO    ] Clear payload received with command _auth
2014-07-22 16:39:27,200 [salt.master                                 ][INFO    ] Authentication request from srv3.example.intern
2014-07-22 16:39:27,201 [salt.master                                 ][INFO    ] Authentication accepted from srv3.example.intern
2014-07-22 16:39:27,481 [salt.master                                 ][INFO    ] AES payload received with command _return
2014-07-22 16:39:27,481 [salt.master                                 ][INFO    ] Got return from srv3.example.intern for job 20140722163917125243
2014-07-22 16:39:27,483 [salt.master                                 ][INFO    ] AES payload received with command _return
2014-07-22 16:39:27,483 [salt.master                                 ][INFO    ] Got return from srv3.example.intern for job 20140722163922135841
2014-07-22 16:39:54,179 [salt.fileserver                             ][DEBUG   ] Updating fileserver cache
2014-07-22 16:39:54,182 [salt.fileserver                             ][DEBUG   ] diff_mtime_map: the maps are the same
2014-07-22 16:39:54,182 [salt.utils.verify                           ][DEBUG   ] This salt-master instance has accepted 7 minion keys.
2014-07-22 16:40:02,253 [salt.master                                 ][INFO    ] Clear payload received with command _auth
2014-07-22 16:40:02,253 [salt.master                                 ][INFO    ] Authentication request from srv8.example.intern
2014-07-22 16:40:02,254 [salt.master                                 ][INFO    ] Authentication accepted from srv8.example.intern
2014-07-22 16:40:02,580 [salt.master                                 ][INFO    ] AES payload received with command _return
2014-07-22 16:40:02,580 [salt.master                                 ][INFO    ] Got return from srv8.example.intern for job 20140722163917125243
2014-07-22 16:40:02,584 [salt.master                                 ][INFO    ] AES payload received with command _return
2014-07-22 16:40:02,585 [salt.master                                 ][INFO    ] Got return from srv8.example.intern for job 20140722163922135841
2014-07-22 16:40:54,248 [salt.fileserver                             ][DEBUG   ] Updating fileserver cache
2014-07-22 16:40:54,251 [salt.fileserver                             ][DEBUG   ] diff_mtime_map: the maps are the same
2014-07-22 16:40:54,251 [salt.utils.verify                           ][DEBUG   ] This salt-master instance has accepted 7 minion keys.
2014-07-22 16:41:54,317 [salt.fileserver                             ][DEBUG   ] Updating fileserver cache
2014-07-22 16:41:54,320 [salt.fileserver                             ][DEBUG   ] diff_mtime_map: the maps are the same
2014-07-22 16:41:54,320 [salt.utils.verify                           ][DEBUG   ] This salt-master instance has accepted 7 minion keys.
2014-07-22 16:42:54,386 [salt.fileserver                             ][DEBUG   ] Updating fileserver cache
2014-07-22 16:42:54,389 [salt.fileserver                             ][DEBUG   ] diff_mtime_map: the maps are the same
2014-07-22 16:42:54,389 [salt.utils.verify                           ][DEBUG   ] This salt-master instance has accepted 7 minion keys.
2014-07-22 16:43:54,454 [salt.fileserver                             ][DEBUG   ] Updating fileserver cache
2014-07-22 16:43:54,457 [salt.fileserver                             ][DEBUG   ] diff_mtime_map: the maps are the same
2014-07-22 16:43:54,458 [salt.utils.verify                           ][DEBUG   ] This salt-master instance has accepted 7 minion keys.

srv3 minion debug log:

2014-07-22 14:31:42,426 [salt             ][INFO    ] Setting up the Salt Minion "srv3.example.intern"
2014-07-22 14:31:42,431 [salt.utils.process][DEBUG   ] Created pidfile: /run/salt-minion.pid
2014-07-22 14:31:42,432 [salt.config      ][DEBUG   ] Reading configuration from /etc/salt/minion
2014-07-22 14:31:42,509 [salt.config      ][DEBUG   ] Including configuration from '/etc/salt/minion.d/myMinion.conf'
2014-07-22 14:31:42,509 [salt.config      ][DEBUG   ] Reading configuration from /etc/salt/minion.d/myMinion.conf
2014-07-22 14:31:42,783 [salt.minion                              ][DEBUG   ] Attempting to authenticate with the Salt Master at 10.82.0.1
2014-07-22 14:31:42,792 [salt.crypt                               ][DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
2014-07-22 14:31:42,851 [salt.crypt                               ][DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
2014-07-22 14:31:42,994 [salt.crypt                               ][DEBUG   ] Decrypting the current master AES key
2014-07-22 14:31:42,994 [salt.crypt                               ][DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
2014-07-22 14:31:43,109 [salt.minion                              ][INFO    ] Authentication with master successful!
2014-07-22 14:31:43,110 [salt.crypt                               ][DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
2014-07-22 14:31:43,249 [salt.crypt                               ][DEBUG   ] Decrypting the current master AES key
2014-07-22 14:31:43,249 [salt.crypt                               ][DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
2014-07-22 14:31:43,384 [salt.crypt                               ][DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
2014-07-22 14:31:43,443 [salt.config                              ][DEBUG   ] Reading configuration from /etc/salt/minion
2014-07-22 14:31:43,521 [salt.config                              ][DEBUG   ] Including configuration from '/etc/salt/minion.d/myMinion.conf'
2014-07-22 14:31:43,521 [salt.config                              ][DEBUG   ] Reading configuration from /etc/salt/minion.d/myMinion.conf
2014-07-22 14:31:47,494 [salt.loader                              ][DEBUG   ] Loaded localemod as virtual locale
2014-07-22 14:31:47,495 [salt.loader                              ][DEBUG   ] Loaded groupadd as virtual group
2014-07-22 14:31:47,497 [salt.loader                              ][DEBUG   ] Loaded ebuild as virtual pkg
2014-07-22 14:31:47,501 [salt.loader                              ][DEBUG   ] Loaded parted as virtual partition
2014-07-22 14:31:47,503 [salt.loader                              ][DEBUG   ] Loaded linux_sysctl as virtual sysctl
2014-07-22 14:31:47,523 [salt.loader                              ][DEBUG   ] Loaded sysmod as virtual sys
2014-07-22 14:31:47,530 [salt.loader                              ][DEBUG   ] Loaded zcbuildout as virtual buildout
2014-07-22 14:31:47,532 [salt.loader                              ][DEBUG   ] Loaded djangomod as virtual django
2014-07-22 14:31:47,533 [salt.loader                              ][DEBUG   ] Loaded useradd as virtual user
2014-07-22 14:31:47,535 [salt.loader                              ][DEBUG   ] Loaded gentoo_service as virtual service
2014-07-22 14:31:47,548 [salt.loader                              ][DEBUG   ] Loaded virtualenv_mod as virtual virtualenv
2014-07-22 14:31:47,550 [salt.loader                              ][DEBUG   ] Loaded linux_acl as virtual acl
2014-07-22 14:31:47,551 [salt.loader                              ][DEBUG   ] Loaded cmdmod as virtual cmd
2014-07-22 14:31:47,562 [salt.loader                              ][DEBUG   ] Loaded gentoolkitmod as virtual gentoolkit
2014-07-22 14:31:47,645 [salt.loader                                 ][DEBUG   ] Loaded syslog_return as virtual syslog
2014-07-22 14:31:47,646 [salt.loader                                 ][DEBUG   ] Loaded couchdb_return as virtual couchdb
2014-07-22 14:31:47,646 [salt.loader                                 ][DEBUG   ] Loaded carbon_return as virtual carbon
2014-07-22 14:31:47,647 [salt.minion                                 ][DEBUG   ] I am srv3.example.intern and I am not supposed to start any proxies.
2014-07-22 14:31:47,647 [salt.minion                                 ][INFO    ] Minion is starting as user 'root'
2014-07-22 14:31:47,647 [salt.minion                                 ][DEBUG   ] Minion 'srv3.example.intern' trying to tune in
2014-07-22 14:31:47,648 [salt.minion                                 ][DEBUG   ] Minion PUB socket URI: ipc:///var/run/salt/minion/minion_event_a9a4e31ca4566be0615108ef7b1891be_pub.ipc
2014-07-22 14:31:47,648 [salt.minion                                 ][DEBUG   ] Minion PULL socket URI: ipc:///var/run/salt/minion/minion_event_a9a4e31ca4566be0615108ef7b1891be_pull.ipc
2014-07-22 14:31:47,649 [salt.minion                                 ][DEBUG   ] Generated random reconnect delay between '1000ms' and '11000ms' (7372)
2014-07-22 14:31:47,649 [salt.minion                                 ][DEBUG   ] Setting zmq_reconnect_ivl to '7372ms'
2014-07-22 14:31:47,649 [salt.minion                                 ][DEBUG   ] Setting zmq_reconnect_ivl_max to '11000ms'
2014-07-22 14:31:48,157 [salt.utils.schedule                         ][DEBUG   ] Running scheduled job: __mine_interval
2014-07-22 14:31:48,157 [salt.utils.schedule                         ][DEBUG   ] schedule: This job was scheduled with jid_include, adding to cache (jid_include defaults to True)
2014-07-22 14:31:48,158 [salt.utils.schedule                         ][DEBUG   ] schedule: This job was scheduled with a max number of 2
2014-07-22 14:31:48,175 [salt.utils.schedule                         ][DEBUG   ] schedule.handle_func: adding this job to the jobcache with data {'fun': 'mine.update', 'jid': '20140722143148168090', 'pid': 4603, 'id': 'srv3.example.intern'}
2014-07-22 14:31:48,178 [salt.crypt                                  ][DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
2014-07-22 14:31:48,318 [salt.crypt                                  ][DEBUG   ] Decrypting the current master AES key
2014-07-22 14:31:48,319 [salt.crypt                                  ][DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
2014-07-22 14:31:48,435 [salt.crypt                                  ][DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
2014-07-22 14:31:48,493 [salt.crypt                                  ][DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
2014-07-22 14:31:48,634 [salt.crypt                                  ][DEBUG   ] Decrypting the current master AES key
2014-07-22 14:31:48,634 [salt.crypt                                  ][DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
2014-07-22 14:33:36,976 [salt.minion                                 ][INFO    ] User root Executing command cmd.run with jid 20140722143336972835
2014-07-22 14:33:36,976 [salt.minion                                 ][DEBUG   ] Command details {'tgt_type': 'glob', 'jid': '20140722143336972835', 'tgt': '*', 'ret': '', 'user': 'root', 'arg': ['grep mgadrmfb /var/log/dmesg'], 'fun': 'cmd.run'}
2014-07-22 14:33:36,996 [salt.loaded.int.module.cmdmod               ][INFO    ] Executing command 'grep mgadrmfb /var/log/dmesg' in directory '/'
2014-07-22 14:33:37,007 [salt.loaded.int.module.cmdmod               ][INFO    ] output: [    4.376982] fbcon: mgadrmfb (fb0) is primary device
[    4.695993] mgag200 0000:03:00.0: fb0: mgadrmfb frame buffer device
2014-07-22 14:33:37,007 [salt.minion                                 ][INFO    ] Returning information for job: 20140722143336972835
2014-07-22 15:31:48,932 [salt.utils.schedule                         ][DEBUG   ] Running scheduled job: __mine_interval
2014-07-22 15:31:48,932 [salt.utils.schedule                         ][DEBUG   ] schedule: This job was scheduled with jid_include, adding to cache (jid_include defaults to True)
2014-07-22 15:31:48,933 [salt.utils.schedule                         ][DEBUG   ] schedule: This job was scheduled with a max number of 2
2014-07-22 15:31:48,945 [salt.utils.schedule                         ][DEBUG   ] schedule.handle_func: adding this job to the jobcache with data {'fun': 'mine.update', 'jid': '20140722153148937826', 'pid': 5110, 'id': 'srv3.example.intern'}
2014-07-22 15:31:48,948 [salt.crypt                                  ][DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
2014-07-22 15:31:49,107 [salt.crypt                                  ][DEBUG   ] Decrypting the current master AES key
2014-07-22 15:31:49,107 [salt.crypt                                  ][DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
2014-07-22 15:31:49,223 [salt.crypt                                  ][DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
2014-07-22 15:31:49,282 [salt.crypt                                  ][DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
2014-07-22 15:31:49,420 [salt.crypt                                  ][DEBUG   ] Decrypting the current master AES key
2014-07-22 15:31:49,420 [salt.crypt                                  ][DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
2014-07-22 16:31:48,002 [salt.utils.schedule                         ][DEBUG   ] Running scheduled job: __mine_interval
2014-07-22 16:31:48,003 [salt.utils.schedule                         ][DEBUG   ] schedule: This job was scheduled with jid_include, adding to cache (jid_include defaults to True)
2014-07-22 16:31:48,003 [salt.utils.schedule                         ][DEBUG   ] schedule: This job was scheduled with a max number of 2
2014-07-22 16:31:48,015 [salt.utils.schedule                         ][DEBUG   ] schedule.handle_func: adding this job to the jobcache with data {'fun': 'mine.update', 'jid': '20140722163148008141', 'pid': 5382, 'id': 'srv3.example.intern'}
2014-07-22 16:31:48,018 [salt.crypt                                  ][DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
2014-07-22 16:31:48,157 [salt.crypt                                  ][DEBUG   ] Decrypting the current master AES key
2014-07-22 16:31:48,158 [salt.crypt                                  ][DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
2014-07-22 16:31:48,273 [salt.crypt                                  ][DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
2014-07-22 16:31:48,332 [salt.crypt                                  ][DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
2014-07-22 16:31:48,469 [salt.crypt                                  ][DEBUG   ] Decrypting the current master AES key
2014-07-22 16:31:48,470 [salt.crypt                                  ][DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
2014-07-22 16:39:17,128 [salt.crypt                                  ][DEBUG   ] Failed to authenticate message
2014-07-22 16:39:17,129 [salt.minion                                 ][DEBUG   ] Waiting 10 seconds to re-authenticate
2014-07-22 16:39:27,139 [salt.minion                                 ][DEBUG   ] Attempting to authenticate with the Salt Master at 10.82.0.1
2014-07-22 16:39:27,140 [salt.crypt                                  ][DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
2014-07-22 16:39:27,199 [salt.crypt                                  ][DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
2014-07-22 16:39:27,337 [salt.crypt                                  ][DEBUG   ] Decrypting the current master AES key
2014-07-22 16:39:27,337 [salt.crypt                                  ][DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
2014-07-22 16:39:27,453 [salt.minion                                 ][INFO    ] Authentication with master successful!
2014-07-22 16:39:27,453 [salt.minion                                 ][INFO    ] User root Executing command cmd.run with jid 20140722163917125243
2014-07-22 16:39:27,453 [salt.minion                                 ][DEBUG   ] Command details {'tgt_type': 'glob', 'jid': '20140722163917125243', 'tgt': '*', 'ret': '', 'user': 'root', 'arg': ['date'], 'fun': 'cmd.run'}
2014-07-22 16:39:27,465 [salt.minion                                 ][INFO    ] User root Executing command saltutil.find_job with jid 20140722163922135841
2014-07-22 16:39:27,465 [salt.minion                                 ][DEBUG   ] Command details {'tgt_type': 'glob', 'jid': '20140722163922135841', 'tgt': '*', 'ret': '', 'user': 'root', 'arg': ['20140722163917125243'], 'fun': 'saltutil.find_job'}
2014-07-22 16:39:27,468 [salt.loaded.int.module.cmdmod               ][INFO    ] Executing command 'date' in directory '/'
2014-07-22 16:39:27,478 [salt.loaded.int.module.cmdmod               ][INFO    ] output: Tue Jul 22 16:39:27 CEST 2014
2014-07-22 16:39:27,479 [salt.minion                                 ][INFO    ] Returning information for job: 20140722163917125243
2014-07-22 16:39:27,481 [salt.minion                                 ][INFO    ] Returning information for job: 20140722163922135841

srv8 minion debug log:

2014-07-22 14:45:45,407 [salt             ][INFO    ] Setting up the Salt Minion "srv8.example.intern"
2014-07-22 14:45:45,420 [salt.utils.process][DEBUG   ] Created pidfile: /run/salt-minion.pid
2014-07-22 14:45:45,421 [salt.config      ][DEBUG   ] Reading configuration from /etc/salt/minion
2014-07-22 14:45:45,484 [salt.config      ][DEBUG   ] Including configuration from '/etc/salt/minion.d/myMinion.conf'
2014-07-22 14:45:45,484 [salt.config      ][DEBUG   ] Reading configuration from /etc/salt/minion.d/myMinion.conf
2014-07-22 14:45:45,702 [salt.minion                              ][DEBUG   ] Attempting to authenticate with the Salt Master at 10.82.0.1
2014-07-22 14:45:45,716 [salt.crypt                               ][DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
2014-07-22 14:45:45,790 [salt.crypt                               ][DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
2014-07-22 14:45:45,944 [salt.crypt                               ][DEBUG   ] Decrypting the current master AES key
2014-07-22 14:45:45,944 [salt.crypt                               ][DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
2014-07-22 14:45:46,082 [salt.minion                              ][INFO    ] Authentication with master successful!
2014-07-22 14:45:46,083 [salt.crypt                               ][DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
2014-07-22 14:45:46,224 [salt.crypt                               ][DEBUG   ] Decrypting the current master AES key
2014-07-22 14:45:46,224 [salt.crypt                               ][DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
2014-07-22 14:45:46,384 [salt.crypt                               ][DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
2014-07-22 14:45:46,454 [salt.config                              ][DEBUG   ] Reading configuration from /etc/salt/minion
2014-07-22 14:45:46,516 [salt.config                              ][DEBUG   ] Including configuration from '/etc/salt/minion.d/myMinion.conf'
2014-07-22 14:45:46,517 [salt.config                              ][DEBUG   ] Reading configuration from /etc/salt/minion.d/myMinion.conf
2014-07-22 14:45:49,874 [salt.loader                              ][DEBUG   ] Loaded localemod as virtual locale
2014-07-22 14:45:49,875 [salt.loader                              ][DEBUG   ] Loaded groupadd as virtual group
2014-07-22 14:45:49,877 [salt.loader                              ][DEBUG   ] Loaded ebuild as virtual pkg
2014-07-22 14:45:49,881 [salt.loader                              ][DEBUG   ] Loaded linux_sysctl as virtual sysctl
2014-07-22 14:45:49,887 [salt.loader                              ][DEBUG   ] Loaded parted as virtual partition
2014-07-22 14:45:49,894 [salt.loader                              ][DEBUG   ] Loaded linux_acl as virtual acl
2014-07-22 14:45:49,903 [salt.loader                              ][DEBUG   ] Loaded sysmod as virtual sys
2014-07-22 14:45:49,910 [salt.loader                              ][DEBUG   ] Loaded zcbuildout as virtual buildout
2014-07-22 14:45:49,912 [salt.loader                              ][DEBUG   ] Loaded useradd as virtual user
2014-07-22 14:45:49,913 [salt.loader                              ][DEBUG   ] Loaded gentoo_service as virtual service
2014-07-22 14:45:49,925 [salt.loader                              ][DEBUG   ] Loaded virtualenv_mod as virtual virtualenv
2014-07-22 14:45:49,925 [salt.loader                              ][DEBUG   ] Loaded djangomod as virtual django
2014-07-22 14:45:49,930 [salt.loader                              ][DEBUG   ] Loaded cmdmod as virtual cmd
2014-07-22 14:45:49,935 [salt.loader                              ][DEBUG   ] Loaded gentoolkitmod as virtual gentoolkit
2014-07-22 14:45:49,936 [salt.loader                              ][DEBUG   ] Loaded linux_lvm as virtual lvm
2014-07-22 14:45:49,992 [salt.loader                                 ][DEBUG   ] Loaded syslog_return as virtual syslog
2014-07-22 14:45:49,992 [salt.loader                                 ][DEBUG   ] Loaded couchdb_return as virtual couchdb
2014-07-22 14:45:49,993 [salt.loader                                 ][DEBUG   ] Loaded carbon_return as virtual carbon
2014-07-22 14:45:49,993 [salt.minion                                 ][DEBUG   ] I am srv8.example.intern and I am not supposed to start any proxies.
2014-07-22 14:45:49,993 [salt.minion                                 ][INFO    ] Minion is starting as user 'root'
2014-07-22 14:45:49,993 [salt.minion                                 ][DEBUG   ] Minion 'srv8.example.intern' trying to tune in
2014-07-22 14:45:49,994 [salt.minion                                 ][DEBUG   ] Minion PUB socket URI: ipc:///var/run/salt/minion/minion_event_a71ed27d3670f6381b102a69f27141a8_pub.ipc
2014-07-22 14:45:49,994 [salt.minion                                 ][DEBUG   ] Minion PULL socket URI: ipc:///var/run/salt/minion/minion_event_a71ed27d3670f6381b102a69f27141a8_pull.ipc
2014-07-22 14:45:49,994 [salt.minion                                 ][DEBUG   ] Generated random reconnect delay between '1000ms' and '11000ms' (8725)
2014-07-22 14:45:49,994 [salt.minion                                 ][DEBUG   ] Setting zmq_reconnect_ivl to '8725ms'
2014-07-22 14:45:49,995 [salt.minion                                 ][DEBUG   ] Setting zmq_reconnect_ivl_max to '11000ms'
2014-07-22 14:45:50,507 [salt.utils.schedule                         ][DEBUG   ] Running scheduled job: __mine_interval
2014-07-22 14:45:50,508 [salt.utils.schedule                         ][DEBUG   ] schedule: This job was scheduled with jid_include, adding to cache (jid_include defaults to True)
2014-07-22 14:45:50,517 [salt.utils.schedule                         ][DEBUG   ] schedule: This job was scheduled with a max number of 2
2014-07-22 14:45:50,545 [salt.utils.schedule                         ][DEBUG   ] schedule.handle_func: adding this job to the jobcache with data {'fun': 'mine.update', 'jid': '20140722144550533394', 'pid': 4272, 'id': 'srv8.example.intern'}
2014-07-22 14:45:50,548 [salt.crypt                                  ][DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
2014-07-22 14:45:50,693 [salt.crypt                                  ][DEBUG   ] Decrypting the current master AES key
2014-07-22 14:45:50,694 [salt.crypt                                  ][DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
2014-07-22 14:45:50,832 [salt.crypt                                  ][DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
2014-07-22 14:45:50,903 [salt.crypt                                  ][DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
2014-07-22 14:45:51,047 [salt.crypt                                  ][DEBUG   ] Decrypting the current master AES key
2014-07-22 14:45:51,048 [salt.crypt                                  ][DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
2014-07-22 15:45:50,443 [salt.utils.schedule                         ][DEBUG   ] Running scheduled job: __mine_interval
2014-07-22 15:45:50,443 [salt.utils.schedule                         ][DEBUG   ] schedule: This job was scheduled with jid_include, adding to cache (jid_include defaults to True)
2014-07-22 15:45:50,443 [salt.utils.schedule                         ][DEBUG   ] schedule: This job was scheduled with a max number of 2
2014-07-22 15:45:50,461 [salt.utils.schedule                         ][DEBUG   ] schedule.handle_func: adding this job to the jobcache with data {'fun': 'mine.update', 'jid': '20140722154550449998', 'pid': 4503, 'id': 'srv8.example.intern'}
2014-07-22 15:45:50,464 [salt.crypt                                  ][DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
2014-07-22 15:45:50,614 [salt.crypt                                  ][DEBUG   ] Decrypting the current master AES key
2014-07-22 15:45:50,614 [salt.crypt                                  ][DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
2014-07-22 15:45:50,753 [salt.crypt                                  ][DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
2014-07-22 15:45:50,824 [salt.crypt                                  ][DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
2014-07-22 15:45:50,965 [salt.crypt                                  ][DEBUG   ] Decrypting the current master AES key
2014-07-22 15:45:50,965 [salt.crypt                                  ][DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
2014-07-22 16:39:17,126 [salt.crypt                                  ][DEBUG   ] Failed to authenticate message
2014-07-22 16:39:17,126 [salt.minion                                 ][DEBUG   ] Waiting 45 seconds to re-authenticate
2014-07-22 16:40:02,172 [salt.minion                                 ][DEBUG   ] Attempting to authenticate with the Salt Master at 10.82.0.1
2014-07-22 16:40:02,172 [salt.crypt                                  ][DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
2014-07-22 16:40:02,243 [salt.crypt                                  ][DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
2014-07-22 16:40:02,387 [salt.crypt                                  ][DEBUG   ] Decrypting the current master AES key
2014-07-22 16:40:02,388 [salt.crypt                                  ][DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
2014-07-22 16:40:02,526 [salt.minion                                 ][INFO    ] Authentication with master successful!
2014-07-22 16:40:02,527 [salt.minion                                 ][INFO    ] User root Executing command cmd.run with jid 20140722163917125243
2014-07-22 16:40:02,527 [salt.minion                                 ][DEBUG   ] Command details {'tgt_type': 'glob', 'jid': '20140722163917125243', 'tgt': '*', 'ret': '', 'user': 'root', 'arg': ['date'], 'fun': 'cmd.run'}
2014-07-22 16:40:02,545 [salt.minion                                 ][INFO    ] User root Executing command saltutil.find_job with jid 20140722163922135841
2014-07-22 16:40:02,546 [salt.minion                                 ][DEBUG   ] Command details {'tgt_type': 'glob', 'jid': '20140722163922135841', 'tgt': '*', 'ret': '', 'user': 'root', 'arg': ['20140722163917125243'], 'fun': 'saltutil.find_job'}
2014-07-22 16:40:02,553 [salt.loaded.int.module.cmdmod               ][INFO    ] Executing command 'date' in directory '/'
2014-07-22 16:40:02,570 [salt.loaded.int.module.cmdmod               ][INFO    ] output: Tue Jul 22 16:40:02 CEST 2014
2014-07-22 16:40:02,575 [salt.minion                                 ][INFO    ] Returning information for job: 20140722163917125243
2014-07-22 16:40:02,580 [salt.minion                                 ][INFO    ] Returning information for job: 20140722163922135841

Because the re-authentication time is different per client I think that's why it feels like it takes 1-2 minutes until all minions are back like stated in #14343 (comment)

Now the question is why authentication will fail sometimes. I am wondering that the master doesn't log any details regarding the failed authentication.

@basepi
Copy link
Contributor

basepi commented Jul 23, 2014

Well, interestingly enough, in both cases the failed authentication was before the actual command was received by master, so it doesn't seem like that's actually the issue. The command is received and returned quickly.

@Whissi
Copy link
Contributor Author

Whissi commented Jul 23, 2014

Here's another minion log:

2014-07-23 20:27:44,410 [salt.minion                                 ][INFO    ] Returning information for job: 20140723202605636921
2014-07-23 20:44:48,515 [salt.crypt                                  ][DEBUG   ] Failed to authenticate message
2014-07-23 20:44:48,515 [salt.minion                                 ][DEBUG   ] Waiting 12 seconds to re-authenticate
2014-07-23 20:45:00,527 [salt.minion                                 ][DEBUG   ] Attempting to authenticate with the Salt Master at 10.82.0.1
2014-07-23 20:45:00,528 [salt.crypt                                  ][DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
2014-07-23 20:45:00,598 [salt.crypt                                  ][DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
2014-07-23 20:45:00,747 [salt.crypt                                  ][DEBUG   ] Decrypting the current master AES key
2014-07-23 20:45:00,747 [salt.crypt                                  ][DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
2014-07-23 20:45:00,886 [salt.minion                                 ][INFO    ] Authentication with master successful!
2014-07-23 20:45:00,886 [salt.minion                                 ][INFO    ] User root Executing command cmd.run with jid 20140723204448511760
2014-07-23 20:45:00,886 [salt.minion                                 ][DEBUG   ] Command details {'tgt_type': 'glob', 'jid': '20140723204448511760', 'tgt': '*', 'ret': '', 'user': 'root', 'arg': ['date'], 'fun': 'cmd.run'}
2014-07-23 20:45:00,903 [salt.minion                                 ][INFO    ] User root Executing command saltutil.find_job with jid 20140723204453520459
2014-07-23 20:45:00,903 [salt.minion                                 ][DEBUG   ] Command details {'tgt_type': 'glob', 'jid': '20140723204453520459', 'tgt': '*', 'ret': '', 'user': 'root', 'arg': ['20140723204448511760'], 'fun': 'saltutil.find_job'}
2014-07-23 20:45:00,914 [salt.loaded.int.module.cmdmod               ][INFO    ] Executing command 'date' in directory '/'
2014-07-23 20:45:00,933 [salt.loaded.int.module.cmdmod               ][INFO    ] output: Wed Jul 23 20:45:00 CEST 2014
2014-07-23 20:45:00,933 [salt.minion                                 ][INFO    ] Returning information for job: 20140723204448511760
2014-07-23 20:45:00,934 [salt.minion                                 ][INFO    ] Returning information for job: 20140723204453520459

@basepi
Copy link
Contributor

basepi commented Jul 23, 2014

Weird! For now I would recommend setting up a cron job with just a basic test.ping call every few minutes to keep it from timing out. This one's definitely strange, but we haven't been able to reproduce it yet!

@steverweber
Copy link
Contributor

FYI the current develop version has a keepalive patch that might help keep the minion online.

the minion will ping the master every 10min... if the master don't respond the minion will restart.
ping_interval: 10

The above might get you by for now but the root issue should still be investigated.

@cachedout
Copy link
Contributor

I tend to wonder if this might be related to the ZeroMQ connection caching that was added in that time frame.

(cc: @jacksontj for his thoughts.)

@pwaller
Copy link
Contributor

pwaller commented Aug 6, 2014

+1 I'm also getting this since 2014.7.1. It makes salt almost impossible to use.

@jacksontj
Copy link
Contributor

Actually looking at the logs it doesn't seem to be related to my zmq caching since it happens so close to when the event happens (and before the command is run)-- and the channel re-use stuff is on rep/recv (the return side of things). The failure that you are seeing looks like the master AES key has rotated in that time. If you look at srv8 minion's log you can see the authentication failure message in the log at the exact time that the job was published. The only 2 places that print that exact message are in crypt.py (https://github.com/saltstack/salt/blob/develop/salt/crypt.py#L671) which both mean that either the AES key was bad or the pub job was malformed-- i'd bet on the AES key rotating. With that you are kicking in the acceptance_wait_time back-off (https://github.com/saltstack/salt/blob/develop/salt/minion.py#L874) which makes the re-auth much less aggressive. This was changed back to higher numbers (50s+) in this release, although I know for a fact that it has been set significantly lower in previous releases. This would cause your returns to take significantly longer to come back because the minion is taking significantly longer to get them (time of pub + time to re-auth + job time).

Now, the reason that this effects some minions and not all is that the re-auth delay is randomized, so it makes sense that "some" minions "don't" return, becuase they got a longer re-auth timer.

If your environment is relatively small you can tune down the acceptance_wait_time to have the minions re-auth faster, an maybe disable AES key rotation if you have it enabled. The ping every N seconds will help because it will force the minion to re-auth sooner (instead of waiting for a job) but it doesn't solve the problem. Basically its a tradeoff of minion availability and master load (for the re-auth).

If you want to play around with it more you can probably re-produce this by doing a salt-key -d on some specific minion key as this forces an AES key rotation, or otherwise forcing an AES key rotation

@pwaller
Copy link
Contributor

pwaller commented Aug 6, 2014

@jacksontj, is what you just suggested consistent with running test.ping on all minions repeatedly and having it fail after ~12 seconds on the same set of minions in succession many times? > 5?

@steverweber
Copy link
Contributor

please ckeck your system logs on the minon to be sure nothing strange correlates with the minion down time.

/var/log/kern.log
/var/log/syslog

this might sound a little silly, but could you log pings to the master...

crontab -e
@reboot ping masterhostname | perl -nle 'BEGIN {$|++} print scalar(localtime), " ", $_' > /var/log/pingtest

im not sure if the above command will work... if not dont worry about it :)

@jacksontj
Copy link
Contributor

@pwaller Yes, the re-auth is a blocking call within the minion's main process. So, until it is able to re-auth no new jobs will be pulled in. Depending on the HWMs it may or may not process those jobs after the auth is finished.

@drawsmcgraw
Copy link
Contributor

Just adding a +1 for me. Version 2014.1.7. Seems to be "several minutes" (2-10) before the minions are responsive again.

@JustinCarmony
Copy link
Contributor

+1 for me as well, we've added a cronjob to test.ping every 5 minutes, but a fix in salt for this would be awesome. :)

@jacksontj
Copy link
Contributor

I've just submitted a pull req to log the aes key rotation on the master
side (#15603). You can either pull
that commit in and spin a new build or watch the directory for the file
that is made (the .dfn file).

Key rotation in salt happens for a few different reasons:

  • Master restart: on startup the AES key is generated, so if the master
    is restarted a new key is made
  • Minion key revocation: since all the publish jobs are encrypted with
    the AES key, once a minion is removed from the cluster the AES key has to
    be rotated to ensure the privacy of published messages going forward
  • Scheduled key rotation: there is a master config option
    (publish_session) which will configure the master to rotate the key every N
    seconds (I can't seem to find docs on this-- so it may just be undocumented
    all together)

Key rotation doesn't really "kill" the connection, it just means that the
next pub job the minion gets will be encrypted using a different key--
meaning the minions needs to re-auth. The test.ping workaround will just
make you hit that same re-auth sooner so that when you run a command its
less likely that you have minions stuck in the re-auth state.

On Mon, Sep 8, 2014 at 12:29 PM, Thomas D. notifications@github.com wrote:

test.ping is the current workaround, yes.

Is there a way to make key rotation visible (add the key rotation event to
the debug log)? I am not yet convinced that this is the root cause. In
other words: If it is the key rotation which kills the connection, why
don't we see a time out when using the test.ping command every 5 minutes?
Or are keys only rotating when not in use for x minutes? Why is test.ping
a working workaround?


Reply to this email directly or view it on GitHub
#14343 (comment).

@Rucknar
Copy link

Rucknar commented Sep 9, 2014

Thanks for the info @jacksontj. I've not heard of publish_session before this ticket. Just had a look and by default it's set to 'publish_session': 86400,. I presume this is seconds? I wonder if were falsely triggering the rotation unintentionally, of interest, we also use a reauth_delay to avoid overloading our master.

@jacksontj
Copy link
Contributor

Yea, that is in seconds-- so that would mean every 24h. You will also
trigger a key rotation on every salt-key delete (-d or -D), not only every
"publish_session" seconds.

We also use the reauth_delay stuff in our infrastructure, but the more
we've been using it (at least in its current implementation) the more
problems it seems to be causing ;)

On Tue, Sep 9, 2014 at 12:05 AM, superted666 notifications@github.com
wrote:

Thanks for the info @jacksontj https://github.com/jacksontj. I've not
heard of publish_session before this ticket. Just had a look and by default
it's set to 'publish_session': 86400,. I presume this is seconds? I
wonder if were falsely triggering the rotation unintentionally, of
interest, we also use a reauth_delay to avoid overloading our master.


Reply to this email directly or view it on GitHub
#14343 (comment).

@Whissi
Copy link
Contributor Author

Whissi commented Sep 11, 2014

OK, I applied #15603 to salt-2014.1.10 and deactivated the 5min test.ping cronjob used as workaround.

For a short moment I thought the problem is gone because everything was working for ~20 hours. No timeout.

But then it happens again. And thanks to the new log output I saw the reason:

2014-09-10 01:02:24,099 [salt.master                                 ][DEBUG   ] New master AES key found by pid 3448
2014-09-10 01:02:24,453 [salt.master                                 ][DEBUG   ] New master AES key found by pid 3441
2014-09-10 01:02:24,730 [salt.master                                 ][DEBUG   ] New master AES key found by pid 3447
2014-09-10 01:11:44,017 [salt.master                                 ][DEBUG   ] New master AES key found by pid 3444
2014-09-10 01:11:44,233 [salt.master                                 ][DEBUG   ] New master AES key found by pid 3440

The first salt command after this was logged run into a timeout. All other following requests succeeded until

2014-09-11 01:02:24,848 [salt.master                                 ][DEBUG   ] New master AES key found by pid 3444
2014-09-11 01:02:25,200 [salt.master                                 ][DEBUG   ] New master AES key found by pid 3440
2014-09-11 01:02:25,477 [salt.master                                 ][DEBUG   ] New master AES key found by pid 3448
2014-09-11 01:11:44,474 [salt.master                                 ][DEBUG   ] New master AES key found by pid 3441
2014-09-11 01:11:44,690 [salt.master                                 ][DEBUG   ] New master AES key found by pid 3447

To be clear:

  1. Keys will rotate at 01:02:00 (not sure why exactly at this time, but that doesn't matter at the moment).
  2. The next command after the master has rotated its key, will fail. It doesn't matter if you issue the next command at 01:15:00 or 23:55:00... the next command will always fail.

From my understanding (please correct me if I am wrong) this is happening because the master doesn't notify the minions "Hey, I have rotated my keys... check your authentication...".
So when the first command with the new key gets published all the subscribed minions will notice "Mhh... cannot decrypt... let me skip this message and re-authenticate!".

This could be improved if

  • The master would notify all minions about a changed key. Because the master knows all the minions which should react on test.ping this shouldn't be a problem. Also, the master could send this notification in batches to keep the load under control (the thing why we introduced the dynamic re-authentication delay).
    • Maybe the master must keep the old key until all minions got notified... but this shouldn't be a problem if the master will only use the old key for such notifications. Also, when the master knows that all minions were notified, the master could remove the old key.
  • When the minion will notice that it cannot decrypt the current message, the minion could keep the message in queue, start re-authentication and restart processing the queue.

@jacksontj
Copy link
Contributor

The notification of minions should be relatively easy (i'd think the
reactor would be a great place for it). As for keeping the old key around
until the new ones-- thats kinda hard for a couple reasons.

  1. The reason that the key is rotation (usually) is that we've revoked
    access to a minion.So, if we allow the old one, then we can't verify that
    the revoked minion won't have access to the data
  2. Publications are encrypted using the same symmetric key, so we end up
    with a variant of problem Enable regex on the salt cli #1

As for the minion cannot decrypt thing, it actually does exactly what you
describe (re-auth, then start processing where it left off). The issue is 2
part:

  1. Minion backoff timers start out REALLY high-- which is maybe good for
    large installs, but just slow for small ones
  2. Minion timeouts don't drop the auth request from the master, so by
    trying, disconnecting, and reconnecting your minion now has 2 auth requests
    in the master queue-- the only reason this ends up helping is due to the
    increased backoff-- which is more latency!

I've looked into solving this issue a bit, but zeromq makes solving a few
of these problems difficult (primarily the backoff problem). We should
probably change the minion backoff thing to just sleep instead of
disconnecting, sleeping, and reconnecting. That would help master load in
the auth storming, but still doesn't help with the latency of the minion
re-connecting (which is hard to work around).

On Wed, Sep 10, 2014 at 6:07 PM, Thomas D. notifications@github.com wrote:

OK, I applied #15603 #15603 to
salt-2014.1.10 and deactivated the 5min test.ping cronjob used as
workaround.

For a short moment I thought the problem is gone because everything was
working for ~20 hours. No timeout.

But then it happens again. And thanks to the new log output I saw the
reason:

2014-09-10 01:02:24,099 [salt.master ][DEBUG ] New master AES key found by pid 3448
2014-09-10 01:02:24,453 [salt.master ][DEBUG ] New master AES key found by pid 3441
2014-09-10 01:02:24,730 [salt.master ][DEBUG ] New master AES key found by pid 3447
2014-09-10 01:11:44,017 [salt.master ][DEBUG ] New master AES key found by pid 3444
2014-09-10 01:11:44,233 [salt.master ][DEBUG ] New master AES key found by pid 3440

The first salt command after this was logged run into a timeout. All other
following requests succeeded until

2014-09-11 01:02:24,848 [salt.master ][DEBUG ] New master AES key found by pid 3444
2014-09-11 01:02:25,200 [salt.master ][DEBUG ] New master AES key found by pid 3440
2014-09-11 01:02:25,477 [salt.master ][DEBUG ] New master AES key found by pid 3448
2014-09-11 01:11:44,474 [salt.master ][DEBUG ] New master AES key found by pid 3441
2014-09-11 01:11:44,690 [salt.master ][DEBUG ] New master AES key found by pid 3447

To be clear:

  1. Keys will rotate at 01:02:00 (not sure why exactly at this time,
    but that doesn't matter at the moment).
  2. The next command after the master has rotated its key, will fail.
    It doesn't matter if you issue the next command at 01:15:00 or 23:55:00...
    the next command will always fail.

From my understanding (please correct me if I am wrong) this is happening
because the master doesn't notify the minions "Hey, I have rotated my
keys... check your authentication..."
.
So when the first command with the new key gets published all the
subscribed minions will notice "Mhh... cannot decrypt... let me skip this
message and re-authenticate!".

This could be improved if

The master would notify all minions about a changed key. Because the
master knows all the minions which should react on test.ping this
shouldn't be a problem. Also, the master could send this notification in
batches to keep the load under control (the thing why we introduced the
dynamic re-authentication delay).
- Maybe the master must keep the old key until all minions got
notified... but this shouldn't be a problem if the master will only use the
old key for such notifications. Also, when the master knows that all
minions were notified, the master could remove the old key.
-

When the minion will notice that it cannot decrypt the current
message, the minion could keep the message in queue, start
re-authentication and restart processing the queue.


Reply to this email directly or view it on GitHub
#14343 (comment).

@cachedout
Copy link
Contributor

@jacksontj Regarding AES key rotation, there's some work over in #15722.

@Whissi
Copy link
Contributor Author

Whissi commented Sep 12, 2014

As for the minion cannot decrypt thing, it actually does exactly what you
describe (re-auth, then start processing where it left off).

Are you sure that this is happening? I mean, if this is actual happening, shouldn't I see the following work:

  1. Wait for a key change (in my case, wait for 01:15)
  2. Run salt '*' cmd.run 'touch /i-was-here'

The command will timeout - no minion will return (because that's the first command after the key was rotated).

But if you are right and the minion will detect a key change, re-authenticate and resume work, than /i-was-here should have a last-modified time after 01:15+backoff_timer_value... I am not sure about the current backoff_timer value, but if I will check each minion at 02:15 (60 minutes after I run the touch command) I should see recently touched file /i-was-here, not?

If this is not working (my feeling) /i-was-here should not exist/shouldn't be modified on any minion... because the job (touching /i-was-here) was lost.

I'll test that in 3 hours... :)

@jacksontj
Copy link
Contributor

Guess I should be clear there too, for some reason there are different
amounts of retries allowed at varying points of salt. Usually its at least
2 tries that are allowed.

And yes, for your test from what I expect to happen is something like

  • Minion gets command it can't decrypt
  • Minion re-auths
  • Minion then executes that job
  • Minion then returns data from the job

If the back-off time is relatively long its very likely that return will be
after the CLI finished waiting (esp since the master CLI timeout is 5s by
default).

On Fri, Sep 12, 2014 at 1:14 PM, Thomas D. notifications@github.com wrote:

As for the minion cannot decrypt thing, it actually does exactly what you
describe (re-auth, then start processing where it left off).

Are you sure that this is happening? I mean, if this is actual happening,
shouldn't I see the following work:

  1. Wait for a key change (in my case, wait for 01:15)
  2. Run salt '*' cmd.run 'touch /i-was-here'

The command will timeout - no minion will return (because that's the first
command after the key was rotated).

But if you are right and the minion will detect a key change,
re-authenticate and resume work, than /i-was-here should have a
last-modified time after 01:15+backoff_timer_value... I am not sure about
the current backoff_timer value, but if I will check each minion at 02:15
(60 minutes after I run the touch command) I should see recently touched
file /i-was-here, not?

If this is not working (my feeling) /i-was-here should not
exist/shouldn't be modified on any minion... because the job (touching
/i-was-here) was lost.

I'll test that in 3 hours... :)


Reply to this email directly or view it on GitHub
#14343 (comment).

@Whissi
Copy link
Contributor Author

Whissi commented Sep 14, 2014

@jacksontj OK, what you said is correct:

master ~ # grep -i found /var/log/salt/master
2014-09-14 01:02:25,236 [salt.master                                 ][DEBUG   ] New master AES key found by pid 3440
2014-09-14 01:02:25,588 [salt.master                                 ][DEBUG   ] New master AES key found by pid 3448
2014-09-14 01:02:25,865 [salt.master                                 ][DEBUG   ] New master AES key found by pid 3441
2014-09-14 01:11:44,367 [salt.master                                 ][DEBUG   ] New master AES key found by pid 3447
2014-09-14 01:11:44,590 [salt.master                                 ][DEBUG   ] New master AES key found by pid 3444


master ~ # date
Sun Sep 14 01:22:14 CEST 2014
master ~ # salt '*' --show-timeout cmd.run 'date > /salt.test && echo "Done!" || echo "Failed! :("'
srv2.example.intern:
    Minion did not return
srv3.example.intern:
    Minion did not return
srv4.example.intern:
    Minion did not return
srv5.example.intern:
    Minion did not return
srv6.example.intern:
    Minion did not return
srv7.example.intern:
    Minion did not return
srv8.example.intern:
    Minion did not return


master ~ # salt-run jobs.print_job 20140914012218141211
'20140914012218141211':
  Arguments:
  - date > /salt.test && echo "Done!" || echo "Failed! :("
  Function: cmd.run
  Result:
    srv2.example.intern: Done!
    srv3.example.intern: Done!
    srv4.example.intern: Done!
    srv5.example.intern: Done!
    srv6.example.intern: Done!
    srv7.example.intern: Done!
    srv8.example.intern: Done!
  StartTime: 2014, Sep 14 01:22:18.141211
  Target: '*'
  Target-type: glob
  User: root


master ~ # salt '*' --show-timeout cmd.run 'cat /salt.test'
srv2.example.intern:
    Sun Sep 14 01:22:50 CEST 2014
srv5.example.intern:
    Sun Sep 14 01:22:49 CEST 2014
srv6.example.intern:
    Sun Sep 14 01:23:08 CEST 2014
srv4.example.intern:
    Sun Sep 14 01:23:13 CEST 2014
srv3.example.intern:
    Sun Sep 14 01:23:04 CEST 2014
srv7.example.intern:
    Sun Sep 14 01:22:44 CEST 2014
srv8.example.intern:
    Sun Sep 14 01:22:35 CEST 2014

I don't understand the find_job return value salt will fire after gather_job_timeout. In this case:

# salt-run jobs.print_job 20140914012223152825
'20140914012223152825':
  Arguments:
  - '20140914012218141211'
  Function: saltutil.find_job
  Result:
    srv2.example.intern: {}
    srv3.example.intern: {}
    srv4.example.intern: {}
    srv5.example.intern: {}
    srv6.example.intern:
      arg:
      - date > /salt.test && echo "Done!" || echo "Failed! :("
      fun: cmd.run
      jid: '20140914012218141211'
      pid: 36977
      ret: ''
      tgt: '*'
      tgt_type: glob
      user: root
    srv7.example.intern: {}
    srv8.example.intern:
      arg:
      - date > /salt.test && echo "Done!" || echo "Failed! :("
      fun: cmd.run
      jid: '20140914012218141211'
      pid: 19054
      ret: ''
      tgt: '*'
      tgt_type: glob
      user: root
  StartTime: 2014, Sep 14 01:22:23.152825
  Target: '*'
  Target-type: glob
  User: root

From my understanding minion srv2, srv3, srv4, srv5 and srv7 are saying

No, I have no information regarding jid 20140914012218141211

but srv6 and srv8 are running this job. So I don't understand why these two minions are also listed with Minion did not return status.

Using the reactor system sounds nice. But we need a way to tell the CLI that the job isn't lost. Well, like shown, the job isn't lost, but when you are using the remote execution stuff and get such a response you will probably re-send the command. But this could be a problem if the command shouldn't/cannot run twice for example.

@donatello
Copy link

👍 I am also affected by this problem. I have about a 100 minions distributed across many DCs. The problem happens quite often, as our servers scale up and scale down un-attended. The salt master loses connectivity to many minions and ends up disrupting other orchestrations when the AES key is rotated due to a minion being deleted.

Are there any stop gap measures that I can use, while this issue is being resolved?

@basepi
Copy link
Contributor

basepi commented Nov 4, 2014

On 2014.7.0, if you set rotate_aes_key: False in your master config and restart your master, it will stop rotating the aes key on key deletion/rejection.

This does allow minions with those keys to continue to eavesdrop on publishes, but they can't interact with the master anymore.

@kennycoder
Copy link

Same issues here... running Debian 7. I did setup a basic cron job to run every 5 minutes but it still doesn't help. Makes it quite unusable.

@kelchm
Copy link

kelchm commented Nov 28, 2014

I've been dealing with this issue for the past few weeks without much success in finding a solution. Running a test.ping cron every few minutes seemed to help, but really didn't fix the issue at all. Similarly, I saw no improvement after configuring rotate_aes_key: False.

Last night out of desperation I tried completely disabling the salt reactor configuration I had in place on my master. I've had zero issues with minions dropping or not returning since making this change.

@basepi
Copy link
Contributor

basepi commented Dec 1, 2014

@kelchm Just to verify, are you on 2014.7.0? Actually, the output of salt --versions-report would be helpful. This is potentially good information if we can find a pattern.

@basepi
Copy link
Contributor

basepi commented Dec 1, 2014

@kelchm We would also be interested to see your reactor config and sls files.

@jacksontj
Copy link
Contributor

@kelchm AFAIK there should be nothing inherent in the reactor that would cause AES key rotation (and hence your problem). My best guess would be that one of your reactor events causes a key rotation (deleting a salt-minion key maybe?) But the reactor config and sls would help us nail that down.

@rallytime rallytime added severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around Core relates to code central or existential to Salt P4 Priority 4 and removed severity-low 4th level, cosemtic problems, work around exists labels Jul 14, 2015
@frogunder
Copy link
Contributor

@Whissi - Is this still a problem for you?

@Whissi
Copy link
Contributor Author

Whissi commented Dec 3, 2015

Not sure, we are still pinging all minions every 5 minutes to prevent this problem. I'll stop the cronjob next week to see if this is still happening with salt-2015.8.3 and update the issue report.

@xyoun
Copy link
Contributor

xyoun commented Dec 3, 2015

@Whissi it seems that 2015.5.3 and upper has resolved this problem.
I have 500+ servers run at once, not one timeout.

@basepi
Copy link
Contributor

basepi commented Dec 3, 2015

@xyoun That's great news! Hopefully that's the case with @Whissi as well!

@frogunder
Copy link
Contributor

@Whissi - Did you have time to update? Did it fix it?

@Whissi
Copy link
Contributor Author

Whissi commented Dec 17, 2015

After running salt-2015.8.3 over one week I haven't seen this issue again. But this can be a false positive because we currently have at least one salt event per 30min. But let consider this as fixed for now. Thanks!

@basepi
Copy link
Contributor

basepi commented Dec 17, 2015

Thanks for the update @Whissi

@basepi basepi closed this as completed Dec 17, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug broken, incorrect, or confusing behavior cannot-reproduce cannot be replicated with info/context provided Core relates to code central or existential to Salt P4 Priority 4 severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around
Projects
None yet
Development

No branches or pull requests