Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2015.5.0: Masterless sync_all hangs at MinionEvent PULL #22796

Closed
obestwalter opened this issue Apr 17, 2015 · 30 comments
Closed

2015.5.0: Masterless sync_all hangs at MinionEvent PULL #22796

obestwalter opened this issue Apr 17, 2015 · 30 comments
Labels
Bug broken, incorrect, or confusing behavior P1 Priority 1 Platform Relates to OS, containers, platform-based utilities like FS, system based apps severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around Windows
Milestone

Comments

@obestwalter
Copy link
Contributor

I have a similar problem like in the closed issue #8058.

I can reproduce the problem reliably in a vagrant box with Windows 2008 R2 x64 and 2015.2.0rc2. It is hanging on SaltEvent PUB socket URI.

versions:

PS D:> salt-call --versions-report
Salt: 2015.2.0rc2
Python: 2.7.8 (default, Jun 30 2014, 16:08:48) [MSC v.1500 64 bit (AMD64)]
Jinja2: 2.7.3
M2Crypto: 0.21.1
msgpack-python: 0.4.5
msgpack-pure: Not Installed
pycrypto: 2.6
libnacl: Not Installed
PyYAML: 3.11
ioflo: Not Installed
PyZMQ: 14.5.0
RAET: Not Installed
ZMQ: 4.0.5
Mako: Not Installed

minion config:

file_client: local
file_roots:
    base:
      - c:\\vagrant
log_level: debug

salt-call --local saltutil.sync_all

[DEBUG   ] Local cache dir: 'c:\\salt\\var\\cache\\salt\\minion\\files\\base\\_utils'
[DEBUG   ] LazyLoaded event.fire
[DEBUG   ] SaltEvent PUB socket URI: ipc://c:\salt\var\run\salt\minion\minion_event_4c08c87836_pub.ipc
[DEBUG   ] SaltEvent PULL socket URI: ipc://c:\salt\var\run\salt\minion\minion_event_4c08c87836_pull.ipc

... and it hangs

@obestwalter obestwalter changed the title 2015.2.0rc2: Masterless highstate hangs (still or again) at MinionEvent PULL 2015.2.0rc2: Masterless sync_all hangs at MinionEvent PULL Apr 17, 2015
@rallytime
Copy link
Contributor

Thanks for the report @obestwalter.

@rallytime rallytime added Bug broken, incorrect, or confusing behavior severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around Windows P3 Priority 3 labels Apr 17, 2015
@rallytime rallytime added this to the Approved milestone Apr 17, 2015
@obestwalter obestwalter changed the title 2015.2.0rc2: Masterless sync_all hangs at MinionEvent PULL 2015.5.0: Masterless sync_all hangs at MinionEvent PULL May 13, 2015
@obestwalter
Copy link
Contributor Author

I just had a look at the Priority labels - P3 means half will see the bug half will not. As priority levels are relative to their functional area I would say this is clearly P1 as 100% of people who try to use the minion masterless on windows will see the bug - or am I missing something?

I think this is especially significant because it completely breaks windows with vagrant + salt provisioner (in cooperation with a bag of bugs that vagrant has regarding to salt provisioning atm).

@rallytime rallytime added P1 Priority 1 and removed P3 Priority 3 labels May 14, 2015
@rallytime
Copy link
Contributor

All fair points. I've made the label changes as needed.

@obestwalter
Copy link
Contributor Author

Thanks Nicole, I am on a bit of a crusade atm to get salt and vagrant on windows working and then keep it that way :)

@rallytime
Copy link
Contributor

ping @UtahDave and @twangboy

@jfindlay jfindlay added the Platform Relates to OS, containers, platform-based utilities like FS, system based apps label May 26, 2015
@twangboy
Copy link
Contributor

twangboy commented Jun 3, 2015

@obestwalter Could you send me a copy of your vagrantfile so I can reproduce?

@obestwalter
Copy link
Contributor Author

It's not specific to vagrant. I would say running a masterless minion on windows (at least 2008 r2) is always hanging. Correct me if I am wrong.

@twangboy
Copy link
Contributor

twangboy commented Jun 3, 2015

Oh, I thought this was specific to vagrant... I've been getting a vagrant environment all set up to troubleshoot this...

@obestwalter
Copy link
Contributor Author

I just detailed the way I reproduced the referenced closed bug - sorry if this was confusing.

Speaking of vagrant though: I'd be absolutely thrilled if this bug would be squashed soon, because it's one of the bugs that renders salt unusable in combination with vagrant on Windows.

@twangboy
Copy link
Contributor

twangboy commented Jun 3, 2015

This command works fine for me. Are there some unique permission on C:\vagrant? I used C:\salt\roots. I ran it from the head of 2015.5 and it worked, so I thought it was an issue with 2015.2rc2, so I spun up a vm and installed 2015.2rc2 and set up the environment. It worked there too. Are you running salt-call from an Administrator cmd prompt?

@twangboy
Copy link
Contributor

twangboy commented Jun 3, 2015

Vagrant might be relevant here because c:\vagrant is the default share for vagrant, right? Sorry, I'm not the king of vagrant.

@obestwalter
Copy link
Contributor Author

C:/vagrant is automatcally mapped by vagrant as a shared folder with full read/write access.

It surprises me that it works for you with any setup though ... I had this marked as non vagrant specific and very reliably broken - so maybe there is something specific about all the ways I was testing this, that triggers the problem. Could you post your Vagrantfile here, so I can verify a setup that actually works and work my way towards the problem?

When you run it on a non vagrant setup masterless with local file roots everything is fine with the latest release?

@twangboy
Copy link
Contributor

twangboy commented Jun 4, 2015

I haven't done anything with my Vagrantfile. It's just the default one that get's created when you do a vagrant init. I added the path to the box is all. Here's what I have in my vagrant file, minus all the commented stuff:

Vagrant.configure(2) do |config|
  config.vm.box = "mwrock/Windows2012R2"
  config.vm.provider "hyperv"
  config.vm.communicator = "winrm"
end

Everything else is commented out. I'm also using hyperv instead of virtualbox, it seems to be more responsive on windows for me. I finally got my C:\Vagrant folder to work and I tried moving the file_roots there. It still worked for me.
Here's my minion conf (minus comments):

ipc_mode: tcp
root_dir: c:\salt
pki_dir: /conf/pki/minion
file_client: local
file_roots:
  base:
    - C:\vagrant
multiprocessing: False
log_level: debug

Here's my output from the saltutil.sync_all command (picking up where yours left off):

[DEBUG   ] LazyLoaded event.fire
[DEBUG   ] SaltEvent PUB socket URI: tcp://127.0.0.1:4510
[DEBUG   ] SaltEvent PULL socket URI: tcp://127.0.0.1:4511
[DEBUG   ] Sending event - data = {'_stamp': '2015-06-04T15:59:49.443000'}
[DEBUG   ] Sending event - data = {'_stamp': '2015-06-04T15:59:50.447000'}
[DEBUG   ] LazyLoaded nested.output
local:
    ----------
    grains:
    modules:
    outputters:
    renderers:
    returners:
    states:
    utils:

Versions report:

C:\salt>salt-call --versions-report
           Salt: 2015.2.0rc2
         Python: 2.7.8 (default, Jun 30 2014, 16:08:48) [MSC v.1500 64 bit (AMD6
4)]
         Jinja2: 2.7.3
       M2Crypto: 0.21.1
 msgpack-python: 0.4.5
   msgpack-pure: Not Installed
       pycrypto: 2.6
        libnacl: Not Installed
         PyYAML: 3.11
          ioflo: Not Installed
          PyZMQ: 14.5.0
           RAET: Not Installed
            ZMQ: 4.0.5
           Mako: Not Installed

@obestwalter
Copy link
Contributor Author

obestwalter commented Jun 4, 2015

First thing jumping at me is the ipc mode explicitely set to tcp. I am
pretty sure I did not touch that, so whatever the default is, is what I am
using. Might that be the problem already? I know nothing yet about this
setting and it's relevance, though.

On Thu, 4 Jun 2015 18:59 Shane Lee notifications@github.com wrote:

I haven't done anything with my Vagrantfile. It's just the default one
that get's created when you do a vagrant init. I added the path to the box
is all. Here's what I have in my vagrant file, minus all the commented
stuff:

Vagrant.configure(2) do |config|
config.vm.box = "mwrock/Windows2012R2"
config.vm.provider "hyperv"
config.vm.communicator = "winrm"
end

Everything else is commented out. I'm also using hyperv instead of
virtualbox, it's seems to be more responsive on windows for me. I finally
got my C:\Vagrant folder to work and I tried moving the file_roots there.
It still worked for me.
Here's my minion conf (minus comments):

ipc_mode: tcp
root_dir: c:\salt
pki_dir: /conf/pki/minion
file_client: local
file_roots:
base:
- C:\vagrant
multiprocessing: False
log_level: debug


Reply to this email directly or view it on GitHub
#22796 (comment).

@lorengordon
Copy link
Contributor

The setting ipc_mode: tcp is part of the minion config packaged with the Windows installer. So if you're using the Windows installer, that should be what you're getting by default.

It defaults to ipc_mode: ipc in the generic config. If you're installing from source, or pip, then you may be pulling down this file, instead, when setting up the minion config:

@obestwalter
Copy link
Contributor Author

Interesting. Does that mean it is broken by design on windows if you unknowingly use the real minion default for this setting on windows, like I did? Will investigate that next week and report back

@lorengordon
Copy link
Contributor

Eh, I don't know if I'd go as far as to say that it's broken by design. I would expect the history developed the other way around...that the Windows-specific conf file exists to fix issues that were identified with the Windows client. There are lots of ipc_mode+Windows hits if you search the issues.

That said, I do think having a wholly separate conf file is a rather static and fragile way to manage the Windows-specific settings. It would maybe be nice to keep a dictionary of Windows-specific settings that are required to avoid breakage on the platform, and for the minion to merge those settings with the default config to override the default settings. Probably want to log a warning of settings that are overridden this way, so admins have a clue that the conf file may not be the final word. Perhaps with a switch to control the override so that devs can live on the edge and test whether new code fixes the breakage.

@lorengordon
Copy link
Contributor

On second thought, that's overly complicated. Just default to the platform-specific configuration settings internally, and use the conf file to override the defaults. This is basically how it already works, just need to implement the platform-specific logic (some may already be there, now that I think about it) and document what they are in the conf file. It would perhaps be nice to log if the conf file contains settings that differ from the known 'safe' settings.

@obestwalter
Copy link
Contributor Author

Just default to the platform-specific configuration settings internally,
and use the conf file to override the defaults.

That's spot on Loren!

@lorengordon
Copy link
Contributor

@obestwalter, I was poking around the source this morning, and much of the logic for platform-specific default values is already in place. The Windows-specific conf file might have been the wrong trouble-shooting pathway...

Defaults are set in defaults.py, which pulls some values from syspaths.py. From my read, the only Windows-specific value that is set in the Windows-specific conf but not in some platform-specific default logic is multiprocessing: False. So, perhaps focus your testing around that setting? If that makes no difference, then we may be looking down the wrong pathway here.

@twangboy
Copy link
Contributor

twangboy commented Jun 9, 2015

@obestwalter Did you try changing to ipc_mode: tcp?

@obestwalter
Copy link
Contributor Author

@twangboy Not yet. Today I built a public box for the vagrant powered version of your salt-windows-dev repo. Any chance to get this merged, when it's done (hopefully tomorrow)? See: vmware-archive/salt-windows-dev#12

@twangboy
Copy link
Contributor

twangboy commented Jun 9, 2015

What needs to be merged? Just the Vagrantfile?

@obestwalter
Copy link
Contributor Author

What needs to be merged? Just the Vagrantfile?

@twangboy I am glad you're open for this approach. Let's move this discussion to the assciated issue

@obestwalter
Copy link
Contributor Author

The immediate issue is resolved and had to do with proper defaults not being set. This is resolved with #24749 and released since 2015.5.3

@PatOShea
Copy link
Contributor

Wow. thanks @obestwalter @twangboy. I'm going to have vagrant's documentation updated for the plugin so no one else runs into this issue.

@obestwalter
Copy link
Contributor Author

@PatOShea - which part of the Vagrant docs would need updating? In my view the problem here was entirely on the side of Salt, as proper defaults weren't loaded on windows. This problem should be fixed now on the Salt side and you can use a minimal minion config and it should work.

@PatOShea
Copy link
Contributor

PatOShea commented Oct 1, 2015

I agree if one uses a minimal configuration you should be good. You'll
need that config value, if you try to copy a "custom" minion file over on
provisioning. ran into this recently and was stick until I found this bug.
On Oct 1, 2015 5:48 AM, "Oliver Bestwalter" notifications@github.com
wrote:

@PatOShea https://github.com/PatOShea - which part of the Vagrant docs
would need updating? In my view the problem here was entirely on the side
of Salt, as proper defaults weren't loaded on windows. This problem should
be fixed now on the Salt side and you can use a minimal minion config and
it should work.


Reply to this email directly or view it on GitHub
#22796 (comment).

@obestwalter
Copy link
Contributor Author

@PatOShea o.k. could you then please mention this issue in the vagrant bug, so we can see the connection?

@PatOShea
Copy link
Contributor

PatOShea commented Oct 1, 2015

Definitely. I will try to make the pull request this afternoon.
On Oct 1, 2015 5:59 AM, "Oliver Bestwalter" notifications@github.com
wrote:

@PatOShea https://github.com/PatOShea o.k. could you then please
mention this issue in the vagrant bug, so we can see the connection?


Reply to this email directly or view it on GitHub
#22796 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug broken, incorrect, or confusing behavior P1 Priority 1 Platform Relates to OS, containers, platform-based utilities like FS, system based apps severity-medium 3rd level, incorrect or bad functionality, confusing and lacks a work around Windows
Projects
None yet
Development

No branches or pull requests

6 participants