Skip to content

Conserve channels#13155

Merged
thatch45 merged 7 commits into
saltstack:developfrom
jacksontj:conserve_channels
Jun 6, 2014
Merged

Conserve channels#13155
thatch45 merged 7 commits into
saltstack:developfrom
jacksontj:conserve_channels

Conversation

@jacksontj
Copy link
Copy Markdown
Contributor

Fileclient creates many MANY channels back to the master. Each one of these is a new tcp session-- which over links with some latency adds up pretty quick.

@ghost
Copy link
Copy Markdown

ghost commented Jun 1, 2014

Test PASSed.
Refer to this link for build results: http://jenkins.saltstack.com/job/salt-pr-build/4766/

Comment thread salt/minion.py Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left over?

@ghost
Copy link
Copy Markdown

ghost commented Jun 1, 2014

Test PASSed.
Refer to this link for build results: http://jenkins.saltstack.com/job/salt-pr-build/4770/

@jacksontj
Copy link
Copy Markdown
Contributor Author

@s0undt3ch Yes that was ;) Just cleaned that out

@thatch45
Copy link
Copy Markdown
Contributor

thatch45 commented Jun 4, 2014

This wont work, it is a problem with ZeroMQ, I have tried this before, it will work for most of the time but then you start to get dark connection failures
Seriously @jacksontj, this is on the list of reasons why we are working on RAET, it works very well right now if you want to try it out.

@jacksontj
Copy link
Copy Markdown
Contributor Author

I tested this a lot locally and it works fine. Is the issue with timed out connections or somesuch? I'd like to figure out why it fails and fix it. I don't really like the mentality that we can't touch zmq since we are working on something else. Expecially since they are completely different-- RAET is non-gauranteed delivery over UDP.

@techdragon
Copy link
Copy Markdown
Contributor

@thatch45 do you have any in git-history work regarding the dark connection failures you mentioned? I'd love to take a look at this, since I'm with @jacksontj regarding the need to continue work on zmq. Not being happy with the RAET design (no delivery guarantees) has lead me to tinker with the idea of writing my own transport layer using nanomsg (which was made even more possible by having RAET as a second blueprint for how to build a transport), since I'm in a position to contribute code I'll do what i can to assist in maintaining the alternatives, ZeroMQ, SSH, etc, or I'll build my own and contribute it back up.

@jacksontj
Copy link
Copy Markdown
Contributor Author

I spent some time this morning trying to break our zmq channels, and turns out its pretty easy! I've just added 2 commits which cover all the cases I found where we break a zmq connection-- the one on the master side makes it REALLY easy to DoS a master :/

@ghost
Copy link
Copy Markdown

ghost commented Jun 5, 2014

Test Failed.

If the failures are unrelated to your code, don't stress, a core developer will know these apart.
In the future, if possible, please branch off a passing commit to avoid false positives.
Refer to this link for build results: http://jenkins.saltstack.com/job/salt-pr-build/4886/

@thatch45
Copy link
Copy Markdown
Contributor

thatch45 commented Jun 5, 2014

@techdragon We have added delivery guarantees to RAET. We are taking RAET very seriously and expect it to hold up.

Sounds good @jacksontj, I am looking over it

@ghost
Copy link
Copy Markdown

ghost commented Jun 5, 2014

Test Failed.

If the failures are unrelated to your code, don't stress, a core developer will know these apart.
In the future, if possible, please branch off a passing commit to avoid false positives.
Refer to this link for build results: http://jenkins.saltstack.com/job/salt-pr-build/4902/

@thatch45
Copy link
Copy Markdown
Contributor

thatch45 commented Jun 5, 2014

Looks like the build flunked out, I will run another one

@thatch45 thatch45 closed this Jun 5, 2014
@thatch45 thatch45 reopened this Jun 5, 2014
@thatch45
Copy link
Copy Markdown
Contributor

thatch45 commented Jun 5, 2014

@techdragon, what makes you think that RAET does not have delivery guarantees? It has been specifically designed to have them, perhaps you were working with it before they were added in, they have only been in for about 4 weeks.

@thatch45
Copy link
Copy Markdown
Contributor

thatch45 commented Jun 5, 2014

We will have docs up soon explaining delivery guarantees in RAET and how it works, we do not not miss information about RAET getting out!

@ghost
Copy link
Copy Markdown

ghost commented Jun 5, 2014

Test Failed.

If the failures are unrelated to your code, don't stress, a core developer will know these apart.
In the future, if possible, please branch off a passing commit to avoid false positives.
Refer to this link for build results: http://jenkins.saltstack.com/job/salt-pr-build/4904/

@jacksontj
Copy link
Copy Markdown
Contributor Author

@thatch45 Looks like the build failed again, any ideas why its not finishing? The jenkins output is.... cryptic ;)

@jacksontj
Copy link
Copy Markdown
Contributor Author

@thatch45 k, looks like when my editor barfed on my code earlier today i missed one :)

@ghost
Copy link
Copy Markdown

ghost commented Jun 6, 2014

Test Failed.

If the failures are unrelated to your code, don't stress, a core developer will know these apart.
In the future, if possible, please branch off a passing commit to avoid false positives.
Refer to this link for build results: http://jenkins.saltstack.com/job/salt-pr-build/4920/

@techdragon
Copy link
Copy Markdown
Contributor

@thatch45 - I haven't had time to catch up on about the last 8 weeks of RAET work, so the guarantees are definitely new to me, good to hear. Looking forward to seeing the documentation on them 👍

@ghost
Copy link
Copy Markdown

ghost commented Jun 6, 2014

Test Failed.

If the failures are unrelated to your code, don't stress, a core developer will know these apart.
In the future, if possible, please branch off a passing commit to avoid false positives.
Refer to this link for build results: http://jenkins.saltstack.com/job/salt-pr-build/4922/

jacksontj added 7 commits June 5, 2014 23:00
Today if you create a client that does

```
import zmq

context = zmq.Context()
socket = None
while True:
    del socket
    socket = context.socket(zmq.REQ)
    socket.connect('tcp://127.0.0.1:4506')
    socket.send('foo')
```

Each asymetric call will cause a master process to go defunct. This patch will catch those exceptions, log them, and rebind the worker zmq socket.
… master is too busy and we leave the socket in a state where we've done a send but not a recv.
@jacksontj
Copy link
Copy Markdown
Contributor Author

Here is a rebase on a newer develop, the test output didn't make a lot of sense... so hopefully it was something else :)

@ghost
Copy link
Copy Markdown

ghost commented Jun 6, 2014

Test Failed.

If the failures are unrelated to your code, don't stress, a core developer will know these apart.
In the future, if possible, please branch off a passing commit to avoid false positives.
Refer to this link for build results: http://jenkins.saltstack.com/job/salt-pr-build/4929/

@thatch45
Copy link
Copy Markdown
Contributor

thatch45 commented Jun 6, 2014

Thanks @techdragon, RAET is not quite ready for general consumption but we are VERY close to the initial beta of salt with RAET.

ok @jacksontj, we will give this a spin!

thatch45 added a commit that referenced this pull request Jun 6, 2014
@thatch45 thatch45 merged commit 6dfbeb3 into saltstack:develop Jun 6, 2014
@jacksontj
Copy link
Copy Markdown
Contributor Author

@thatch45 Ok :) Not sure why the tests were "failing" since if you look at the results it says "Test Result (no failures)". Also looks like someone changed the pylint checks-- since there are LOTS of errors all of a sudden ;)

@jacksontj jacksontj mentioned this pull request Jun 6, 2014
@s0undt3ch
Copy link
Copy Markdown
Contributor

@jacksontj when these lint error bumps occurs, you can usually blame me 😄 Sorry about the distraction.

cachedout pushed a commit to cachedout/salt that referenced this pull request Jun 9, 2014
According to saltstack#13328, the minion was stacktracing on state but
operating normally after that.

I believe this problem may have originated in the channel-reuse
changes in saltstack#13155. It appears that the intention there may have
been to move the crypt functionality into the transport, which
would make it unecessary in the minion method itself. (Please
correct me if I'm wrong, of course. I'm making an assumption here.)

At any rate, this removes that duplicate functionality, thus
eliminating the stack trace.
jacksontj added a commit to jacksontj/salt that referenced this pull request Jun 11, 2014
…annel object.

This is an improvement over saltstack#10332 in that you don't have to worry about someone creating a ZeroMQChannel then forking (thereby breaking the thread-pid rules). This way once it moves it will create a new sreq as needed.

This is possible now that saltstack#13155 is merged
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants