Conserve channels#13155
Conversation
|
Test PASSed. |
|
Test PASSed. |
|
@s0undt3ch Yes that was ;) Just cleaned that out |
|
This wont work, it is a problem with ZeroMQ, I have tried this before, it will work for most of the time but then you start to get dark connection failures |
|
I tested this a lot locally and it works fine. Is the issue with timed out connections or somesuch? I'd like to figure out why it fails and fix it. I don't really like the mentality that we can't touch zmq since we are working on something else. Expecially since they are completely different-- RAET is non-gauranteed delivery over UDP. |
|
@thatch45 do you have any in git-history work regarding the dark connection failures you mentioned? I'd love to take a look at this, since I'm with @jacksontj regarding the need to continue work on zmq. Not being happy with the RAET design (no delivery guarantees) has lead me to tinker with the idea of writing my own transport layer using nanomsg (which was made even more possible by having RAET as a second blueprint for how to build a transport), since I'm in a position to contribute code I'll do what i can to assist in maintaining the alternatives, ZeroMQ, SSH, etc, or I'll build my own and contribute it back up. |
|
I spent some time this morning trying to break our zmq channels, and turns out its pretty easy! I've just added 2 commits which cover all the cases I found where we break a zmq connection-- the one on the master side makes it REALLY easy to DoS a master :/ |
|
Test Failed. If the failures are unrelated to your code, don't stress, a core developer will know these apart. |
|
@techdragon We have added delivery guarantees to RAET. We are taking RAET very seriously and expect it to hold up. Sounds good @jacksontj, I am looking over it |
|
Test Failed. If the failures are unrelated to your code, don't stress, a core developer will know these apart. |
|
Looks like the build flunked out, I will run another one |
|
@techdragon, what makes you think that RAET does not have delivery guarantees? It has been specifically designed to have them, perhaps you were working with it before they were added in, they have only been in for about 4 weeks. |
|
We will have docs up soon explaining delivery guarantees in RAET and how it works, we do not not miss information about RAET getting out! |
|
Test Failed. If the failures are unrelated to your code, don't stress, a core developer will know these apart. |
|
@thatch45 Looks like the build failed again, any ideas why its not finishing? The jenkins output is.... cryptic ;) |
|
@thatch45 k, looks like when my editor barfed on my code earlier today i missed one :) |
|
Test Failed. If the failures are unrelated to your code, don't stress, a core developer will know these apart. |
|
@thatch45 - I haven't had time to catch up on about the last 8 weeks of RAET work, so the guarantees are definitely new to me, good to hear. Looking forward to seeing the documentation on them 👍 |
|
Test Failed. If the failures are unrelated to your code, don't stress, a core developer will know these apart. |
Today if you create a client that does
```
import zmq
context = zmq.Context()
socket = None
while True:
del socket
socket = context.socket(zmq.REQ)
socket.connect('tcp://127.0.0.1:4506')
socket.send('foo')
```
Each asymetric call will cause a master process to go defunct. This patch will catch those exceptions, log them, and rebind the worker zmq socket.
… master is too busy and we leave the socket in a state where we've done a send but not a recv.
…"undo" apparently i missed this one ;)
|
Here is a rebase on a newer develop, the test output didn't make a lot of sense... so hopefully it was something else :) |
|
Test Failed. If the failures are unrelated to your code, don't stress, a core developer will know these apart. |
|
Thanks @techdragon, RAET is not quite ready for general consumption but we are VERY close to the initial beta of salt with RAET. ok @jacksontj, we will give this a spin! |
|
@thatch45 Ok :) Not sure why the tests were "failing" since if you look at the results it says "Test Result (no failures)". Also looks like someone changed the pylint checks-- since there are LOTS of errors all of a sudden ;) |
|
@jacksontj when these lint error bumps occurs, you can usually blame me 😄 Sorry about the distraction. |
According to saltstack#13328, the minion was stacktracing on state but operating normally after that. I believe this problem may have originated in the channel-reuse changes in saltstack#13155. It appears that the intention there may have been to move the crypt functionality into the transport, which would make it unecessary in the minion method itself. (Please correct me if I'm wrong, of course. I'm making an assumption here.) At any rate, this removes that duplicate functionality, thus eliminating the stack trace.
…annel object. This is an improvement over saltstack#10332 in that you don't have to worry about someone creating a ZeroMQChannel then forking (thereby breaking the thread-pid rules). This way once it moves it will create a new sreq as needed. This is possible now that saltstack#13155 is merged
Fileclient creates many MANY channels back to the master. Each one of these is a new tcp session-- which over links with some latency adds up pretty quick.