Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to re-eastablish connection? #49

Open
jaromrax opened this issue Sep 11, 2020 · 9 comments
Open

How to re-eastablish connection? #49

jaromrax opened this issue Sep 11, 2020 · 9 comments

Comments

@jaromrax
Copy link

jaromrax commented Sep 11, 2020

Hello,
I have seen the project long ago, only now I tried.

However, I found, that if imagehub (receiver) is restarted, the sender is stalled.
Probably waiting the 'OK' response. Is there a way to timeout the sender?

Pointing to an IP from the client and shooting an image is very convenient, but the interruption means a kill and restart on client...

thank you
jaro

@jeffbass
Copy link
Owner

Hi jaro @jaromrax,

Thanks for your question. One disadvantage of the ZMQ REQ/REP messaging pattern is that the sender needs to restart if the imagehub (receiver) is restarted. This is expected ZMQ behavior and is mentioned in the ZMQ documentation. I use the REQ/REP pattern in my own production systems and REQ/REP does require a timeout watch method in the sender program. I mention one sender timeout technique using the Linux SIGALRM signal in this imageZMQ FAQ here. I mention a couple of others below.

There are other ways a REP timeout watcher could be implemented; some “more robust REQ/REP patterns” are discussed in the ZMQ documentation here.

In my own imagenode programs, I restart the sender program when the imagehub program is restarted. I use 2 different timeout techniques. Both of these techniques are in my imagenode GitHub repository: 1) using a try except block for each REQ that sets a timeout using signal.SIGALRM; this code uses a Patience class and is in the main imagenode branch here, lines 38-45. and 2) appending the precise time of each REQ sent and each REP received to a deque and using a separate timeout watching method running in a thread. This is in a test imagenode stall_watcher branch here, lines 262-293 & lines 317-350. I am developing the alternative to my Patience class for 2 reasons: signal.SIGALRM does not work in Python threads and 2) signal.SIGALRM does not exist in Windows which was pointed in this imagenode issue by a Windows user. My own production systems have been running for over 2 years and I have found that timeout watching of REP sends is needed because of network and power glitches more often than for imagehub restarts. But a timeout watcher is definitely needed.

One of the imageZMQ users (Pat Ryan @youngsoul) developed a different technique for a REP timeout watcher and ImageSender restart. I mention it in the Useful Forks section of the imagaZMQ README.rst. The direct link to the Pat’s code is here: Pat’s code uses signal.SIGALRM.

Note that you can close and restart the ImageSender (as Pat Ryan does) or you can restart the image sender program (as my own imagenode program does).

Thanks for your question. It is a good one, so I will put together a few examples of the above and add them to the imageZMQ examples folder in the next week or so. Please feel free to ask follow up questions as comments in this issue if you need more details before then.

Jeff

@jaromrax
Copy link
Author

Thank you very much, for very elaborate and kind response, I am just in the middle of trying @youngsoul solution now. I have found it earlier than the references you point at. Do you think, there are some functional differences between the two approaches?
Thanks
Jaro

@jeffbass
Copy link
Owner

Hi Jaro,
I haven't used Pat's solution yet, so I don't have specific feedback about it. I would love to hear your thoughts after you try it. I am currently leaning toward the method 2) that I mentioned above. I am a couple of weeks into testing it and it has minimal effects on latency and throughput. It puts the "watching for a REP after REQ" task in a separate Python thread, which fits well with my own imagenode project.
Jeff

@jaromrax
Copy link
Author

Dear Jeff,
after some tweakin, I realized that I must remove the release of SIG in def timeout. After this, I can arbitrarily start and stop receiver and sender. However, I have seen a crash when starting one of the codes - twice.
Jaro

@jeffbass
Copy link
Owner

Hi Jaro,

One thing you might want to try is set the zmq.LINGER option after each start or restart of the sender:

        sender = imagezmq.ImageSender(connect_to=hub_address)
        sender.zmq_socket.setsockopt(zmq.LINGER, 0)  # prevents ZMQ error on exit

That helped eliminate some restart errors for me.
Jeff

@jaromrax
Copy link
Author

Dear Jeff.
I was trying to combine an older but great flask webcam example (of Adrian Rosebrock?) with the SIGALRM version of imagezmq. And I have hit the barrier you have mentioned - while the izmq req/rep works stable now, it works only in the main thread... The option 2/ you have mentioned may solve this...

@jeffbass
Copy link
Owner

jeffbass commented Oct 1, 2020

Hi Jaro,
I am testing my option 2 now and have been for about a month. It is working well on a dozen Raspberry Pi's. Give it a try and let me know how it works for you. Note that the option 2 code is in the stall_watcher branch of imagenode, not the master branch. When I have completed my testing, I will merge it into the master branch. Good luck!
Jeff

@jaromrax
Copy link
Author

Dear Jeff, sorry for not coming back for so long. That sounds great, I just tried to see the branches (it is long from October), but I see no other branches are in imagenode than master. Nor in imagezmq. Thank you

@jeffbass
Copy link
Owner

Hi Jaro @jaromrax,
I've merged the branches that I referred to in previous comments. Here are 2 code examples that may help you:

  1. imageZMQ timeout & restart example program: timeout_req_ImageSender.py
  2. My "option 2" discussed above is now merged as an option inimagenode. The REP_watcher() runs in a separate thread and watches for a excessively long time of "REQ sent and no REP received". The optional REP_watcher send method send_jpg_frame_REP_watcher() appends send and receive times to a deque; see lines 271-367 in imagenode's imaging.py. You can learn more about the REP_watcher option in the imagenode YAML settings docs here.

I hope this helps. Let me know if you have other questions.
Jeff

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants