Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZMQ sends and threads conflict #162

Closed
GuillaumeMercier opened this issue May 27, 2024 · 17 comments
Closed

ZMQ sends and threads conflict #162

GuillaumeMercier opened this issue May 27, 2024 · 17 comments
Assignees

Comments

@GuillaumeMercier
Copy link
Collaborator

GuillaumeMercier commented May 27, 2024

ZMQ doesn't seem to like threads (I saw some stuff about that in the documentation but can't find it right now).
Right now, all threads share the same context and thus uses the same socket to communicate with the server which
might be an issue. Or maybe ZMQ sockets don't like bursts of messages.
The error is the following:

[1,0]<stderr>: [quo-vadis error at (qvi-rmi.cc::qvi_zerr_msg::101)] zmq_msg_send() failed with errno=156384763 (Unknown error 156384763)
[1,0]<stderr>: [quo-vadis error at (qvi-rmi.cc::qvi_zerr_msg::101)] zmq_msg_send() truncated with errno=156384763 (Unknown error 156384763)

Adding some delay in the example (with a call to sleep) fixes the issues, adding a lock doesn't fix anything so I will investigtate the burst of messages. I already tried option for ZMQ sockets but without positive results.

See PR #163

@samuelkgutierrez
Copy link
Member

I'll take a look, @GuillaumeMercier. Thank you.

@samuelkgutierrez
Copy link
Member

I have a fix doing it a brute-force way, but let me see if I can come up with a nicer solution.

@GuillaumeMercier
Copy link
Collaborator Author

And what is the current fix? I'm curious.

@samuelkgutierrez
Copy link
Member

Having a context mutex managed by a lock_guard at the interface boundary. A little heavy handed, so I think I can do better.

@GuillaumeMercier
Copy link
Collaborator Author

But I think I tried this and it didn't work.

@samuelkgutierrez
Copy link
Member

I'm not sure how you implemented it, but mine seems to do the trick.

@GuillaumeMercier
Copy link
Collaborator Author

struct qv_context_s {
    qvi_rmi_client_t *rmi = nullptr;
    qvi_zgroup_t *zgroup = nullptr;
    qvi_bind_stack_t *bind_stack = nullptr;
    pthread_mutex_t lock;

@GuillaumeMercier
Copy link
Collaborator Author

Ok, I'm puzzled.

@GuillaumeMercier
Copy link
Collaborator Author

I guess it has to do with when/where you do lock/unlock

@GuillaumeMercier
Copy link
Collaborator Author

I put the lock/unlock phase around the call to qv_bind_push in qv_thread_routine

@samuelkgutierrez
Copy link
Member

@GuillaumeMercier your issues should be fix by #164. I've also pushed a new branch named thread-bug-work that has the fixes to build your code. When you are ready, please issue a pull request so I can merge your work into master.

@samuelkgutierrez
Copy link
Member

@GuillaumeMercier things look much better now since merging #189. For what it's worth, there were other issues outside the RMI and threading impacting this one (e.g., unreliable binding after split). One particularly nasty bug to track was stale task data being passed to hwloc. Now these data are gathered at the time the call is performed. And the relevant task details update appropriately when threads are introduced. Closing for now, but please re-open if needed.

@GuillaumeMercier
Copy link
Collaborator Author

@samuelkgutierrez : so, did you get rid of the lock?

@GuillaumeMercier
Copy link
Collaborator Author

I'll update my repo and make tests asap.

@samuelkgutierrez
Copy link
Member

Yes, I got rid of the lock. Let me push a minor update before you test.

@GuillaumeMercier
Copy link
Collaborator Author

No sweat, I won't make tests today, but maybe tomorrow.

@samuelkgutierrez
Copy link
Member

The minor change is in. Thank you for testing! Fingers crossed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants