New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libflux: store fully decoded message in flux_msg_t #3701
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After a quick skim (since this is WIP), I have a couple quick comments inline and one general one:
Suggestion: dispense with the internal "proto block accessors", and instead just add a couple of functions that can be used as helpers by msg_to_zmsg()
and zmsg_to_msg()
, e.g.
static int proto_encode (void *buf, int len, const flux_msg_t *msg);
static int proto_decode (const void *buf, int len, flux_msg_t *msg);
I'd suggest adding another generic union name to the two u32's like aux1
and aux2
and, in those functions, just encode/decode from/to those instead of picking a union member based on the message type.
src/common/libccan/ccan/str/str.h
Outdated
* streq - Are two strings equal? | ||
* ccanstreq - Are two strings equal? | ||
* @a: first string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that before but I figured it would just be easier to do stuff like:
#include <czmq.h>
#ifdef streq
#undef streq
#endif
#include "src/common/libccan/ccan/str/str.h"
in places where czmq.h and str.h are both used?
If that doesn't feel right, then it seems like just conditionally defining streq in str.h would be better than a rename, since the macros are equivalently defined.
src/common/libflux/message.c
Outdated
uint8_t magic; | ||
uint8_t version; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
magic and version could probably be left out of the struct and just added/checked when the message is encoded/decoded?
src/common/libflux/message.c
Outdated
struct route_id *r = NULL; | ||
if (!(r = calloc (1, sizeof (*r)))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't need to set r to NULL right before you set it to the result of calloc()
. Let's not make gratuitous initialization a thing.
src/common/libflux/message.c
Outdated
if (!(r->id = malloc (id_len + 1))) | ||
return NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
strndup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or make it one malloc for the struct plus the string and use set id = r + 1
?
9ce45cd
to
d692715
Compare
re-pushed, did some more cleanup per some comments above. I suddenly recalled the:
trick for the route id. Wondering if there are other places in Could this be a potential merge-able candidate? Or should we consider making more progress on #3617 within the PR? |
Oh heh,
I'd be tempted to suggest that we try to get these functions out of the public API in this PR: /* Send message to zeromq socket.
* Returns 0 on success, -1 on failure with errno set.
*/
int flux_msg_sendzsock (void *dest, const flux_msg_t *msg);
int flux_msg_sendzsock_ex (void *dest, const flux_msg_t *msg, bool nonblock);
/* Receive a message from zeromq socket.
* Returns message on success, NULL on failure with errno set.
*/
flux_msg_t *flux_msg_recvzsock (void *dest); Then the PR could be re-titled with release notes in mind, reflecting the outwardly visible change, like:
To accomplish that i think it means we would need to move the |
Just ran a 40K job throughput test a couple times and it seems like these changes are somewhat helpful?
compared with e.g. master
|
Yeah, I just recalled it suddenly. I guess it was the right context for remembering it since
Sounds good. What about the encode/decode functions? I feel I can "hand" encode data in the encode functions (probably will for performance), but feel decode will still need zmsg (I lazily used zmsg in encode b/c of the latter case).
Nice, I hadn't run a benchmark yet b/c I was waiting to do more work (possibly remove some additional to/from zmsg/msg). |
That would mean that message.c would be czmq free (!) but as soon as one opens the broker, the local connector would pull it in. So progress I guess :-) |
I'm fuzzy on how message parts are delimited when czmq encodes the message to a buffer. |
ugh, one builder hit a lot of
not sure what subtlety the compiler is seeing, but i guess should initialize everywhere. |
I can't remember the details (or if I spleunked far enough into the CCAN list implementation to find them) but the need to initialize that variable seems to be a property of the CCAN list_for_each_* macros. |
d692715
to
9060cee
Compare
I made an attempt to move the czmq-requiring functions from I'm now thinking that a utility library might be best to share between
in the helper lib ( and
As an aside, during this experiment I also split |
I think it would be a good step to try to do whatever refactoring is necessary to eliminate
What's accomplished by splitting the struct? I think it is fine if the whole thing is exposed in |
Having second thoughts about suggesting that all message encode/decode be pushed to librouter. Maybe we should keep message.[ch] a complete implementation of RFC 3. That is, perform encoding in there down to the level of frames. The encoding that would go in librouter in support of sendfd/sendzmq would translate an opaque list of frames to the underlying wire protocol. That would mean adding some interfaces to message.[ch] for the frame based encoding/decoding. We could reinvent zframe for this, but I would think it might be a little nicer to use something like an array of iovecs, similar to readv(2), writev(2). Any thoughts? I'm wondering if we ought to make some time to chat about this in a coffee meeting to get on the same page? |
9060cee
to
16ee565
Compare
re-pushed w/ my experimental
which worked out pretty well. We can eliminate This was mostly done before @garlick comments above as I experimented to see how this would work.
I believe
agreed
Will definitely need a Aside, do we have a code style pattern for "reverse iterator", like "foo_next_rev ()"? Or perhaps keeping track if the user called (Edit: it appears
The
Its simply for abstraction purposes, limiting access to only the fields necessary. If we can eliminate some of the "common functions" like you said above, then the benefit may not be there anymore. |
Hmmm, the idea is intriguing. I think the only thing is I don't know if it's a big win at this point. I don't know the zsock code that well so I don't know how much work with would be to change a lot of the zsock code. Basically:
So if we implemented an internal I imagine you were thinking we could do something like:
and The value could be there, but I don't think I know the zsock code well enough to make a judgement. |
zmsg_t
to store protocol data in flux_msg_t
e729ae3
to
5748c6e
Compare
zmsg_t
to store protocol data in flux_msg_t
zmsg_t
to store protocol data in flux_msg_t
Per offline discussion, we've decided that this PR with just the remove of Re-pushed with what I believe is my final cleanup. Some cleanup tradeoffs I made: Because of the fact field values are easily accessible in the
should be removed and replaced with direct access to the field in question. i.e. just read Generally speaking I left the calls in, b/c its good abstraction and who knows how things could change in the future. The two exceptions I made were to access
which of course just reading In this PR there is a I actually have a branch that removed zmsg needs from We also discussed adding We also discussed re-formatting A small throughput performance test: Master branch
and this branch
ran several times, but the results are largely the same. There's a small incremental gain on throughput performance. The most important number is probably the one at the end, how long everything took to run. |
4cc8177
to
fe5055a
Compare
re-pushed, fixing up some compiler warnings a builder found |
fe5055a
to
f373cf9
Compare
Is this ready for another review? Suggestion: change title to
|
zmsg_t
to store protocol data in flux_msg_t
@garlick yup, ready for review |
looking through code coverage, there are alot of new paths being missed due to changes. I'm going to re-org the tests and add a bunch more coverage. But I think the main |
f373cf9
to
76d74c2
Compare
re-pushed, adding some additional tests. I am going to split off all of the cleanup commits into a new PR, b/c they are starting to add up after a test/message.c re-org of code (this is PR #3745) |
7885739
to
04b5f52
Compare
04b5f52
to
8bada62
Compare
goto nomem; | ||
p += n; | ||
} | ||
if (zmsg_to_msg (msg, zmsg) < 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this is an intentional baby step, but there seems to be a disparity between flux_msg_encode()
which is zmsg/zframe free, and flux_msg_decode()
which is not. Is there a good reason not to go all the way and convert the decode path as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually have a branch that did remove zmsg
use from flux_msg_decode()
, using a zlist
instead (b/c that's all I'm really using zmsg
for in flux_msg_decode()
, just as a list) and I created a zlist_to_msg()
kinda function.
But zmsg_to_msg()
is still needed for flux_msg_recvzsock()
. And a (good) common function between zlist_to_msg()
and zmsg_to_msg()
was annoying enough I just abandoned the zlist_to_msg()
until the longer term plan was established.
We could go ahead and add zlist_to_msg()
here b/c I think we're going to end up having two functions in the end anyways (one in libflux
and one in librouter
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the interest of keeping things moving forward, maybe we should leave as is. This is a big change and there will be opportunities to refine it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not spotting any problems that we aren't confident we can resolve a few steps down the road. As noted earlier, just making the accessors more direct has a positive performance impact, so let's just go full speed ahead.
8bada62
to
418902f
Compare
@garlick thanks, i'll set MWP |
bionic builder seems to have been aborted while running the shell input sharness test. Restarted. |
Problem: The flux_msg_t data structure stores all internal data within a zmsg_t data structure. This leads to libflux always requiring libczmq. Solution: Remove use of zmsg_t when storing data internally within flux_msg_t, instead using lists, memory allocations, etc. Adjust all API functions accordingly. In the few functions that still require zmsg, convert flux_msg_t to/from zmsg_t as needed. This leads libflux to still require libczmq for the time being, but decreases the footprint of functions that need to be migrated out of libflux-core.
Problem: Many corner cases were missed due to code changes from the removal of zmsg internally in flux_msg_t. Solution: Add a large number of new corner case tests to cover missed paths and missed corner cases. Consolidate tests in check_cornercase() when appropriate.
Problem: With fully decoded routes in flux_msg_t now, test messages specifying a "delim" no longer make sense. Solution: Update tests to be more clear that there is no longer a delimeter.
hmmm centos7 builder failed with a ton of errors ... investigating. unsure if issue with rebase i did. Edit: gonna just rebase & retry, something may have gone really wrong w/ that builder. 25 test fails over 6 resource related sharness test files. |
418902f
to
2fb0d86
Compare
Codecov Report
@@ Coverage Diff @@
## master #3701 +/- ##
==========================================
+ Coverage 83.21% 83.27% +0.05%
==========================================
Files 333 333
Lines 50763 50793 +30
==========================================
+ Hits 42244 42298 +54
+ Misses 8519 8495 -24
|
Per discussion in #3617, this is part 1 to remove the
libczmq
dependency onlibflux-core
. In this code we rework theflux_msg_t
internals to store data in its own native data structure rather than usingzmsg_t
.Just wanted to get some initial thoughts.
Should we consider this a mergable PR? Or would we want to complete more of #3617 in one PR? It is a lot of code changes.
The only places that
zmsg_t
andzframe_t
remain are influx_msg_encode_size()
,flux_msg_encode()
,flux_msg_decode()
,flux_msg_sendzsock()
, andflux_msg_recvzsock()
. We have somemsg_to_zmsg()
andzmsg_to_msg()
internal functions to handle the transition for the time being. These functions will eventually be migrated out oflibflux-core
. Tolibrouter
? Or a helper lib? This is perhaps still TBD.Note that both
ccan
andczmq
define astreq()
macro ... ugh. So decided to change the macro inccan
for the time being. We can (hopefully) revert this commit at a later time when things are no longer coupled.Should the format of
flux_msg_fprint()
change? With the removal of internalzmsg_t
storage, we are no longer dependent on formatting output likezframe_fprint()
. We can output whatever we like. This can be a follow on PR of course.