Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libflux: update flux msg route functions #3746

Merged
merged 6 commits into from Jul 13, 2021

Conversation

chu11
Copy link
Member

@chu11 chu11 commented Jun 28, 2021

I'll just post what is in the primary commit message here:

Problem: The functions flux_msg_get_route_first() and
flux_msg_get_route_last() return route IDs via an allocated string
that must be freed by the user.  This API made sense previously
when route IDs were stored internally in a zframe.  However, now
that the route IDs are stored in decoded data structures, we could
return the route IDs directly.

Solution: Return internally stored route IDs via a const char *
instead of an allocated string.  Adjust all callers accordingly.
use 'const char *' instead of 'char *' in several places where
more appropriate.

If you're wondering why don't I put this change in one of the other libflux refactoring PRs, its b/c flux-sched needs updating too, so figure a separate PR would be best.

I had some further design thoughts too which I'll post in #3617.

@chu11 chu11 force-pushed the issue3617_flux_msg_routes_const branch 6 times, most recently from 51a856e to 8809998 Compare June 29, 2021 04:03
@garlick
Copy link
Member

garlick commented Jun 29, 2021

If we're going to change the API anyway, maybe we ought to update it to look more like other similar interfaces in flux:

  • Use a prefix of flux_msg_route_
  • Accessors return the route or NULL rather than int success and double pointer paramter
  • make some simple functions return void

So maybe

void flux_msg_route_enable (flux_msg_t *msg); 
void flux_msg_route_clear (flux_msg_t *msg);
int flux_msg_route_push (flux_msg_t *msg, const char *id);
char *flux_msg_route_pop (flux_msg_t *msg);
const char *flux_msg_route_first (flux_msg_t *msg);
const char *flux_msg_route_last (flux_msg_t *msg);
int flux_msg_route_count (flux_msg_t *msg);

// wondering if this is needed externally?
char *flux_msg_route_string (const flux_msg_t *msg);

// wondering if this is worth it, given it is easier now to access routes and compare?
bool flux_msg_route_match_first (const flux_msg_t *msg1, const flux_msg_t *msg2);

@chu11
Copy link
Member Author

chu11 commented Jun 29, 2021

@garlick sounds like a good idea. I assume your removal of const flux_msg_t * in some of the above is accidental? Or intentional to perhaps include a next/prev in the future? Will have to see how the current users of the API may/may not break.

@garlick
Copy link
Member

garlick commented Jun 29, 2021

Sorry, yup that was an accident.

@chu11
Copy link
Member Author

chu11 commented Jun 29, 2021

const char *flux_msg_route_first (flux_msg_t *msg);
const char *flux_msg_route_last (flux_msg_t *msg);

I tried to make this conversion in an earlier prototyping attempt, but reverted it. Unlike some other functions we have, a return of a NULL id is sort of ok, indicating there are no routes. So NULL can't be used as an indicator of error. So I kept the style the same.

What's the opinion on the NULL means no routes or error style? When I made the conversion earlier I did add something like:

if (!(route = flux_msg_get_route_first (msg))) {
    errno = EPROTO;
    goto error;
}

in a few places as a consequence. I felt it was too much of a change, but that's just me of course.

@chu11
Copy link
Member Author

chu11 commented Jun 29, 2021

void flux_msg_route_enable (flux_msg_t *msg);
void flux_msg_route_clear (flux_msg_t *msg);

given the rest of the msg API doesn't return void, I think it'd be better to return int on error? e.g. these funcs call flux_msg_set_flags() internally.

@garlick
Copy link
Member

garlick commented Jun 29, 2021

given the rest of the msg API doesn't return void, I think it'd be better to return int on error? e.g. these funcs call flux_msg_set_flags() internally.

It seems like "invalid flag" (which can't happen) or "null message" (which by the time you're setting flags, you probably know you have a good message) are the main possible errors. Since it is painfully repetitious to check for errors from these functions, it seems to me like these could just not crash on a NULL msg and be OK.

I felt the same way about the route accessors. Return NULL for "no route" when there is no route, or if there is an error. By the time you're accessing the routes, you probably know you have a good message.

I might be missing some subtle error case though. (But if there is one we should decide if it's worth it).

@chu11
Copy link
Member Author

chu11 commented Jun 30, 2021

// wondering if this is needed externally?
char *flux_msg_route_string (const flux_msg_t *msg);

The only user outside of libflux/message is broker/ping. I don't think we can remove this function until we have a way for the user to iterate the route frames.

// wondering if this is worth it, given it is easier now to access routes and compare?
bool flux_msg_route_match_first (const flux_msg_t *msg1, const flux_msg_t *msg2);

w/ the newer libflux/disconnect library, the use has plummeted to 3 locations (one of which is disconnect).

Ehhh ... it's sort of borderline. My lazy immediate thought is to keep the function, otherwise we'd have to do:

const char *s1 = flux_msg_route_first (msg1);
const char *s2 = flux_msg_route_first (msg2);
if (!s1 || !s2)
   return false;
if (strcmp (s1, s2))
  return false;
return true;

in 3 places :-|

@chu11 chu11 force-pushed the issue3617_flux_msg_routes_const branch from 8809998 to 9ca1a0a Compare June 30, 2021 15:26
@chu11 chu11 changed the title [WIP] libflux: return const ids in flux_msg_get_route_first/last() [WIP] libflux: update flux msg route functions Jun 30, 2021
@chu11
Copy link
Member Author

chu11 commented Jun 30, 2021

re-pushed with many cleanups per discussion above and a few extra pre-cleanups (borderline could peel them off into a new PR, but it's not quite there I think).

Note that the commits:

    libflux: return routes via const in msg API
    libflux/message: returns const routes directly

could be squashed, but I let them be separate for easier review for now.

Since there's no way to know if a return of NULL is an error or empty route, in some cases I set errno = EPROTO in the error path when using the new flux_msg_route_first/last().

Returning void for flux_msg_route_enable() had an interesting side effect in librouter. I had to break up a few big if statements.

@@ -51,13 +51,17 @@ static void notify_disconnect (struct service *ss, const char *uuid)
 
     if (!(msg = flux_request_encode ("disconnect", NULL))
         || flux_msg_set_noresponse (msg) < 0
-        || flux_msg_enable_route (msg) < 0
-        || flux_msg_set_cred (msg, ss->cred) < 0
-        || flux_msg_push_route (msg, uuid) < 0
-        || flux_requeue (ss->h, msg, FLUX_RQ_TAIL) < 0) {
-        if (ss->verbose)
-            log_msg ("error notifying server of %.5s disconnect", uuid);
-    }
+        || flux_msg_set_cred (msg, ss->cred) < 0)
+        goto error;
+    flux_msg_enable_route (msg);
+    if (flux_msg_push_route (msg, uuid) < 0
+        || flux_requeue (ss->h, msg, FLUX_RQ_TAIL) < 0)
+        goto error;
+    flux_msg_decref (msg);
+    return;
+error:
+    if (ss->verbose)
+        log_msg ("error notifying server of %.5s disconnect", uuid);
     flux_msg_decref (msg);
 }

I could have done some trickery to avoid this, but figured I the average person might be confused if I did || (flux_msg_route_enable (), false). Even I'd look at that in a month and wonder what the hell I did.

@garlick
Copy link
Member

garlick commented Jun 30, 2021

Maybe push_route() should internally set the route flag?

@chu11
Copy link
Member Author

chu11 commented Jun 30, 2021

Maybe push_route() should internally set the route flag?

Doh, i thought about that last night and just forgot to try it. I'll give it a shot.

@garlick
Copy link
Member

garlick commented Jun 30, 2021

Were you amenable to the rename suggestion above? E.g. use flux_msg_route_ prefix on these functions?

Also would it be OK to have route_pop() just return NULL on both "no route" and error? Again, it's one of those situations where this doesn't get called until ample error checking has already been performed. This function has a relatively small number of users.

@chu11
Copy link
Member Author

chu11 commented Jun 30, 2021

Were you amenable to the rename suggestion above? E.g. use flux_msg_route_ prefix on these functions?

yup, it worked out fine.

Also would it be OK to have route_pop() just return NULL on both "no route" and error?

We can, but the semantics of that function are different. B/c we're popping a route frame, we're destroying the internal copy. So we sort of always have to strdup() and return the route id. But a significant amount of the time (5/7 current uses), we don't even need the route id. So we'd be strdup-ing & free-ing for nothing. And I think a fair amount of those pop-ings are in pretty important paths.

But the inconsistent API style can be annoying. Perhaps we just need another function, flux_msg_route_peek() or flux_msg_route_top()? Or a flux_msg_route_del_top()?

@garlick
Copy link
Member

garlick commented Jun 30, 2021

Perhaps we just need another function, flux_msg_route_peek() or flux_msg_route_top()? Or a flux_msg_route_del_top()?

I think peek/top would be the same as "first", wouldn't it?

Maybe flux_msg_route_delete() should just delete the top route?
Edit: er the "first" route.

@garlick
Copy link
Member

garlick commented Jun 30, 2021

Did you already push the name changes? I wasn't seeing them in my copy.

Edit: oh I just saw that "push_route" and "pop_route" wasn't renamed. I see the other functions were renamed.

@chu11
Copy link
Member Author

chu11 commented Jun 30, 2021

Edit: oh I just saw that "push_route" and "pop_route" wasn't renamed. I see the other functions were renamed.

Doh! Sorry, I missed those. I was probably processing the removal of get in those and forgot to the flip the back part.

@garlick
Copy link
Member

garlick commented Jun 30, 2021

I'm thinking maybe we ought to bump this to the next release and try to tag soon. That ok @chu11?

@chu11
Copy link
Member Author

chu11 commented Jun 30, 2021

@garlick that's fine, i had assumed this was going to be for the next release already. Minimally it requires flux-sched to be changed too.

@chu11
Copy link
Member Author

chu11 commented Jun 30, 2021

Maybe push_route() should internally set the route flag?

Did a prototype of this change. It's fine I guess, but I dislike the inconsistency (sometimes calling enable_route, sometimes not). But leads to an obvious question, do we even need flux_msg_route_enable() anymore?

I imagine it was needed in the old zmsg_t days b/c you needed to know if you had a delimiter on the zmsg list. But with the new decoded implementation, is it needed? Presumably a user should just be able to push routes to a message without any "enabling" of it.

Edit: oh wait, does a zmsg have to have a delimeter on it? even if no routes are added to it?
Edit2: i'm thinking yes, the delimeter is needed

@garlick
Copy link
Member

garlick commented Jul 1, 2021

I think you do need to add a delimiter with no routes before sending a request from a ROUTER socket. The socket then pushes the identity on before the message is sent to a peer. So there are cases where you want the delimiter but not routes.

Reading through the ZMQ_ROUTER section of zmq_socket(3) I'm now questioning whether the delimiter is necessary! It may just be a hold over from ZMQ_REQ / ZMQ_REP which nobody uses.

@chu11 chu11 force-pushed the issue3617_flux_msg_routes_const branch from 0360a19 to 469e720 Compare July 7, 2021 23:31
@chu11
Copy link
Member Author

chu11 commented Jul 7, 2021

re-pushed per comments above

  • flux_msg_route_delete() renamed to flux_msg_route_delete_last()

  • added flux_msg_route_disable() which does that flux_msg_route_clear() did before. flux_msg_route_clear() now only does a clear but not a disable.

  • remove flux_msg_route_pop(). Please note how I adjusted things in librouter.

  • flux_msg_route_first() and flux_msg_route_last() no longer set errno.

  • some cleanups as a result of above (e.g. flux_msg_route_match_first() now uses flux_msg_route_first())

  • re-ordered some commits, since the order sort of made no sense given above changes and the commit messages had to be changed.

Edit (7/8) - added one random cleanup commit

@chu11 chu11 force-pushed the issue3617_flux_msg_routes_const branch 2 times, most recently from 506322f to ceb2816 Compare July 8, 2021 23:47
Copy link
Member

@garlick garlick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright this is looking good. I'm fine if this goes in when you are ready. Does the sched PR need any updating?

@chu11
Copy link
Member Author

chu11 commented Jul 9, 2021

@garlick Yup, the flux-sched PR is all ready here: flux-framework/flux-sched#852

I can't set MWP here b/c of the flux-sched breakage. Apologies, I can't quite remember the workaround so one can manually merge this.

(I'll rebase so it'll be ready for manual merge).

@grondo
Copy link
Contributor

grondo commented Jul 9, 2021

To manually merge we'll have to remove branch protections, merge, then re-establish branch protections.

I can help with that when you are ready.

@chu11 chu11 force-pushed the issue3617_flux_msg_routes_const branch from ceb2816 to f6d3215 Compare July 9, 2021 03:53
@chu11
Copy link
Member Author

chu11 commented Jul 9, 2021

@grondo, thanks Just re-based, so once it passes we can do a manual merge, then i'll kick the flux-sched side to re-do its tests.

@chu11
Copy link
Member Author

chu11 commented Jul 9, 2021

code coverage builder timed out, restarted builders

@chu11 chu11 force-pushed the issue3617_flux_msg_routes_const branch from f6d3215 to 0f71ade Compare July 9, 2021 23:55
@chu11
Copy link
Member Author

chu11 commented Jul 9, 2021

since this wasn't yet merged, I snuck one more cleanup in. Should be reviewed before merging. The cleanup is in commit:

commit 0b19862b51d920a7984b5fae56d45e904f75afe5
Author: Albert Chu <chu11@llnl.gov>
Date:   Fri Jul 9 11:31:41 2021 -0700

    libflux: do not reuse zframes when sending msg

@chu11 chu11 force-pushed the issue3617_flux_msg_routes_const branch from 0f71ade to b307ed2 Compare July 10, 2021 18:13
@chu11
Copy link
Member Author

chu11 commented Jul 10, 2021

sorry, i snuck in one more cleanup

commit 516f7d5918dc3e8d25fe936b605aa24d46fdb401
Author: Albert Chu <chu11@llnl.gov>
Date:   Sat Jul 10 10:59:38 2021 -0700

    libflux: fix const declaration style
    
    Problem: A pointer was declared "uint8_t const *", where as it is
    far more common in flux to use "const uint8_t *".

@chu11
Copy link
Member Author

chu11 commented Jul 12, 2021

found yet another minor thing to cleanup, so I went ahead and peeled all those cleanups onto #3771, so that should go in first.

Problem: The functions flux_msg_get_route_first() and
flux_msg_get_route_last() return route IDs via an allocated string
that must be freed by the user.  This API made sense previously
when route IDs were stored internally in a zframe.  However, now
that the route IDs are stored in decoded data structures, we could
return the route IDs directly.  In addition, returning this field
as an in-and-out parameter is not consistent to most of the code
in flux-core.

Solution: Return internally stored route IDs via a const char *
return parameter instead of an in-and-out paramter.

Update callers accordingly.
Problem: The flux_msg_pop_route() function returns an id via an
in-and-out parameter that requires a memory allocation/free.  With
flux_msg_get_route_last() now returning a `const char *`, users can
avoid the memory allocation/free if there was a way to remove the
last route on the message when necessary.

Solution: Provide a new function flux_msg_delete_route_last() that will
behave similarly to flux_msg_pop_route() but not allocate memory
or require the user to pop a route off the message.

Convert any callers that call flux_msg_pop_route() with a NULL
'id' parameter to use flux_msg_delete_route_last() instead.

Add unit tests.
Problem: flux_msg_pop_route() allocates memory to return the
route id that is being "popped".  Under most circumstances
users can call flux_msg_get_route_last() and flux_msg_delete_route_last()
to get the same effect and not allocate memory.

Solution: Remove flux_msg_pop_route().

Update callers accordingly.
Problem: flux_msg_enable_route() and flux_msg_clear_route() return
an int on success/error.  For such simple functions, most of flux-core
returns void for these types of functions.

Solution: Have flux_msg_enable_route() and flux_msg_clear_route()
return void.  Update all callers appropriately.
Problem: libflux currently supports flux_msg_enable_route() and
flux_msg_clear_route(), which seem imbalanced.  Hypothethically, there
should be a flux_msg_disable_route().

Solution: Add a new function flux_msg_disable_route() that behaves
identically to flux_msg_clear_route() before this commit.  Change the
behavior of flux_msg_clear_route() to only clear internally stored route
ids, not to disable routing on a message.

Adjust all callers accordingly.  In locations that previously called
flux_msg_clear_route() and flux_msg_enable_route() in series, only
call flux_msg_clear_route().
Problem: The naming of the flux msg route functions was inconsistent,
with some prefixing the action with "get" while others not.  Some stating
the "action" before "route" and others not.

Solution: Make the function names consistent.  Specifically, the following
functions have been renamed.

flux_msg_enable_route -> flux_msg_route_enable
flux_msg_disable_route -> flux_msg_route_disable
flux_msg_clear_route -> flux_msg_route_clear
flux_msg_push_route -> flux_msg_route_push
flux_msg_delete_route_last -> flux_msg_route_delete_last
flux_msg_get_route_first -> flux_msg_route_first
flux_msg_get_route_last -> flux_msg_route_last
flux_msg_get_route_count -> flux_msg_route_count
flux_msg_get_route_string -> flux_msg_route_string
flux_msg_match_route_first -> flux_msg_route_match_first

Some internal functions were renamed as well for consistency.

Update all callers appropriately.
@chu11 chu11 force-pushed the issue3617_flux_msg_routes_const branch from f669ed6 to bb42869 Compare July 12, 2021 22:23
@chu11
Copy link
Member Author

chu11 commented Jul 12, 2021

rebased. None of the "snuck in" commits are here anymore, just the ones that @garlick reviewed.

@codecov
Copy link

codecov bot commented Jul 13, 2021

Codecov Report

Merging #3746 (bb42869) into master (0bd8341) will increase coverage by 0.02%.
The diff coverage is 88.59%.

@@            Coverage Diff             @@
##           master    #3746      +/-   ##
==========================================
+ Coverage   83.29%   83.31%   +0.02%     
==========================================
  Files         342      342              
  Lines       50982    50897      -85     
==========================================
- Hits        42464    42405      -59     
+ Misses       8518     8492      -26     
Impacted Files Coverage Δ
src/common/libflux/handle.c 84.98% <ø> (ø)
src/common/libterminus/terminus.c 85.47% <33.33%> (-0.13%) ⬇️
src/modules/barrier/barrier.c 77.34% <50.00%> (-1.89%) ⬇️
src/broker/broker.c 76.85% <63.63%> (-0.60%) ⬇️
src/common/libterminus/pty.c 87.83% <75.00%> (-0.23%) ⬇️
src/modules/job-manager/alloc.c 81.05% <75.00%> (-0.05%) ⬇️
src/common/librouter/usock_service.c 73.87% <85.71%> (-0.91%) ⬇️
src/broker/module.c 76.45% <87.50%> (-1.60%) ⬇️
src/broker/overlay.c 89.66% <92.30%> (+0.42%) ⬆️
src/broker/log.c 79.32% <100.00%> (-0.44%) ⬇️
... and 26 more

@grondo
Copy link
Contributor

grondo commented Jul 13, 2021

Ok, I'm going to remove branch protections, merge, and then reset branch protections.

@grondo grondo merged commit 298463a into flux-framework:master Jul 13, 2021
@chu11 chu11 deleted the issue3617_flux_msg_routes_const branch July 16, 2021 22:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants