Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

logging needs filtering/reduction #58

Closed
garlick opened this issue Oct 15, 2014 · 4 comments
Closed

logging needs filtering/reduction #58

garlick opened this issue Oct 15, 2014 · 4 comments

Comments

@garlick
Copy link
Member

garlick commented Oct 15, 2014

We used to have a log module that flux_log () routed messages through. When we reworked the project repository for public release, this module was dropped and logging was supported natively in the cmbd broker. The new logging implementation unconditionally forwards all log entries to rank 0, where they are disposed of according to this option

-L,--logdest DEST            Log to DEST, can  be syslog, stderr, or file

We lost three useful features when this happened:

  • logging could be filtered at the source
  • duplicate entries could be squashed on their way towards rank 0
  • each rank had a circular buffer of (filtered) log entries that it could spew forth on receipt of a fault event. The idea was that log data could be captured but only be injected into the network when some problem that might require additional context for debugging was detected

We should consider how to extend the very simple logging in the broker with a module that can do reductions, debug logging, and filtering.

@grondo
Copy link
Contributor

grondo commented Oct 15, 2014

Here's an idea that might not make sense:

We might need something similar for a "simple rsh" implementation to handle stdout/err.
Imagine if the flux rsh RANKS COMMAND... frontend worked something like:

  • generate a unique ID for the current run
  • subscribe to "log" stream for ID
  • send rsh.execute or similar command with json decsription of command+environment
  • stderr/out "log" messages would be copied back to stderr/out of flux rsh command -- other log messages could be optionally displayed based on --verbose. Collapsed lines could optionally be expanded by flux rsh
  • exit code(s) could come back as CMB replies, or perhaps specially formatted log messages

Does this make any sense? Maybe it doesn't make sense to derive the flux rsh protocol from the
log implementation, but instead think of a lower abstraction from which both rsh and log services
are derived?

@garlick
Copy link
Member Author

garlick commented Oct 15, 2014

Could we just use the existing logging interface on the rshd end, e.g.

flux_log_set_facility (h, "rsh-%d", rsh_jobid);
flux_log (h, LOG_INFO, "%s", stdout_line)
flux_log (h, LOG_ERR, "%s", stderr_line)

Then we would just need a way for the rsh end to subscribe to messages sent to that facility. Are we OK with presuming that stdio will be consumed on rank 0? If so maybe part of the log design could be an ipc:// socket that all logs are published to, with PUB-SUB topic string derived from the facility. Then rsh could connect to the socket and subscribe to its particular rsh_jobid.

The flux-snoop utility works with a "snoop socket" in pretty much this way now.

@garlick garlick added this to the Simple remote shell sprint milestone Oct 16, 2014
@garlick garlick removed this from the Simple remote shell sprint milestone Apr 20, 2015
@garlick
Copy link
Member Author

garlick commented Aug 10, 2015

With the "reduction handle" improvements in pr #298, I was thinking perhaps this issue should be revisited. Since TIMEDWAIT is the obvious "flush policy" for compressing identical log messages, and flux_reduce_t requires the flux reactor for installing internal timer watchers, the fact that the broker still uses zloop is an impediment.

I've opened #320 to remind us to get off zloop in the broker.

@garlick
Copy link
Member Author

garlick commented Dec 28, 2016

#320 is no longer a blocker, but this feels to me a bit like premature optimization and furthermore, is a fairly obvious possibility so I don't think needs an issue to remind us. Closing.

@garlick garlick closed this as completed Dec 28, 2016
grondo added a commit to grondo/flux-core that referenced this issue Dec 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants