Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logging cleanup #691

Merged
merged 14 commits into from Jun 14, 2016

Conversation

Projects
None yet
3 participants
@garlick
Copy link
Member

garlick commented Jun 10, 2016

This is some minor clean-up to logging functions.

The prototypish libutil/log.c is stripped down to the bare minimum and functions were renamed to have a log_ prefix. I stopped short of converting all the err() and err_exit() users, so left the old names still working for those.

Adopt some RFC 5424 logging conventions in flux_log(), including ISO 8601 date strings, procid, and appname.

@garlick garlick added the review label Jun 10, 2016

@coveralls

This comment has been minimized.

Copy link

coveralls commented Jun 10, 2016

Coverage Status

Coverage increased (+0.07%) to 75.119% when pulling 12c4b58 on garlick:logging_cleanup into 2bde1f8 on flux-framework:master.

@coveralls

This comment has been minimized.

Copy link

coveralls commented Jun 10, 2016

Coverage Status

Coverage increased (+0.08%) to 75.123% when pulling 12c4b58 on garlick:logging_cleanup into 2bde1f8 on flux-framework:master.

@garlick garlick force-pushed the garlick:logging_cleanup branch from 12c4b58 to 769eff3 Jun 12, 2016

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Jun 12, 2016

Just forced a push of this PR with some additional work converting the logging internals to use RFC 5242 log messages. I think this will be a bit lighter on the heap than the JSON based stuff it replaces.

@coveralls

This comment has been minimized.

Copy link

coveralls commented Jun 12, 2016

Coverage Status

Coverage increased (+0.1%) to 75.113% when pulling 769eff3 on garlick:logging_cleanup into ed680c8 on flux-framework:master.

@coveralls

This comment has been minimized.

Copy link

coveralls commented Jun 13, 2016

Coverage Status

Coverage increased (+0.1%) to 75.126% when pulling e15452f on garlick:logging_cleanup into ed680c8 on flux-framework:master.

@coveralls

This comment has been minimized.

Copy link

coveralls commented Jun 13, 2016

Coverage Status

Coverage increased (+0.1%) to 75.114% when pulling 4170d08 on garlick:logging_cleanup into ed680c8 on flux-framework:master.

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Jun 13, 2016

This looks great!

One comment, should we transition to clock_gettime instead of gettimeofday since POSIX documentation of gettimeofday() says "Applications should use the clock_gettime() function instead of the obsolescent gettimeofday() function."

Also, should wallclock_get_zulu be included in stdlog.h since it is for getting the "standard log" timestamp? Or were you thinking we could expand wallclock.h with other wall-clock time related convenience functions?

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Jun 13, 2016

Ah I didn't know that about gettimeofday! I will fix that.

Or were you thinking we could expand wallclock.h with other wall-clock time related convenience functions?

I just vaguely thought wallclock_get_zulu() might be useful somewhere besides the logging subsystem. My understanding is that the linker pulls in entire .o's when it finds an unresolved symbol in a static library. If that's still the case, and stdlog.o and wallclock.o might have different users, then maybe there is an advantage to keeping them as separate libutil.a objects?

@coveralls

This comment has been minimized.

Copy link

coveralls commented Jun 13, 2016

Coverage Status

Coverage increased (+0.1%) to 75.094% when pulling 1b0985f on garlick:logging_cleanup into ed680c8 on flux-framework:master.

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Jun 13, 2016

One other long standing bit of (mindless) cleanup I should probably add to this PR is to add a flux_strerror() function that maps errno to a string, both POSIX values and zeromq values.

There are many places in our code (a quick grep says around 300) where strerror() is called to decode errors that might include errnos in the zeromq space. I'll plan to replace those with flux_strerror() or if appropriate, flux_log_error() or FLUX_LOG_ERROR().

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Jun 13, 2016

@SteVwonder's question in this morning's meeting reminded me that it would be nice if there was a log-level tunable at which log messages are discarded instead of being added to the dmesg buffer. The specific issue I'd like to solve is error messages being quickly dropped from the buffer due to lots of debug messages (e.g. when running a lot of jobs).

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Jun 13, 2016

Reasonable - I'm on it.

@garlick garlick force-pushed the garlick:logging_cleanup branch from 1b0985f to 7d9e3a6 Jun 13, 2016

@coveralls

This comment has been minimized.

Copy link

coveralls commented Jun 13, 2016

Coverage Status

Coverage increased (+0.06%) to 75.162% when pulling 7d9e3a6 on garlick:logging_cleanup into d290da3 on flux-framework:master.

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Jun 13, 2016

OK, I rebased on current master, added the requested log-level attribute, and finished converting all the err() and err_exit() to log_err() and log_err_exit().

Then I changed internal logging in wreck from log_ to wlog_ and broker log buffer class from log_ to logbuf_.

@coveralls

This comment has been minimized.

Copy link

coveralls commented Jun 13, 2016

Coverage Status

Coverage increased (+0.1%) to 75.206% when pulling ef2f4a4 on garlick:logging_cleanup into d290da3 on flux-framework:master.

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Jun 13, 2016

OK, one more change here: move the oom() macro out of log.h into its own header. There is no reason to have it be dependent on log_errn_exit() - it can call fprintf(stderr) and exit(1).

@coveralls

This comment has been minimized.

Copy link

coveralls commented Jun 13, 2016

Coverage Status

Coverage increased (+0.06%) to 75.161% when pulling 95853c1 on garlick:logging_cleanup into d290da3 on flux-framework:master.

@garlick garlick force-pushed the garlick:logging_cleanup branch from 95853c1 to 866e387 Jun 13, 2016

@coveralls

This comment has been minimized.

Copy link

coveralls commented Jun 13, 2016

Coverage Status

Coverage increased (+0.1%) to 75.2% when pulling 866e387 on garlick:logging_cleanup into d290da3 on flux-framework:master.

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Jun 13, 2016

I'm all done poking at this now, I think.

shortjson.h: add Jadd_str_len()
Add function to allow an unterminated string to be added
to a JSON object.

@garlick garlick force-pushed the garlick:logging_cleanup branch from 866e387 to 04218b0 Jun 13, 2016

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Jun 13, 2016

Rebased again after faketime fix landed on master.

@coveralls

This comment has been minimized.

Copy link

coveralls commented Jun 13, 2016

Coverage Status

Coverage increased (+0.2%) to 75.192% when pulling 04218b0 on garlick:logging_cleanup into d2c9b48 on flux-framework:master.

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Jun 14, 2016

This looks good to me. Hate to ask this, but want to squash some of the intermediate commits, like whitespace and clang error fixups?

garlick added some commits Jun 10, 2016

libutil/wallclock: add wallclock_get_zulu()
Add function to get an ISO 8601 representation of wall clock,
with subsecond precision (whatever gettimeofday() gives us),
conformant with RFC 5424 syslog time stamps.

Include a test for RFC 5424 conformance.
libflux/flog: use RFC 5424 log format
Modify log protocol and interfaces to use RFC 5424 (new syslog)
encoding instead of custom JSON.

flux-dmesg(1) now shows appname (formerly "facility") and
severity separated by a space instead of a period.
Dates displayed by flux-dmesg are now human-readable
ISO 8601 GMT, rather than epoch-based seconds.

flux-logger(1) now accepts [--severity SEVERITY] and
[--appname APPNAME].  --priority FAC.SEV is no longer accepted.

Update the API to (for the most part) carry encoded
log messages, not broken out fields.  This should be more
efficient and less heap-instensive without the extra encode/decodes.

Truncate log messages that exceed 2K bytes (including header).

Update tests to reflect API and log format changes.

garlick added some commits Jun 12, 2016

libutil/stdlog: add RFC 5424 message parser
Add utils for generating and parsing RFC 5424 syslog
messages.

Add unit test for above.
libutil/log: rename msg() to log_msg()
The short names of these functions tend to be a problem,
particularly as "msg" is a tempting variable name for a flux_msg_t.

Rename msg() to log_msg().
Rename msg_exit() to log_msg_exit().
libutil/log: rename errn() to log_errn()
Rename errn() to log_errn().
Rename errn_exit() to log_errn_exit().
libutil/log: add log_err()
Add log_err(), an alias for err().
Add log_err_exit(), an alias for err_exit().
libflux/flog: add flux_strerror()
Error numbers might be POSIX or zeromq space so using
strerror() or %m directly might not always work.

Convert many direct calls to strerror (errno) to
flux_strerror (errno), or use flux_log_error() or
FLUX_LOG_ERROR() where appropriate.
broker/log: add log-level attribute
Add the 'log-level' broker attribute (default 7==debug) which
controls which messages are appended to the local ring buffer.

Update flux-broker-attributes(7).
libutil/log: drop err() and err_exit()
Convert from err() to log_err(), and err_exit() to log_err_exit().
modules/wrexecd: use wlog_ for internal log functions
Avoid clash with libutil/log which uses log_ prefix.
broker/log: use logbuf_ for internal funcs
Avoid clash with libutil/log which uses log_ prefix.
libutil/oom.h: split out of log.h
Split the oom() macro out of log.h and make it independent
of the log_ functions.

Update users.

@garlick garlick force-pushed the garlick:logging_cleanup branch from 04218b0 to f328268 Jun 14, 2016

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Jun 14, 2016

There, how's that?

@coveralls

This comment has been minimized.

Copy link

coveralls commented Jun 14, 2016

Coverage Status

Coverage increased (+0.1%) to 75.169% when pulling f328268 on garlick:logging_cleanup into d2c9b48 on flux-framework:master.

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Jun 14, 2016

Hmm, travis (coverage) got stuck here - I'll restart

PASS: t0015-cron.t
ok 1 - libfaketime works
ok 2 - load cron module
ok 3 - flux-cron tab works for set minute
not ok 4 - flux-cron tab works for any day midnight
#   
#       id=$(echo "0 0 * * * flux event pub t.cron.complete" | flux_cron tab) &&
#       next=$(date +%s --date="tomorrow 00:00") &&
#       cron_entry_check ${id} type datetime &&
#       cron_entry_check ${id} stopped false &&
#       cron_entry_check ${id} typedata.next_wakeup ${next} &&
#       ${event_trace} t.cron t.cron.complete \
#           $set_faketime tomorrow 00:00 &&
#       cron_entry_check ${id} stats.count 1 &&
#       flux cron delete ${id}
#   
No output has been received in the last 10 minutes, this potentially indicates a stalled build or something wrong with the build itself.
The build has been terminated
@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Jun 14, 2016

Ah, bummer. I hope those new cron tests aren't unreliable. :-(

This looks great now, ok to merge?

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Jun 14, 2016

Yep!

@grondo grondo merged commit 87495ac into flux-framework:master Jun 14, 2016

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage increased (+0.1%) to 75.169%
Details

@grondo grondo removed the review label Jun 14, 2016

@garlick garlick deleted the garlick:logging_cleanup branch Jun 14, 2016

@garlick garlick referenced this pull request Aug 2, 2016

Closed

0.4.0 release notes #739

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.