Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clean up LOG_INFO output, log wireup, rc1, rc3 times #769

Merged
merged 9 commits into from Aug 15, 2016

Conversation

Projects
None yet
4 participants
@garlick
Copy link
Member

garlick commented Aug 15, 2016

The goal of this PR is to get useful timings and other log information out of a large session launched with

srun flux start -o,-Slog-stderr-level=6 [COMMAND]

First, some messages, especially those emitted from every rank, were changed from LOG_INFO to LOG_DEBUG. Next, elapsed time was added to the log messages for wireup, rc1, and rc3 completion.
Finally, the "last heard from" messages from idle peers are suppressed during wireup and the wireup progress reporting interval was increased from 1s to 10s. Net result is something like this:

$ ./flux start -s512 -o,-Slog-stderlevel=6 flux getattr size
2016-08-15T19:28:43.195512Z broker.info[0]: nodeset: [0-511] (complete) 0.6s
2016-08-15T19:28:43.195571Z broker.info[0]: Run level 1 starting
2016-08-15T19:28:46.642082Z cron.info[0]: synchronizing cron tasks to event hb
2016-08-15T19:28:50.545477Z broker.info[0]: Run level 1 Exited (rc=0) 7.3s
2016-08-15T19:28:50.545502Z broker.info[0]: Run level 2 starting
512
2016-08-15T19:28:50.561784Z broker.info[0]: Run level 2 Exited (rc=0) 0.0s
2016-08-15T19:28:50.562846Z broker.info[0]: Run level 3 starting
2016-08-15T19:28:50.568524Z broker.info[0]: shutdown in 2.000s: run level 2 Exited
2016-08-15T19:28:51.178605Z broker.info[0]: Run level 3 Exited (rc=0) 0.6s

garlick added some commits Aug 15, 2016

broker: log insmod/rmmod at LOG_DEBUG
Since the number of insmod/rmmod log messages is proportional to
the size of the session, reduce the logging level to LOG_DEBUG
so that LOG_INFO can be reserved for less voluminous log messages.
broker: add elapsed time to hello completion
Log the elapsed time with the LOG_INFO "hello" completion message.
This is the best measure we have of wireup time.
modules/wreck/job: LOG_INFO -> LOG_DEBUG
Emit debug messages at LOG_DEBUG level not LOG_INFO.
modules/content-sqlite: log shutdown at LOG_DEBUG
Change the content-sqlite shutdown log message when instance is
terminating to be more informative and at LOG_DEBUG level.
broker/runlevel: add struct for each runlevel
Prepare for additional state to be stored with each runlevel.
broker/runlevel: log elapsed time for runlevels
Add elapsed time for LOG_INFO messages at runlevel termination.
We have the log timestamps, but this makes it easier to spot.
broker/overlay: suppress wireup idle warnings
The LOG_CRIT messages warning about peers "last heard from"
can be voluminous during wireup on a large session.
Suppress this warning until wireup comletes.  We already
have an overall wireup timeout, so the case where wireup
fails is handled, and these warnings probably aren't that useful.

@garlick garlick added the review label Aug 15, 2016

@coveralls

This comment has been minimized.

Copy link

coveralls commented Aug 15, 2016

Coverage Status

Coverage increased (+0.01%) to 75.058% when pulling 9a6ccdf on garlick:bootstrap_times into 8c793e1 on flux-framework:master.

@codecov-io

This comment has been minimized.

Copy link

codecov-io commented Aug 15, 2016

Current coverage is 74.78% (diff: 86.48%)

Merging #769 into master will increase coverage by 0.06%

@@             master       #769   diff @@
==========================================
  Files           145        145          
  Lines         25014      25022     +8   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          18692      18713    +21   
+ Misses         6322       6309    -13   
  Partials          0          0          
Diff Coverage File Path
••••••• 78% src/broker/runlevel.c
•••••••• 85% src/broker/overlay.c
••••••••• 92% src/broker/broker.c
•••••••••• 100% src/modules/wreck/job.c
•••••••••• 100% src/modules/content-sqlite/content-sqlite.c

Powered by Codecov. Last update 8c793e1...a4484a1

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Aug 15, 2016

Oh, sorry about all the wreck noise.

This looks great is it ready?

On another note, I wonder if by writing some info to kvs we could get support for something like systemd-analyze. (Maybe that would be a cool "stunt" or hackathon type project)

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Aug 15, 2016

No worries - I forgot I wanted to add pmi timing broken out per our CTS-1 milestone goals so let me add that, then this will be ready. Good ideal on systemd-analyze idea - if you have more thoughts maybe they could go in a "help wanted" issue?

@coveralls

This comment has been minimized.

Copy link

coveralls commented Aug 15, 2016

Coverage Status

Coverage increased (+0.06%) to 75.104% when pulling a4484a1 on garlick:bootstrap_times into 8c793e1 on flux-framework:master.

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Aug 15, 2016

Ok, ready?

@garlick

This comment has been minimized.

Copy link
Member Author

garlick commented Aug 15, 2016

Sure!

@grondo

This comment has been minimized.

Copy link
Contributor

grondo commented Aug 15, 2016

Oh, one typo in that last commit message (transposed characters) that you might as well fix.

broker: add pmi timing
Log seconds spent bootstrapping with PMI on rank 0.
Since there is a barrier in there before the "read phase",
this is a close approximation of the time spent globally,
although rank 0 may exit before others have completed the
"read phase".

@garlick garlick force-pushed the garlick:bootstrap_times branch from a4484a1 to c0c7d09 Aug 15, 2016

@grondo grondo merged commit b8249b4 into flux-framework:master Aug 15, 2016

1 check was pending

continuous-integration/travis-ci/pr The Travis CI build is in progress
Details

@grondo grondo removed the review label Aug 15, 2016

@coveralls

This comment has been minimized.

Copy link

coveralls commented Aug 15, 2016

Coverage Status

Coverage increased (+0.03%) to 75.073% when pulling c0c7d09 on garlick:bootstrap_times into 8c793e1 on flux-framework:master.

@garlick garlick deleted the garlick:bootstrap_times branch Aug 15, 2016

@garlick garlick referenced this pull request Oct 26, 2016

Closed

0.5.0 release notes #879

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.