Resume and reboot modes #74

snim2 · 2015-10-27T01:08:14Z

This PR adds two new command-line switches to krun: --resume and --reboot.

In resume-mode krun will look for an existing set of results. If one is found, krun first checks that the current platform is (approximately) the same as the platform detailed in the results file. If this test passes, the schedule is built and executions which have already been run are removed from the job queue. Old results are added to the current job scheduler, which means the JSon results file can be dumped (rather than appended to), as before.

Under reboot-mode, every time an execution has finished, krun runs a reboot command which is defined with the platform definition (actually, krun currently just prints the command out, to ease testing, this needs to be fixed before merging).

Json-related code has been refactored into krun/util.py. A very basic test suite has been added to krun/tests/. The documentation in examples/README.md has been updated.

Different platforms have different conventions for starting a program on boot. An example rc.local file has been added to etc/.

Fixes #41
Fixes #54

snim2 · 2015-10-27T01:10:23Z

That Travis build failed because the bz2 module does not contain a context manager in Python 2.6. Is Python2.6 needed?

ltratt · 2015-10-27T09:05:17Z

I think we can safely force the use of python2.7, on the assumption that it will be installed everywhere that we want to run.

vext01 · 2015-10-27T09:49:56Z

The design sounds sane. Now I will inspect the code.

vext01 · 2015-10-27T09:51:16Z

etc/rc.local

+# This script is executed at the end of each multiuser runlevel.
+#
+
+nohup sudo -H -u krun python krun.py --resume --reboot /home/krun/krun/examples/example.krun


Do we really want nohup? That means all krun output is going to disk twice.

I'm not sure, but the script needs to exit 0 to work with the init framework (I need to test this later today)

Usually nohup will write stdout to nohup.out you see.

vext01 · 2015-10-27T09:52:26Z

examples/README.md

@@ -86,6 +86,30 @@ $ PYTHONPATH=../ ../krun.py example.krun
 You should see a log scroll past, and results will be stored in the file:
 `../krun/examples/example_results.json.bz2`.

+## Running in reboot and resume modes
+
+krun can resume an interrupted benchmark by passing in the `--resume` flag:


Say something about the granularity of this feature? I.e. executions.

snim2 · 2015-10-28T10:11:31Z

This now works on my machine, with some caveats:

I haven't found a way to re-start krun after boot whilst running as a non-root user. When I start krun on the command line I have to give a password to sudo. However, I think this is an issue with how the environment is set up rather than krun.
When I tried running this on bencher5 I got an error about cpufrequtils not being installed, but apt-get believes the package is there.

ltratt · 2015-10-28T10:14:03Z

If you use sudo in rc.local you should be able to do sudo -u krun <path to krun> without requiring a password (most, though perhaps not all, sudo installs allow root to call sudo without a password).

snim2 · 2015-10-28T10:21:01Z

yes, but if I run krun from the command line as a non-root user it asks me for a sudo password. I don't think it is doing an sudo -u krun when it asks, because krun works whether or not I have a user called krun.

ltratt · 2015-10-28T10:29:42Z

/etc/rc.local is run as root, so sudo -u krun in that file almost certainly won't require a password. [It is possible someone's set a really silly sudo config that requires a password from root, but I haven't seen such a setup yet.]

This is a basic check that benchmarking has been resumed on "the same" platform that the benchmark was started on.

Resume mode removes jobs from the schedule that have already been executed and adds old data to the set of results.

…ry runs.

Log name either based on current time (ordinary run) or mtime of config file (resume mode).

Information provided in audits differs between platforms.

ETA emails are sent. Fixed existing error in documentation.

Appends logs to /var/log/rc.local.log. Linux only.

vext01 · 2015-10-30T11:28:28Z

krun.py

+        if len(self) == 0:
+            debug("krun started with an empty queue of jobs")
+
+        if not resume:


Logic here is wrong?

vext01 · 2015-11-02T11:44:33Z

Sarah, if you are happy with my last commit, I think we can merge this.

Note however that we should address #83 and #84.

…ally starts Krun on boot. Improvements to error messages: if the output file does not exist, don't tell the user it isn't a regular file. Only wait for network when --started-by-init. --dry-run now simulates time.sleep.

…r. These show that a --reboot makes progress through the schedule, and should help prevent an infinite reboot loop.

Resume and reboot modes

snim2 added the enhancement label Oct 27, 2015

snim2 assigned vext01 Oct 27, 2015

snim2 added this to the Ready to publish milestone Oct 27, 2015

vext01 reviewed Oct 27, 2015
View reviewed changes

snim2 mentioned this pull request Oct 27, 2015

DOCUMENT KRUN! #57

Closed

vext01 reviewed Oct 27, 2015
View reviewed changes

This was referenced Oct 27, 2015

Minor refactorings of commonly used dictionaries #75

Closed

Run benchmarks on a fresh install. #56

Closed

Sarah Mount added 12 commits October 30, 2015 10:37

Resume mode checks platform integrity.

c2b26e5

This is a basic check that benchmarking has been resumed on "the same" platform that the benchmark was started on.

Implement resume mode.

52f2483

Resume mode removes jobs from the schedule that have already been executed and adds old data to the set of results.

Dry runs and debug levels now CLI options. Fixed an existing bug in d…

7be48fd

…ry runs.

Log files appended to in resume mode.

ed47933

Log name either based on current time (ordinary run) or mtime of config file (resume mode).

Ignore cpuinfo when comparing audits.

bb0d4cf

Information provided in audits differs between platforms.

Disallow --reboot without --resume

db88dcf

Resume mode does not send emails after every execution.

18e5c4d

ETA emails are sent. Fixed existing error in documentation.

Reboot mode waits for network to come up after reboot.

4a73de6

Documented reboot and resume modes.

0a93d88

Document --dryrun and --debug

bcbbb28

Example /etc/rc.local for reboot mode.

c09e385

Appends logs to /var/log/rc.local.log. Linux only.

Add clean target to example benchmark Makefile.

476fa3b

snim2 force-pushed the resume-mode branch from d8163f6 to 476fa3b Compare October 30, 2015 10:48

vext01 reviewed Oct 30, 2015
View reviewed changes

krun.py

if len(self) == 0:

debug("krun started with an empty queue of jobs")

if not resume:

Copy link

Member

vext01 Oct 30, 2015

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic here is wrong?

vext01 force-pushed the resume-mode branch from 56253f1 to aa46031 Compare October 30, 2015 15:57

vext01 and others added 10 commits November 2, 2015 13:18

Add --started-by-init to example rc.local.linux

d6593b3

Document --started-by-init. Use the Krun name correctly.

271e688

Don't reboot if exec queue is empty.

d24c3e6

Basic tests for krun.env.

38e68b7

Mocks for platform and mailer objects.

5d04e3d

Extend tests for krun.util.

7a2541c

Refactor krun. Add tests for the scheduler and time estimate formatte…

59a7a93

…r. These show that a --reboot makes progress through the schedule, and should help prevent an infinite reboot loop.

Add dependencies to travis install. Track test coverage in travis.

62a0715

Improvements to logging output.

903fe78

snim2 force-pushed the resume-mode branch from 9478a79 to 903fe78 Compare November 2, 2015 13:22

vext01 added a commit that referenced this pull request Nov 2, 2015

Merge pull request #74 from softdevteam/resume-mode

1f09531

Resume and reboot modes

vext01 merged commit 1f09531 into master Nov 2, 2015

snim2 deleted the resume-mode branch November 2, 2015 13:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resume and reboot modes #74

Resume and reboot modes #74

snim2 commented Oct 27, 2015

snim2 commented Oct 27, 2015

ltratt commented Oct 27, 2015

vext01 commented Oct 27, 2015

vext01 Oct 27, 2015

snim2 Oct 27, 2015

vext01 Oct 27, 2015

vext01 Oct 27, 2015

snim2 commented Oct 28, 2015

ltratt commented Oct 28, 2015

snim2 commented Oct 28, 2015

ltratt commented Oct 28, 2015

vext01 Oct 30, 2015

vext01 commented Nov 2, 2015

Resume and reboot modes #74

Resume and reboot modes #74

Conversation

snim2 commented Oct 27, 2015

snim2 commented Oct 27, 2015

ltratt commented Oct 27, 2015

vext01 commented Oct 27, 2015

vext01 Oct 27, 2015

Choose a reason for hiding this comment

snim2 Oct 27, 2015

Choose a reason for hiding this comment

vext01 Oct 27, 2015

Choose a reason for hiding this comment

vext01 Oct 27, 2015

Choose a reason for hiding this comment

snim2 commented Oct 28, 2015

ltratt commented Oct 28, 2015

snim2 commented Oct 28, 2015

ltratt commented Oct 28, 2015

vext01 Oct 30, 2015

Choose a reason for hiding this comment

vext01 commented Nov 2, 2015