Easier session resume #41

vext01 · 2015-09-28T09:23:34Z

When a benchmark session crashes, or when the system loses power, we often need to run again just what we need to.

Currently this is quite a manual process. The experimenter has to manually determine which work needs to be run, run them, then manually merge the new data into the existing data.

It would be better if krun could help us do this.

I imagine something like the following:

user invokes krun
krun realises a results file already exists
krun says partial results detected. Looks like we need to run: 4 executions of benchmark3 and 4 executions of benchmark 4, OK?
If user says yes, krun appends results to existing results.

This need not be an interactive process, there could be a -check-results and a -resume option on the CLI.

The text was updated successfully, but these errors were encountered:

ltratt · 2015-10-20T18:44:08Z

Is this implicitly the same as reboot mode? If so, should we merge it with that issue?

vext01 · 2015-10-21T08:39:42Z

No, it's separate. The session may crash in non-reboot mode, and we need a way to continue without over-writing results.

ltratt · 2015-10-21T08:42:34Z

I should have said: is the code which picks up where we left off implicitly the same between the two modes?

vext01 · 2015-10-21T08:49:14Z

There will be sharing, yes.

snim2 · 2015-10-22T11:27:49Z

@vext01 this issue is still unassigned. Do you want me to take this one along with #54?

vext01 · 2015-10-22T11:59:53Z

Sure. Go ahead.

vext01 · 2015-10-22T12:00:39Z

I imagine we need to flesh out the working of this feature too. When I raised this, I did not have the reboot mechanism in mind.

snim2 · 2015-10-22T12:08:20Z

OK, well your initial issue here suggests that krun starts by looking for a results file, and if one exists, that it checked against the current configuration to see if the results are complete.

If the results are not complete, and -resume has been passed as a CLI then the next execution can be started.

Each execution then needs to append to the JSON file, rather than writing the file out at the end of all the executions.

Also, there is no need for a file in /var/log or similar, to say which VM-benchmark pair was last executed, as the same information can be found in the results file.

In this scheme, #54 just needs to have a script in rc.local that starts krun with -resume switched on. So, to run in reboot-mode, a -reboot switch would be passed in, the file in rc.local needs to exist, and krun would call /sbin/reboot would be called after each execution.

Does that make sense?

vext01 · 2015-10-22T12:09:41Z

Yes, and I think this all sounds sensible. Go for it.

snim2 · 2015-10-22T12:12:47Z

Edited that slightly.

vext01 · 2015-10-22T12:14:40Z

This is separate, but I would also like a way to re-run a subset of experiments.

Using the scheme you outline above, you can re-run stuff by deleting entries from the json bz2 file. This is quite fiddly as you have to uncompress and then locate the right lines to remove in a results file potentially with tens of thousands of lines, so I wonder if we could write a tiny tool to remove results by key. Thoughts?

vext01 · 2015-10-22T12:15:40Z

Note also that krun currently dumps a bz json file after each execution, not just once at the end. This is to protect against crashing.

vext01 · 2015-10-22T12:17:18Z

WRT: Does krun need to know if it's in reboot mode? I suggest adding a boolean config option REBOOT_MODE.

snim2 · 2015-10-22T15:04:38Z

OK, I think it would be helpful for testing this to add the examples/ directory back in, with a config that conforms to the current structure. Does that sound sensible?

vext01 · 2015-10-22T15:14:21Z

What would you test?

snim2 · 2015-10-22T15:18:36Z

For now, I just mean that I need something to try out by hand.

Potentially I'm sure there are other things that Travis could test, but I think that would be a separate PR / Issue.

snim2 added low priority (nice to have) enhancement labels Oct 16, 2015

vext01 mentioned this issue Oct 22, 2015

Reboot after each execution. #54

Closed

snim2 self-assigned this Oct 22, 2015

snim2 mentioned this issue Oct 22, 2015

Add basic travis config #69

Merged

This was referenced Oct 26, 2015

Updated examples directory #71

Merged

Provide a way to re-run a subset of experiments #73

Closed

snim2 added high priority and removed low priority (nice to have) labels Oct 26, 2015

snim2 mentioned this issue Oct 27, 2015

Resume and reboot modes #74

Merged

vext01 closed this as completed in #74 Nov 2, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Easier session resume #41

Easier session resume #41

vext01 commented Sep 28, 2015

ltratt commented Oct 20, 2015

vext01 commented Oct 21, 2015

ltratt commented Oct 21, 2015

vext01 commented Oct 21, 2015

snim2 commented Oct 22, 2015

vext01 commented Oct 22, 2015

vext01 commented Oct 22, 2015

snim2 commented Oct 22, 2015

vext01 commented Oct 22, 2015

snim2 commented Oct 22, 2015

vext01 commented Oct 22, 2015

vext01 commented Oct 22, 2015

vext01 commented Oct 22, 2015

snim2 commented Oct 22, 2015

vext01 commented Oct 22, 2015

snim2 commented Oct 22, 2015

Easier session resume #41

Easier session resume #41

Comments

vext01 commented Sep 28, 2015

ltratt commented Oct 20, 2015

vext01 commented Oct 21, 2015

ltratt commented Oct 21, 2015

vext01 commented Oct 21, 2015

snim2 commented Oct 22, 2015

vext01 commented Oct 22, 2015

vext01 commented Oct 22, 2015

snim2 commented Oct 22, 2015

vext01 commented Oct 22, 2015

snim2 commented Oct 22, 2015

vext01 commented Oct 22, 2015

vext01 commented Oct 22, 2015

vext01 commented Oct 22, 2015

snim2 commented Oct 22, 2015

vext01 commented Oct 22, 2015

snim2 commented Oct 22, 2015