Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Easier session resume #41

Closed
vext01 opened this issue Sep 28, 2015 · 16 comments · Fixed by #74
Closed

Easier session resume #41

vext01 opened this issue Sep 28, 2015 · 16 comments · Fixed by #74

Comments

@vext01
Copy link
Member

vext01 commented Sep 28, 2015

When a benchmark session crashes, or when the system loses power, we often need to run again just what we need to.

Currently this is quite a manual process. The experimenter has to manually determine which work needs to be run, run them, then manually merge the new data into the existing data.

It would be better if krun could help us do this.

I imagine something like the following:

  • user invokes krun
  • krun realises a results file already exists
  • krun says partial results detected. Looks like we need to run: 4 executions of benchmark3 and 4 executions of benchmark 4, OK?
  • If user says yes, krun appends results to existing results.

This need not be an interactive process, there could be a -check-results and a -resume option on the CLI.

@ltratt
Copy link
Member

ltratt commented Oct 20, 2015

Is this implicitly the same as reboot mode? If so, should we merge it with that issue?

@vext01
Copy link
Member Author

vext01 commented Oct 21, 2015

No, it's separate. The session may crash in non-reboot mode, and we need a way to continue without over-writing results.

@ltratt
Copy link
Member

ltratt commented Oct 21, 2015

I should have said: is the code which picks up where we left off implicitly the same between the two modes?

@vext01
Copy link
Member Author

vext01 commented Oct 21, 2015

There will be sharing, yes.

@snim2
Copy link
Collaborator

snim2 commented Oct 22, 2015

@vext01 this issue is still unassigned. Do you want me to take this one along with #54?

@vext01
Copy link
Member Author

vext01 commented Oct 22, 2015

Sure. Go ahead.

@snim2 snim2 self-assigned this Oct 22, 2015
@vext01
Copy link
Member Author

vext01 commented Oct 22, 2015

I imagine we need to flesh out the working of this feature too. When I raised this, I did not have the reboot mechanism in mind.

@snim2
Copy link
Collaborator

snim2 commented Oct 22, 2015

OK, well your initial issue here suggests that krun starts by looking for a results file, and if one exists, that it checked against the current configuration to see if the results are complete.

If the results are not complete, and -resume has been passed as a CLI then the next execution can be started.

Each execution then needs to append to the JSON file, rather than writing the file out at the end of all the executions.

Also, there is no need for a file in /var/log or similar, to say which VM-benchmark pair was last executed, as the same information can be found in the results file.

In this scheme, #54 just needs to have a script in rc.local that starts krun with -resume switched on. So, to run in reboot-mode, a -reboot switch would be passed in, the file in rc.local needs to exist, and krun would call /sbin/reboot would be called after each execution.

Does that make sense?

@vext01
Copy link
Member Author

vext01 commented Oct 22, 2015

Yes, and I think this all sounds sensible. Go for it.

@snim2
Copy link
Collaborator

snim2 commented Oct 22, 2015

Edited that slightly.

@vext01
Copy link
Member Author

vext01 commented Oct 22, 2015

This is separate, but I would also like a way to re-run a subset of experiments.

Using the scheme you outline above, you can re-run stuff by deleting entries from the json bz2 file. This is quite fiddly as you have to uncompress and then locate the right lines to remove in a results file potentially with tens of thousands of lines, so I wonder if we could write a tiny tool to remove results by key. Thoughts?

@vext01
Copy link
Member Author

vext01 commented Oct 22, 2015

Note also that krun currently dumps a bz json file after each execution, not just once at the end. This is to protect against crashing.

@vext01
Copy link
Member Author

vext01 commented Oct 22, 2015

WRT: Does krun need to know if it's in reboot mode? I suggest adding a boolean config option REBOOT_MODE.

@snim2
Copy link
Collaborator

snim2 commented Oct 22, 2015

OK, I think it would be helpful for testing this to add the examples/ directory back in, with a config that conforms to the current structure. Does that sound sensible?

@vext01
Copy link
Member Author

vext01 commented Oct 22, 2015

What would you test?

@snim2
Copy link
Collaborator

snim2 commented Oct 22, 2015

For now, I just mean that I need something to try out by hand.

Potentially I'm sure there are other things that Travis could test, but I think that would be a separate PR / Issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants