Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task checkpointing #3

Open
modora opened this issue Dec 19, 2017 · 3 comments
Open

Task checkpointing #3

modora opened this issue Dec 19, 2017 · 3 comments

Comments

@modora
Copy link
Owner

modora commented Dec 19, 2017

The checkpoint module allows for some long-running task to continue execution after exiting, either by interruption or exception. This module must allow for simple integration with existing scripts.

The implementation will be as follows:

  1. Create a context manager. Integration into scripts requires plugin developers to use the syntax
with checkpoint():  #checkpoint 1
# do stuff

with checkpoint():  #checkpoint 2
# do more stuff

cleanup()  # remove all checkpoints from disk
  1. The __enter__ method checks whether it should create a checkpoint, load a checkpoint, or skip.
  2. When creating a checkpoint, all variables in the local namespace are saved disk during the __enter__ method
  3. When loading a checkpoint, the module will scan the disk if a checkpoint already exists. The block will be skipped if the checkpoint sessions do not match (a checkpoint counter will suffice); otherwise, the checkpoint is loaded
  4. If a checkpoint is loaded, the variables stored are unpacked into the namespace and the block is executed.
  5. At the end of the script, all checkpoints should be cleaned up.

The checkpoints saved to disk must be encoded such that each checkpoint is easily distinguishable. At the moment, following metadata will be looked at:

Note if the script is still under development, then the above metadata may change.

Each checkpoint can be cleaned up in the __exit__ method removing the need for a dedicated cleanup function but there may be a case where you have code not encapsulated by checkpoints that raises an exception. The above template allows you to restore the last saved state.

@modora
Copy link
Owner Author

modora commented Dec 26, 2017

Task checkpointing is mostly done though the implementation is admittedly hacking. Implementing it as a context manager required some unconventional techniques.

For one, to be able to distinguish between two checkpoints within the same scope, the checkpoints would have to recognize they are distinct. This was done by looking at the line number and the filename where the checkpoint was used. Finding out where the checkpoint was used requires that the checkpoint look down the stack. For more assurance, the checkpoint looks all the way down to the main module. For the db encoding, a hash function is used to generate a unique filename.

Additionally, the checkpoint needs the save the namespace wherever it was called. Like the last case, it would have to look down the stack to the parent frame, this time, looking at the locals in the frame. Saving the namespace requires only being to read the locals while loading the namespace requires modifying the locals of the parent frame.

Lastly, if loading a checkpoint, the with-block would have to be skipped (since that's the point of a checkpoint). PEP 377 specifically addresses this request but was denied. There this hack but my tests have shown that it only works in Python 2. Currently, the implementation requires that the user writes their code in the form

with checkpoint() as cp:
    cp.skip_with()   # skip this block if we can load the checkpoint

    # actual code here

@modora
Copy link
Owner Author

modora commented Dec 26, 2017

There is a minor issue with variables that are created outside of with-blocks. Call these variables runtime variables. Loading a checkpoint with overwrite the value of the runtime variables to their historic state. This can be mitigated by keeping track of whether a variable is a runtime variable or a static variable (variables defined within with-blocks). This bookkeeping can be done within the checkpoint class and must persist throughout all the checkpoints wherever it was called .We'd have to look at the function of the parent frame then -- fortunately, this is possible... with more hacks.

@modora
Copy link
Owner Author

modora commented Dec 26, 2017

For anyone wondering, these hacks mostly use the inspect package

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant