<a href="https://colab.research.google.com/github/jchen6727/batchtk/blob/development/examples/colab_driveless/batchtk0.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Tutorial 0**

**Note 0** This tutorial will perform an install of the `batchtk` package within a temporary Google Colab notebook, then demonstrate using the `Dispatcher`, `Runner` and `Submit` classes to run a single job.


In [None]:
#jupyter 0
!pip install batchtk
import site
site.addsitedir('/usr/local/lib/python3.10/dist-packages')

**Note 1.0** Any `!<command>` is executed in a new standalone shell (equivalent to running `<command>` in a shell script, or within a newly opened terminal, then closing the terminal). These commands clone the `batchtk` repository, perform an `editable` install, then add the path of the package to the current jupyter runtime.

**Note 1.1** the `batchtk` package was designed with SOLID OOP principles in mind. Rather than using arguments to augment the control flow of a monolithic function, multiple classes allow for dynamic execution of relevant code.
The `base classes` which other classes inherit from include:

**Runner** which implements functionality for parsing provided `arguments` into any arbitrary scripts namespace and communicating with the **Dispatcher**

**Dispatcher**: which implements functionality for updating the **Submit** with `arguments`, calling **Submit** functions to execute the appropriate arbitrary script containing the **Runner** class, and monitoring the execution of of the arbitrary script

**Submit** which implements functionality for serializing the `arguments` to pass to the **Runner**, updating the **Template** with the `arguments` and communication protocols for use between the **Dispatcher** and **Runner**, and starting the arbitrary script containing the **Runner** class

**Template** which implements functionality for generating and formatting arbitrary string templates for **Submit**

Other classes `extend and inherit` these `base classes`, allowing for more complex interactions, for instance, by adding custom scripting, support for custom communication protocols (`stdio`, `filesystem`, `socket`), etc.

Let's look at these classes, including the `base classes` and custom `extended classes`

In [None]:
#jupyter 1
from batchtk.runtk import Runner, SocketRunner, FileRunner, INETDispatcher, SHDispatcher, Template, Submit, SHSubmitSOCK
sockrunner = SocketRunner() # SocketRunner inherits Runner and extends it with functionality for communicating through socket.socket functions
filerunner = FileRunner()   # FileRunner inherits Runner and extends it with functionality for communicating through file I/O
dispatcher = INETDispatcher(project_path = "/content", submit = SHSubmitSOCK()) # Dispatcher requires instantiation of a Submit object, inherits Dispatcher and extends it with functionality for communicating through file socket.socket functions related to INET (TCP) protocol
template = Template("""{sh}.sh, {foo}.run, {bar}.out, {baz}.sgl, {sockname}\necho 'hello'""") # Template requires a string to call
submit = Submit(submit_template=template, script_template=template) #Submit requires both a submit and script template (both are Template instances)
socksh = SHSubmitSOCK() #SHSubmitSOCK is a custom class inheriting submit which uses .sh scripts to execute code and establishes handling for socket.socket communications

**Note 2** Using the `help()` function on any of these classes helps demonstrate both the base functionality and extended functionality. (call on the imported class name, not the instance). Alternatively, the `dir()` function can be called on the created instance to see it's `attributes` and `methods`

In [None]:
#jupyter 2
print("-----help(SocketRunner)-----")
help(SocketRunner)
print("\n-----help(INETDispatcher)-----\n")
help(INETDispatcher)
print("\n-----dir(socksh)-----")
dir(socksh)

**Note 3** Let's look closer at the custom submit `SHSubmitSock` by printing the generated instance

Note that it uses the base `__repr__` from `Submit` to handle print statements (verify this with `help()`), it shows the command executed by the Submit to call the arbitrary script (`submit:`), the script file that will be run (`script:`), the path where the script file will be written (`path:`), the files/communication addresses which allow the `dispatcher` and `runner` instances to communicate (`handles:`) and the keyword arguments that must be filled (`kwargs:`).

special arguments (`project_path`, `output_path`, `label` and `env`) will be automatically handled by the **Dispatcher** and **Submit** prior to job creation by default. `env` specifically will be filled with the serialized arguments for the **Runner**. (`sockname`) is a unique argument that allows for establishing communication between specialized **Dispatcher** and **Runner** scripts. It is handled by **Dispatcher** automatically. (`command`) is something that is updated to our preference (i.e. to some variant of `mpiexec -np 4 nrniv...` or some other call)

**Note 3.1**
notice the use of `nohup` and piping of `stderr` and `stdout` to the `{output_path)/{label}.run` file which prevents blocking by the `{command}`.

In [None]:
#jupyter 3
print(socksh)

**Note 4.0** Note that in the case of `socksh`, beyond the `runtk.SUBMIT` and `runtk.STDOUT` handles implemented in other `Submit` classes, it also includes a handle specific for socket based communication (`runtk.SOCKET`).

**Note 4.1**
Now let's create a custom `Submit` class that can handle executing arbitrary scripts in a `Google Colab` environment. In this custom submission class, we can have some arbitrary `FOO`, `BAR`, `BAZ` values passed to the environment, to be defined by our `Dispatcher`. We can also have it provide a `process ID` (`pid`) which we can capture and return as a `job_id`.

By inheriting from `Submit`, we preserve the original functionality and relevant interfaces, and then extend with our own. For instance, here we call the `__repr__` class method which is inherited from `Submit` which displays the relevant strings for the submit command, the script, the path of the script, the handles and keyword arguments.

In [None]:
#jupyter 4
class GCSubmit(Submit):
  def __init__(self):
    # creates a Submit with the templates we define
    super().__init__(
        submit_template = Template("sh {output_path}/{label}.sh"),
        script_template = Template("""\
#!/bin/bash
cd {project_path}
export FOO={foovalue}
export BAR={barvalue}
export BAZ={bazvalue}
{env}
nohup python /content/runner.py > {output_path}/{label}.run 2>&1 &
pid=$!
echo $pid >&1
"""
        )
    )
  def submit_job(self):
    # using this submit_job, we can add some handling of stdout, job failure (i.e. if stdout does not return an integer value as expected),
    # extending the functionality of Submit with this exception handling.
    proc = super().submit_job()
    try:
      self.job_id = int(proc.stdout)
    except Exception as e:
      raise(Exception("{}\nJob submission failed:\n{}\n{}\n{}\n{}".format(e, self.submit, self.script, proc.stdout, proc.stderr)))
    return self.job_id

gcs = GCSubmit()
print("-----print(gcs)-----")
print(gcs) # inherited functionality from the base Submit class
print("\n-----dir(gcs)-----\n")
dir(gcs)


**Note 5** Before supplying the `gcs` instance to the `SHDispatcher()` constructor, we can permanently update the templates by calling the `update_templates()` method. Now every job created by the `dispatcher` instance will share this updated value.

In [None]:
#jupyter 5
gcs.update_templates(foovalue='"A"') #update the template instance (this will permanently update the template)
print(gcs.templates.script) #print the template script which contains the 'export FOO={foovalue}'

**Note 6** Now we can pass the custom submit instance to our **SHDispatcher** which extends the base **Dispatcher** with support for `shell` (/`bash`/`powershell`/`z shell` ...) scripts

**Note 6.1** Additionally, note the other arguments passed to `SHDispatcher`, which include a `project_path`, which specifies the directory of input files, the `output_path`, which specifies where the files generated by the dispatcher instance and the shell script should be written to, and the `gid` a unique identifier for the dispatcher which acts as a label.

In [None]:
#jupyter 6
dispatcher = SHDispatcher(project_path='/content', output_path='./batch', submit=gcs, gid='example')
print(dispatcher.submit) # prints the dispatcher.submit

**Note 6** To pass arguments to the **Runner** script, we will call `update_env` from the dispatcher. The argument is a dictionary of `key:value` pairs.


In [None]:
#jupyter 7
dispatcher.update_env({'strvalue': '1',
                       'intvalue': 2,
                       'fltvalue': 3.0})
print(dispatcher.submit)

**Note 6** Upon job creation through the `.create_job()` method, the `{env}`, `{project_path}`, `{output_path}` and `{label}` are filled by the dispatcher class based on arguments passed to it. We will also pass the method the `keyword:arg` pairs for `foovalue`, `barvalue` and `bazvalue` to fill in all the strings of the

the `{env}` will be replaced with a custom `serialization` (in this case, exported string values) that can then be deserialized by the **runner** in the `runner.py` script, now let's create a job and review the job submission.

In [None]:
#jupyter 8
dispatcher.create_job(barvalue='"B"',
                      bazvalue='"C"')
print(dispatcher.submit) # see the new submit

**Note 7** Let's download and check a basic `runner.py` using the `Runner` class.

In [None]:
#jupyter 9
!curl https://raw.githubusercontent.com/jchen6727/batchtk/development/examples/colab/basic_runner.py > /content/runner.py
!cat /content/runner.py

**Note 10** Notice that the runner.py script automatically captures the `arguments` passed in `{env}` in the `runner.mappings` attribute as a dictionary of `key:value` pairs. We'll have it print the job ID with `os.getpid()` and then print the `arguments` passed to it, which we will be able to see in `/content/example.run` after submitting the job via the `dispatcher.submit_job()` function

In [None]:
#jupyter 10
dispatcher.submit_job()
dispatcher.job_id # prints the job_id, should match the printed pid from the runner.py script

**Note 11** Finally, let's evaluate the output of the `job` created by the dispatcher instance. From the contents of the shell script, we see that the `stdout` (and `stderr`) of the job is captured in a filename stored in one of the dispatcher `handles` (`runtk.STDOUT`).

In [None]:
from batchtk import runtk #retrieve runtk constants
print("contents of {}:\n".format(dispatcher.handles[runtk.STDOUT])) #see the handle
!cat {dispatcher.handles[runtk.STDOUT]} #print the contents of the handle