<a href="https://colab.research.google.com/github/jchen6727/colab/blob/master/j4/pubtk0.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Tutorial 0**

**Note 0** This tutorial will set up a persistent virtual environment within Google Drive, install the development version of the `pubtk` python package to the persistent virtual environment and then demonstrate its use.


**Note 0.1**
To mount google drive and set up a virtual environment, we'll need to use the `google.colab.drive.mount()` function which comes as part of the preinstalled python packages in google colab (exists in every session), as well as `virtualenv` which does not come preinstalled, and needs to be installed either on a per-session basis through pip, or maintained within a separate persistent virtual environment

**Note 0.2**
if you already have created a virtual environment and the tutorial development environment in prior sessions, can skip executing cells `jupyter 1` through `jupyter 4` after executing `jupyter 0`

In [1]:
#jupyter 0
from google.colab import drive # use drive.mount() to link Google Drive to
drive.mount('/content/drive')  # the google colab session

Mounted at /content/drive


**Note 1** The `!<command>` are executed in a new standalone shell (equivalent to running `<command>` within a newly opened terminal). The first installs virtualenv through pip, the second uses virtualenv to install a python environment that will be stored persistently on the linked Google Drive.

In [None]:
#jupyter 1
!pip install virtualenv                  # install virtualenv to create a
!virtualenv /content/drive/MyDrive/venv  # python environment within linked Drive

**Note 2** when using google colab, because `!<command>` is executed in a new standalone shell, any persistent changes (such as `!cd`) are lost when the line completes. One workaround is using `;` to execute multiple script calls within one command, as well as `;\` which allows you to escape the newline. However, keep in mind that shell commands are finicky with spacing   

In [None]:
#jupyter 2
!cd /content/drive/MyDrive
!pwd                           # prints /content
!cd /content/drive/MyDrive;\
pwd;\
echo $(pwd)                    # prints /content/drive/MyDrive, twice

**Note 3** One way of using our virtualenv is sourcing the activation script prior to any `python` or `pip` `!<command>`; another is by specifying the full path to the binary

In [None]:
#jupyter 3
!source /content/drive/MyDrive/venv/bin/activate #doesn't do anything so...
!which python #the python command still points to the default, instead use...
!source /content/drive/MyDrive/venv/bin/activate;which python
#which will call the venv python, or use a static path for python/pip etc.
!which /content/drive/MyDrive/venv/bin/python #venv python

**Note 4** now that we are familiarized with `!<command>` and the `virtualenv`, we will create our workspace for experimenting with the `pubtk` package

In [None]:
#jupyter 4
!mkdir /content/drive/Mydrive/dev
!git clone --depth 1 https://github.com/jchen6727/pubtk.git /content/drive/MyDrive/dev/pubtk
!/content/drive/MyDrive/venv/bin/pip install -e /content/drive/MyDrive/dev/pubtk

**Note 5** after establishing the virtual environment, we need to link the packages in the virtual environment to the current `google colab` jupyter notebook session. This can be done by adding the virtual environment packages to our search path with `site.addsitedir()` after which we are free to use packages from our python virtual environment.

In [None]:
#jupyter 5
import site
site.addsitedir('/content/drive/MyDrive/venv/lib/python3.10/site-packages')

**Note 6** the `pubtk` package was designed with SOLID OOP principles in mind. Rather than using arguments to augment the control flow of a monolithic function, multiple classes allow for dynamic execution of relevant code.
The `base classes` which other classes inherit from include:

**Runner** which implements functionality for parsing provided `arguments` into any arbitrary scripts namespace and communicating with the **Dispatcher**

**Dispatcher**: which implements functionality for updating the **Submit** with `arguments`, calling **Submit** functions to execute the appropriate arbitrary script containing the **Runner** class, and monitoring the execution of of the arbitrary script

**Submit** which implements functionality for serializing the `arguments` to pass to the **Runner**, updating the **Template** with the `arguments` and communication protocols for use between the **Dispatcher** and **Runner**, and starting the arbitrary script containing the **Runner** class

**Template** which implements functionality for generating and formatting arbitrary string templates for **Submit**

Other classes `extend and inherit` these `base classes`, allowing for more complex interactions, for instance, by adding custom scripting, support for custom communication protocols (`stdio`, `filesystem`, `socket`), etc.

Let's look at these classes, including the `base classes` and custom `extended classes`

In [None]:
#jupyter 6
from pubtk.runtk import Runner, SocketRunner, FileRunner, INET_Dispatcher, SH_Dispatcher, Template, Submit, ZSHSubmitSOCK
sockrunner = SocketRunner() # SocketRunner inherits Runner and extends it with functionality for communicating through socket.socket functions
filerunner = FileRunner()   # FileRunner inherits Runner and extends it with functionality for communicating through file I/O
dispatcher = INET_Dispatcher(submit = ZSHSubmitSOCK()) # Dispatcher requires instantiation of a Submit object, inherits Dispatcher and extends it with functionality for communicating through file socket.socket functions related to INET (TCP) protocol
template = Template("""{foo}, {bar}, {baz}""") # Template requires a string to call
submit = Submit(submit_template=template, script_template=template) #Submit requires both a submit and script template (both are Template instances)
zsh = ZSHSubmitSOCK() #ZSHSubmitSOCK is a custom class inheriting submit which uses .zsh scripts to execute code and establishes handling for socket.socket communications

**Note 7** Using the `help()` function on any of these classes helps demonstrate both the base functionality and extended functionality.

In [None]:
#jupyter 7
help(Submit)
dir(Submit)

**Note 8** Let's look closer at the custom submit `ZSHSubmitSock` by printing the generated instance

Note that it uses the base `__repr__` from `Submit` to handle print statements (verify this with `help()`), it shows the command executed by the Submit to call the arbitrary script, the script file that will be run, as well as the contents of the script file.

special arguments (`cwd`, `label` and `env`) will be automatically handled by the **Dispatcher** and **Submit** prior to job creation by default. `env` specifically will be filled with the serialized arguments for the **Runner**. (`sockname`) is a unique argument that allows for establishing communication between specialized **Dispatcher** and **Runner** scripts. It is handled by **Dispatcher** automatically. (`command`) is something that is updated to our preference (i.e. to some variant of `mpiexec -np 4 nrniv...` or some other call)

**Note 8.1**
notice the use of `nohup` which prevents blocking by the `{command}`.

In [None]:
#jupyter 8
print(zsh)

**Note 9**
Now lets create a custom submission class that can handle executing arbitrary scripts in a `Google Colab` environment. For instance, we know that the command to call for the appropriate python script needs to be statically linked to our `virtualenv` and the script as well needs to be referenced from the `/content` root. Additionally, we can say that we arbitrarily want to have some `FOO`, `BAR`, `BAZ` values passed to the environment, to be defined by our `Dispatcher` and we want it to provide a `process ID` (`pid`) to `stdout` and then capture that `stdout` and return it as a `job_id`.

again, by inheriting from `Submit`, we preserve the original functionality and relevant interfaces, and then extend with our own.

In [None]:
#jupyter 9
class GCSubmit(Submit):
  def __init__(self):
    # creates a Submit with the templates we define
    super().__init__(
        submit_template = Template("sh {cwd}/{label}.sh"),
        script_template = Template("""\
#!/bin/bash
export FOO={foovalue}
export BAR={barvalue}
export BAZ={bazvalue}
{env}
nohup /content/drive/MyDrive/venv/bin/python /content/drive/MyDrive/dev/runner.py > {cwd}/{label}.run &
pid=$!
echo $pid >&1
"""
        )
    )
  def submit_job(self):
    # using this submit_job, we can add some handling of stdout, job failure (i.e. if stdout does not return an integer value as expected),
    # extending the functionality of Submit with this exception handling.
    proc = super().submit_job()
    try:
      self.job_id = int(proc.stdout)
    except Exception as e:
      raise(Exception("{}\nJob submission failed:\n{}\n{}\n{}\n{}".format(e, self.submit, self.script, proc.stdout, proc.stderr)))
    return self.job_id

gcs = GCSubmit()
print(gcs) # inherited functionality from the base Submit class

**Note 10** Now we can pass the custom submission to our **SH_Dispatcher** which extends the base **Dispatcher** with support for `shell` (/`bash`/`powershell`/`z shell` ...) scripts

In [None]:
#jupyter 10
dispatcher = SH_Dispatcher(cwd='/content', submit=gcs, gid='example')
print(dispatcher.submit) # prints the dispatcher.submit

**Note 11** To pass arguments to the **Runner** script, we will call `update_env` from the dispatcher. The argument is a dictionary of `key:value` pairs. Additionally, we can update the arbitrary `FOO`, `BAR` and `BAZ` values from the `dispatcher.submit`

In [None]:
#jupyter 11
dispatcher.update_env({'strvalue': '1',
                       'intvalue': 2,
                       'fltvalue': 3.0})
dispatcher.submit.update_templates(foovalue='"A"',
                                   barvalue='"B"',
                                   bazvalue='"C"')
print(dispatcher.submit)

**Note 12** Upon job creation, the `{env}`, `{cwd}` and `{label}` are filled.

the `{env}` will be replaced with a custom `serialization` (in this case, exported string values) that can then be deserialized by the **runner** in the `runner.py` script

In [None]:
#jupyter 12
dispatcher.create_job()
print(dispatcher.submit) # see the new submit

**Note 13** Let's download and check a basic `runner.py` using the `Runner` class.

In [2]:
#jupyter 13
!rm /content/drive/MyDrive/dev/runner.py
!curl https://raw.githubusercontent.com/jchen6727/colab/master/j4/basic_runner.py > /content/drive/MyDrive/dev/runner.py
!cat /content/drive/MyDrive/dev/runner.py

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    14  100    14    0     0     50      0 --:--:-- --:--:-- --:--:--    51
404: Not Found

**Note 14** Notice that the runner.py script automatically captures the `arguments` passed in `{env}` in the `runner.mappings` attribute as a dictionary of `key:value` pairs. We'll have it print the job ID with `os.getpid()` and then print the `arguments` passed to it, which we will be able to see in `/content/example.run` after submitting the job via the `dispatcher.submit_job()` function

In [None]:
#jupyter 14
dispatcher.submit_job()
dispatcher.job_id # prints the job_id, should match the printed pid from the runner.py script