# Implement new pyiron jobs

Author: [jan-janssen](https://jan-janssen.com)

While the first two tutorials covered the use of existing simulation codes e.g. LAMMPS in pyiron and how multiple calculation can be combined in a workflow using the `GenericMasters` jobs like the `Murnaghan` to calculate energy volume curves, the implementation of new simulation codes and utilities is another strength of pyiron. To emphasise the application of pyiron beyond the scope of atomistic simulations, the pyiron project was split in two parts `pyiron_atomistics` for the atomistic simulation codes and utilities and `pyiron_base` for the general job management of pyiron which can be used to define your own job classes. 

In the following `pyiron_base` is used to demonstrate the implementation of two example jobs, the first one calling an external executable and the second calling a python function and representing it as pyiron job object. For each of two categories there are multiple slightly different variants how pyiron job objects can be defined. 

In analogy to the previous tutorials we enable the autocompletion for multiple levels using:

In [1]:
%config IPCompleter.evaluation='unsafe'

## External Executable
While an increasing number of simulation codes provide Python bindings so they can be intergrated in a Python Function call, there are still a number of external executables one might want to integrate in a given workflow. For the example here the executable is the shell command:
```
cat input > output
```
It copies the content of the input file to the output file, still this command could be replaced with any other call of an external executable.

### GenericJob
Already the initial [pyiron paper](https://doi.org/10.1016/j.commatsci.2018.07.043) introduced the implementation of custom pyiron job objects in Appendix C. Here the code is slightly modified to be compatible with the recent version of `pyiron_base`. 

The `GenericJob` class as well as the `GenericParameters` class can both be imported from `pyiron_base`. In addition, the `os` module from the standard library is imported to define the path to the input and output files.

In [2]:
import os
from pyiron_base import GenericJob, GenericParameters

The job class `ToyJob` in this example is derived from the `GenericJob` class and defines four functions:
* `write_input()` writes the input which based on the input stored in the `ToyJob` class to the working directory before the external executable is called. 
* `collect_output()` reads the input from the working directory and stores it in the HDF5 file of the job object.
* `to_hdf()` stores the current instance of the `ToyJob` class in the HDF5 file.
* `from_hdf()` reloads a previously stored instance of the `ToyJob` class from the HDF5 file.

In [3]:
class ToyJob(GenericJob):
    def __init__(self, project, job_name):
        super(ToyJob, self).__init__(project, job_name)
        self.input = ToyInput()
        self.executable = "cat input > output"

    def write_input(self):
        self.input.write_file(
            file_name="input",
            cwd=self.working_directory,
        )

    def collect_output(self):
        file = os.path.join(self.working_directory, "output")
        with open(file) as f:
            line = f.readlines()[0]
            energy = float(line.split()[1])
        with self.project_hdf5.open("output/generic") as h5out:
            h5out["energy_tot"] = energy

    def to_hdf(self, hdf=None, group_name=None):
        super(ToyJob, self).to_hdf(
            hdf=hdf,
            group_name=group_name
        )
        with self.project_hdf5.open("input") as h5in:
            self.input.to_hdf(h5in)

    def from_hdf(self, hdf=None, group_name=None):
        super(ToyJob, self).from_hdf(
            hdf=hdf,
            group_name=group_name,
        )
        with self.project_hdf5.open("input") as h5in:
            self.input.from_hdf(h5in)

In addtion, to the `ToyJob` class we also define the `ToyInput` class derived from the `GenericParameters`. This class is primarily used to define the default parameters of the `ToyJob` class and simplifies the writing of the input files in the `write_input()` function.

In [4]:
class ToyInput(GenericParameters):
    def __init__(self, input_file_name=None):
        super(ToyInput, self).__init__(
            input_file_name=input_file_name,
            table_name="input"
        )
    
    def load_default(self):
        self.load_string("input_energy 100")

Following the definition of the two classes `ToyJob` and `ToyInput` we can use the `ToyJob` class in analogy to all the other pyiron job classes. To demonstrate the functionality of the `ToyJob` class we do a simple test calculation. 

In [5]:
from pyiron_base import Project

In [6]:
pr = Project('test')

In [7]:
pr.remove_jobs(recursive=True, silently=True)

0it [00:00, ?it/s]

After the project is imported and a new project is created which does not contain any previous calculation the `ToyJob` class can be used in the `create_job()` function to create a corresponding pyiron job object. In addition, the `job_name` is provided to distinguish multiple jobs of the same type. 

In [8]:
job = pr.create_job(ToyJob, job_name="toy")

By using the `GenericParameters` class as a basis for the `ToyInput` the input of the `ToyJob` is nicely rendered as pandas dataframe and can be modified just like the LAMMPS input in the previous examples:

In [9]:
job.input

Unnamed: 0,Parameter,Value,Comment
0,input_energy,100,


After defining the input the job object is executed using the `run()` function and the output can be inspected using the `job.content` propoerty again in analogy to the LAMMPS job class in the previous tutorials.

In [10]:
job.run()

The job toy was saved and received the ID: 1


In [11]:
job.content['output/generic/energy_tot']

100.0

### Template Job
One limitation of using the `GenericJob` class is the requirement for the user to define the `to_hdf()` and `from_hdf()` functions which handle the serialization and deserialization of the job class in the HDF5 files. As many jobs just use the same kind of standard input and output structures represented as a dictionary, the `TemplateJob` class defines more extensive defaults, so the user no longer has to define the `to_hdf()` and `from_hdf()` functions. Apart from this the `TemplateJob` class is based on the `GenericJob` class and behaves exactly the same. 

In [12]:
from pyiron_base import TemplateJob

In [13]:
class ToyTemplateJob(TemplateJob):
    def __init__(self, project, job_name):
        super(ToyTemplateJob, self).__init__(project, job_name)
        self.executable = "cat input > output"

    def write_input(self):
        with open(os.path.join(self.working_directory, "input"), "w") as f:
            f.write(str(job.input.energy))

    def collect_output(self):
        with open(os.path.join(self.working_directory, "output"), "r") as f:
            self.output.energy = f.read()
        self.to_hdf()

It is important to add the `to_hdf()` function call at the end of the `collect_output()` function to store the output after it was parsed, otherwise the output is lost. The usage in a project follows in analogy to the `GenericJob` class example above. 

Starting by defining a `Project` and removing the jobs in this project:

In [14]:
pr = Project('testtemp')

In [15]:
pr.remove_jobs(recursive=True, silently=True)

0it [00:00, ?it/s]

The `ToyTemplateJob` class can again be used to create pyiron job objects using the `create_job()` function of the project object. The input is assigned to the `job.input` property and afterwards the job is executed by calling the `run()` function. 

In [16]:
job = pr.create_job(ToyTemplateJob, job_name="toytemp")

In [17]:
job.input.energy = 100

In [18]:
job.run()

The job toytemp was saved and received the ID: 2


After the successful execution the output is available in the `job.output` property just like it was set in the `collect_output()` function above.

In [19]:
job.output.energy

'100'

### For testing
While defining job classes based on the `GenericJob` or `TemplateJob` class is commonly preferable for frequently used job types, pyiron also provides an simplified interface to dynamically define pyiron jobs during the run time. Again, a `write_input()` and `collect_output()` function are require to interface the pyiron framework to the external executable. 

It is important that the supported input parameters for the `write_input()` function are only `input_dict` and `working_directory` and for the `collect_output()` function just the `working_directory` and the `collect_output()` function should return a dictionary with the corresponding outputs. Any other function signature won't work. 

In [20]:
def write_input(input_dict, working_directory):
    with open(os.path.join(working_directory, "input"), "w") as f:
        f.write(str(input_dict["energy"]))

In [21]:
def collect_output(working_directory):
    with open(os.path.join(working_directory, "output"), "r") as f:
        return {
            "energy": f.read()
        }

In analogy to the two previous examples a new project is defined and the existing jobs in this project are removed

In [22]:
pr = Project('template')

In [23]:
pr.remove_jobs(recursive=True, silently=True)

0it [00:00, ?it/s]

The process of creating a pyiron job class from the `write_input()` function and the `collect_output()` function is hidden in the `create_job_class()` function. This function takes both the `write_input()` function and the `collect_output()` function as an input in addition to a `class_name` to access the pyiron job class later on, a dictionary with default parameters `default_input_dict` and the external executable as string `executable_str`. 

In [24]:
pr.create_job_class(
    class_name="ToyJob",
    write_input_funct=write_input,
    collect_output_funct=collect_output,
    default_input_dict={  # Default Parameter 
        "energy": 100.0, 
    },
    executable_str="cat input > output",
)

After the pyiron job class is created using the `create_job_class()` function, the corresponding pyiron job class instances can be created using the `pr.create.job.<job class>()` function, by just providing the job name as an input parameter, in analogy to the LAMMPS class in the previous tutorials.

In [25]:
job = pr.create.job.ToyJob(job_name="toy")

Again the input is assigned to the pyiron job class using the input property, afterwards the job is executed by calling the `run()` function and finally the output can be inspected using the output property. 

In [26]:
job.input

In [27]:
job.run()

The job toy was saved and received the ID: 3


In [28]:
job.output

## Python Function
An alternative to interfacing with external executables is integrating Python functions directly in pyiron by representing them as pyiron job objects. In this example this is demonstrated on a simple multiplication function. Still the multiplication function is just an example and commonly this approach is more useful for functions which need a long run time in the order of several minutes or even hours. 

In [29]:
def calculation(value):
    return 2 * value

### Python Template Job
Following the same idea as the `TemplateJob` the `PythonTemplateJob` provides a predefined pyiron job type to simplify the implementation of new job classes. It can again be imported directly from the `pyiron_base` module. 

In [30]:
from pyiron_base import PythonTemplateJob

The `ToyPythonJob` class derived from the `PythonTemplateJob` needs primarily two functions the `__init__()` function to set the default input and the `run_static()` function which is executed when `run()` is called on the corresponding job object. This calls the python function, stores its output in the pyiron job object, sets the status of the pyiron job object to finished and finally serializes the pyiron job object in an HDF5 file.   

In [31]:
class ToyPythonJob(PythonTemplateJob):
    def __init__(self, project, job_name):
        super().__init__(project, job_name) 
        self.input.value = None

    def run_static(self):
        self.output.result = calculation(self.input.value)
        self.status.finished = True
        self.to_hdf()

The user can afterwards interact with the `ToyPythonJob` class just like any other pyiron job class. Again a new project is created and then the pyiron job object is defined in this project, the input is set using the `job.input` property and afterwards the job is executed using the `run()` function.

In [32]:
pr = Project('template')

In [33]:
pr.remove_jobs(recursive=True, silently=True)

  0%|          | 0/1 [00:00<?, ?it/s]

In [34]:
job = pr.create_job(ToyPythonJob, job_name="toy")

In [35]:
job.input.value = 5

In [36]:
job.run()

The job toy was saved and received the ID: 3


In [37]:
job.output

In [38]:
job.output.result

10

### Function Wrapper
For testing python functions which might only be used in a given notebook, pyiron also providews a `wrap_python_function()` function in analogy to the `create_job_class()` function introduced above. The Python function is simply provided as an input to the `wrap_python_function()` function which then returns a job class object the user can interact with. 

In [39]:
def calculation(value):
    return 2 * value

In [40]:
pr = Project('wrapper')

In [41]:
pr.remove_jobs(recursive=True, silently=True)

0it [00:00, ?it/s]

In [42]:
job_calc = pr.wrap_python_function(calculation)

In [43]:
job_calc.input.value = 100

In [44]:
job_calc.run()

The job calculation_dddbae7c4e7af8da1929d3cde1cc8929 was saved and received the ID: 4


In [45]:
job_calc.output

### Job decorator
Finally, to follow the process of calling the `wrap_python_function()` can be simplified further with the `@job` decorator. This decorator can be applied on any Python function an creates the corresponding job class. 

In [46]:
from pyiron_base import job

In [47]:
@job
def calculation(value):
    return 2 * value

In [48]:
pr = Project('decorator')

In [49]:
pr.remove_jobs(recursive=True, silently=True)

0it [00:00, ?it/s]

In [50]:
result = calculation(value=5, pyiron_project=pr)

In [51]:
result.pull()

The job calculation_62803b3985079eaedcea77395de6cecb was saved and received the ID: 5


10

## Summary 
In summary, while pyiron was initially developed for atomistic simulation it now provides a number of ways to integrate additional executables.