# Job

A `hyrun.job.Job` is a list of tasks (each represented by `hyset.v2.RunSettings`) and (optionally) a list of outputs objects (represented by `hyrun.Result`), together with further information about the job:

- `hash` : hash for identifying the job within the workflow and in the database
- `db_id`: index of the job in the database
- `job_id` : the submission if of the job for identifying the job on the cluster
- `status`: status of the job (`str` as defined by slurm)
- `job_script`: `hytools.file.File` instance defining commands that are send to the scheduler for execution
- `metadata`: a dictionary containing further attributes, which are not used by `hyrun`, e.g., wall-time



In [1]:
from hyrun import Job

Jobs are initiated by a dict or from (a list of) `hyset.v2.RunSettings`, which represent the tasks of a job.
Most of the other parameters, such as `job_script` an `hash` are constructed during `hyrun.init()`.
The `job_id` is available once the job has been submitted.

In [2]:
from hyset.v2 import RunSettings
job = Job(tasks=RunSettings(database={'database_name':'mydb'},
                            print_level='critical'),
          metadata={'name': 'myjob'})
print(job)

Job(job_id=None, db_id=None, job_script=None, status=None, hash=None, metadata={'name': 'myjob'}, tasks=[RunSettings(output_file=None, stdout_file=File(name='stdout.out', content=None, path=PosixPath('stdout.out'), host=None, folder=PosixPath('.')), stderr_file=File(name='stderr.out', content=None, path=PosixPath('stderr.out'), host=None, folder=PosixPath('.')), stdin_file=None, files_to_write=[], files_for_restarting=[], files_to_rename=[], files_to_parse=[], files_to_remove=[], files_to_zip=[], files_to_tar=[], files_to_send=[], files_not_to_transfer=[], data_files=[], tar_all_files=False, zip_all_files=False, use_relative_paths=None, overwrite_files=False, conda_env=None, conda_launcher=[], container_image=None, container_executable=None, container_mounts=None, container_type=None, container_launcher=[], container_args=[], work_dir_container=None, work_dir_local=PosixPath('/Users/tilmann/Documents/Documents/work/hylleraas/hyrun/docs/src/user_guide'), scratch_dir_local=PosixPath('/Us

## Multiple tasks

A job can have multiple tasks, i.e. calculations performed consecutively within the same job script:


In [3]:
rs0 = RunSettings(connection={'host': 'host0'}, print_level='error')
job = Job(tasks=[rs0 for _ in range(3)])

hyset_1 - ERROR : Memory per CPU not set in compute settings.


> **Note:** All tasks in a job have to have identical settings for: 1. connection
2. scheduler
3. database

In [4]:
rs1 = RunSettings(connection={'host': 'host1'})
try:
    job = Job(tasks=[rs0, rs1])
except ValueError:
    print('tasks have different parameters')
else:
    print('all ok')


hyset_1 - ERROR : Memory per CPU not set in compute settings.
tasks have different parameters


## Database integration

Jobs are the central objects that are used in `hyrun`. Below is a typical example of what happens during a run: 

In [5]:
from hyset.v2 import RunSettings
from hytools.file import File
from hytools.logger import get_logger
from hydb import get_database

def hyrun_internal(job):
    """Stuff that hyrun does internally."""
    # generate job_script and hash
    job.job_script = File(name='job_script',
                          content='import sys\nprint(sys.executable)')
    job.set_hash()
    # add job to database, as defined in first run_settings
    db = get_database(job.tasks[0].database.database_name)
    #db.db.truncate()
    db_id = db.insert_one(job, immutable={'hash': job.hash}) 
    if db_id:
        job.db_id = db_id
    #assert db.insert_one(job, immutable={'hash': job.hash}) == None
    # update job
    job.job_id = 137
    job.status = 'running'
    #update db entry
    db.update_one(entry=job, db_id=job.db_id)
    
    return db, job

job = Job(tasks=RunSettings(database={'database_name':'mydb'},
                            print_level='critical'),
          metadata={'name': 'myjob'})
    




The job is stored as a `dict` in a database and can accessed via the `job.db_id` or by searching for an attribute:

In [6]:
db, job_updated = hyrun_internal(job)

print(db[job_updated.db_id].get('status'))

entry = db.search_one(hash=job.hash)
print(isinstance(entry, dict))

print(entry.get('status'))





running
True
running


Finally, the entire `hyrun.job.Job` object can be reconstructed from the database entry:

In [7]:
# resolve the objects in the db entry
job = db.search_one(hash=job.hash, resolve=True)
print(f'\n-- Job (updated) --\n', job)

print(type(job))

hyset_1 - ERROR : Memory per CPU not set in compute settings.

-- Job (updated) --
 Job(job_id=137, db_id=1, job_script=File(name='job_script', content='import sys\nprint(sys.executable)', path=None, host=None, folder=None), status='running', hash='bc86a255ba412df2ee9b4039ec716865d09dc76d1517e79ae814e85fab898279', metadata={'name': 'myjob'}, tasks=[RunSettings(output_file=None, stdout_file=File(name='stdout.out', content=None, path=PosixPath('stdout.out'), host=None, folder=PosixPath('.')), stderr_file=File(name='stderr.out', content=None, path=PosixPath('stderr.out'), host=None, folder=PosixPath('.')), stdin_file=None, files_to_write=[], files_for_restarting=[], files_to_rename=[], files_to_parse=[], files_to_remove=[], files_to_zip=[], files_to_tar=[], files_to_send=[], files_not_to_transfer=[], data_files=[], tar_all_files=False, zip_all_files=False, use_relative_paths=None, overwrite_files=False, conda_env=None, conda_launcher=[], container_image=None, container_executable=None, co