## Create Example Jobs

To begin with, create a variety of AMS jobs with different settings, engines and calculation types.

In [1]:
from scm.plams import from_smiles, AMSJob, PlamsError, Settings, Molecule, Atom
from scm.libbase import UnifiedChemicalSystem as ChemicalSystem
from scm.input_classes.drivers import AMS
from scm.input_classes.engines import DFTB
from scm.utils.conversions import plams_molecule_to_chemsys


def example_job_dftb(smiles, task, use_chemsys=False):
    # Generate molecule from smiles
    mol = from_smiles(smiles)
    if use_chemsys:
        mol = plams_molecule_to_chemsys(mol)

    # Set up calculation settings using PISA
    sett = Settings()
    sett.runscript.nproc = 1
    driver = AMS()
    driver.Task = task
    driver.Engine = DFTB()
    sett.input = driver
    return AMSJob(molecule=mol, settings=sett)


def example_job_adf(smiles, task, basis, gga=None, use_chemsys=False):
    # Generate molecule from smiles
    mol = from_smiles(smiles)
    if use_chemsys:
        mol = plams_molecule_to_chemsys(mol)

    # Set up calculation settings using standard settings
    sett = Settings()
    sett.runscript.nproc = 1
    sett.input.AMS.Task = task
    sett.input.ADF.Basis.Type = basis
    if gga:
        sett.input.ADF.XC.GGA = gga
    return AMSJob(molecule=mol, settings=sett)


def example_job_neb(iterations, use_chemsys=False):
    # Set up molecules
    main_molecule = Molecule()
    main_molecule.add_atom(Atom(symbol="C", coords=(0, 0, 0)))
    main_molecule.add_atom(Atom(symbol="N", coords=(1.18, 0, 0)))
    main_molecule.add_atom(Atom(symbol="H", coords=(2.196, 0, 0)))
    final_molecule = main_molecule.copy()
    final_molecule.atoms[1].x = 1.163
    final_molecule.atoms[2].x = -1.078

    mol = {"": main_molecule, "final": final_molecule}

    if use_chemsys:
        mol = {k: plams_molecule_to_chemsys(v) for k, v in mol.items()}

    # Set up calculation settings
    sett = Settings()
    sett.runscript.nproc = 1
    sett.input.ams.Task = "NEB"
    sett.input.ams.NEB.Images = 9
    sett.input.ams.NEB.Iterations = iterations
    sett.input.DFTB

    return AMSJob(molecule=mol, settings=sett)

Now, run a selection of them.

In [2]:
from scm.plams import config, JobRunner

config.default_jobrunner = JobRunner(parallel=True, maxthreads=8)

smiles = ["CC", "C", "O", "CO"]
tasks = ["SinglePoint", "GeometryOptimization"]
engines = ["DFTB", "ADF"]
jobs = []
for i, s in enumerate(smiles):
    for j, t in enumerate(tasks):
        job_dftb = example_job_dftb(s, t, use_chemsys=i % 2)
        job_adf1 = example_job_adf(s, t, "DZ", use_chemsys=True)
        job_adf2 = example_job_adf(s, t, "TZP", "PBE")
        jobs += [job_dftb, job_adf1, job_adf2]

job_neb1 = example_job_neb(10)
job_neb2 = example_job_neb(100, use_chemsys=True)
jobs += [job_neb1, job_neb2]

for j in jobs:
    j.run()

[07.02|13:52:49] JOB plamsjob STARTED
[07.02|13:52:49] JOB plamsjob STARTED
[07.02|13:52:49] JOB plamsjob STARTED
[07.02|13:52:49] JOB plamsjob STARTED
[07.02|13:52:49] JOB plamsjob STARTED
[07.02|13:52:49] Renaming job plamsjob to plamsjob.002
[07.02|13:52:49] JOB plamsjob RUNNING
[07.02|13:52:49] JOB plamsjob STARTED
[07.02|13:52:49] Renaming job plamsjob to plamsjob.003
[07.02|13:52:49] JOB plamsjob STARTED
[07.02|13:52:49] JOB plamsjob STARTED
[07.02|13:52:49] Renaming job plamsjob to plamsjob.004
[07.02|13:52:49] JOB plamsjob STARTED
[07.02|13:52:49] Renaming job plamsjob to plamsjob.005
[07.02|13:52:49] Renaming job plamsjob to plamsjob.006
[07.02|13:52:49] Renaming job plamsjob to plamsjob.007
[07.02|13:52:49] Renaming job plamsjob to plamsjob.008
[07.02|13:52:49] JOB plamsjob.002 RUNNING
[07.02|13:52:49] JOB plamsjob.004 RUNNING
[07.02|13:52:49] JOB plamsjob.003 RUNNING
[07.02|13:52:49] JOB plamsjob.007 RUNNING
[07.02|13:52:49] JOB plamsjob.005 RUNNING
[07.02|13:52:49] JOB plam

## Job Analysis

### Adding and Loading Jobs

Jobs can be loaded by passing job objects directly, or loading from a path.

In [3]:
from scm.plams import JobAnalysis

In [11]:
ja = JobAnalysis(jobs=jobs[:10], paths=[j.path for j in jobs[10:-2]])

Jobs can also be added or removed after initialization.

In [12]:
ja.add_job(jobs[-2]).load_job(jobs[-1].path)

Path,Name,OK,Check,ErrorMsg
/Users/ormrodmorley/Documents/code/ams/amshome/scripting/scm/plams/examples/JobAnalysis/plams_workdir/plamsjob,plamsjob,True,True,
/Users/ormrodmorley/Documents/code/ams/amshome/scripting/scm/plams/examples/JobAnalysis/plams_workdir/plamsjob.002,plamsjob.002,True,True,
/Users/ormrodmorley/Documents/code/ams/amshome/scripting/scm/plams/examples/JobAnalysis/plams_workdir/plamsjob.003,plamsjob.003,True,True,
/Users/ormrodmorley/Documents/code/ams/amshome/scripting/scm/plams/examples/JobAnalysis/plams_workdir/plamsjob.004,plamsjob.004,True,True,
/Users/ormrodmorley/Documents/code/ams/amshome/scripting/scm/plams/examples/JobAnalysis/plams_workdir/plamsjob.005,plamsjob.005,True,True,
/Users/ormrodmorley/Documents/code/ams/amshome/scripting/scm/plams/examples/JobAnalysis/plams_workdir/plamsjob.006,plamsjob.006,True,True,
/Users/ormrodmorley/Documents/code/ams/amshome/scripting/scm/plams/examples/JobAnalysis/plams_workdir/plamsjob.007,plamsjob.007,True,True,
/Users/ormrodmorley/Documents/code/ams/amshome/scripting/scm/plams/examples/JobAnalysis/plams_workdir/plamsjob.008,plamsjob.008,True,True,
/Users/ormrodmorley/Documents/code/ams/amshome/scripting/scm/plams/examples/JobAnalysis/plams_workdir/plamsjob.009,plamsjob.009,True,True,
/Users/ormrodmorley/Documents/code/ams/amshome/scripting/scm/plams/examples/JobAnalysis/plams_workdir/plamsjob.010,plamsjob.010,True,True,


### Adding and Removing Fields

A range of common standard fields can be added or removed with dedicated methods.
Custom fields can also be added or removed, by defining a field key, value accessor and optional arguments like display name and value formatting.

In [13]:
(ja
 .remove_path_field()
 .add_formula_field()
 .add_smiles_field()
 .add_cpu_time_field()
 .add_sys_time_field()
 .add_settings_input_fields()
 .add_field("Energy", lambda j: j.results.get_energy(unit="kJ/mol"), display_name="Energy [kJ/mol]", fmt=".2f")
 .display_table(max_rows=5)
)

| Name         | OK    | Check | ErrorMsg                          | Formula           | Smiles            | CPUTime   | SysTime  | InputAmsTask         | InputAdfBasisType | InputAdfXcGga | InputAmsNebImages | InputAmsNebIterations | Energy [kJ/mol] |
|--------------|-------|-------|-----------------------------------|-------------------|-------------------|-----------|----------|----------------------|-------------------|---------------|-------------------|-----------------------|-----------------|
| plamsjob     | True  | True  | None                              | C2H6              | CC                | 0.195655  | 0.073372 | SinglePoint          | None              | None          | None              | None                  | -19594.01       |
| plamsjob.002 | True  | True  | None                              | C2H6              | CC                | 4.124700  | 0.405045 | SinglePoint          | DZ                | None          | None              | None                  | -3973.29        |
| ...          | ...   | ...   | ...                               | ...               | ...               | ...       | ...      | ...                  | ...               | ...           | ...               | ...                   | ...             |
| plamsjob.024 | True  | True  | None                              | CH4O              | CO                | 20.020822 | 1.745231 | GeometryOptimization | TZP               | PBE           | None              | None                  | -2900.38        |
| plamsjob.025 | False | False | NEB optimization did NOT converge | : CHN, final: CHN | : C=N, final: C#N | 0.518947  | 0.076174 | NEB                  | None              | None          | 9                 | 10                    | None            |
| plamsjob.026 | True  | True  | None                              | : CHN, final: CHN | : C=N, final: C#N | 1.145788  | 0.219883 | NEB                  | None              | None          | 9                 | 100                   | -14936.54       |

In addition to the fluent syntax, both dictionary and dot syntaxes are also supported for adding and removing fields.

In [14]:
import numpy as np

ja["AtomType"] = lambda j: [at.symbol for at in j.results.get_main_molecule()]
ja.Charge = lambda j: j.results.get_charges()
ja.AtomCoords = lambda j: [np.array(at.coords) for at in j.results.get_main_molecule()]

del ja["Check"]
del ja.SysTime

ja.display_table(max_rows=5, max_col_width=30)

| Name         | OK    | ErrorMsg                          | Formula           | Smiles            | CPUTime   | InputAmsTask         | InputAdfBasisType | InputAdfXcGga | InputAmsNebImages | InputAmsNebIterations | Energy [kJ/mol] | AtomType                          | Charge                            | AtomCoords                        |
|--------------|-------|-----------------------------------|-------------------|-------------------|-----------|----------------------|-------------------|---------------|-------------------|-----------------------|-----------------|-----------------------------------|-----------------------------------|-----------------------------------|
| plamsjob     | True  | None                              | C2H6              | CC                | 0.195655  | SinglePoint          | None              | None          | None              | None                  | -19594.01       | ['C', 'C', 'H', 'H', 'H', 'H',... | [-0.07293185 -0.07372966  0.02... | [array([-0.74763668,  0.041837... |
| plamsjob.002 | True  | None                              | C2H6              | CC                | 4.124700  | SinglePoint          | DZ                | None          | None              | None                  | -3973.29        | ['C', 'C', 'H', 'H', 'H', 'H',... | [-0.83243445 -0.83187828  0.27... | [array([-0.74763668,  0.041837... |
| ...          | ...   | ...                               | ...               | ...               | ...       | ...                  | ...               | ...           | ...               | ...                   | ...             | ...                               | ...                               | ...                               |
| plamsjob.024 | True  | None                              | CH4O              | CO                | 20.020822 | GeometryOptimization | TZP               | PBE           | None              | None                  | -2900.38        | ['C', 'O', 'H', 'H', 'H', 'H']    | [ 0.58673094 -0.60299606 -0.10... | [array([-0.36298962, -0.021487... |
| plamsjob.025 | False | NEB optimization did NOT conve... | : CHN, final: CHN | : C=N, final: C#N | 0.518947  | NEB                  | None              | None          | 9                 | 10                    | None            | ['C', 'N', 'H']                   | None                              | [array([0.45314036, 0.20055487... |
| plamsjob.026 | True  | None                              | : CHN, final: CHN | : C=N, final: C#N | 1.145788  | NEB                  | None              | None          | 9                 | 100                   | -14936.54       | ['C', 'N', 'H']                   | [-0.00713901 -0.21113018  0.21... | [array([0.56299762, 0.20528523... |

### Processing Data

Once an initial analysis has been created, the data can be further processed, depending on the use case.
For example, to inspect the difference between failed and successful jobs, jobs can be filtered down and irrelevant fields removed.

In [15]:
ja_neb = (ja
          .copy()
          .filter_jobs(lambda data: data["InputAmsTask"] == "NEB")
          .remove_field("AtomCoords")
          .remove_uniform_fields(ignore_empty=True))

ja_neb.display_table()

| Name         | OK    | CPUTime  | InputAmsNebIterations |
|--------------|-------|----------|-----------------------|
| plamsjob.025 | False | 0.518947 | 10                    |
| plamsjob.026 | True  | 1.145788 | 100                   |

Another use case may be to analyze the results from one or more jobs.
For this, it can be useful to utilize the `expand` functionality to convert job(s) to multiple rows.
During this process, fields selected for expansion will have their values extracted into individual rows, whilst other fields have their values duplicated.

In [27]:
ja_adf = (ja
          .copy()
          .filter_jobs(lambda data: data["InputAmsTask"] == "GeometryOptimization" and data["InputAdfBasisType"] is not None and data["Smiles"] == "O")
          .expand_field("AtomType")
          .expand_field("Charge")
          .expand_field("AtomCoords")
          .remove_uniform_fields())

ja_adf.display_table()

| Name         | CPUTime  | InputAdfBasisType | InputAdfXcGga | Energy [kJ/mol] | AtomType | Charge              | AtomCoords                                        |
|--------------|----------|-------------------|---------------|-----------------|----------|---------------------|---------------------------------------------------|
| plamsjob.017 | 2.957275 | DZ                | None          | -1316.30        | O        | -0.8416865250737331 | [-2.17062120e-04  3.82347777e-01  0.00000000e+00] |
| plamsjob.017 | 2.957275 | DZ                | None          | -1316.30        | H        | 0.42084716070260286 | [-0.81250923 -0.19167629  0.        ]             |
| plamsjob.017 | 2.957275 | DZ                | None          | -1316.30        | H        | 0.4208393643711281  | [ 0.8127263  -0.19067148  0.        ]             |
| plamsjob.018 | 4.603972 | TZP               | PBE           | -1363.77        | O        | -0.6739805275850443 | [-2.46726007e-04  4.01580956e-01  0.00000000e+00] |
| plamsjob.018 | 4.603972 | TZP               | PBE           | -1363.77        | H        | 0.33698188085180536 | [-0.76455997 -0.2012764   0.        ]             |
| plamsjob.018 | 4.603972 | TZP               | PBE           | -1363.77        | H        | 0.33699864673323343 | [ 0.76480669 -0.20030455  0.        ]             |

For more nested values, the depth of expansion can also be selected to further flatten the data.

In [29]:
(ja_adf
 .add_field("Coord", lambda j: [("x", "y", "z") for _ in j.results.get_main_molecule()], expansion_depth=2)
 .expand_field("AtomCoords", depth=2)
 .display_table())

| Name         | CPUTime  | InputAdfBasisType | InputAdfXcGga | Energy [kJ/mol] | AtomType | Charge              | AtomCoords              | Coord |
|--------------|----------|-------------------|---------------|-----------------|----------|---------------------|-------------------------|-------|
| plamsjob.017 | 2.957275 | DZ                | None          | -1316.30        | O        | -0.8416865250737331 | -0.00021706211955194217 | x     |
| plamsjob.017 | 2.957275 | DZ                | None          | -1316.30        | O        | -0.8416865250737331 | 0.38234777653349844     | y     |
| plamsjob.017 | 2.957275 | DZ                | None          | -1316.30        | O        | -0.8416865250737331 | 0.0                     | z     |
| plamsjob.017 | 2.957275 | DZ                | None          | -1316.30        | H        | 0.42084716070260286 | -0.8125092343354401     | x     |
| plamsjob.017 | 2.957275 | DZ                | None          | -1316.30        | H        | 0.42084716070260286 | -0.19167629390344054    | y     |
| plamsjob.017 | 2.957275 | DZ                | None          | -1316.30        | H        | 0.42084716070260286 | 0.0                     | z     |
| plamsjob.017 | 2.957275 | DZ                | None          | -1316.30        | H        | 0.4208393643711281  | 0.8127262964549918      | x     |
| plamsjob.017 | 2.957275 | DZ                | None          | -1316.30        | H        | 0.4208393643711281  | -0.19067148263005784    | y     |
| plamsjob.017 | 2.957275 | DZ                | None          | -1316.30        | H        | 0.4208393643711281  | 0.0                     | z     |
| plamsjob.018 | 4.603972 | TZP               | PBE           | -1363.77        | O        | -0.6739805275850443 | -0.00024672600727009935 | x     |
| plamsjob.018 | 4.603972 | TZP               | PBE           | -1363.77        | O        | -0.6739805275850443 | 0.40158095623473306     | y     |
| plamsjob.018 | 4.603972 | TZP               | PBE           | -1363.77        | O        | -0.6739805275850443 | 0.0                     | z     |
| plamsjob.018 | 4.603972 | TZP               | PBE           | -1363.77        | H        | 0.33698188085180536 | -0.7645599672263915     | x     |
| plamsjob.018 | 4.603972 | TZP               | PBE           | -1363.77        | H        | 0.33698188085180536 | -0.2012764045590436     | y     |
| plamsjob.018 | 4.603972 | TZP               | PBE           | -1363.77        | H        | 0.33698188085180536 | 0.0                     | z     |
| plamsjob.018 | 4.603972 | TZP               | PBE           | -1363.77        | H        | 0.33699864673323343 | 0.7648066932336616      | x     |
| plamsjob.018 | 4.603972 | TZP               | PBE           | -1363.77        | H        | 0.33699864673323343 | -0.20030455167568945    | y     |
| plamsjob.018 | 4.603972 | TZP               | PBE           | -1363.77        | H        | 0.33699864673323343 | 0.0                     | z     |

Expansion can be undone with the corresponding `collapse` method. 

Fields can be also further filtered, modified or reordered to customize the analysis.

In [42]:
(ja_adf
 .collapse_field("AtomCoords")
 .collapse_field("Coord")
 .filter_fields(lambda vals: all([not isinstance(v, list) for v in vals]))  # remove arrays
 .remove_name_field()
 .format_field("CPUTime", ".2f")
 .format_field("Charge", ".4f")
 .rename_field("InputAdfBasisType", "Basis")
 .reorder_fields(["AtomType", "Charge", "Energy"])
 .display_table())

| AtomType | Charge  | Energy [kJ/mol] | CPUTime | Basis | InputAdfXcGga |
|----------|---------|-----------------|---------|-------|---------------|
| O        | -0.8417 | -1316.30        | 2.96    | DZ    | None          |
| H        | 0.4208  | -1316.30        | 2.96    | DZ    | None          |
| H        | 0.4208  | -1316.30        | 2.96    | DZ    | None          |
| O        | -0.6740 | -1363.77        | 4.60    | TZP   | PBE           |
| H        | 0.3370  | -1363.77        | 4.60    | TZP   | PBE           |
| H        | 0.3370  | -1363.77        | 4.60    | TZP   | PBE           |

### Extracting Analysis Data

Analysis data can be extracted in a variety of ways.

As has been demonstrated, a visual representation of the table can be easily generated using the `to_table` method (or `display_table` in a notebook).
The format can be selected as markdown, html or rst. This will return the data with the specified display names and formatting.

In [45]:
print(ja_adf.to_table(fmt="rst"))

+----------+---------+-----------------+---------+-------+---------------+
| AtomType | Charge  | Energy [kJ/mol] | CPUTime | Basis | InputAdfXcGga |
| O        | -0.8417 | -1316.30        | 2.96    | DZ    | None          |
+----------+---------+-----------------+---------+-------+---------------+
| H        | 0.4208  | -1316.30        | 2.96    | DZ    | None          |
+----------+---------+-----------------+---------+-------+---------------+
| H        | 0.4208  | -1316.30        | 2.96    | DZ    | None          |
+----------+---------+-----------------+---------+-------+---------------+
| O        | -0.6740 | -1363.77        | 4.60    | TZP   | PBE           |
+----------+---------+-----------------+---------+-------+---------------+
| H        | 0.3370  | -1363.77        | 4.60    | TZP   | PBE           |
+----------+---------+-----------------+---------+-------+---------------+
| H        | 0.3370  | -1363.77        | 4.60    | TZP   | PBE           |
+----------+---------+---

Alternatively, raw data can be retrieved via the `get_analysis` method, which returns a dictionary of analysis keys to values.

In [48]:
print(ja_adf.get_analysis())

{'AtomType': ['O', 'H', 'H', 'O', 'H', 'H'], 'Charge': [-0.8416865250737331, 0.42084716070260286, 0.4208393643711281, -0.6739805275850443, 0.33698188085180536, 0.33699864673323343], 'Energy': [-1316.2997406426532, -1316.2997406426532, -1316.2997406426532, -1363.766294275197, -1363.766294275197, -1363.766294275197], 'CPUTime': [2.957275, 2.957275, 2.957275, 4.603972, 4.603972, 4.603972], 'InputAdfBasisType': ['DZ', 'DZ', 'DZ', 'TZP', 'TZP', 'TZP'], 'InputAdfXcGga': [None, None, None, 'PBE', 'PBE', 'PBE']}


Data can also be easily written to a csv file using `to_csv_file`, to be exported to another program.

In [52]:
csv_name = "./tmp.csv"
ja_adf.to_csv_file(csv_name)

with open(csv_name) as csv:
    print(csv.read())

AtomType,Charge,Energy,CPUTime,InputAdfBasisType,InputAdfXcGga
O,-0.8416865250737331,-1316.2997406426532,2.957275,DZ,
H,0.42084716070260286,-1316.2997406426532,2.957275,DZ,
H,0.4208393643711281,-1316.2997406426532,2.957275,DZ,
O,-0.6739805275850443,-1363.766294275197,4.603972,TZP,PBE
H,0.33698188085180536,-1363.766294275197,4.603972,TZP,PBE
H,0.33699864673323343,-1363.766294275197,4.603972,TZP,PBE



Finally, for more complex data analysis, the results can be converted to a [pandas](https://pandas.pydata.org) dataframe. This is recommended for more involved data manipulations, and can be installed using amspackages i.e. using the command: `"${AMSBIN}/amspackages" install pandas`.

In [55]:
try:
    import pandas
    
    df = ja_adf.to_dataframe()
    print(df)

except ImportError:
    
    print("Pandas not available. Please install with amspackages to run this example '${AMSBIN}/amspackages install pandas'")

  AtomType    Charge       Energy   CPUTime InputAdfBasisType InputAdfXcGga
0        O -0.841687 -1316.299741  2.957275                DZ          None
1        H  0.420847 -1316.299741  2.957275                DZ          None
2        H  0.420839 -1316.299741  2.957275                DZ          None
3        O -0.673981 -1363.766294  4.603972               TZP           PBE
4        H  0.336982 -1363.766294  4.603972               TZP           PBE
5        H  0.336999 -1363.766294  4.603972               TZP           PBE


### Additional Analysis Methods

The `JobAnalysis` class does have some additional built in methods to aid with job analysis.

For example, the `get_timeline` and `display_timeline` methods show pictorially when jobs started, how long they took to run and what their status is.

This can be useful for visualising the dependencies of jobs.

In [59]:
ja.display_timeline(fmt="html")

JobName,↓2025-02-07 13:52:48,↓2025-02-07 13:53:03,↓2025-02-07 13:53:18,↓2025-02-07 13:53:32,↓2025-02-07 13:53:47,WaitDuration,RunDuration,TotalDuration
plamsjob,=*>,,,,,0s,0s,1s
plamsjob.002,==========>,,,,,0s,6s,7s
plamsjob.003,====================,=>,,,,0s,15s,16s
plamsjob.004,==>,,,,,0s,1s,1s
plamsjob.005,====================,====================,=====>,,,0s,32s,33s
plamsjob.006,====================,====================,====================,===================*,>,0s,58s,58s
plamsjob.007,=*>,,,,,0s,0s,1s
plamsjob.008,=======>,,,,,0s,4s,5s
plamsjob.009,--=========>,,,,,0s,7s,8s
plamsjob.010,..=>,,,,,1s,0s,2s
