# Working with `ichor` files

The `ichor.core` package has classes implemented for file reading (and writing) of several input and output files of computational programs that are used (Gaussian, AIMAll, DLPOLY, etc.)

In [1]:
# all available classes (see ichor.core.files.__init__ file where __all__ is implemented)
from ichor.core import files

files.__all__

['INT',
 'INTs',
 'AIM',
 'GJF',
 'WFN',
 'GaussianOut',
 'Trajectory',
 'DlpolyHistory',
 'DlPolyField',
 'DlPolyConfig',
 'DlPolyControl',
 'DlPolyFFLUX',
 'DlPolyIQAEnergies',
 'DlPolyIQAForces',
 'FFLUXDirectory',
 'PandoraInput',
 'PointDirectory',
 'PointsDirectory',
 'XYZ',
 'Mol2',
 'PySCFDirectory',
 'MorfiDirectory',
 'PandoraDirectory',
 'ABINT']

## General implementation of file classes

File classes subclass from `ichor.core.files.file.ReadFile` and/or `ichor.core.files.file.WriteFile`.

When subclassing from `ReadFile`, a `_read_file` method must be defined, which is how the new file should be read and parsed. Ichor has lazy file reading implemented, meaning that a file will only be read when an attribute of the file instance is accessed, **not** when the file instance is created.

For example:

In [2]:
from ichor.core.files import GJF

# make instance of GJF, file has not been read yet
gjf_instance = GJF("../../../example_files/example_gjf.gjf")

# then you can access attributes
# in the background, the lazy file reading will check the value of the attribute
# and read the file if necessary
atoms = gjf_instance.atoms

print(atoms)

N1       1.30610788    -29.77550072     -0.39451506
H2       0.88322943    -29.08071028     -1.14190493
H3       1.46749713    -29.22282070      0.46703669
H4       2.11921902    -30.18852549     -0.75438182


The way the lazy reading works is by having a special `None` equivalent, defined as `FileContents` (which is an instance of `FileContentsType`. This essentially the same as `NoneType`). This allows us to check if an attribute is `FileContents`. If it is `FileContents`, then we first must read the file and then try to access the attribute value again. The implementation is found in `ReadFile.__getattribute__`, which is called every time an attribute is accessed.

Subclassing from `WriteFile` means a `_write_file` method must be implemented in the sub-class, which defines the format of the file that needs to be written out.

**Note that you can directly then call the `write` method (implemented in the `WriteFile`), which subsequently calls `_write_file`.**

# Example of Orca Input File definition

Below is an example implementation for ORCA input file. The implementation for the ORCA output file is identical, however there will be additional attributes that need to be defined (which will be the outputs of the calculation). Use the code below (or look at other implementations in ichor) as a reference to define new file formats.

There are several important things to note for file reading:

1. You must subclass from `ReadFile`, `WriteFile`, or both. This will allow you to use ichor's lazy file reading and other functionality implemented for files.
2. The `HasAtoms` class means that the `atoms` attribute is present in the class. This `atoms` attribute is an instance of `ichor.core.atoms.Atoms` class, which implements all the functionalities for calculating features, defining ALF, etc. If the atomic coordinates are found in the file somewhere, make sure to subclass from `HasAtoms` as well to utilize this functionality.
3. In the `__init__` method parameters, make sure any parameters that are going to be read from the file (or are going to be defined by the user) are set as `None`. Then, inside the `__init__` method, use definitions like this `self.charge: int = charge or FileContents`. What this allows you to do is to either give ORCA job options yourself (so these are going to be used, regardless of the definitions in an already present file ORCA input file) or if definitions are left as `None`, then options defined in the already present ORCA file will be read. If the ORCA input file is not present on disk, then sensible default options will be written out to the file.
4. You must set the `filetype` method, which returns the extension of the file. This is used to determine the file type in other parts of the ichor code.
5. Use the `_set_write_defaults_if_needed` method to set the defaults which are going to be used if writing a new file.
6. Implement the `_read_file` and `_write_file` methods which read and write the file respectively. These definitions depend if the file is intended to be read only, written only, or both. Note that the `ichor.core.files.file.File` class implements the `read` and `write` methods, which can be used directly for reading and writing files instead. Note that you will likely not need to use `read` method directly, as accessing any attribute (which contains information from a file) will automatically read the file.

Note that you must provide the molecular structure yourself, i.e. you must set the `atoms` attribute to an `ichor.core.atoms.Atoms` instance, which contains the molecular structure. Otherwise, you will not be able to write the file to disk (and will also not be a correct ORCA input file because ORCA needs an input structure).

In [None]:
from pathlib import Path
from typing import Dict, List, Optional, Union

from ichor.core.atoms import Atom, Atoms
from ichor.core.common.functools import classproperty
from ichor.core.common.pairwise import pairwise

# from enum import Enum
from ichor.core.files.file import File, FileContents, ReadFile, WriteFile
from ichor.core.files.file_data import HasAtoms


class OrcaInput(ReadFile, WriteFile, File, HasAtoms):
    """

    Wraps around an ORCA input file that is used as input to ORCA.

    :param path: A string or Path to the ORCA input file file. If a path is not give,
        then there is no file to be read, so the user has to write the file contents. If
        no contents/options are written by user, they are written as the default values in the
        ``write`` method.
    :param main_input: A list of strings which are commands beginning with !
        charge: Optional[int] = None,
        spin_multiplicity: Optional[int] = None,
        atoms: Optional[Atoms] = None,
        **input_blocks: Dict[str, Union[str, List[str]]]
    :param charge: The charge of the system
    :param spin_multiplicity: The spin multiplicity of the system
    :param atoms: An Atoms instance that contains the molecular structure
    :param input_blocks: A dictionary consisting of keys: The option,
        and values: A list containing even number of elements. The option
        is going to be written out with a %, followed by the specifications
        that the user gives for the option

    .. note::
        There is no checking of what the inputs are, so it is up to the user to make sure
        that the inputs are correct.

    .. note::
        Gaussian uses a different b3lyp version (https://sites.google.com/site/orcainputlibrary/dft-calculations)
        so use b3lyp/g (this is the Gaussian implementation) instead of b3lyp

    References:
        https://sites.google.com/site/orcainputlibrary/home
        https://www.cup.uni-muenchen.de/oc/zipse/teaching/computational-chemistry-2/topics/a-typical-orca-output-file/
        https://www.orcasoftware.de/tutorials_orca/first_steps/input_output.html
        https://www.afs.enea.it/software/orca/orca_manual_4_2_1.pdf (note this is for version 4, not 5)
        version 5 manual, needs login:
        available in https://orcaforum.kofo.mpg.de/app.php/dlext/?view=detail&df_id=186
        https://orcaforum.kofo.mpg.de/viewtopic.php?f=8&t=7470&p=32102&hilit=atomic+force#p32102
    """

    def __init__(
        self,
        path: Union[Path, str],
        main_input: Optional[List[str]] = None,
        charge: Optional[int] = None,
        spin_multiplicity: Optional[int] = None,
        atoms: Optional[Atoms] = None,
        **input_blocks: Dict[str, Union[str, List[str]]],
    ):
        File.__init__(self, path)

        # the main input contains lines starting with !
        self.main_input: List[str] = main_input or FileContents

        self.charge: int = charge or FileContents
        self.spin_multiplicity: int = spin_multiplicity or FileContents
        self.atoms = atoms or FileContents

        # any other input blocks specified by %
        self.input_blocks = input_blocks or FileContents

    @classproperty
    def filetype(self) -> str:
        return ".orcainput"

    def _read_file(self):

        with open(self.path, "r") as f:
            # assume the first lines start with !
            # the next lines are optional commands with %
            # and finally the inputs

            line = next(f)

            main_input = []
            # these are lines that contain things like method and basis set
            # since these can be on multiple lines
            while line.strip().startswith("!"):
                line = line.lower()
                line = line.strip().replace("!", "")
                line_splits = line.split()
                for s in line_splits:
                    main_input.append(s)
                line = next(f)

            input_blocks = {}
            # while we are not at the geometries
            while not line.strip().startswith("*"):
                line = line.lower()
                one_option = []
                # now read in optional input with %
                while "end" not in line.strip():
                    one_option.append(line)
                    line = next(f)
                    line = line.lower()
                line = next(f)

                option_name, *other_options = (
                    one_option[0].strip().replace("%", "").split()
                )
                input_blocks[option_name] = other_options
                for other_lines in one_option[1:]:
                    input_blocks[option_name] += other_lines.strip().split()

            charge, spin_multiplicity = map(int, line.strip().split()[-2:])
            line = next(f)
            atoms = Atoms()
            while line.strip() and line.strip() != "*":
                atoms.append(Atom(*line.split()))
                try:
                    line = next(f)
                except StopIteration:
                    break

        self.main_input = self.main_input or main_input
        self.input_blocks = self.input_blocks or input_blocks
        self.charge = self.charge or charge
        self.spin_multiplicity = self.spin_multiplicity or spin_multiplicity
        self.atoms = self.atoms or atoms

    def _set_write_defaults_if_needed(self):
        """Set default values for attributes if bool(self.attribute) evaluates to False.
        So if an attribute is still FileContents, an empty string, an empty list, etc.,
        then default values will be used."""

        # aim for wfn file
        # nousesym to not use symmetry
        # normalprint for printing out to the outputfile
        # engrad calculate energy and (analytical) gradient
        self.main_input = self.main_input or [
            "b3lyp/g",
            "6-31+g(d,p)",
            "nousesym",
            "aim",
            "normalprint",
            "engrad",
        ]

        # have to have default values for input blocks
        self.input_blocks = {}

        self.charge = self.charge or 0
        self.spin_multiplicity = self.spin_multiplicity or 1

    def _check_values_before_writing(self):
        """Basic checks done prior to writing file.

        .. note:: Not everything written to file can be checked for, so
        there is still the need for a user to check out what is being written.
        """

        if len(self.atoms) == 0:
            raise ValueError("There are no atoms to write to orca input file.")

    def _write_file(self, path: Path, *args, **kwargs):
        fmtstr = "12.8f"

        with open(path, "w") as f:
            for m in self.main_input:
                f.write(f"!{m}\n")
            for k, vals in self.input_blocks.items():
                f.write(f"%{k}\n")
                for v in pairwise(vals):
                    f.write(f"    {v[0]} {v[1]}\n")
                f.write("end")
            f.write("\n")
            f.write(f"* xyz {self.charge} {self.spin_multiplicity}\n")
            for atom in self.atoms:
                f.write(
                    f"{atom.type} {atom.x:{fmtstr}} {atom.y:{fmtstr}} {atom.z:{fmtstr}}\n"
                )
            f.write("*")