In [1]:
class Atom:
    def __init__(self, label, Z, x, y, z):
        self.label = label
        self.Z = Z
        self.x = x
        self.y = y
        self.z = z

class Molecule:
    def __init__(self, atoms, charge, nbasis, max_nc):
        # atoms will be a list of Atom objects
        self.atoms = atoms
        self.charge = charge
        self.nbasis = nbasis
        self.max_nc = max_nc

The following two classes, Atom and Molecule, provide the fundamental data structures used to store information from the input files.

The Atom class represents a single atomic center in the molecule:  
-label: the chemical symbol of the atom (e.g., "H", "Li").  
-Z: the nuclear charge, expressed as an integer.  
-x, y, z: the Cartesian coordinates (in Ångström) of the atom in 3D space.  

The Molecule class represents the complete molecular system described by the input file:  
-atoms: a Python list of Atom objects, in the same order they appear in the input file.  
-charge: the total molecular charge.  
-nbasis: the total number of basis functions used for the SCF calculation.  
-max_nc: the maximum number of primitive Gaussian functions contracted into any single basis function.

In [None]:
def _find_index(lines, pattern):
    """
    Searches the list 'lines' for the first line that starts with 'pattern',
    ignoring uppercase/lowercase differences.
    """
    pattern_lower = pattern.lower()

    for i, line in enumerate(lines):
        if line.lower().startswith(pattern_lower):
            return i

    raise ValueError("No line was found starting with: " + pattern)

The function _find_index is used to locate specific header lines inside the Hartree–Fock input files. The input files are structured in human-readable blocks (e.g., "number of atoms", "Overall charge").  

pattern_lower = pattern.lower()  
Converts the search pattern to lowercase. This allows the function to ignore differences in uppercase/lowercase when matching.

for i, line in enumerate(lines):  
Iterates over each line in the file, keeping track of both the line content (line) and its position (i).

if line.lower().startswith(pattern_lower):  
If the line begins with the pattern, the function returns the corresponding index.

In [None]:
def read_basic_input(path):
       with open(path, "r") as f:
        lines = []
        for line in f:                
            stripped = line.strip()     
            if stripped:                
                lines.append(stripped)

This block is responsible for reading the file and cleaning the raw text lines.

with open(path, "r") as f:  
Opens the file located at path in read mode ("r").

lines = []  
Initializes an empty list that will store all the non-empty lines.

for line in f:  
Iterates over every line in the file object f.

stripped = line.strip()  
Removes the whitespace characters (spaces, tabs, newline characters) that go before and after from each line.

if stripped:  
This condition is True only if the line is not empty after stripping.

lines.append(stripped)  
Adds the cleaned, non-empty line to the lines list.

Result:
At the end of this block, lines is a list of strings where all empty lines have been removed, each line has no extra spaces at the beginning or end.

In [None]:
idx_na = _find_index(lines, "number of atoms")
    natoms = int(lines[idx_na + 1])

This block extracts the number of atoms from the input.

idx_na = _find_index(lines, "number of atoms")  
Calls the function _find_index to search for the line that starts with "number of atoms". The function returns the index of that line in the lines list.

natoms = int(lines[idx_na + 1])  
The number of atoms is on the next line after the header. So we read line[idx_na + 1], and convert it to an integer using int(...).

In [None]:
idx_atoms_header = _find_index(lines, "Atom labels")
    first_atom_line = idx_atoms_header + 1

Now we locate the start of the atomic coordinates block.

idx_atoms_header = _find_index(lines, "Atom labels")
Finds the line that starts with "Atom labels".
Again, the search is case-insensitive and returns the index of that header line.

first_atom_line = idx_atoms_header + 1
The first atom is defined on the line immediately after this header, so we add 1 to move to the first data line.

In [None]:
 atoms = []
    for j in range(natoms):
        parts = lines[first_atom_line + j].split()
        label = parts[0]
        Z = int(parts[1])
        x = float(parts[2])
        y = float(parts[3])
        z = float(parts[4])
        atom = Atom(label, Z, x, y, z)
        atoms.append(atom)

This loop reads each atom line and creates Atom objects.

atoms = []  
Initializes an empty list that will store the Atom instances.

for j in range(natoms):
Loops over each atom index from 0 to *(natoms - 1)*. There should be exactly *natoms* consecutive lines describing atoms.

parts = lines[first_atom_line + j].split()  
Takes the j-th atom line and splits it into separate components using whitespace as the separator.  
For example, a line like:  
"H 1 0.00000000 0.00000000 0.74080000"
becomes:  
["H", "1", "0.00000000", "0.00000000", "0.74080000"].  

label = parts[0]  
The first element is the chemical symbol.

Z = int(parts[1])  
The second element is the nuclear charge, converted to an integer. 

x = float(parts[2]), y = float(parts[3]), z = float(parts[4])  
The remaining elements are the Cartesian coordinates of the atom, converted to floating-point numbers.

atom = Atom(label, Z, x, y, z)  
Creates an *Atom* object using the previously defined *Atom* class.

atoms.append(atom)  
Adds the newly created *Atom* to the atoms list in *Molecule*.

At the end of this loop, *atoms* is a list containing one *Atom* object for each atom defined in the input file.

In [None]:
  idx_charge = _find_index(lines, "Overall charge")
    charge = int(lines[idx_charge + 1])

This block reads the total molecular charge.

idx_charge = _find_index(lines, "Overall charge")  
Finds the line that starts with "Overall charge".

charge = int(lines[idx_charge + 1])  
The value of the charge appears on the line immediately after the header. It is converted from string to integer.

In [None]:
 idx_nb = _find_index(lines, "Number of basis funcs")
    nbasis = int(lines[idx_nb + 1])

This block reads the total number of basis functions.

idx_nb = _find_index(lines, "Number of basis funcs")  
Locates the header for the number of basis functions.

nbasis = int(lines[idx_nb + 1])  
Reads the line immediately after the header, converts it to an integer, and stores it in nbasis.

In [None]:
    idx_maxnc = _find_index(lines, "Maximum number of primitives")
    max_nc = int(lines[idx_maxnc + 1])

This block reads the maximum number of primitives in any contracted basis function.

idx_maxnc = _find_index(lines, "Maximum number of primitives")  
Finds the header that marks this value.

max_nc = int(lines[idx_maxnc + 1])  
Reads the next line, converts it to an integer, and stores it in max_nc.

In [None]:
    mol = Molecule(atoms, charge, nbasis, max_nc)
    return mol

Finally, the function gathers all the data into a single *Molecule* object.

mol = Molecule(atoms, charge, nbasis, max_nc)  
Creates an instance of the *Molecule* class using:

the list of Atom objects (atoms),

the total charge (charge),

the number of basis functions (nbasis),

and the maximum number of primitives (max_nc).

return mol
Returns the constructed *Molecule*.