# $\text{Preparing } \textit{ab-initio} \text{ data}$

## Displacements using CFOUR

To create displaced geometries in `.xyz` format, we will follow these steps:

1. **Geometry Optimization and Frequency Calculation**:
   - Use **CFOUR** to optimize the molecular geometry and compute harmonic frequencies.
   - Extract the results from the `output` file generated by the harmonic frequency calculation.

2. **Generating Displacements**:
   - Utilize the `cfour_displacements.py` script located in the `Tools` directory to process the output and generate the displaced geometries in `.xyz` format.

### Configuring and Running the `cfour_displacements.py` Script

To generate displaced geometries using the `cfour_displacements.py` script, configure the following variables within the script:

```python
############## Variables to Change ###############

inputfile = ""  # Output file from CFOUR frequency calculation
natoms = 0      # Number of atoms in the molecule
is_linear =   # Indicate whether the molecule is linear (True or false)

min_displacement = 0.0  # Minimum displacement in dimensionless units
max_displacement = 0.0  # Maximum displacement in dimensionless units
step_displacement = 0.0  # Step size for displacement in dimensionless units

outputfile = ""  # Output directory for the displaced geometries
##################################################
```

After configuring the variables, execute the script using the following command:
```bash
python3 cfour_displacements.py
```

### Displaced Geometries
Upon executing the `cfour_displacements.py` script, a directory will be created containing all displaced geometries in `.xyz` format. The naming convention for these files is as follows:

```
geo_v{normal_mode}_{displacement}.xyz
```

- **`normal_mode`**: Indicates the vibrational normal mode.
- **`displacement`**: Specifies the displacement value in dimensionless units, including the sign for negative displacements.

For example, `geo_v1_-01.xyz` for a negative displacement and `geo_v1_01.xyz` for a positive displacement.

## Preparing ADCC Input Files

### Running ADCC Calculations
To process these geometry files and run ADCC calculations, you can use the provided Python script `run_adc.py`. This script takes a single geometry file as input, performs an SCF calculation using pyscf, computes excited states using adcc, and generates a molden file.

#### run_adc.py
```python
import numpy as np
import sys
import adcc
import pyscf
from pyscf.tools import molden

# Create the molden filename
def change_extension(filename):
    if filename.endswith('.xyz'):
        return filename[:-4] + '.molden'
    return filename

def main():
    # Run SCF in pyscf
    mol = pyscf.gto.M(
        atom=sys.argv[1],  # Take the geometry from the displaced folder
        basis="cc-pvdz"
    )

    scfres = pyscf.scf.RHF(mol)
    scfres.conv_tol = 1e-13
    scfres.kernel()

    adcc.set_n_threads(8)
    hf = adcc.ReferenceState(scfres)
    print(adcc.LazyMp(hf).energy(level=3))  # Print MP3 energy; for better precision, CCSD(T) could be used
    state = adcc.cvs_adc3(scfres, n_singlets=10, core_orbitals=4)
    print(state.describe())  # Print the data obtained
    print(state.describe_amplitudes())  # Print the transitions

    new_filename = change_extension(sys.argv[1])
    # Creating the .molden file
    with open(new_filename, 'w') as f1:
        molden.header(mol, f1)
        molden.orbital_coeff(mol, f1, scfres.mo_coeff, ene=scfres.mo_energy, occ=scfres.mo_occ)

if __name__ == "__main__":
    main()
```

### Submitting Jobs to the Cluster
To run the `run_adc.py` script on a cluster, you can use the following sbatch script. Replace the placeholders (X) with appropriate values for your system.

#### run_sbatch_adcc.sh
```bash
#!/bin/bash
#SBATCH --partition=X
#SBATCH --nodes=X
#SBATCH --ntasks-per-node=X
#SBATCH --time=XX:XX:XX
#SBATCH --mem=Xgb

new_filename="${1%.xyz}.out"  # This will create an output file
python3 run_adc.py ${1} > $new_filename
```

### Running Multiple Calculations
To efficiently run calculations for all displaced geometries, you can use the following bash script. This script iterates over all normal modes and displacements, submitting a separate job for each geometry file. Ensure that the filename formatting matches the output of `cfour_displacements.py`.

#### run_everything.sh
```bash
#!/bin/bash
nmodes=x
min_disp=x
max_disp=x
step=x
for nmode in {1..$nmodes}; do
    for ((disp=$min_disp; disp<=$max_disp; disp+=$step)); do
        if [ $disp -lt 0 ]; then
            inpgeo=$(printf "geo_v%d_%03d.xyz" $nmode $disp)
        else
            inpgeo=$(printf "geo_v%d_%02d.xyz" $nmode $disp)
        fi
        sbatch run_sbatch_adcc.sh $inpgeo
    done
done
```

**Note**: Adjust the values of `nmodes`, `min_disp`, `max_disp`, and `step` to match your specific calculation parameters. Also, verify that the filename formatting in the script (e.g., `geo_v%d_%03d.xyz` for negative displacements and `geo_v%d_%02d.xyz` for positive displacements) matches the naming convention used by `cfour_displacements.py`. For example, this script assumes filenames like `geo_v1_-01.xyz` and `geo_v1_01.xyz`.

### Extracting Data from ADC Output Files

. **Configure the Extraction Script**:
   - Open `Tools/extract_adcc.py` in a text editor.
   - Modify the configuration parameters as needed
   - Running it by executing
     ```bash
        python3 extract_adcc.py
     ```
   - This will create the data files `gs_v{mode}.dat` and `ce{state}_v{mode}.dat`, each file consisting of two columns, the displacement and the energy.
    