# PFAS Radicals: A Quantum Chemistry Perspective  
---
In this lab exercise, you will:  
  
>  
>**Module 1**  
>  
>1. Model the equilibrium geometry and IR frequencies of the CH3 radical and compare to the CF3 radical  
>2. Develop a Python function to parse data from ORCA output files  
>
>**Module 2**  
>  
>3. Scan the X-C-X bond angle from 120 degrees (trigonal planar) to 109.5 degrees (tetrahedral) and compare results between the CH3 radical and the CF3 radical  
>4. Develop a Python function to write and edit ORCA input files  
>
>**Module 3**  
>  
>5. Use Python to analyze results from your ORCA calculations  
>  
  
This module will cover items **3** and **4**.

## Module 2: Coordinate Scans  
---  
By now, you should have seen that the CH3 radical and the CF3 radical do not have the same symmetry. To explore this further, you will perform a relaxed coordinate scan along the X-C-X bond angle.  
  
## Coordinate Scans - The Basics  
Scanning bond lengths, bond angles or dihedrals is a common technique used in computational chemistry. This technique is often used to identify transition states or barriers to rotation, and some form of automated coordinate scan is implemented in most quantum mechanics modeling softwares. Scans can be performed as a **relaxed** coordinate scan, or a **rigid** coordinate scan. **Rigid** coordinate scans iteratively change only the bond length, angle or dihedral requested and maintain all other dimensions fixed during the course of the scan. In contrast, **relaxed** coordinate scans iteratively update the scanning dimension while allowing all other dimensions to 'relax" to the lowest energy structure under the given constraint. We will use relaxed scans to allow the C-X bond length to change to minimize the energy of the resulting structures.  

For your system, you will be using an angle constraint to force the X-C-X bond angle to your set input. Rather than using ORCA's built-in coordinate scanning tool, we will generate a separate input file for each angle in the sweep. This will force you to flex your matrix algebra and python skills, and eliminate the need for a new output file parsing function (ORCA coordinate scan output files are a different format). For most normal uses, ORCA's built-in coordinate scan works just fine, and it will result in faster computations because it will autiomatically use guess parameters fromt eh previous step's wavefunction.
  
To add a geometric constraint to your input file, you will need to modify your input file by adding the following code:  
  
```  
%geom Constraints  
    { A * 0 * C }   
end  
```  
  
This constraint sets the angle (A) between atom 0 (this should be your carbon atom) and all other atoms to be constant (C). Running an **opt freq** calculation with this constraint will allow the C-X bond lengths to change to minimize the energy of the system, but will not allow the X-C-X bond angle to change. This means that the bond angle of the starting geometry will be the bond angle of the optimized geometry.  
To perform a bond angle scan, you will need to write a python code that can generate starting geometries with any bond angle you select.  

## Creating starting geometries  
You will write a python function that takes an input X-C-X bond angle and write the coordinates for each atom.  
You know:  
* Your system contains 4 atoms  
* You know the equilibrium bond lengths  
* C-X bond will be 120 degrees apart when projected onto the plane formed by the three X atoms, regardless of the X-C-X bond angle  
  
Similar to the last unit, you will need to complete the missing custom functions `calculate_z_displacement()` and `scale_bond_lengths()` in the following code:  
\*note the helper function `check_bond_angles()` has been written for you

```python
def generate_geometry(bond_angle_deg : float, bond_length : float) -> list:

    # generate geometry using simple scaled diagram
    z_displacement = calculate_z_displacement(bond_angle_deg)
    assert z_displacement >= 0, "The z displacement is either nan or less than zero. Check that you are not taking the square root of a negative number."

    # place heteroatoms 120 degrees apart in the xy plane
    heteroatom_xyz_positions = [np.array([1, 0, 0]), np.array([-np.cos(np.deg2rad(60)), np.sin(np.deg2rad(60)), 0]), \
                                np.array([-np.cos(np.deg2rad(60)), -np.sin(np.deg2rad(60)), 0])]
    # place the carbon atom at the calculated z displacement
    carbon_xyz_position = np.array([0, 0, z_displacement])

    # adjust bond lengths to the equilibrium lengths
    heteroatom_xyz_positions = scale_bond_lengths(bond_length, carbon_xyz_position, heteroatom_xyz_positions)
    assert False not in check_bond_angles(bond_angle_deg, heteroatom_xyz_positions), "There actual angle is not equal to the input angle.\n\t\t \
    Check the z displacement calculation and ensure you are not changing the angles when scaling bond lengths."

    return heteroatom_xyz_positions
```
  
You may find the figure below provides a useful starting point:  

<img src="../images/scan_geometry.png" alt="geometry" style="width: 50%;"/>


##### Write your functions below  
Fill in the code with the needed elements. You should perform some hand calculations to test that it is working properly. Make sure to run the cells before proceeding.

In [None]:
import numpy as np

#### START YOUR CODE HERE ####
def calculate_z_displacement(bond_angle_deg : float) -> float:
    bond_angle_rad = np.deg2rad(bond_angle_deg)
    z_displacement =  # write this line
    return z_displacement
#### END YOUR CODE HERE ###

#### START YOUR CODE HERE ####
def scale_bond_lengths(bond_length : float, carbon_xyz_position, heteroatom_xyz_position_list : list) -> list:
    new_bond_vectors = []
    for heteroatom_position in heteroatom_xyz_position_list:
            bond_vector = # write this line
            bond_unit_vector =  # write this line
            new_bond_vectors.append(bond_unit_vector * bond_length)
    return new_bond_vectors
#### END YOUR CODE HERE ####

Run, but do not edit this code:

In [1]:
from math import isclose

def generate_geometry(bond_angle_deg : float, bond_length : float) -> list:

    # generate geometry using simple scaled diagram
    z_displacement = calculate_z_displacement(bond_angle_deg)
    assert z_displacement >= 0, "The z displacement is either nan or less than zero. Check that you are not taking the square root of a negative number."

    # place heteroatoms 120 degrees apart in the xy plane
    heteroatom_xyz_positions = [np.array([1, 0, 0]), np.array([-np.cos(np.deg2rad(60)), np.sin(np.deg2rad(60)), 0]), \
                                np.array([-np.cos(np.deg2rad(60)), -np.sin(np.deg2rad(60)), 0])]
    # place the carbon atom at the calculated z displacement
    carbon_xyz_position = np.array([0, 0, z_displacement])

    # adjust bond lengths to the equilibrium lengths
    heteroatom_xyz_positions = scale_bond_lengths(bond_length, carbon_xyz_position, heteroatom_xyz_positions)
    assert False not in check_bond_angles(bond_angle_deg, heteroatom_xyz_positions), "There actual angle is not equal to the input angle.\n\t\t \
    Check the z displacement calculation and ensure you are not changing the angles when scaling bond lengths."

    return heteroatom_xyz_positions

# helper function to check if the input bond angle is the same as our calculated bond angle
def check_bond_angles(bond_angle_deg : float, bond_vectors : float) -> list:
    n = len(bond_vectors)
    bond_vectors = bond_vectors + bond_vectors
    results = []
    for bond_index in range(n):
        dot_product = np.dot(bond_vectors[bond_index], bond_vectors[bond_index + 1])
        norms = np.linalg.norm(bond_vectors[bond_index]) * np.linalg.norm(bond_vectors[bond_index + 1])
        actual_bond_angle = np.rad2deg(np.arccos(dot_product / norms))
        results.append(isclose(bond_angle_deg, actual_bond_angle, abs_tol=4))
    return results

## Writing input files  
Now that we can calculate the xyz coordinate of our atoms for any bond angle we want, we need to write input files for each bond angle.  
  
We can start by making the text for the input file using the following code (do not edit this code):  

In [None]:
def make_input_file_text(heteroatom : str, bond_angle_deg : float, bond_length : float) -> str:

    # generate coordinates with your custom function
    heteroatom_xyz_positions = generate_geometry(bond_angle_deg, bond_length)

    # format coordinates
    formatted_lines = []
    for heteroatom_coordinates in heteroatom_xyz_positions:
        atom_line = heteroatom + " " + " ".join(([ "{:0.10f}".format(coordinate) for coordinate in heteroatom_coordinates]))
        formatted_lines.append(atom_line)
    heteroatom_section = "\n".join(formatted_lines)

    angle = "{:0.1f}".format(bond_angle_deg)

    input_file_text = (f"""
    ! UKS TightSCF wB97x-D3 def2-TZVPD xyzfile opt freq

    # C{heteroatom}3 radical {angle} degrees fixed

    %geom Constraints
            {{ A * 0 * C }}
        end
    end
    * xyz 0 2
    C 0.00 0.00 0.00
    {heteroatom_section}
    *
    """).strip()
    
    input_file_text = format_text_indentation(input_file_text)

    return input_file_text


# helper function to make the output files look pretty
def format_text_indentation(text : str) -> str:
    newlines= []
    for line in text.split("\n"):
        if line[:4] == ' '*4 :
            newlines.append(line[4:])
        else:
            newlines.append(line)
    return "\n".join(newlines)

You may notice that the input file text looks a little different than the examples we showed earlier. For one, there are extra culy braces {} in the geometry constraints section. This is needed so that python does not try to replace the text inside the braces as it does with the `{heteroatom}` and `{angle}`, and it won't show up in the final text. Try generating an input file below:

In [None]:
make_input_file_text("F", 109.5, 1.31)

Now that we have made the input file text, we need to write the output to a file. The function below should enable this capability. Files will be written to the directory where you run this code. Run this cell but do not edit this code.

In [None]:
def write_input_file(heteroatom : str, bond_angle_deg : float, bond_length : float):

    # create a meaningful filename
    filename = f"C{heteroatom}3_radical_{bond_angle_deg}_degrees.inp"
    
    # make the inputfile text
    input_file_text = make_input_file_text(heteroatom, bond_angle_deg, bond_length)

    with open(filename, "w") as f:
        f.write(input_file_text)

Test that the function works in the cell below:

In [None]:
# make an input file for the CH3 radical at a bond angle of 100 degrees here:
write_input_file()

A new file should have appeared in your current directory. 

We can now write an input file for any bond angle we want!  
There is one final step before we can start submiting calculations - automatically generating the files for your chosen scan.  

##### Write your function below  
You will write a function that takes in the heteroatom, bond angle and bond length similar to the `write_input_file()` function. Your function will also take in the minimum and maximum angles you want to scan over, and the step.  

In [None]:
#### START YOUR CODE HERE ####
def write_coordinate_scan_input_files(heteroatom, bond_length, low : float, high : float, step : float):
    # look up the documentation for np.arange() - this will be useful
    scan_angles = # write this line
    for angle in scan_angles:
        write_input_file(heteroatom, angle, bond_length)
#### END YOUR CODE HERE ####

### Putting it all together  
---
Assuming you have properly implemented the custom functions and ran the cells above, you should be able to generate the ORCA input files for all of the jobs in your scan. 
  
**Generate the input files using the cell below and submit your calculations to ORCA.** When your calculations are complete, copy the output files to a common folder and continue with this notebook. Your instructor may have guidance on which angles to scan and an appropriate step distance.

In [None]:
# generate your orca input files here
write_coordinate_scan_input_files("F", )
write_coordinate_scan_input_files("H", )

With your output files in a common folder, you should be able to parse the data from each file using the `parse_outfile()` code you wrote in the previous module. Rather than applying this fucntion manually to each file, we can write a helper function to loop through all of the outfiles in our folder and write the data to a single file. Run the cells below to continue but do not edit the code.

In [10]:
import os
import sys; sys.path.insert(0, '..') # required for relative path import into a notebook
from utilities import module1_functions as m1 # import the function you wrote from the previous module
import pandas as pd

def parse_outfiles_from_folder(folderpath : str) -> list:
    folder_data = []
    files = os.listdir(folderpath)
    outfiles = list(filter(lambda ext: ".out" in ext, files))
    for file in outfiles:
        print(f"NOW PARSING {file}")
        filepath = os.path.join(folderpath, file)
        data_names, data = m1.parse_outfile(filepath) # use the parsing fucntion from the previous module
        file_data = [file] + data
        folder_data.append(file_data)
    print("PARSING COMPLETE")
    folder_data = [['filename', *data_names]] + folder_data
    return folder_data

def write_data_to_csv(folder_data : list):
    dataframe = pd.DataFrame(folder_data[1:], columns=folder_data[0])
    dataframe.sort_values(by=['heteroatom', 'bond_angle[deg]'], inplace=True)
    dataframe.to_csv("./summary_data.csv", index=False)

To use the helper functions above, you can run the code below. If you placed your logfiles in the same directory as this notebook, you can replace "outfiles/" with the name of the folder where you put the output files. Otherwise, enter the path to the folder.

In [None]:
data = parse_outfiles_from_folder("./outfiles/")
write_data_to_csv(data)

This should create a new file in your current directory called "summary_data.csv" that contains all of the data parsed from every file in the scan.

#### Congratulations!  
You have completed module 2. Please proceed to [module 3](./module3_analysis.ipynb).