# Ground State Energy Estimation for Photosynthesis with Quantum Circuits

Enhancing our understanding of artificial photosynthesis could offer immediate solutions to the ever-growing energy demand and climate change challenges. Artificial photosynthesis aims to replicate and optimize the natural process of converting sunlight, water, and carbion dioxide into energy-rich fuels, resulting in a more sustainable and carbon-neutral energy cycle.

Artificial photosynthesis is a multi-step process that begins with the absorption of sunlight, leading to charge separation and the oxidation of water ($H_{2}0$), which produces oxygen ($O_2$), protons ($H^+$), and electrons ($e^-$).

The electrons and protons extracted from the previous step are used for $CO_2$ reduction to faciliate the production of fuels. This involves reducing $CO_2$ into either CO, hydrocarbons like methane, or other carbohydrates such as glucose. 

Water oxidation reaction:
$$ 2H_20 \rightarrow O_2 + 4(H^+)+4e^- $$

$CO_2$ reduction reactions:

\begin{gather*}
\text{Reduction I}:   CO_2 + 2(H^+) + 2e^- \rightarrow CO + H_2O \\
\text{Reduction II}:  CO_2 + 6(H^+) + 6e^- \rightarrow CH_3OH + H_2O \\
\text{Reduction III}: CO_2 + 8(H^+) + 8e^- \rightarrow CH_4 +2H_2O
\end{gather*}

In this notebook, we illustrate the process of catalyst down-selection for the water oxidation reaction. The ultimate goal is to estimate the ground state electronic structure per transition state to calculate the activation energy barriers for the reaction pathway. The activation energy represents an important quantity that is crucial for determining rate constant of the reaction using the Arrhenius law.

There are five main steps to estimating the ground state energy of the reaction:

1. Generate the electronic hamiltonian in the second-quantized form (also known as molecular hamiltonian) for each state of the reaction pathway. Using a second-quantized formulation results in a simpler hamiltonian as it's easier to impose particle symmetry. Additionally, it's much easier to prepare the initial states on a quantum computer. However, recent work indicates that there may be advantages in gate complexity using a first quantization formulation [[1]](https://arxiv.org/abs/2105.12767). 
2. Prepare an initial state that provides sufficient overlap with the true ground states, boosting the success probability of the phase estimation for the reactive system. One way how this can be achieved is through a linear combination of the HF state and selected configuration interaction singles (CIS) states. Performing the initialization in this fashion should allign the initial state's energy to a low enough energy subspace of the quantum system. These product states can be prepared by applying local unitary rotations to end up with some possible state such as |0..011..1>
3. Perform a mapping between Qubit operators and Fermionic operators. This will result in a Hamiltonian represented in the Pauli basis. This can be achieved through methods such as Jordan-Wigner, Bravyi-Kitaev, or parity encoding. For this notebook, we will be using Jordan-Wigner, which is handled by openfermion, to perform this mapping.
4. Perform Ground State Energy Estimation on the mapped qubit hamiltonian on a quantum computer to estimate the ground state energies of the transition states along the reaction pathway.
5. Compute activation energies between the transition states. 
   
Here, we present illustration of the workflow for the steps above for the example of water oxidation: $Co_2O_9H_{12}$ catalysis [[2]](https://doi.org/10.1021/jp511805x).

In [1]:
import re
import sys
import time
import cirq
import numpy as np
from dataclasses import dataclass
from openfermionpyscf import run_pyscf
from openfermion.chem import MolecularData
from pyLIQTR.PhaseEstimation.pe import PhaseEstimation
from openfermion.ops.representations import InteractionOperator
from qca.utils.utils import extract_number, gen_resource_estimate, GSEEMetaData
from qca.utils.algo_utils import gsee_resource_estimation



# Hamiltonian Generation
First, we will define the functions necessary to grab the charge, multiplicity, and number of atoms for some molecule within a desired pathway that was specified from a catalyst of interest. Once we have such information, we can then use it to construct a molecular hamiltonian along the reaction pathway of interest. 

Our input for this is a file encoded in the XYZ file format, which is used for depicting molecular data. An XYZ file gives the number of atoms of the molecule on the first line followed by the molecule's charge, multiplicity and atomic symbol. Sometimes, instead of the atomic symbol, the order will be specified instead. Recall the following information:
- The charge is defined as an integer giving the total molecular charge
- The multiplicity is an integer giving the spin multiplicity, which is the number of probable orientations of the spin angular momentum corresponding to a given total spin quantum number

This is followed by the element's symbol and its corresponding cartesian coordinates for each atom in the molecule. These cartesian coordinates are separated by spaces, tabs, or commas. This notebook comes shipped with an XYZ file that describes water oxidation by using $Co_2O_9H_{12}$ as a catalyst. One can take a look at such files in the data/ directory in this repository for reference. The following defined functions are able to accept any pathway as long as the user is able to specify the path of the xyz file of interest.

In [2]:
t_init = time.perf_counter()
def grab_line_info(current_line:str):
    multiplicity = 0
    charge = 0
    multiplicity_match = re.search(r"multiplicity\s*=\s*(\d+)", current_line)
    if multiplicity_match:
        multiplicity = int(multiplicity_match.group(1))
    charge_match = re.search(r"charge\s*=\s*(\d+)", current_line)
    if charge_match:
        charge = int(charge_match.group(1))
    return multiplicity, charge

def grab_pathway_info(data: list[str], nat:int, current_line:str, coord_pathways:list, current_idx:int):
    coords_list = []
    multiplicity, charge = grab_line_info(current_line)
    coords_list.append([nat, charge, multiplicity])
    for point in range(nat):
        data_point = data[current_idx+1+point].split()
        aty = data_point[0]
        xyz = [float(data_point[i]) for i in range(1,4)]
        coords_list.append([aty, xyz])
    coord_pathways.append(coords_list)

In [3]:
# Given some xyz file and a pathway of interest, grab the information of interest
def load_pathway(fname:str, pathway:list[int]=None) -> list:
    with open(fname, 'r') as f:
        coordinates_pathway = []
        data = f.readlines()
        data_length = len(data)
        idx = 0
        while idx < data_length:
            line = data[idx]
            if 'charge' in line or 'multiplicity' in line:
                geo_name = ''
                if len(line.split(',')) > 2:
                    geo_name = line.split(',')[2]
                nat = int(data[idx-1].split()[0])
                if geo_name and pathway:
                    order = extract_number(geo_name)
                    if order and order in pathway:
                        grab_pathway_info(data, nat, line, coordinates_pathway, idx)
                else:
                    grab_pathway_info(data, nat, line, coordinates_pathway, idx)
                idx += nat + 2
            else:
                idx += 1
    return coordinates_pathway

We then define the appropriate parameters for generating the electronic hamiltonian along a reaction pathway. The Python-based Simulations of Chemistry Framework (PySCF) is an open-source collection of electronic structure modules and we interface it through an openfermion plugin called openfermionpyscf. openfermionpyscf is actually what is used to generate the molecular hamiltonian.

The calculation parameters are used to indicate whether we want to perform a specific calculation. They are as follows:
- run_scf: boolean flag to indicate running an SCF calculation
  - Self-consistent field methods to describe many-body problems
  
- run_mp2: boolean flag to indicate running a MP2 calculation
  - Perform a second order Møller–Plesset perturbation theory method, a post HF method that adds electron correlation effects by means of a second ordered Rayleigh–Schrödinger perturbation theory to describe many-body problems
  
- run_cisd: boolean flag to indicate running a CISD calculation
  - A post-Hartree-Fock linear variational method for solving many-body problems
  
- run_ccsd: boolean flag to indicate running a CCSD calculation
  - A post-Hartree-Fock numerical technique for describing many-body problems
  
- run_fci: boolean flag to indicate running a FCI calculation
  - A linear variational approach to provide solutions to the time-independent, non-relativistic Schrödinger equation

Additionally, we need to choose the basis set for our molecular hamiltonian. There are different basis sets we can choose from, but for the purpose of minimizing computational complexity, we choose 'sto-3g' as our basis set as its a common minimal basis set and is the cheapest to compute.

The active space in a molecule refers to a subset of orbitals that are considered to be energetically important in describing the electronic structure and properties of the molecule. In the selection of an active space, the key principle is that all strongly correlated orbitals must be identified and included. Given its selection, we can effectively reduce the number of configurations in the wavefunction expansion, thus, reducing the computational complexity of the molecular hamiltonian alongside its corresponding qubit hamiltonian. 

In the case of this notebook, where we are performing an example use case, we specify a variable, `active_space_frac`, to reduce the active space of the molecular hamiltonian. In this case, we specify it to be ten for a simple solution that reduces the active space by a fraction of ten. 

For ease of use for running the notebook, the pathway provided for water oxidation via a $Co_2O_9H_{12}$ catalyst results in a simple and minimal solution. There are other pathways that one can choose from, but the pathway specified is sufficient for our example use case and has the benefit of fast compilation. 


In [4]:
molecular_hamiltonians = []
pathway = [1, 14, 15, 16, 24, 25, 26, 27]
# water oxidation via Co4O4 catalyst.
coordinates_pathway = load_pathway('../data/water_oxidation_Co2O9H12.xyz', pathway=pathway)

# Set calculation parameters.
run_scf = 1
run_mp2 = 0
run_cisd = 0
run_ccsd = 0
run_fci = 0

# Set molecule parameters.
basis = 'sto-3g'
active_space_frac = 10

In [5]:
@dataclass
class molecular_info:
    """Class for keeping track of information for a given state in the molecular orbital basis"""
    occupied_qubits: int
    unoccupied_qubits: int
    initial_state: np.ndarray[int]
    hf_energy:float
    molecular_hamiltonian: InteractionOperator

In [6]:
def generate_electronic_hamiltonians(coordinates_pathway:list) -> list:
    molecular_hamiltonians = []
    for idx, coords in enumerate(coordinates_pathway):
        t_coord_start = time.perf_counter()
        _, charge, multi = [int(coords[0][j]) for j in range(3)]

        # set molecular geometry in pyscf format
        geometry = []
        for coord in coords[1:]:
            atom = (coord[0], tuple(coord[1]))
            geometry.append(atom)
        
        molecule = MolecularData(geometry=geometry,
                                 basis=basis,
                                 multiplicity=multi,
                                 charge=charge,
                                 description='catalyst')
        t0 = time.perf_counter()
        molecule = run_pyscf(molecule,
                             run_scf=run_scf,
                             run_mp2=run_mp2,
                             run_cisd=run_cisd,
                             run_ccsd=run_ccsd,
                             run_fci=run_fci)
        t1 = time.perf_counter()

        print(f'Time to perform a HF calculation on molecule {idx} : {t1-t0}')
        print(f'Number of orbitals          : {molecule.n_orbitals}')
        print(f'Number of electrons         : {molecule.n_electrons}')

        print(f'Number of qubits            : {molecule.n_qubits}')
        print(f'Hartree-Fock energy         : {molecule.hf_energy}')
        sys.stdout.flush()

        nocc = molecule.n_electrons // 2
        nvir = molecule.n_orbitals - nocc

        percent_occupied = nocc/molecule.n_orbitals
        percent_unoccupied = nvir/molecule.n_orbitals

        print(f'Number of unoccupied Molecular orbitals are: {nvir}')
        print(f'Number of occupied Molecular orbitals are: {nocc}')
        sys.stdout.flush()

        # get molecular Hamiltonian
        active_space_start =  nocc - nocc // active_space_frac # start index of active space
        active_space_stop = nocc + nvir // active_space_frac   # end index of active space

        print(f'active_space start : {active_space_start}')
        print(f'active_space stop  : {active_space_stop}')
        sys.stdout.flush()

        molecular_hamiltonian = molecule.get_molecular_hamiltonian(
            occupied_indices=range(active_space_start),
            active_indices=range(active_space_start, active_space_stop)
        )
        molecular_occupied = round(percent_occupied*molecular_hamiltonian.n_qubits)
        molecular_unoccupied = round(percent_unoccupied*molecular_hamiltonian.n_qubits)
        initial_state = [0]*molecular_unoccupied + [1]*molecular_occupied
       
        
        print(f'In the Molecular Orbital Basis: we have {molecular_hamiltonian.n_qubits} qubits')
        print(f'In the Molecular Orbital Basis: we have {molecular_occupied} qubits occupied')
        print(f'In the Molecular Orbital Basis: we have {molecular_unoccupied} qubits unoccupied')
        
        # shifted by HF energy
        molecular_hamiltonian -= molecule.hf_energy
        mi = molecular_info(occupied_qubits=molecular_occupied,
                            unoccupied_qubits=molecular_unoccupied,
                            initial_state=initial_state,
                            hf_energy=molecule.hf_energy,
                            molecular_hamiltonian=molecular_hamiltonian)
        molecular_hamiltonians.append(mi)
        t_coord_end = time.perf_counter()
        print(f'Time to generate a molecular hamiltonian for molecule {idx} : {t_coord_end-t_coord_start}\n')
    return molecular_hamiltonians


if coordinates_pathway:
    molecular_hamiltonians = generate_electronic_hamiltonians(coordinates_pathway)

Time to perform a HF calculation on molecule 0 : 44.94567812496098
Number of orbitals          : 100
Number of electrons         : 148
Number of qubits            : 200
Hartree-Fock energy         : -3479.3603932694523
Number of unoccupied Molecular orbitals are: 26
Number of occupied Molecular orbitals are: 74
active_space start : 67
active_space stop  : 76
In the Molecular Orbital Basis: we have 18 qubits
In the Molecular Orbital Basis: we have 13 qubits occupied
In the Molecular Orbital Basis: we have 5 qubits unoccupied
Time to generate a molecular hamiltonian for molecule 0 : 44.95591650001006

Time to perform a HF calculation on molecule 1 : 59.59993424999993
Number of orbitals          : 99
Number of electrons         : 147
Number of qubits            : 198
Hartree-Fock energy         : -3478.738701637257
Number of unoccupied Molecular orbitals are: 26
Number of occupied Molecular orbitals are: 73
active_space start : 66
active_space stop  : 75
In the Molecular Orbital Basis: we

# Initial State Preparation
Next, we need to implement a circuit to prepare an initial estimate for the ground state of each state amongst the reaction pathway with a high degree of overlap with the actual ground state. In this case, a Hartree-Fock (HF) computation serves as a good initial approximation for each state's initial state. We do the following to grab the initial state of each state along the reaction pathway:

1. When generating a state's corresponding molecular hamiltonian, we perform a HF calculation to get the state's canonical orbitals.
2. With the state's canonical orbitals now generated, we use it to perform a mapping between the Fermionic Hamiltonian from an atomic basis to a molecular orbital basis.  
3. Once we have a molecular orbital representation of the Fermionic Hamiltonian, the HF state is |00..011..1>, in molecular representation, where the numbers of 0 and 1 are equal to the number of unoccupied and occupied moleular orbitals, termed as Nvir and Nocc, respectively. For preparing this state on a quantum computer, we only need to apply bit flip gates to `nocc` qubits to prepare the HF state as the state's initial state. This will result in a depth-1 circuit of X gates.

For visualization, you can see how the circuit will look like for preparing the initial state in the next cell alongside its resource estimates.

In [7]:
def prepare_initial_state(mi: molecular_info) -> cirq.Circuit:
    circuit = cirq.Circuit()
    bit_flips = [cirq.X(cirq.LineQubit(qb)) for qb in range(mi.molecular_hamiltonian.n_qubits-1,
                                                            mi.unoccupied_qubits-1,
                                                            -1)]
    circuit.append(bit_flips)
    return circuit

# grab an intermediate stage for constructing the circuit for preparing the initial state
intermediate_idx = len(molecular_hamiltonians)//2
state_prep_circ = prepare_initial_state(molecular_hamiltonians[intermediate_idx])
state_prep_re = gen_resource_estimate(state_prep_circ, is_extrapolated=False)
print(state_prep_re)
print(state_prep_circ)

{'num_qubits': 13, 't_count': 0, 'circuit_depth': 1, 'gate_count': 13, 't_depth': 0, 'clifford_count': 13}
5: ────X───

6: ────X───

7: ────X───

8: ────X───

9: ────X───

10: ───X───

11: ───X───

12: ───X───

13: ───X───

14: ───X───

15: ───X───

16: ───X───

17: ───X───


# Ground State Energy Estimation (GSEE)
Once the initial state is prepared, we can now perform Quantum Phase Estimation (QPE) to estimate the ground state energy of each molecular hamiltonian along the reaction pathway. The initial prepared state used is the Fock state from each molecular hamiltonian.

For the purpose of this example, we choose an energy precision of up to 1 millihartree, requiring up to 10 bits of precision. On top of this, we will be be performing a second order suzuki-trotter evaluation to estimate the ground state energy. To simplify this problem, we are using a short evolution time and a second order trotterization with a single step. Scaling arguments are used to determine the final resources since generating the full circuit for a large number of trotter steps with many bits of precision is quite costly and will increase the compilation time. The circuit depth scales linearly with the number of trotter steps and exponentially base 2 for the number of bits of precision. This comes at the cost of a higher error.

Alternative approaches for specifying these parameters involve using openfermion to uncover the number of trotter steps is necessary for accurately estimating the ground state which can result in better GSEE circuits.

Note that recently, there has been a pyLIQTR release that performs QPE with Quantum Signal processing (QSP) as a sub-process. This can yield potential improvements in resource estimates, however, this has yet to be explored.

Additionally, we grab a quarter of the molecular hamiltonian's Hartree-Fock energy to represent the minimum energy to calculate the phase offset needed for estimating the ground state energy of some stage within the reaction pathway. This is handled by the `grab_molecular_phase_offset` function.

Though not shown explicitly here, when we pass the molecular hamiltonian as arguments to generating a circuit for estimating the ground state energy, pyLIQTR performs a Jordan-Wigner transformation on it. This operation maps the fermionic operators to qubit operators, now allowing us to apply quantum algorithms on the hamiltonian. 

Once a circuit is generated to estimate the ground state energy for each molecular hamiltonian, we translate the circuit to a fault tolerant gate set, i.e, Clifford + T gateset, to grab its resource estimates. The resource estimates are encoded as JSON files which contains all of the resource estimates for a given molecular hamiltonian.

In [8]:
def grab_molecular_phase_offset(hf_energy: float):
    E_min = -abs(0.25 * hf_energy)
    E_max = 0
    omega = E_max - E_min
    t = 2*np.pi/omega
    return E_max * t

In [10]:
trotter_order = 2
trotter_steps = 1
bits_precision = 10
ev_time =1 

gse_args = {
    'trotterize' : True,
    'ev_time'    : ev_time,
    'trot_ord'   : trotter_order,
    'trot_num'   : trotter_steps
}
resource_estimates = []

for idx, molecular_hamiltonian_info in enumerate(molecular_hamiltonians):
    molecular_hamiltonian = molecular_hamiltonian_info.molecular_hamiltonian
    molecular_hf_energy = molecular_hamiltonian_info.hf_energy
    
    n_qubits = molecular_hamiltonian.n_qubits
    gse_args['mol_ham'] = molecular_hamiltonian
    phase_offset = grab_molecular_phase_offset(molecular_hf_energy)
    init_state = molecular_hamiltonian_info.initial_state


    #TODO: Figure out Phase offset in metadata
    molecular_metadata = GSEEMetaData(
        id = time.time_ns(),
        name=f'Co2O9H12_{idx}',
        category='scientific',
        size=f'{n_qubits} qubits',
        task='Ground State Energy Estimation', 
        bits_precision=bits_precision,
        evolution_time=ev_time,
        nsteps=trotter_steps,
        trotter_order=trotter_order,
    )

    t0 = time.perf_counter()
    molecular_gse = gsee_resource_estimation(
        outdir='GSE/Quantum_Chemistry/',
        nsteps=trotter_steps,
        gsee_args=gse_args,
        init_state=init_state,
        precision_order=1,
        bits_precision=bits_precision,
        phase_offset=phase_offset,
        circuit_name=f'Co2O9H12_{idx}',
        metadata=molecular_metadata,
        write_circuits=True
    )
    t1 = time.perf_counter()
    print(f'Time to estimate Co2O9H12_step({idx}): {t1-t0}')

Time to generate circuit for GSEE: 0.0001236249809153378 seconds
   Time to decompose high level HPowGate circuit: 0.00017012498574331403 seconds 
   Time to transform decomposed HPowGate circuit to Clifford+T: 0.00043766602175310254 seconds
   Time to decompose high level IdentityGate circuit: 1.8667022231966257e-05 seconds 
   Time to transform decomposed IdentityGate circuit to Clifford+T: 2.2082997020334005e-05 seconds
   Time to decompose high level _PauliX circuit: 3.9832957554608583e-05 seconds 
   Time to transform decomposed _PauliX circuit to Clifford+T: 2.216600114479661e-05 seconds
   Time to decompose high level PhaseOffset circuit: 8.141703438013792e-05 seconds 
   Time to transform decomposed PhaseOffset circuit to Clifford+T: 0.00010900001507252455 seconds
   Time to decompose high level Trotter_Unitary circuit: 33.39688883302733 seconds 
   Time to transform decomposed Trotter_Unitary circuit to Clifford+T: 191.66754699999 seconds
   Time to decompose high level Measur

At this point, we have estimated the ground state energy for each stage in the reaction pathway and have grabbed its resource estimates. After performing QPE, we can calculate the activation energy of the reaction. The activation energy is essential in assessing the feasibility and kinetics of a reaction and directly influencing the efficiency of a catalyst. By accurately computing the activation energies for each step of the reaction pathway, we can predict the rate at which the reaction will proceed under different conditions.

The activation energy is calculated by:
$$E_b = E_{transition\_states} - E_{reactants} $$ 
This can be evaluated by grabbing the maximum energy between the intermediate states of the reaction pathway and the energy of the reactants state and finding this energy difference. This will result in the activation energy for the reaction pathway specified.

In [10]:
t_end = time.perf_counter()
print(f'Total time to run through this notebook: {t_end-t_init}')

Total time to run through this notebook: 4212.176184124999


### References
[1] Fault-Tolerant Quantum Simulations of Chemistry in First Quantization - Yuan su, Dominic W. Berry, Nathan Wiebe, Nicholas Rubin, and Ryan Babbush - https://arxiv.org/abs/2105.12767

[2] Reaction Pathways for Water Oxidation to Molecular Oxygen Mediated by Model Cobalt Oxide Dimer and Cubane Catalysts - Amendra Fernando and Christine M. Aikens - The Journal of Physical Chemistry C 2015 119 (20), 11072-11085 DOI: 10.1021/jp511805x