# APDFT Analysis

Predictions can use

- **QC**: HF, CCSD, or CCSD(T) energy differences with the species of interests or target,
- **Quantum Alchemy**: HF, CCSD, or CCSD(T) energy differences by changing the nuclear charge of other systems that have the same number of electrons as the target,
- **APDFT*n***: Taylor series approximation with order *n* of the APDFT potential energy surface.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import colors

from qa_tools.utils import hartree_to_ev, all_atom_systems
from qa_tools.data import prepare_dfs
from qa_tools.prediction import get_qa_change_charge, get_qc_change_charge
from qa_tools.analysis import qats_error_change_charge

json_path = '../../json-data/atom-pyscf.qa-data.posthf.json'
df_qc, df_qats = prepare_dfs(json_path, get_CBS=False)

## APDFT prediction errors

There is some intrinsic error in modeling a target system (e.g., N atom) by changing the nuclear charge of a reference system's basis set (e.g., C<sup> &ndash;</sup> ).
The following cell computes this error, and represents the best performance for APDFT without fortuitous errors.

In [2]:
system_label = 'n'
delta_charge = 1
target_initial_charge = 0  # Initial charge of the system.
basis_set = 'aug-cc-pV5Z'  # cc-pV5Z, aug-cc-pVTZ, aug-cc-pVQZ, aug-cc-pV5Z, CBS-aug

use_ts = False  # Use finite differences with Taylor series for APDFT predictions.
change_signs = False  # Multiple all predictions by negative one (e.g., for electron affinities)

ie_qc_prediction = energy_change_charge_qc_atom(
    df_qc, system_label, delta_charge,
    target_initial_charge=target_initial_charge,
    change_signs=change_signs, basis_set=basis_set
)
ie_qats_predictions = energy_change_charge_qa_atom(
    df_qc, df_qats, system_label, delta_charge,
    target_initial_charge=target_initial_charge,
    change_signs=change_signs, basis_set=basis_set, use_ts=use_ts
)

ie_qc_prediction = hartree_to_ev(ie_qc_prediction)
ie_qats_predictions = {key:hartree_to_ev(value) for (key,value) in ie_qats_predictions.items()}
ie_qats_errors = {key:value-ie_qc_prediction for (key,value) in ie_qats_predictions.items()}

print(f'PySCF prediction of IE for {system_label}: {ie_qc_prediction:.3f} eV\n')
print(f'APDFT prediction errors in eV:')
print(pd.DataFrame(ie_qats_errors, index=[f'APDFT']))

PySCF prediction of IE for n: 14.541 eV

APDFT prediction errors in eV:
              b         c         o
APDFT -0.053009 -0.010871  0.001612


## APDFT*n* prediction errors

Now, we can look at approximating the APDFT prediction by using a Taylor series centered on $\Delta Z = 0$.

In [3]:
system_label = 'n'
delta_charge = 1
target_initial_charge = 0  # Initial charge of the system.
basis_set = 'aug-cc-pV5Z'  # cc-pV5Z, aug-cc-pVTZ, aug-cc-pVQZ, aug-cc-pV5Z, CBS-aug

use_ts = True  # Use finite differences with Taylor series for APDFT predictions.
change_signs = False  # Multiple all predictions by negative one (e.g., for electron affinities)

ie_qc_prediction = energy_change_charge_qc_atom(
    df_qc, system_label, delta_charge,
    target_initial_charge=target_initial_charge,
    change_signs=change_signs, basis_set=basis_set
)
ie_qats_predictions = energy_change_charge_qa_atom(
    df_qc, df_qats, system_label, delta_charge,
    target_initial_charge=target_initial_charge,
    change_signs=change_signs, basis_set=basis_set, use_ts=use_ts
)

ie_qc_prediction = hartree_to_ev(ie_qc_prediction)
ie_qats_predictions = {key:hartree_to_ev(value) for (key,value) in ie_qats_predictions.items()}
ie_qats_errors = {key:value-ie_qc_prediction for (key,value) in ie_qats_predictions.items()}

print(f'PySCF prediction of IE for {system_label}: {ie_qc_prediction:.3f} eV\n')
print(f'APDFTn prediction errors in eV:')
print(pd.DataFrame(ie_qats_errors, index=[f'QATS-{n}' for n in range(5)]))

PySCF prediction of IE for n: 14.541 eV

APDFTn prediction errors in eV:
                b          c          o
QATS-0 -17.880868 -13.288335  20.598492
QATS-1 -14.107617  -4.001721  -3.519578
QATS-2 -14.026723   0.459099  -0.030341
QATS-3   9.209186  -0.214531  -0.005005
QATS-4  45.454693  -0.470333   0.000159


### Specifying lambda values

We can also specify specific lambda values to include. For example, we could only look at lambda values of +-1.

In [4]:
system_label = 'n'
delta_charge = 1
target_initial_charge = 0  # Initial charge of the system.
basis_set = 'aug-cc-pV5Z'  # cc-pV5Z, aug-cc-pVTZ, aug-cc-pVQZ, aug-cc-pV5Z, CBS-aug

considered_lambdas = [-1, 1]

use_ts = True  # Use finite differences with Taylor series for APDFT predictions.
change_signs = False  # Multiple all predictions by negative one (e.g., for electron affinities)

ie_qc_prediction = energy_change_charge_qc_atom(
    df_qc, system_label, delta_charge,
    target_initial_charge=target_initial_charge,
    change_signs=change_signs, basis_set=basis_set
)
ie_qats_predictions = energy_change_charge_qa_atom(
    df_qc, df_qats, system_label, delta_charge,
    target_initial_charge=target_initial_charge,
    change_signs=change_signs, basis_set=basis_set, use_ts=use_ts,
    considered_lambdas=considered_lambdas
)

ie_qc_prediction = hartree_to_ev(ie_qc_prediction)
ie_qats_predictions = {key:hartree_to_ev(value) for (key,value) in ie_qats_predictions.items()}
ie_qats_errors = {key:value-ie_qc_prediction for (key,value) in ie_qats_predictions.items()}

print(f'PySCF prediction of IE for {system_label}: {ie_qc_prediction:.3f} eV\n')
print(f'APDFTn prediction errors in eV:')
print(pd.DataFrame(ie_qats_errors, index=[f'QATS-{n}' for n in range(5)]))

PySCF prediction of IE for n: 14.541 eV

APDFTn prediction errors in eV:
                c          o
QATS-0 -13.288335  20.598492
QATS-1  -4.001721  -3.519578
QATS-2   0.459099  -0.030341
QATS-3  -0.214531  -0.005005
QATS-4  -0.470333   0.000159


## APDFT*n* errors with respect to APDFT

Or you, can compute the difference between APDFT*n* (predictions with Taylor series) and APDFT.

In [5]:
system_label = 'n'
delta_charge = 1
target_initial_charge = 0  # Initial charge of the system.
basis_set = 'aug-cc-pV5Z'  # cc-pV5Z, aug-cc-pVTZ, aug-cc-pVQZ, aug-cc-pV5Z, CBS-aug

return_qats_vs_qa = True  # Returns APDFTn - APDFT instead of energy predictions.

use_ts = True  # Use finite differences with Taylor series for APDFT predictions.
change_signs = False  # Multiple all predictions by negative one (e.g., for electron affinities)

ie_qats_predictions = energy_change_charge_qa_atom(
    df_qc, df_qats, system_label, delta_charge,
    target_initial_charge=target_initial_charge,
    change_signs=change_signs, basis_set=basis_set, use_ts=use_ts,
    return_qats_vs_qa=return_qats_vs_qa
)

ie_qats_predictions = {key:hartree_to_ev(value) for (key,value) in ie_qats_predictions.items()}

print(f'Differences between APDFTn and APDFT in eV:')
print(pd.DataFrame(ie_qats_predictions, index=[f'QATS-{n}' for n in range(5)]))

Differences between APDFTn and APDFT in eV:
                b          c          o
QATS-0 -17.827859 -13.277465  20.596880
QATS-1 -14.054608  -3.990850  -3.521190
QATS-2 -13.973714   0.469970  -0.031953
QATS-3   9.262195  -0.203660  -0.006618
QATS-4  45.507701  -0.459463  -0.001454


## Overall statistics

We can also compute mean absolute errors (MAEs), root mean squared error (RMSE) and max error.

In [6]:
all_systems = all_atom_systems[1:]
basis_set = 'aug-cc-pV5Z'
target_initial_charge = 0

use_ts = True
return_qats_vs_qa = False
considered_lambdas = [-1, 1]

delta_charge = 1
change_signs = False
max_qats_order = 4
ignore_one_row = True


for i in range(len(all_systems)):
    sys_error = error_change_charge_qats_atoms(
        df_qc, df_qats, all_systems[i], delta_charge, change_signs=change_signs, 
        basis_set=basis_set, target_initial_charge=target_initial_charge,
        use_ts=use_ts, ignore_one_row=ignore_one_row,
        return_qats_vs_qa=return_qats_vs_qa,
        considered_lambdas=considered_lambdas
    )
    if i == 0:
        all_error = sys_error
    else:
        all_error = pd.concat(
            [all_error, sys_error], axis=1
        )

if use_ts or return_qats_vs_qa == True:
    # MAE
    for n in range(0, max_qats_order+1):
        qatsn_errors = all_error.iloc[n].values
        qatsn_mae = np.mean(np.abs(qatsn_errors))
        print(f'QATS-{n} MAE: {qatsn_mae:.4f} eV')

    # RMSE
    print()
    for n in range(0, max_qats_order+1):
        qatsn_errors = all_error.iloc[n].values
        qatsn_rmse = np.sqrt(np.mean((qatsn_errors)**2))
        print(f'QATS-{n} RMSE: {qatsn_rmse:.4f} eV')
    
    # Max
    print()
    for n in range(0, max_qats_order+1):
        qatsn_errors = all_error.iloc[n].values
        qatsn_max = np.max(np.abs(qatsn_errors))
        print(f'QATS-{n} Max Abs.: {qatsn_max:.4f} eV')
else:
    # MAE
    qatsn_errors = all_error.iloc[0].values
    qatsn_mae = np.mean(np.abs(qatsn_errors))
    print(f'Qauntum alchemy MAE: {qatsn_mae:.4f} eV')

    # RMSE
    print()
    qatsn_rmse = np.sqrt(np.mean((qatsn_errors)**2))
    print(f'Quantum alchemy RMSE: {qatsn_rmse:.4f} eV')

    # Max
    print()
    qatsn_max = np.max(np.abs(qatsn_errors))
    print(f'Quantum alchemy Max Abs.: {qatsn_max:.4f} eV')


QATS-0 MAE: 14.5811 eV
QATS-1 MAE: 3.6138 eV
QATS-2 MAE: 0.3958 eV
QATS-3 MAE: 1.0075 eV
QATS-4 MAE: 2.7050 eV

QATS-0 RMSE: 16.6775 eV
QATS-1 RMSE: 4.5460 eV
QATS-2 RMSE: 0.6555 eV
QATS-3 RMSE: 2.7752 eV
QATS-4 RMSE: 6.5716 eV

QATS-0 Max Abs.: 50.5211 eV
QATS-1 Max Abs.: 13.7508 eV
QATS-2 Max Abs.: 2.7449 eV
QATS-3 Max Abs.: 12.9765 eV
QATS-4 Max Abs.: 25.9311 eV
