# Choice of $\lambda$-values
The number and distribution of $\lambda$-values influences the error in the calculation of atomic energies. This notebook investigates how the much the ML-error changes if 
- the number of $\lambda$-values is changed (adding/removing densities at certain $\lambda$-values)

The code for the generation of the data can be found in:

alchemy_tools.test_impact_lambda: calculates the atomic energies where always one $\lambda$-value (of 0.2, 0.4, 0.6, 0.8) is neglected. The data is stored in /home/misa/APDFT/prototyping/atomic_energies/results/slice_ve38/compound/no_*ve*.txt (where the neglected $\lambda = \frac{ve}{38}$

crossvalidate.choose_different_lambdas: generates crossvalidated learning curves using the atomic energies created by test_impact_lambda; the hyperparameters are the same as for the data where all $\lambda$-values are used


## ML-Error if one $\lambda$-value is left out
The plot shows learning curves if one $\lambda$-value is left out and the learning curve where all $\lambda$-values are used. ~The error is for all curves in the same range which shows that an unsufficient number of $\lambda$-values is not the main contributor to the error. (In principle using all $\lambda$-values should give the lowest error, this is not for every training set size the case. The reason for this behaviour is probably, that the number of crossvalidation samples (=10) is not large enough).~

In [80]:
import matplotlib
matplotlib.use('Qt5Agg')
from matplotlib import pyplot as plt
plt.rcParams.update({'font.size': 22})
import numpy as np

In [81]:
# load learning curves

base_path = '/home/misa/APDFT/prototyping/atomic_energies/results/analyse_learning/'
l_curves_data = ['learning_curves.tab', 'no_8.tab', 'no_15.tab', 'no_23.tab', 'no_30.tab']

l_curves = np.empty((5,10,3))

for idx, lc in enumerate(l_curves_data):
    l_curves[idx] = np.loadtxt(base_path+lc)


In [82]:
# plot results
labels = ['all', 'no $\lambda_{0.2}$', 'no $\lambda_{0.4}$', 'no $\lambda_{0.6}$', 'no $\lambda_{0.8}$']

fig, ax = plt.subplots(1,1)

for idx, l in enumerate(labels):
    ax.plot(l_curves[idx][:,0], l_curves[idx][:,1], '-o', label = l)

ax.set_xscale('log')
ax.set_yscale('log')

ax.set_xlabel('Training set size')
ax.set_ylabel('Mean error  (Ha)')

ax.legend()

<matplotlib.legend.Legend at 0x7f39dba7d2b0>

## Change in atomisation energy if one $\lambda$-value is left out
The difference between the integrals where all $\lambda$-values are used and where $\lambda \approx 0.8$ is left out is $\approx 0.05$ Ha. This value is in the order of the minimum error that we obtain for our learning curves (0.02 Ha). This suggest, that the number of $\lambda$-values is insufficient.

In [4]:
import sys
sys.path.insert(0, '/home/misa/APDFT/prototyping/atomic_energies')
import qml_interface as qi
import numpy as np

import matplotlib
matplotlib.use('Qt5Agg')
from matplotlib import pyplot as plt
plt.rcParams.update({'font.size': 22})

In [5]:
p_all = '/home/misa/APDFT/prototyping/atomic_energies/results/slice_ve38/finished_abs'
paths_all = qi.wrapper_alch_data(p_all)

p_no30 = '/home/misa/APDFT/prototyping/atomic_energies/results/slice_ve38/paths_no_30'
paths_no30 = qi.wrapper_alch_data(p_no30)


In [6]:
# verify that same order
a=[el.rstrip('no_30.txt') for el in paths_no30]
b=[el.rstrip('atomic_energies.txt') for el in paths_all]

for idx in range(len(a)):
    assert a[idx]==b[idx]

In [7]:
data_all, msize_all = qi.load_alchemy_data(paths_all)
data_no_30, msize_no30 = qi.load_alchemy_data(paths_no30)

In [83]:
diff = []
for idx in range(len(data_all)):
    diff.extend(data_all[idx][:,5] - data_no_30[idx][:,5])
diff = np.array(diff)
mean_diff = np.abs(diff).mean()

In [84]:
fig, ax = plt.subplots(1,2)
mean=[]
std = []
idx_per_charge = qi.partition_idx_by_charge(data_all, range(len(data_all)))
for i in range(len(idx_per_charge)):
    ax[0].scatter(range(len(idx_per_charge[i][1][0])),diff[idx_per_charge[i][1]], label = 'Z = {}'.format(idx_per_charge[i][0]))
#     ax[1].bar(i, np.abs(diff[idx_per_charge[i][1]]).mean()  )
    mean.append(np.abs(diff[idx_per_charge[i][1]]).mean())
    std.append(np.abs(diff[idx_per_charge[i][1]]).std())
    
ax[0].set_xlabel('atom ID')
ax[0].set_ylabel(r'$\Delta E_{atomic}$ (Ha)')
ax[0].legend()

mean.append(np.abs(diff).mean())
std.append(np.abs(diff).std())

prop_cycle = plt.rcParams['axes.prop_cycle']
colors = prop_cycle.by_key()['color']
ax[1].bar(range(len(idx_per_charge)+1),mean, yerr=std, tick_label=['Z = 1', 'Z = 6', 'Z = 7', 'Z = 8', 'all'], color = colors[0:len(idx_per_charge)+1])
ax[1].set_ylabel(r'$\overline{\Delta E}_{atomic}$ (Ha)')

Text(0, 0.5, '$\\overline{\\Delta E}_{atomic}$ (Ha)')

## Convergence of the integral $\int d\lambda$
We calculate the integral for different amounts of $\lambda$-values and plot how the integral changes if more values are added.

In [13]:
import sys
sys.path.insert(0, '/home/misa/APDFT/prototyping/atomic_energies')

import alchemy_tools as at
import glob


In [109]:
def get_paths(directory):
    # load data from cube files
    paths_cubes = ['/home/misa/APDFT/prototyping/atomic_energies/results/slice_ve38/ueg/ve_00.cube']
    paths2 = glob.glob(directory+'/cube-files/*')
    paths2.sort()
    paths_cubes.extend(paths2)
    return(paths_cubes)
    

### $\Delta E_{atomic} = E(\text{the original 6}) - E(\text{all} \lambda)$

In [108]:
directories = ['/home/misa/APDFT/prototyping/atomic_energies/results/slice_ve38/dsgdb9nsd_003712',
 '/home/misa/APDFT/prototyping/atomic_energies/results/slice_ve38/dsgdb9nsd_003886',
 '/home/misa/APDFT/prototyping/atomic_energies/results/slice_ve38/dsgdb9nsd_001212']

['/home/misa/APDFT/prototyping/atomic_energies/results/slice_ve38/dsgdb9nsd_003712',
 '/home/misa/APDFT/prototyping/atomic_energies/results/slice_ve38/dsgdb9nsd_003886',
 '/home/misa/APDFT/prototyping/atomic_energies/results/slice_ve38/dsgdb9nsd_001212']

In [110]:
diff_en = []
diff_alch = []

for d in directories:
    paths_cubes = get_paths(d)
    # all lambda values
    lam_vals, densities, nuclei, gpts = at.load_cube_data(paths_cubes)
    av_dens = at.integrate_lambda_density(densities, lam_vals, method='trapz')
    atomic_energies, alch_pots = at.calculate_atomic_energies(av_dens, nuclei, gpts)
    
    # the orginal 6
    old_densities = []
    old_lam_vals = []

    for i in range(len(lam_vals)):
        if lam_vals[i] in [0, 8/38, 15/38, 23/38, 30/38, 38/38]:
            old_lam_vals.append(lam_vals[i])
            old_densities.append(densities[i])
            
    av_dens_old = at.integrate_lambda_density(old_densities, old_lam_vals, method='trapz')
    atomic_energies_old, alch_pots_old = at.calculate_atomic_energies(av_dens_old, nuclei, gpts)
    
    diff_en.append(atomic_energies_old-atomic_energies)
    diff_alch.append(alch_pots_old - alch_pots)

In [111]:
diff_en

[array([ 0.00959746,  0.01364483, -0.02457503,  0.0033201 ,  0.01778172,
        -0.02563588, -0.01011724,  0.01298866,  0.00245047,  0.02051527,
         0.01231401,  0.02036384,  0.01007422,  0.01840382,  0.01525411]),
 array([-0.03078194,  0.02812559, -0.02890459, -0.11024495,  0.02720261,
        -0.04471018, -0.08504432,  0.01274703,  0.01292123,  0.00984363]),
 array([ 0.00198757, -0.00104597, -0.15993516, -0.08758155,  0.03307746,
         0.01495659, -0.02957475,  0.01221636,  0.00955504,  0.00863894,
         0.01659848,  0.016055  ])]

In [122]:
for i in range(3):
    print(np.abs(diff_en[i]).mean())

0.014469110157744975
0.03905260699764899
0.03260190620883215


In [112]:
diff_alch

[array([ 0.00159958,  0.00227414, -0.00409584,  0.00055335,  0.00296362,
        -0.00366227, -0.00144532,  0.01298866,  0.00245047,  0.02051527,
         0.01231401,  0.02036384,  0.01007422,  0.01840382,  0.01525411]),
 array([-0.00439742,  0.0046876 , -0.00412923, -0.01378062,  0.00453377,
        -0.00638717, -0.01063054,  0.01274703,  0.01292123,  0.00984363]),
 array([ 0.00033126, -0.00017433, -0.0199919 , -0.01094769,  0.00551291,
         0.00249276, -0.00422496,  0.01221636,  0.00955504,  0.00863894,
         0.01659848,  0.016055  ])]

In [123]:
for i in range(3):
    print(np.abs(diff_alch[i]).mean())

0.008597234344947325
0.00840582285041629
0.008894970509579622


## $ \frac{d}{d \lambda} \rho$

In [96]:
def get_derivatives(lam_vals, densities):
    derivatives = []
    for idx in range(1, len(densities)):
        d_dens_lam = (densities[idx] - densities[idx-1])/(lam_vals[idx]-lam_vals[idx-1])
        derivatives.append(d_dens_lam)
    
    derivatives = np.array(derivatives)
    return(derivatives)

def sum_gradients(derivatives):
    gradients = []
    for idx in range(len(derivatives)):
        sum_gradients = np.abs(derivatives[idx]).sum()
        gradients.append(sum_gradients)
    return(np.array(gradients))

In [97]:
derivatives = get_derivatives(lam_vals, densities)

In [98]:
gradients_sum = sum_gradients(derivatives)

In [99]:
gradients_sum

array([ 82.88549193,  82.67519794,  86.16749275,  90.10449409,
        97.79197916,  94.6862974 ,  91.19729829, 107.21152492,
        99.57975493, 103.35557286,  92.4498429 ,  89.35113404,
        89.90858836,  89.17835445,  94.97777587,  94.29996967,
        89.23052011,  89.7250754 ,  83.68187086,  68.40971137])

In [100]:
old_derivatives = get_derivatives(old_lam_vals, old_densities)
old_gradients_sum = sum_gradients(old_derivatives)

In [101]:
old_gradients_sum

array([80.05959928, 80.92979938, 77.8198165 , 83.75354686, 77.22147903])

## Example plot for projected densities at different $\lambda$-values

In [7]:
import glob
import sys
sys.path.insert(0, '/home/misa/APDFT/prototyping/atomic_energies')
from parse_cube_files import CUBE
import numpy as np

import matplotlib
matplotlib.use('Qt5Agg')
from matplotlib import pyplot as plt
plt.rcParams.update({'font.size': 22})

In [21]:
p = '/home/misa/APDFT/prototyping/atomic_energies/results/slice_ve38/dsgdb9nsd_001212/cube-files'
paths = glob.glob(p+'/*.cube')
paths.sort()

cubes = []
for path in paths:
    cubes.append(CUBE(path))

In [41]:
float(paths[0].split('/')[-1].split('.')[0].split('_')[1])

4.0