# Choice of $\lambda$-values
The number and distribution of $\lambda$-values influences the error in the calculation of atomic energies. This notebook investigates how the much the ML-error changes if 
- the number of $\lambda$-values is changed (adding/removing densities at certain $\lambda$-values)

The code for the generation of the data can be found in:

alchemy_tools.test_impact_lambda: calculates the atomic energies where always one $\lambda$-value (of 0.2, 0.4, 0.6, 0.8) is neglected. The data is stored in /home/misa/APDFT/prototyping/atomic_energies/results/slice_ve38/compound/no_*ve*.txt (where the neglected $\lambda = \frac{ve}{38}$

crossvalidate.choose_different_lambdas: generates crossvalidated learning curves using the atomic energies created by test_impact_lambda; the hyperparameters are the same as for the data where all $\lambda$-values are used


## ML-Error if one $\lambda$-value is left out
The plot shows learning curves if one $\lambda$-value is left out and the learning curve where all $\lambda$-values are used. The error is for all curves in the same range which shows that an unsufficient number of $\lambda$-values is not the main contributor to the error. (In principle using all $\lambda$-values should give the lowest error, this is not for every training set size the case. The reason for this behaviour is probably, that the number of crossvalidation samples (=10) is not large enough).

In [1]:
import matplotlib
matplotlib.use('Qt5Agg')
from matplotlib import pyplot as plt
plt.rcParams.update({'font.size': 22})
import numpy as np

In [4]:
# load learning curves

base_path = '/home/misa/APDFT/prototyping/atomic_energies/results/analyse_learning/'
l_curves_data = ['learning_curves.tab', 'no_8.tab', 'no_15.tab', 'no_23.tab', 'no_30.tab']

l_curves = np.empty((5,10,3))

for idx, lc in enumerate(l_curves_data):
    l_curves[idx] = np.loadtxt(base_path+lc)


In [19]:
# plot results
labels = ['all', 'no $\lambda_{0.2}$', 'no $\lambda_{0.4}$', 'no $\lambda_{0.6}$', 'no $\lambda_{0.8}$']

fig, ax = plt.subplots(1,1)

for idx, l in enumerate(labels):
    ax.plot(l_curves[idx][:,0], l_curves[idx][:,1], '-o', label = l)

ax.set_xscale('log')
ax.set_yscale('log')

ax.set_xlabel('Training set size')
ax.set_ylabel('Mean error  (Ha)')

ax.legend()

<matplotlib.legend.Legend at 0x7f172905b898>