`CoulombMatrix(permutation="sorted_l2")` is not symmetric #89

juntyr · 2022-06-20T10:52:25Z

When using the dscribe Python library, v1.2.1, Coulomb matrices generated using the sorted_l2 permutation are no longer symmetric matrices. I compared dscribe's result with a manual implementation of the sorting based on doi:10.1002/qua.24954 to check that the algorithm in general works.

This manual implementation matches

dscribe/dscribe/descriptors/matrixdescriptor.py

Lines 143 to 159 in 79a1393

    
               def sort(self, matrix): 
        
                   """Sorts the given matrix by using the L2 norm. 
        
                   Args: 
        
                       matrix(np.ndarray): The matrix to sort. 
        
                   Returns: 
        
                       np.ndarray: The sorted matrix. 
        
                   """ 
        
                   # Sort the atoms such that the norms of the rows are in descending 
        
                   # order 
        
                   norms = np.linalg.norm(matrix, axis=1) 
        
                   sorted_indices = np.argsort(norms, axis=0)[::-1] 
        
                   sorted_matrix = matrix[sorted_indices] 
        
                   sorted_matrix = sorted_matrix[:, sorted_indices] 
        
                   return sorted_matrix

Therefore, I'd guess that something might be wrong with the C++ implementation in

dscribe/dscribe/ext/cm.cpp

Lines 111 to 136 in 79a1393

    
           void CoulombMatrix::sort(Ref<MatrixXd> matrix, bool noise) 
        
           { 
        
               // Calculate row norms 
        
               VectorXd norms = matrix.rowwise().norm(); 
        
               // Introduce noise with norm as mean and sigma as standard deviation. 
        
               if (noise) { 
        
                   for (int i = 0; i < norms.size(); ++i) { 
        
                       normal_distribution<double> distribution(norms(i), this->sigma); 
        
                       norms(i) = distribution(this->generator); 
        
                   } 
        
               } 
        
               // Calculate row permutations that order the matrix. 
        
               int n_atoms = matrix.rows(); 
        
               VectorXi indices(n_atoms); 
        
               iota(indices.begin(), indices.end(), 0); 
        
               std::sort(indices.data(), indices.data() + indices.size(), [&](int i1, int i2) { return norms(i1) > norms(i2); } ); 
        
               PermutationMatrix<Dynamic> P(indices); 
        
               // Sort matrix in place. Notice that we sort both rows and columns. This 
        
               // way the interpretation of the matrix as pairwise interaction of atoms is 
        
               // still valid. 
        
               matrix = matrix * P; 
        
               matrix = P * matrix; 
        
           }

Below is an example to reproduce the issue:

import ase
import numpy as np

from io import StringIO

from ase.io import read
from dscribe.descriptors import CoulombMatrix


# Manual implementation of the CM matrix generation
def generate_coulomb_matrix(n_atoms_max, atoms):
    cm = np.zeros((n_atoms_max, n_atoms_max))

    Z = atoms.get_atomic_numbers()
    xyz = np.array(atoms.positions)

    for r in range(len(xyz)):
        for c in range(len(xyz)):
            if r == c:
                cm[r][c] = 0.5 * Z[r] ** 2.4
            elif r < c:
                cm[r][c] = Z[r] * Z[c] / np.linalg.norm(xyz[r] - xyz[c])
            else:
                cm[r][c] = cm[c][r]

    return cm

# Manual implementation of the L2 sort for a CM
def sort_cm_l2(cm):
    """
    [Coulomb matrices are] not invariant to reindexing [their] atoms.
    One remedy is to sort Coulomb matrices by descending row norm by
    simultaneously permuting rows and columns accordingly. This
    ensures invariance against nuclear permutations, but introduces
    discontinuities where the sorting changes.

    Rupp, M. (2015) Machine learning for quantum mechanics in a nutshell.
    International Journal of Quantum Chemistry.
    115 (16) pp.1058–1073. doi:10.1002/qua.24954.
    """

    row_norms = np.linalg.norm(cm, axis=1)
    sorted_row_indices = np.argsort(row_norms)[::-1]

    sorted_cm = cm[sorted_row_indices]
    sorted_cm = sorted_cm[:, sorted_row_indices]

    return sorted_cm


# Example xyz file for the pyridine molecule
# https://en.wikipedia.org/w/index.php?title=XYZ_file_format&oldid=1002382601#Example
xyz = """11

C       -0.180226841      0.360945118     -1.120304970
C       -0.180226841      1.559292118     -0.407860970
C       -0.180226841      1.503191118      0.986935030
N       -0.180226841      0.360945118      1.29018350
C       -0.180226841     -0.781300882      0.986935030
C       -0.180226841     -0.837401882     -0.407860970
H       -0.180226841      0.360945118     -2.206546970
H       -0.180226841      2.517950118     -0.917077970
H       -0.180226841      2.421289118      1.572099030
H       -0.180226841     -1.699398882      1.572099030
H       -0.180226841     -1.796059882     -0.917077970
"""


atoms = read(StringIO(xyz), format="xyz")


cm_none = CoulombMatrix(n_atoms_max=11, permutation="none", flatten=False)
cm_l2 = CoulombMatrix(n_atoms_max=11, permutation="sorted_l2", flatten=False)


dscribe_cm_none = cm_none.create(atoms).round(1)
manual_cm_none = generate_coulomb_matrix(11, atoms).round(1)

# Succeeds, i.e. the CM matrix is generated as expected
assert np.array_equal(dscribe_cm_none, manual_cm_none)


dscribe_cm_l2 = cm_l2.create(atoms).round(1)
manual_cm_l2 = sort_cm_l2(generate_coulomb_matrix(11, atoms)).round(1)

# Fails, i.e. the L2-sorted CM matrix is NOT generated as expected
assert np.array_equal(dscribe_cm_l2, manual_cm_l2)


dscribe_cm_norm = np.linalg.norm(dscribe_cm_l2, axis=0).round(1)
dscribe_cm_norm = np.linalg.norm(manual_cm_l2, axis=0).round(1)

# Succeeds, i.e. the L2-sorted CM matrix is sorted by each row's L2 norm
assert np.array_equal(dscribe_cm_none, manual_cm_none)


# Succeeds, i.e. the manually sorted CM remains symmetric
assert np.array_equal(manual_cm_none, manual_cm_none.T)


# Fails, i.e. dscribe's L2-sorted CM is no longer symmetric
assert np.array_equal(dscribe_cm_l2, dscribe_cm_l2.T)

The text was updated successfully, but these errors were encountered:

lauri-codes · 2022-06-27T18:45:34Z

Hi @MomoLangenstein,

Thanks for the report. I need to check what is going on. If this is the case, we definitely need to fix it and introduce a better test that also checks the symmetricity.

lauri-codes · 2022-07-20T05:55:07Z

Hi @MomoLangenstein,

Indeed there was an issue where the rows and columns of the matrix were not being permuted as expected, causing non-symmetric matrices. There is now a fix in the branch v1.2.2, which will be made into a release in the near future.

It would be great if you can verify this fix as well. I have added a new symmetricity test into our test suite and your examples now also pass after changing the sorting on the python side slightly like this:

sorted_row_indices = np.argsort(-row_norms)

The sorting in your manual implementation needs to be exactly the same as in our C++ implementation due to ambiguity in sorting by row norm. In several cases the norms of the matrix rows are identical and in order to get the exact same results, the sorting logic needs to be identical in your example and in our C++ implementation.

juntyr · 2022-07-25T13:35:35Z

Hi @lauri-codes,

Thank you for working out the fix for this issue! I can confirm that my small test now works on the v1.2.2 branch. I just wanted to clarify your proposed change to the manual Python implementation. If I do use

sorted_row_indices = np.argsort(-row_norms)

instead of

sorted_row_indices = np.argsort(row_norms)[::-1]

I get exactly the same results with my manual implementation as with dscribe. Are both methods semantically correct, and only differ in their handling of how to order rows/columns with equivalent norms?

lauri-codes · 2022-07-27T13:12:20Z

Hi,

Yes, the sorting simply affects the way that rows with equal norm are handled. If there are no equal norms, then the output is completely independent of the sorting algorithm (see e.g. the kind attribute in the numpy docs) and whether you do np.argsort(-row_norms) or np.argsort(row_norms)[::-1].

The small differences only start kicking in when there are equal norms. I think np.argsort(-row_norms) and np.argsort(row_norms)[::-1] should not be equal in these cases, e.g.:

import numpy as np

a = [
    [2, 1],
    [1, 2],
]

row_norms = np.linalg.norm(a, axis=1)
indices1 = np.argsort(row_norms)[::-1]
print(indices1) # [1, 0]
indices2 = np.argsort(-row_norms)
print(indices2) # [0, 1]

In the end it does not really matter which way you sort the rows with equal norms, just as long as you are consistent throughout your data.

lauri-codes · 2022-08-01T19:15:49Z

Now fixed in v1.2.2.

lauri-codes closed this as completed Aug 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`CoulombMatrix(permutation="sorted_l2")` is not symmetric #89

`CoulombMatrix(permutation="sorted_l2")` is not symmetric #89

juntyr commented Jun 20, 2022

lauri-codes commented Jun 27, 2022

lauri-codes commented Jul 20, 2022

juntyr commented Jul 25, 2022

lauri-codes commented Jul 27, 2022

lauri-codes commented Aug 1, 2022

CoulombMatrix(permutation="sorted_l2") is not symmetric #89

CoulombMatrix(permutation="sorted_l2") is not symmetric #89

Comments

juntyr commented Jun 20, 2022

lauri-codes commented Jun 27, 2022

lauri-codes commented Jul 20, 2022

juntyr commented Jul 25, 2022

lauri-codes commented Jul 27, 2022

lauri-codes commented Aug 1, 2022

`CoulombMatrix(permutation="sorted_l2")` is not symmetric #89

`CoulombMatrix(permutation="sorted_l2")` is not symmetric #89