Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_voronoi_neighbors exhausts memory #84

Closed
mhangaard opened this issue Feb 13, 2019 · 7 comments
Closed

get_voronoi_neighbors exhausts memory #84

mhangaard opened this issue Feb 13, 2019 · 7 comments

Comments

@mhangaard
Copy link
Contributor

This seems to happen for high aspect ratio unit cells with many atoms, when the call is made to scipy.spatial.Voronoi. Can something be done to either:

  • Estimate memory usage before running and raise MemoryError if too intensive?
  • Break the problem down to something less memory intensive?
    The current behavior risks exhausting memory and cause a freeze.
@jboes
Copy link
Collaborator

jboes commented Mar 9, 2019

Thanks for letting me know. Can you provide an example unit cell and usage? I'll look into it.

@mhangaard
Copy link
Contributor Author

mhangaard commented Mar 12, 2019

Here is my example:

import ase.db
from catkit.gen.utils.connectivity import get_voronoi_neighbors

# Connect the ase-db.
c = ase.db.connect('../input/some_li_atoms.db')
s = c.select()

# Connect to output.
c_cm = ase.db.connect('../input/some_atoms_with_cm_2.db')

N = 0
for row in s:
    # Get atoms.
    atoms = row.toatoms()
    # Get connectivity.
    cm = get_voronoi_neighbors(atoms)
    # Write to ase db.
    atoms.connectivity = cm
    c_cm.write(atoms, key_value_pairs=row.key_value_pairs,
               data={'connectivity': cm})
    N += 1
    if N % 100 == 0:
        print(N)
print('pulled {} structures from db'.format(N))

I'll send you the data directly

@jboes
Copy link
Collaborator

jboes commented Mar 12, 2019

Ok, looks like this is being caused because too many atoms are being passed to qhull. This is because of the automatic expansion of the cell from catkit.gen.utils.expand_cell is either not intelligent enough, or there is simply no way to guarantee correct bonding identification of these "needle-like" structure.

Possible solutions:

  • If it's a bulk structure, standardization of the cell will make it more orthogonal, and thus to get a proper number of atoms surrounding all atoms in the cell.
  • Manually specify the number of unit cell repetitions. I've exposed the padding kwarg which will be passed to the catkit.gen.utils.expand_cell function, allowing the user to set their own repetitions of the cell. Only recommended for expert users, as this could result in returning incorrect connectivity.
  • There may be a more intelligent way to get smaller padding that still guarantees correct bonding. I can look into this option further.

Currently, there is a warning that provides some instruction if the repeated cell returned is quite large. This will give the user some information on how to proceed until I can find out if a better padding solution exists. #85

@mhangaard
Copy link
Contributor Author

mhangaard commented Mar 20, 2019

Making the get_standardized_cell transformation on the atoms helps a lot, but does not solve the problem in all cases.

I am still getting some index from expand_cell that are over 100k in length.

import ase.db
from catkit.gen.utils.connectivity import get_voronoi_neighbors
from catkit.gen.symmetry import get_standardized_cell
import warnings


# Connect the ase-db.
c = ase.db.connect('../input/some_li_atoms.db')
s = c.select()

# Connect to output.
c_cm = ase.db.connect('../input/some_atoms_with_cm.db')

N = 0
for row in s:
    # Get atoms.
    atoms = row.toatoms()

    # Conventional standard cell.
    atoms = get_standardized_cell(atoms, primitive=False)
    if len(atoms) != int(row.natoms):
        warnings.warn(str(row.natoms) + ' != ' + str(len(atoms)))

    # Get connectivity.
    cm = get_voronoi_neighbors(atoms)

    # Write to ase db.
    c_cm.write(atoms, key_value_pairs=row.key_value_pairs,
               data={'connectivity': cm})

    # Print progress.
    N += 1
    if N % 100 == 0:
        print(N)
print('pulled {} structures from db'.format(N))

If I also make the following check, after standardizing the cell, most of my structures make it through and the most unreasonable ones get skipped.

    # Check size of expanded cell.
    index, coords, offsets = expand_cell(atoms)
    if len(index) > 30000:
        warnings.warn(str(len(index)))
        continue

but the limit should be either user input or be estimated from the available memory with psutil. I just don't know the relation between len(index) and required memory. It's a property of qhull.

@jboes
Copy link
Collaborator

jboes commented Mar 20, 2019

Having a user-defined memory flag is going to be a really ugly feature. If most other nearest-neighbor functions from other programs are working for this, there must be a simpler solution which involves simply making the algorithm more efficient. I'll look into this in the very near future.

@mhangaard mhangaard changed the title get_voronoi_neighbors spends enormous memory get_voronoi_neighbors exhausts memory Mar 21, 2019
@jboes
Copy link
Collaborator

jboes commented Mar 22, 2019

Progress so far:

memory profiling on the bulk structures provided. Values reported for structures requiring over 5 MiB:

Primitive form:
MiB memory used: [ 41.9 58.2 175.4]
Structure ID: [190 12 101]

Standard form:
MiB memory used: [ 9.9 10.9 11.1 12.1 13.6 14.8 16.2 20. 28.6]
Structure ID: [185 100 207 222 179 218 149 163 190]

Taking the primitive help more often than standard, but when the primitive form does make it worse, it's really bad. Wouldn't expect 30MiB to crash any modern system, but it more than likely doesn't need to be this expensive.

This function gets called by every structure generator, and sometimes multiple times, so its efficiency is important.

The worst structure (shows up in both cases above) is 190:
sample-190

The automatically assigned padding to the extended cell is [7, 7, 7], but [1, 1, 1] will produce the same connectivity, so the current automatically assigned padding is excessive.

@jboes
Copy link
Collaborator

jboes commented Mar 22, 2019

Ok, a user-defined cutoff radius is now the default for determining the number of expansions to the cell, similar to ASE. The default cutoff is 5 angstroms, which is probably more than sufficient for any structure which using a Voronoi method is actually likely to lead to a reasonable representation of the connectivity matrix.

Gives the same connectivity matrix for all provided examples and reduces the memory requirement to less than 0.1 MiB for all cases. #86

@jboes jboes closed this as completed Apr 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants