Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimized Dynamical Matrix command #1314

Merged
merged 14 commits into from Feb 28, 2019

Conversation

@charlessievers
Copy link
Collaborator

commented Feb 2, 2019

Purpose

Important Note by @akohlmey: The third_order command has been removed from this PR and will be added in a later PR, since it still needs work, while dynamical_matrix seems to work as expected, is valgrind clean and ready to be merged (and a nice complement to fix phonon).

I added two commands to lammps. One that calculates the dynamical matrix and one that calculates the third order equivalent of the dynamical matrix (excluding division by mass). These commands will be useful to anyone who would like to calculate these matrices using the numerous force fields that are implemented in lammps.

closes #1304

Author(s)

Charlie Sievers
UC Davis PhD Candidate
Donadio Lab

Backward Compatibility

Backwards compatible up to two years.

Implementation Notes

Implemented for make mpi.

The matrices are calculated using a finite difference method and have been MPI paralellized.

Post Submission Checklist

  • The feature or features in this pull request is complete
  • Suitable new documentation files and/or updates to the existing docs are included
  • One or more example input decks are included
  • The source code follows the LAMMPS formatting guidelines

Further Information, Files, and Links

Within the directory lammps/examples/USER/phonon/ there are examples and manuals which describe how to use the implemented commands.

Information on dynamical matrices: http://faculty.virginia.edu/esfarjani/UVA/Teaching_files/phonons.pdf

@charlessievers charlessievers changed the title Optimized (but not working) Dynamical Matrix command) Optimized Dynamical Matrix command Feb 2, 2019

@charlessievers

This comment has been minimized.

Copy link
Collaborator Author

commented Feb 2, 2019

I guess I am too stubborn. Is there a flag to tell if an atom is a ghost or local? I am double counting atoms when I take into consideration the ghosts for local_idx.

@akohlmey

This comment has been minimized.

Copy link
Member

commented Feb 2, 2019

@charlessievers

This comment has been minimized.

Copy link
Collaborator Author

commented Feb 2, 2019

An atom is local if it's index is smaller than atom->nlocal.

Great, that is what I did on my outdated code. I am glad to know that it is always true.

charlie sievers charlie sievers
@charlessievers

This comment has been minimized.

Copy link
Collaborator Author

commented Feb 2, 2019

Alright, I made it work. I will admit that it is better in every possible way. I still need to implement the group functionality, rewrite the third order code, and update the examples. Thank you for pushing me and improving my product.

@akohlmey
Copy link
Member

left a comment

I've looked over this version of the code and made some comments and suggestions for changes.

double **f = atom->f;

//initialize dynmat to all zeros
for (int i=0; i < dynlen; i++)

This comment has been minimized.

Copy link
@akohlmey

akohlmey Feb 3, 2019

Member

dynlen is of type bigint, so should be i and j here.


if (comm->me == 0 && screen) fprintf(screen,"Calculating Dynamical Matrix...\n");

for (int i=1; i<=natoms; i++){

This comment has been minimized.

Copy link
@akohlmey

akohlmey Feb 3, 2019

Member

natoms is of type bigint, so should be i here.

for (int j=0; j < dynlen; j++)
dynmat[i][j] = 0.;

energy_force(0);

This comment has been minimized.

Copy link
@akohlmey

akohlmey Feb 3, 2019

Member

why call energy_force() here?

This comment has been minimized.

Copy link
@charlessievers

charlessievers Feb 3, 2019

Author Collaborator

Great question. I believe that it was for testing purposes initially. Any case, it is outdated.


// compute all forces
force_clear();
external_force_clear = 0;

This comment has been minimized.

Copy link
@akohlmey

akohlmey Feb 3, 2019

Member

The steps below are essentially what is in energy_force(), is it not?
I would measure the duration of the force computation, as that can be used as an estimate for how long it will take to complete the command (wall_time of one force computation * 6 * dynlen * dynlen, if i am not mistaken).

}


memory->create(final_dynmat,int(dynlen),int(dynlen),"dynamic_matrix_buffer:buf");

This comment has been minimized.

Copy link
@akohlmey

akohlmey Feb 3, 2019

Member

This limits dynlen to be a 32-bit only integer. Now, it is going to be quite slow to compute the dynamical matrix for a very large system, but then there should be a test at the beginning checking that dynlen isn't larger than MAXSMALLINT. Then it could also be made an int and loop indices correspondingly.

int scaleflag;
int me;
bigint dynlen;
double **dynmat;

This comment has been minimized.

Copy link
@akohlmey

akohlmey Feb 3, 2019

Member

dynmat does not have to be a class member, and it does not have to be a full matrix but a 3 x dynlen size matrix, since the matrix is constructed 3 rows at a time.

This comment has been minimized.

Copy link
@charlessievers

charlessievers Feb 3, 2019

Author Collaborator

Right. I will make it in the local scope.

neighbor->delay = 1;
neighbor->ago = 0;
neighbor->ndanger = 0;

This comment has been minimized.

Copy link
@akohlmey

akohlmey Feb 3, 2019

Member

Here is an idea for how you can add group support:

  • count the number of atoms in the group and store it as a class member. that number replaces atom->natoms to determine the per-atom loop count in the computation (nelem)
  • allocate a vector idxmap[nelem] and a vector localmap[atom->nmax] on every MPI rank and zero them
  • on each MPI rank do a loop from 0 to atoms->nlocal(exclusive) and if the loop index i is member of the group add atom->tag[i] to localmap[]
  • on MPI rank zero, copy the non-zero elements of localmap[] to idxmap[]
  • now loop over all MPI ranks and have them send their localmap to the one on rank 0, and append the non-zero elements to idxmap[]
  • now sort idxmap[] (if desired) and broadcast it to all MPI ranks. use the content of idxmap instead of the loop from 1 to atom->natoms(inclusive) for the loops in the dynamical matrix calculation.
  • when outputting the dynamical matrix, also provide this list, so that the matrix elements can be identified by atom id and direction.

This comment has been minimized.

Copy link
@charlessievers

charlessievers Feb 3, 2019

Author Collaborator

I believe I already did this exact solution.

This comment has been minimized.

Copy link
@akohlmey

akohlmey Feb 3, 2019

Member

nope. my suggestion will reduce the amount of memory used, yours will always require storage of the dimension natoms.

return negative gradient for nextra_global dof in fextra
------------------------------------------------------------------------- */

void DynamicalMatrix::energy_force(int resetflag)

This comment has been minimized.

Copy link
@akohlmey

akohlmey Feb 3, 2019

Member

calling this function energy_force() is a misnomer, since you are setting eflag and vflag to zero, and thus do not tally energy or virial contributions.

This comment has been minimized.

Copy link
@charlessievers

charlessievers Feb 3, 2019

Author Collaborator

I just wanted to make it look similar to other class methods. Would you have me change other classes?

This comment has been minimized.

Copy link
@akohlmey

akohlmey Feb 3, 2019

Member

Since this is an "internal" function, its name only has to make sense within this class.

if (local_idx >= 0){
for (int alpha=0; alpha<3; alpha++){
displace_atom(local_idx, alpha, 1);
energy_force(0);

This comment has been minimized.

Copy link
@akohlmey

akohlmey Feb 3, 2019

Member

energy_force() must be called by all MPI ranks, not only those owning the displaced atom.

This comment has been minimized.

Copy link
@charlessievers

charlessievers Feb 3, 2019

Author Collaborator

I have fallen for this trap enough times. I have fixed this.

dynmat[(i-1)*3+alpha][(j-1)*3+beta] *= conversion;
}
}
}

This comment has been minimized.

Copy link
@akohlmey

akohlmey Feb 3, 2019

Member

if dynmat changed to be a (local) 3 x dynlen matrix, then it must be reduced here, and then added to the global dynamical matrix. This would reduce the memory consumption for large systems significantly.

This comment has been minimized.

Copy link
@charlessievers

charlessievers Feb 3, 2019

Author Collaborator

I see. Memory consumption > MPI overhead.

This comment has been minimized.

Copy link
@akohlmey

akohlmey Feb 3, 2019

Member

yes. you can reduce memory use even more by outputting the dynamical matrix here directly. in LAMMPS we try to distribute data as much as possible, and thus we try to avoid allocations of size atom->natoms (but that is not always possible). atom->natoms with a constant factor is similar, but atom->natoms * atom->natoms is a much bigger problems, as memory use now grows quadratic with the number of atoms and cannot be reduced by parallelization.
...and if quadratic size cannot be avoided, then at least it should be only on one MPI rank (i.e. use MPI_Reduce() instead of MPI_Allreduce()), i.e. the one writing the file.

The whole matrix would really only be needed, e.g. if you include some postprocessing, and even then with additional effort, that could be distributed.

But then again, any size of a system, where this would matter, is probably taking such a long time to compute, that nobody would want to do it in the first place. Hence my suggestion, to run the force computation once to record the time it take for one force compute and then write out an estimate for the total time it would take to compute the matrix.

This comment has been minimized.

Copy link
@charlessievers

charlessievers Feb 3, 2019

Author Collaborator

I should definitely do MPI_Reduce instead of all reduce. I wrote that back when I was learning MPI...

But then again, any size of a system, where this would matter, is probably taking such a long time to compute, that nobody would want to do it in the first place. Hence my suggestion, to run the force computation once to record the time it take for one force compute and then write out an estimate for the total time it would take to compute the matrix.

Should I make a check_comp_time method?

This comment has been minimized.

Copy link
@akohlmey

akohlmey Feb 3, 2019

Member

That seems a bit overkill to me. i would just leave the one call to the force computation in the setup() method (that seems the best place for it) and measure its time.

if you then use the following code (adapted from src/run.cpp:

   timer->init();
    timer->barrier_start();
    calculateMatrix();
    timer->barrier_stop();

If you keep the timer calls in the force routine, as they were in verlet.cpp or similar, then you have a nice breakdown of how the time is spent on the different force computations, in the summary output from the Finish class.

The call in the setup function could in the future also be used to get per MPI rank timer information to be used for load balancing, although that is something that can be done outside the command using a run 0 and a balance command. There are lots of additional 'gimmicks' that can be added later, a timeout (for when running with batch systems), a restart feature (for very large systems) etc.

@akohlmey

This comment has been minimized.

Copy link
Member

commented Feb 3, 2019

Alright, I made it work. I will admit that it is better in every possible way. I still need to implement the group functionality, rewrite the third order code, and update the examples. Thank you for pushing me and improving my product.

I understand how you feel. I have been in the same situation many times myself and it took quite a while to manage the resulting frustration. Later I have learned to appreciate, that somebody saying "no" to my contributions doesn't mean "no" to myself, but that there is value in the rejection by leading to an overall better effort. In an open source effort like LAMMPS, it is particularly difficult to say "no" to a submission (unless it is complete and obvious garbage), since the well-being of the project is dependent on contributions, and there should be a reward for people giving their changes/additions back to the project (instead of keeping them to themselves).

charlie sievers charlie sievers
//find number of local atoms in the group (final_gid)
for (int i=1; i<=natoms; i++){
local_idx = atom->map(i);
if (mask[local_idx] & groupbit && local_idx < nlocal)

This comment has been minimized.

Copy link
@akohlmey

akohlmey Feb 3, 2019

Member

This does not account for local_idx < 0 for atoms that are not owned and will thus have illegal memory access. Try:

if ((local_idx >= 0) && (local_idx < nlocal) && mask[local_idx] & groupbit) 

//combine subgroup maps into total temporary groupmap
MPI_Allgatherv(sub_groupmap,gid,MPI_INT,temp_groupmap,recv,displs,MPI_INT,world);
std::sort(temp_groupmap,temp_groupmap+group->count(igroup));

This comment has been minimized.

Copy link
@akohlmey

akohlmey Feb 3, 2019

Member

You have to add #include <algorithm> if you want to use std::sort(). As an alternative, you could also use our internal c-style sort from the mergesort.h header.

double conv_distance;
double conv_mass;
double del;
int igroup,groupbit;

This comment has been minimized.

Copy link
@akohlmey

akohlmey Feb 3, 2019

Member

you should also store the result of group->count(igroup) here, since the count function incurs an MPI collective operation.

casievers added some commits Feb 5, 2019

casievers
casievers
@akohlmey

This comment has been minimized.

Copy link
Member

commented Feb 7, 2019

@charlessievers please put a note here on github, when would like me to do the next pass of reviewing your contribution. This would then also include compiling the code and running the provided examples with an instrumented binary and/or using valgrind's memcheck tool.

@charlessievers

This comment has been minimized.

Copy link
Collaborator Author

commented Feb 8, 2019

@akohlmey that would be great. I am currently making three more examples. I should be done with them by tomorrow.

casievers and others added some commits Feb 12, 2019

@charlessievers charlessievers requested a review from rbberger as a code owner Feb 28, 2019

requested changes are included

@akohlmey

This comment has been minimized.

Copy link
Member

commented Feb 28, 2019

@charlessievers i looked over your latest changes and made some corrections, and now it appears that the dynamical_matrix command is ready to be merged, but the third_order command is not. I converted the docs for the former and integrated them properly into the manual, but didn't do it for third_order (those should be merged into the file added for dynamical_matrix. I fixed a major issue with third_order causing segmentation faults, but the output between serial and parallel runs differs a lot and i cannot tell currently, if this is to be expected or not. For dynamical_matrix, one can see, that those a minor numerical differences that are within the scale of what is expected from during finite differences and summing up forces differently.

Please let me know ASAP, how you want to proceed. I am planning to make a patch release later today, and this could be included or would have to wait (possibly 3-4 more weeks) until the next patch and it is completely functional.

@charlessievers

This comment has been minimized.

Copy link
Collaborator Author

commented Feb 28, 2019

I would like to merge the dynamical matrix command. I will keep working on the third order. How would you like me to proceed.

@akohlmey

This comment has been minimized.

Copy link
Member

commented Feb 28, 2019

I would like to merge the dynamical matrix command. I will keep working on the third order. How would you like me to proceed.

i'll just delete the third order code from this branch and ask other LAMMPS developers to have a look and approve the merge. I will also delete the examples using ASE (for now). They should be accompanied with instructions about how to get and set up ASE and how to run those inputs. I know a little bit about this so I know how to deal with it, but other LAMMPS users will be confused. I am not opposing it (in fact, we would very uch welcome a small tutorial addition to the LAMMPS manual describing how to use LAMMPS through ASE), but we want to avoid to include anything that is not properly documented.

after the merge (and patch release) you can then create a new branch re-add the content, that i removed, and submit a new pull request and we keep working on that until it is ready to be merged.

@charlessievers

This comment has been minimized.

Copy link
Collaborator Author

commented Feb 28, 2019

I would like to merge the dynamical matrix command. I will keep working on the third order. How would you like me to proceed.

i'll just delete the third order code from this branch and ask other LAMMPS developers to have a look and approve the merge. I will also delete the examples using ASE (for now). They should be accompanied with instructions about how to get and set up ASE and how to run those inputs. I know a little bit about this so I know how to deal with it, but other LAMMPS users will be confused. I am not opposing it (in fact, we would very uch welcome a small tutorial addition to the LAMMPS manual describing how to use LAMMPS through ASE), but we want to avoid to include anything that is not properly documented.

after the merge (and patch release) you can then create a new branch re-add the content, that i removed, and submit a new pull request and we keep working on that until it is ready to be merged.

Perfect, will do!

@akohlmey

This comment has been minimized.

Copy link
Member

commented Feb 28, 2019

Perfect, will do!

great. please make sure you pull the branch now, so you have my changes and bugfixes also for third_order and the documentation integration.

@charlessievers

This comment has been minimized.

Copy link
Collaborator Author

commented Feb 28, 2019

great. please make sure you pull the branch now, so you have my changes and bugfixes also for third_order and the documentation integration.

Just pulled.

@akohlmey

This comment has been minimized.

Copy link
Member

commented Feb 28, 2019

great. please make sure you pull the branch now, so you have my changes and bugfixes also for third_order and the documentation integration.

Just pulled.

one more thing, can you give me a "permanent" e-mail address, that i can add to the readme in case people (or the LAMMPS developers) need to contact you about your contribution?
institutional e-mails often expire after people graduate or find a new job, and then it can be difficult to get hold of people.

thanks,
axel.

@charlessievers

This comment has been minimized.

Copy link
Collaborator Author

commented Feb 28, 2019

one more thing, can you give me a "permanent" e-mail address, that i can add to the readme in case people (or the LAMMPS developers) need to contact you about your contribution?
institutional e-mails often expire after people graduate or find a new job, and then it can be difficult to get hold of people.

thanks,
axel.

charliesievers@cox.net

Best,
Charlie

@akohlmey akohlmey requested review from sjplimp, stanmoore1 and athomps Feb 28, 2019

akohlmey added some commits Feb 28, 2019

@akohlmey akohlmey merged commit 14e6c12 into lammps:master Feb 28, 2019

@athomps
Copy link
Contributor

left a comment

It looks like this has already been merged by @akohlmey but I have a couple of comments anyway.

  1. Well done @charlessievers ! This is a great addition, something that has been talked about for years
  2. What about providing an option to report instead the Hessian d2U/dR_j^2 in units of energy/distance^2?
  3. It would be better if you could avoid hard-coding your own unit conversion factors and instead use built in lammps constants e.g. force->mvv2e.
  4. Why is dynamical matrix always in units of 10 J/mol/A^2/g/mol. It would be more LAMMPSific if the units changed with unit_style?
@akohlmey

This comment has been minimized.

Copy link
Member

commented Mar 1, 2019

@athomps

This comment has been minimized.

Copy link
Contributor

commented Mar 1, 2019

I just looked at this again. I see that it uses O(N) memory by writing one row at a time. This is very good. Also, because it does not use a neighbor list, it should work for any potential, including things like EAM, kspace, maybe even QEQ, all of which are non-trivial. Well done @charlessievers and @akohlmey

@athomps

This comment has been minimized.

Copy link
Contributor

commented Mar 1, 2019

For systems that are bigger than the interaction cutoff, the number of non-zero entries in each row is roughly constant i.e. the matrix sparsity ~ 1/L^3. This could be handled optionally by only writing non-zero entries in each row, preceded by a list of column indices for that row. Sparseness could also be exploited in the computation, by applying a distance check, or neighbor check, but it is not easy to ensure correctness for all potentials.

@martok

This comment has been minimized.

Copy link
Collaborator

commented Mar 6, 2019

Very good addition, and just in time for when I needed it 👍

The command does have a few issues, some of which are easy to fix. Should I open a PR, or would you prefer doing it all in one batch with the third_order part? @charlessievers @akohlmey

@akohlmey

This comment has been minimized.

Copy link
Member

commented Mar 6, 2019

@martok i generally prefer doing things in an incremental fashion. Besides, if you post a pull request we can either merge it to the master branch directly or @charlessievers can merge it first into his development branch depending on how much time he has available to test and integrate changes and what seems easier to do. My time is going to be somewhat limited for the next couple of weeks due to other projects.

@martok

This comment has been minimized.

Copy link
Collaborator

commented Mar 6, 2019

Makes sense!
Don't worry about merge time, I just want to put this somewhere public to avoid creating wildly different branches.

@martok martok referenced this pull request Mar 6, 2019
7 of 13 tasks complete
@charlessievers

This comment has been minimized.

Copy link
Collaborator Author

commented Mar 6, 2019

Hello everyone,

Thank you for your comments and support. Sorry I have not been responding, I gave a departmental talk on thermal devices this week.

@athomps

What about providing an option to report instead the Hessian d2U/dR_j^2 in units of energy/distance^2?

A hessian option is a great idea, I will go ahead and implement that when I get some time!

It would be better if you could avoid hard-coding your own unit conversion factors and instead use built in lammps constants e.g. force->mvv2e.

Hard-coded unit conversions are only set in the ESKM style. I too would prefer to avoid them.

Why is dynamical matrix always in units of 10 J/mol/A^2/g/mol. It would be more LAMMPSific if the units changed with unit_style?

The dynamical matrix only uses these units in the ESKM style. The square root of the eigenvalues divided by 2pi give you THz with said units. Also, ESKM -- an in house lattice dynamics code -- uses dynamical matrices with said units.

Sparseness could also be exploited in the computation, by applying a distance check, or neighbor check, but it is not easy to ensure correctness for all potentials.

Great idea! Some brainstorming will definitely be required.

@martok

The command does have a few issues, some of which are easy to fix.

I just learned how to program this year, so there are bound to be some issues. I appreciate your time and patience to improve the command.

Best,
Charlie

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.