Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimized Dynamical Matrix command #1314

Merged
merged 14 commits into from
Feb 28, 2019

Conversation

charlessievers
Copy link
Collaborator

@charlessievers charlessievers commented Feb 2, 2019

Purpose

Important Note by @akohlmey: The third_order command has been removed from this PR and will be added in a later PR, since it still needs work, while dynamical_matrix seems to work as expected, is valgrind clean and ready to be merged (and a nice complement to fix phonon).

I added two commands to lammps. One that calculates the dynamical matrix and one that calculates the third order equivalent of the dynamical matrix (excluding division by mass). These commands will be useful to anyone who would like to calculate these matrices using the numerous force fields that are implemented in lammps.

closes #1304

Author(s)

Charlie Sievers
UC Davis PhD Candidate
Donadio Lab

Backward Compatibility

Backwards compatible up to two years.

Implementation Notes

Implemented for make mpi.

The matrices are calculated using a finite difference method and have been MPI paralellized.

Post Submission Checklist

  • The feature or features in this pull request is complete
  • Suitable new documentation files and/or updates to the existing docs are included
  • One or more example input decks are included
  • The source code follows the LAMMPS formatting guidelines

Further Information, Files, and Links

Within the directory lammps/examples/USER/phonon/ there are examples and manuals which describe how to use the implemented commands.

Information on dynamical matrices: http://faculty.virginia.edu/esfarjani/UVA/Teaching_files/phonons.pdf

@charlessievers charlessievers changed the title Optimized (but not working) Dynamical Matrix command) Optimized Dynamical Matrix command Feb 2, 2019
@charlessievers
Copy link
Collaborator Author

charlessievers commented Feb 2, 2019

I guess I am too stubborn. Is there a flag to tell if an atom is a ghost or local? I am double counting atoms when I take into consideration the ghosts for local_idx.

@akohlmey
Copy link
Member

akohlmey commented Feb 2, 2019 via email

@charlessievers
Copy link
Collaborator Author

An atom is local if it's index is smaller than atom->nlocal.

Great, that is what I did on my outdated code. I am glad to know that it is always true.

@charlessievers
Copy link
Collaborator Author

Alright, I made it work. I will admit that it is better in every possible way. I still need to implement the group functionality, rewrite the third order code, and update the examples. Thank you for pushing me and improving my product.

Copy link
Member

@akohlmey akohlmey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've looked over this version of the code and made some comments and suggestions for changes.

double **f = atom->f;

//initialize dynmat to all zeros
for (int i=0; i < dynlen; i++)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dynlen is of type bigint, so should be i and j here.


if (comm->me == 0 && screen) fprintf(screen,"Calculating Dynamical Matrix...\n");

for (int i=1; i<=natoms; i++){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

natoms is of type bigint, so should be i here.

for (int j=0; j < dynlen; j++)
dynmat[i][j] = 0.;

energy_force(0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why call energy_force() here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question. I believe that it was for testing purposes initially. Any case, it is outdated.


// compute all forces
force_clear();
external_force_clear = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The steps below are essentially what is in energy_force(), is it not?
I would measure the duration of the force computation, as that can be used as an estimate for how long it will take to complete the command (wall_time of one force computation * 6 * dynlen * dynlen, if i am not mistaken).

}


memory->create(final_dynmat,int(dynlen),int(dynlen),"dynamic_matrix_buffer:buf");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This limits dynlen to be a 32-bit only integer. Now, it is going to be quite slow to compute the dynamical matrix for a very large system, but then there should be a test at the beginning checking that dynlen isn't larger than MAXSMALLINT. Then it could also be made an int and loop indices correspondingly.

int scaleflag;
int me;
bigint dynlen;
double **dynmat;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dynmat does not have to be a class member, and it does not have to be a full matrix but a 3 x dynlen size matrix, since the matrix is constructed 3 rows at a time.

Copy link
Collaborator Author

@charlessievers charlessievers Feb 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I will make it in the local scope.

neighbor->delay = 1;
neighbor->ago = 0;
neighbor->ndanger = 0;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is an idea for how you can add group support:

  • count the number of atoms in the group and store it as a class member. that number replaces atom->natoms to determine the per-atom loop count in the computation (nelem)
  • allocate a vector idxmap[nelem] and a vector localmap[atom->nmax] on every MPI rank and zero them
  • on each MPI rank do a loop from 0 to atoms->nlocal(exclusive) and if the loop index i is member of the group add atom->tag[i] to localmap[]
  • on MPI rank zero, copy the non-zero elements of localmap[] to idxmap[]
  • now loop over all MPI ranks and have them send their localmap to the one on rank 0, and append the non-zero elements to idxmap[]
  • now sort idxmap[] (if desired) and broadcast it to all MPI ranks. use the content of idxmap instead of the loop from 1 to atom->natoms(inclusive) for the loops in the dynamical matrix calculation.
  • when outputting the dynamical matrix, also provide this list, so that the matrix elements can be identified by atom id and direction.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe I already did this exact solution.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope. my suggestion will reduce the amount of memory used, yours will always require storage of the dimension natoms.

return negative gradient for nextra_global dof in fextra
------------------------------------------------------------------------- */

void DynamicalMatrix::energy_force(int resetflag)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

calling this function energy_force() is a misnomer, since you are setting eflag and vflag to zero, and thus do not tally energy or virial contributions.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just wanted to make it look similar to other class methods. Would you have me change other classes?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is an "internal" function, its name only has to make sense within this class.

if (local_idx >= 0){
for (int alpha=0; alpha<3; alpha++){
displace_atom(local_idx, alpha, 1);
energy_force(0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

energy_force() must be called by all MPI ranks, not only those owning the displaced atom.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have fallen for this trap enough times. I have fixed this.

dynmat[(i-1)*3+alpha][(j-1)*3+beta] *= conversion;
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if dynmat changed to be a (local) 3 x dynlen matrix, then it must be reduced here, and then added to the global dynamical matrix. This would reduce the memory consumption for large systems significantly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Memory consumption > MPI overhead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. you can reduce memory use even more by outputting the dynamical matrix here directly. in LAMMPS we try to distribute data as much as possible, and thus we try to avoid allocations of size atom->natoms (but that is not always possible). atom->natoms with a constant factor is similar, but atom->natoms * atom->natoms is a much bigger problems, as memory use now grows quadratic with the number of atoms and cannot be reduced by parallelization.
...and if quadratic size cannot be avoided, then at least it should be only on one MPI rank (i.e. use MPI_Reduce() instead of MPI_Allreduce()), i.e. the one writing the file.

The whole matrix would really only be needed, e.g. if you include some postprocessing, and even then with additional effort, that could be distributed.

But then again, any size of a system, where this would matter, is probably taking such a long time to compute, that nobody would want to do it in the first place. Hence my suggestion, to run the force computation once to record the time it take for one force compute and then write out an estimate for the total time it would take to compute the matrix.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should definitely do MPI_Reduce instead of all reduce. I wrote that back when I was learning MPI...

But then again, any size of a system, where this would matter, is probably taking such a long time to compute, that nobody would want to do it in the first place. Hence my suggestion, to run the force computation once to record the time it take for one force compute and then write out an estimate for the total time it would take to compute the matrix.

Should I make a check_comp_time method?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems a bit overkill to me. i would just leave the one call to the force computation in the setup() method (that seems the best place for it) and measure its time.

if you then use the following code (adapted from src/run.cpp:

   timer->init();
    timer->barrier_start();
    calculateMatrix();
    timer->barrier_stop();

If you keep the timer calls in the force routine, as they were in verlet.cpp or similar, then you have a nice breakdown of how the time is spent on the different force computations, in the summary output from the Finish class.

The call in the setup function could in the future also be used to get per MPI rank timer information to be used for load balancing, although that is something that can be done outside the command using a run 0 and a balance command. There are lots of additional 'gimmicks' that can be added later, a timeout (for when running with batch systems), a restart feature (for very large systems) etc.

@akohlmey
Copy link
Member

akohlmey commented Feb 3, 2019

Alright, I made it work. I will admit that it is better in every possible way. I still need to implement the group functionality, rewrite the third order code, and update the examples. Thank you for pushing me and improving my product.

I understand how you feel. I have been in the same situation many times myself and it took quite a while to manage the resulting frustration. Later I have learned to appreciate, that somebody saying "no" to my contributions doesn't mean "no" to myself, but that there is value in the rejection by leading to an overall better effort. In an open source effort like LAMMPS, it is particularly difficult to say "no" to a submission (unless it is complete and obvious garbage), since the well-being of the project is dependent on contributions, and there should be a reward for people giving their changes/additions back to the project (instead of keeping them to themselves).

//find number of local atoms in the group (final_gid)
for (int i=1; i<=natoms; i++){
local_idx = atom->map(i);
if (mask[local_idx] & groupbit && local_idx < nlocal)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not account for local_idx < 0 for atoms that are not owned and will thus have illegal memory access. Try:

if ((local_idx >= 0) && (local_idx < nlocal) && mask[local_idx] & groupbit) 


//combine subgroup maps into total temporary groupmap
MPI_Allgatherv(sub_groupmap,gid,MPI_INT,temp_groupmap,recv,displs,MPI_INT,world);
std::sort(temp_groupmap,temp_groupmap+group->count(igroup));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have to add #include <algorithm> if you want to use std::sort(). As an alternative, you could also use our internal c-style sort from the mergesort.h header.

double conv_distance;
double conv_mass;
double del;
int igroup,groupbit;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should also store the result of group->count(igroup) here, since the count function incurs an MPI collective operation.

@akohlmey
Copy link
Member

akohlmey commented Feb 7, 2019

@charlessievers please put a note here on github, when would like me to do the next pass of reviewing your contribution. This would then also include compiling the code and running the provided examples with an instrumented binary and/or using valgrind's memcheck tool.

@charlessievers
Copy link
Collaborator Author

@akohlmey that would be great. I am currently making three more examples. I should be done with them by tomorrow.

@akohlmey akohlmey dismissed their stale review February 28, 2019 19:47

requested changes are included

@akohlmey
Copy link
Member

@charlessievers i looked over your latest changes and made some corrections, and now it appears that the dynamical_matrix command is ready to be merged, but the third_order command is not. I converted the docs for the former and integrated them properly into the manual, but didn't do it for third_order (those should be merged into the file added for dynamical_matrix. I fixed a major issue with third_order causing segmentation faults, but the output between serial and parallel runs differs a lot and i cannot tell currently, if this is to be expected or not. For dynamical_matrix, one can see, that those a minor numerical differences that are within the scale of what is expected from during finite differences and summing up forces differently.

Please let me know ASAP, how you want to proceed. I am planning to make a patch release later today, and this could be included or would have to wait (possibly 3-4 more weeks) until the next patch and it is completely functional.

@charlessievers
Copy link
Collaborator Author

I would like to merge the dynamical matrix command. I will keep working on the third order. How would you like me to proceed.

@akohlmey
Copy link
Member

I would like to merge the dynamical matrix command. I will keep working on the third order. How would you like me to proceed.

i'll just delete the third order code from this branch and ask other LAMMPS developers to have a look and approve the merge. I will also delete the examples using ASE (for now). They should be accompanied with instructions about how to get and set up ASE and how to run those inputs. I know a little bit about this so I know how to deal with it, but other LAMMPS users will be confused. I am not opposing it (in fact, we would very uch welcome a small tutorial addition to the LAMMPS manual describing how to use LAMMPS through ASE), but we want to avoid to include anything that is not properly documented.

after the merge (and patch release) you can then create a new branch re-add the content, that i removed, and submit a new pull request and we keep working on that until it is ready to be merged.

@charlessievers
Copy link
Collaborator Author

I would like to merge the dynamical matrix command. I will keep working on the third order. How would you like me to proceed.

i'll just delete the third order code from this branch and ask other LAMMPS developers to have a look and approve the merge. I will also delete the examples using ASE (for now). They should be accompanied with instructions about how to get and set up ASE and how to run those inputs. I know a little bit about this so I know how to deal with it, but other LAMMPS users will be confused. I am not opposing it (in fact, we would very uch welcome a small tutorial addition to the LAMMPS manual describing how to use LAMMPS through ASE), but we want to avoid to include anything that is not properly documented.

after the merge (and patch release) you can then create a new branch re-add the content, that i removed, and submit a new pull request and we keep working on that until it is ready to be merged.

Perfect, will do!

@akohlmey
Copy link
Member

Perfect, will do!

great. please make sure you pull the branch now, so you have my changes and bugfixes also for third_order and the documentation integration.

@charlessievers
Copy link
Collaborator Author

great. please make sure you pull the branch now, so you have my changes and bugfixes also for third_order and the documentation integration.

Just pulled.

@akohlmey
Copy link
Member

great. please make sure you pull the branch now, so you have my changes and bugfixes also for third_order and the documentation integration.

Just pulled.

one more thing, can you give me a "permanent" e-mail address, that i can add to the readme in case people (or the LAMMPS developers) need to contact you about your contribution?
institutional e-mails often expire after people graduate or find a new job, and then it can be difficult to get hold of people.

thanks,
axel.

@charlessievers
Copy link
Collaborator Author

one more thing, can you give me a "permanent" e-mail address, that i can add to the readme in case people (or the LAMMPS developers) need to contact you about your contribution?
institutional e-mails often expire after people graduate or find a new job, and then it can be difficult to get hold of people.

thanks,
axel.

charliesievers@cox.net

Best,
Charlie

@akohlmey akohlmey merged commit 14e6c12 into lammps:master Feb 28, 2019
Copy link
Contributor

@athomps athomps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this has already been merged by @akohlmey but I have a couple of comments anyway.

  1. Well done @charlessievers ! This is a great addition, something that has been talked about for years
  2. What about providing an option to report instead the Hessian d2U/dR_j^2 in units of energy/distance^2?
  3. It would be better if you could avoid hard-coding your own unit conversion factors and instead use built in lammps constants e.g. force->mvv2e.
  4. Why is dynamical matrix always in units of 10 J/mol/A^2/g/mol. It would be more LAMMPSific if the units changed with unit_style?

@akohlmey
Copy link
Member

akohlmey commented Mar 1, 2019 via email

@athomps
Copy link
Contributor

athomps commented Mar 1, 2019

I just looked at this again. I see that it uses O(N) memory by writing one row at a time. This is very good. Also, because it does not use a neighbor list, it should work for any potential, including things like EAM, kspace, maybe even QEQ, all of which are non-trivial. Well done @charlessievers and @akohlmey

@athomps
Copy link
Contributor

athomps commented Mar 1, 2019

For systems that are bigger than the interaction cutoff, the number of non-zero entries in each row is roughly constant i.e. the matrix sparsity ~ 1/L^3. This could be handled optionally by only writing non-zero entries in each row, preceded by a list of column indices for that row. Sparseness could also be exploited in the computation, by applying a distance check, or neighbor check, but it is not easy to ensure correctness for all potentials.

@martok
Copy link
Collaborator

martok commented Mar 6, 2019

Very good addition, and just in time for when I needed it 👍

The command does have a few issues, some of which are easy to fix. Should I open a PR, or would you prefer doing it all in one batch with the third_order part? @charlessievers @akohlmey

@akohlmey
Copy link
Member

akohlmey commented Mar 6, 2019

@martok i generally prefer doing things in an incremental fashion. Besides, if you post a pull request we can either merge it to the master branch directly or @charlessievers can merge it first into his development branch depending on how much time he has available to test and integrate changes and what seems easier to do. My time is going to be somewhat limited for the next couple of weeks due to other projects.

@martok
Copy link
Collaborator

martok commented Mar 6, 2019

Makes sense!
Don't worry about merge time, I just want to put this somewhere public to avoid creating wildly different branches.

@martok martok mentioned this pull request Mar 6, 2019
13 tasks
@charlessievers
Copy link
Collaborator Author

charlessievers commented Mar 6, 2019

Hello everyone,

Thank you for your comments and support. Sorry I have not been responding, I gave a departmental talk on thermal devices this week.

@athomps

What about providing an option to report instead the Hessian d2U/dR_j^2 in units of energy/distance^2?

A hessian option is a great idea, I will go ahead and implement that when I get some time!

It would be better if you could avoid hard-coding your own unit conversion factors and instead use built in lammps constants e.g. force->mvv2e.

Hard-coded unit conversions are only set in the ESKM style. I too would prefer to avoid them.

Why is dynamical matrix always in units of 10 J/mol/A^2/g/mol. It would be more LAMMPSific if the units changed with unit_style?

The dynamical matrix only uses these units in the ESKM style. The square root of the eigenvalues divided by 2pi give you THz with said units. Also, ESKM -- an in house lattice dynamics code -- uses dynamical matrices with said units.

Sparseness could also be exploited in the computation, by applying a distance check, or neighbor check, but it is not easy to ensure correctness for all potentials.

Great idea! Some brainstorming will definitely be required.

@martok

The command does have a few issues, some of which are easy to fix.

I just learned how to program this year, so there are bound to be some issues. I appreciate your time and patience to improve the command.

Best,
Charlie

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants