Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Molecular dynamics job #134

Merged
merged 45 commits into from Jul 15, 2022
Merged

Molecular dynamics job #134

merged 45 commits into from Jul 15, 2022

Conversation

mjwen
Copy link
Member

@mjwen mjwen commented May 5, 2022

Add basic MD job, based on

  • MPMDSet in pymatgen
  • MDFlow in atomate1

Copy link
Member

@utf utf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @mjwen, thanks for the PR. I realise this is a WIP but here are some initial comments.

``{"my_file:txt": "contents of the file"}``.
"""

# TODO expose basic setting of INCAR parameters to user
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't put these controls in the job maker, instead put them in the input set generator to be consistent with the other input sets.

Also, I'd advise against exposing the basic incar settings as those are easily overriden using user_incar_settings. Instead you could add a several "presets", specified by a string, e.g., "nvt", "npt", "long" or whatever you think is useful

# use VASP default ENCUT
update["ENCUT"] = None

if defaults["ISPIN"] == 1:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what defaults is here?

Instead of this, you might want to set auto_ispin: bool = True in this dataclass. See the docstring for that setting here:

And this is an example of where I've turned it on:

auto_ispin: bool = True


# generate transmuter job
job = MDMaker(
transformations=["SupercellTransformation"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you don't need these options?

@mjwen
Copy link
Member Author

mjwen commented May 5, 2022

thanks for the suggestions @utf ! The npt, nvt, ... are great suggestions; will definitely do it.

I've got another questions for you. For MD run, I believe a user would want all ionic steps being parsed and stored.

  • seems TaskDocument.from_directory() will only parse the last step. What's a good way to processed? Modifying TaskDocument or creating a new class (probably subclassing) like TaskDocument?
  • All the ionic steps are potentially a large amount of data, so should we store them in the extra output store by default?

@utf
Copy link
Member

utf commented May 5, 2022

I think the full ionic information is parsed. For each ionic step you get the energy, forces, and structure - do you need anything else? E.g., it is here:

ionic_steps: List[Dict[str, Any]] = Field(

You're right that this will likely be huge for an MD simulation. One option would be to parse a pymatgen Trajectory object instead and store this in the data store (this shouldn't be enabled by default though).

My suggestion would be:

  1. Add a store_trajectory option to Calculation.from_vasp_files which parses the Trajectory, adds it to the list of vasp_objects and sets ionic_steps in the CalculationOutput to None.
  2. Note that atomate2 already includes an enum for trajectories
  3. Add the pymatgen Trajectory class to the list of objects that are automatically parsed into the data store in all vasp jobs:
    _DATA_OBJECTS = [
  4. In the MDMaker, override the make function and enable the store_trajectory option. I did a similar thing to enable storing band structures here:
    if "parse_dos" not in self.task_document_kwargs:
    # parse DOS only for uniform band structure
    self.task_document_kwargs["parse_dos"] = mode == "uniform"
    if "parse_bandstructure" not in self.task_document_kwargs:
    self.task_document_kwargs["parse_bandstructure"] = mode
    # copy previous inputs
    if "additional_vasp_files" not in self.copy_vasp_kwargs:
    self.copy_vasp_kwargs["additional_vasp_files"] = ("CHGCAR",)
    return super().make.original(self, structure, prev_vasp_dir)
    The original part of return super().make.original(self, structure, prev_vasp_dir) is essential otherwise the overriden function will return a job rather than actually running vasp.

Hope that makes sense, I can clarify further if not.

@utf
Copy link
Member

utf commented May 5, 2022

Also, did you see this page in the documentation on how to write VASP tests?

https://materialsproject.github.io/atomate2/dev/vasp_tests.html

@mjwen
Copy link
Member Author

mjwen commented May 6, 2022

I think the full ionic information is parsed. For each ionic step you get the energy, forces, and structure - do you need anything else?

I did a quick run of the MD job but did not find the ionic_steps in the output. After a closer look, it seems custodian changes IBRION=0 to IBRION=1 and then somehow the ionic_steps are now shown (not sure whether this is the reason). Anyway, can you point me to the place to config the behavior of custodian?

Besides energy, forces, and structure at each ionic step, I may want other properties, like stress (if ISIF gives it) and partial charges (via bader analysis?) in the long run.

I agree that pymatgen Trajectory may be a good option to store the data. I think I get the idea of how to make it happen and will let you know if I need more help.

@utf
Copy link
Member

utf commented May 6, 2022

I did a quick run of the MD job but did not find the ionic_steps in the output. After a closer look, it seems custodian changes IBRION=0 to IBRION=1 and then somehow the ionic_steps are now shown (not sure whether this is the reason). Anyway, can you point me to the place to config the behavior of custodian?

There isn't any logic in the task document for storing ionic_steps depending on the value of IBRION. It should always be stored. Maybe you weren't looking in the right place. They should be in TaskDocument.calcs_reversed[0].output.ionic_steps.

And I've no idea about the custodian changing IBRION! You may have to look in the source code or ask on the issues page.

@utf
Copy link
Member

utf commented May 6, 2022

Besides energy, forces, and structure at each ionic step, I may want other properties, like stress (if ISIF gives it) and partial charges (via bader analysis?) in the long run.

Good point! I double checked and these are the keys that are stored for each ionic step:

  • e_fr_energy
  • e_wo_entrp
  • e_0_energy
  • forces
  • stress
  • electronic_steps
  • structure

So that should cover most things.

Bader might be a little more difficult because you'd have to continuously run Bader as the calculation is progressing as the CHGCAR at the end only has the data for the last step.

@utf
Copy link
Member

utf commented May 6, 2022

To make this clearer, I added IonicStep and ElectronicStep schemas in abecd04 and c3c5a0b.

@mjwen
Copy link
Member Author

mjwen commented May 6, 2022

Bader might be a little more difficult because you'd have to continuously run Bader as the calculation is progressing as the CHGCAR at the end only has the data for the last step.

Yeah, it seems a second run is needed to get Bader for each ionic step. For now, I do not need it and let's leave it for a second.

To make this clearer, I added IonicStep and ElectronicStep schemas in abecd04 and c3c5a0b.

Nice, this is great!

Now that we have these nice schemas, instead of using pymartgen Trajectory, I think maybe we want to create a atomate2 schema for trajectory. Simply,

class Trajectory(BaseModel):

    ionic_steps: List[IonicStep] = Field(None, description="...") 

will do it.

This will make it cleaner that we simply store ionic_steps in a different store than using a pymatgen Trajectory object. Also, the pymatgen Trajectory has lots of additional stuff and may not be the cleanest object to store the data.

@utf
Copy link
Member

utf commented May 10, 2022

Unfortunately, creating a Trajectory BaseModel won't work. This is because only MSONable classes can be stored in the data store. All pydantic models will get converted to a dict (with no additional metadata) so it is impossible for jobflow to recognise them and store them in the data store.

@codecov
Copy link

codecov bot commented May 10, 2022

Codecov Report

Merging #134 (5c3d39c) into main (6047862) will increase coverage by 0.51%.
The diff coverage is 91.93%.

@@            Coverage Diff             @@
##             main     #134      +/-   ##
==========================================
+ Coverage   72.99%   73.51%   +0.51%     
==========================================
  Files          49       49              
  Lines        4229     4349     +120     
  Branches      672      686      +14     
==========================================
+ Hits         3087     3197     +110     
- Misses        973      980       +7     
- Partials      169      172       +3     
Impacted Files Coverage Δ
src/atomate2/vasp/schemas/calc_types/_generate.py 0.00% <ø> (ø)
src/atomate2/vasp/sets/core.py 83.22% <77.77%> (-1.10%) ⬇️
src/atomate2/vasp/schemas/calculation.py 85.66% <80.00%> (-0.30%) ⬇️
src/atomate2/vasp/schemas/task.py 90.45% <83.33%> (-0.42%) ⬇️
src/atomate2/vasp/jobs/base.py 97.82% <100.00%> (+0.04%) ⬆️
src/atomate2/vasp/jobs/core.py 87.93% <100.00%> (+0.77%) ⬆️
src/atomate2/vasp/schemas/calc_types/enums.py 100.00% <100.00%> (ø)
src/atomate2/vasp/schemas/calc_types/utils.py 71.18% <100.00%> (+1.01%) ⬆️

@mjwen
Copy link
Member Author

mjwen commented May 10, 2022

I've come up with a way to deal with this. But before proceeding, I'd like to hear what you think.

I've created a Trajectory BaseModel, and added it as a field trajectory of CalculationOutput. Its purpose is to replace ionic_steps in CalculationOutput. By default, CalculationOutput.trajectory is None and everything is the same as before. If instead we set store_trajectory = True in CalculationOutput.from_vasp_outputs(), ionic_steps will be set to None, and trajectory will be set to store the ionic steps. In addition, the trajectory will be stored in additional data store because trajectory is added to _DATA_OBJECTS_.

This is basically what we've discussed before, with the only difference being that we use a Trajectory BaseModel instead of a pmg Trajectory class. The good side is that we store the same ionic_steps, just in different places. The ugly side is that now we have ionic_steps and trajectory in CalculationOutput for the same info. If you think it is OK, how about not using trajectory at all and let ionic_steps be stored in the additional store by default? Any downside of doing this?

This is the major issue for now; and of course there are other TODOs and the tests I need to do before you want to review it.

@mkhorton
Copy link
Member

This will make it cleaner that we simply store ionic_steps in a different store than using a pymatgen Trajectory object. Also, the pymatgen Trajectory has lots of additional stuff and may materialsproject/pymatgen#2515 to store the data.

Just adding a note to say I'm keeping an eye on the pymatgen PR -- I am absolutely ok with merging in edits to this to make it cleaner or more useful for your workflows. As far as I know, current uses of this class are minimal (we do use it in the MP API to return the Trajectory associated with a geometry optimization, but nothing downstream uses this and it is constructed on-the-fly from the relevant task doc, so changes would be ok).

@mjwen
Copy link
Member Author

mjwen commented May 18, 2022

@utf what's your thoughts on this?

@utf
Copy link
Member

utf commented Jun 21, 2022

Hi @mjwen, Sorry for the delay on this. I think I like the Trajectory class because then it can get stored in VASP objects, which is a obvious place to find the large objects. Is there a specific reason why you don't think the Trajectory class would be appropriate?

@mjwen
Copy link
Member Author

mjwen commented Jun 21, 2022

Two reasons I did not want to do it:

  • The current pymatgen Trajectory class stores info and has methods that we do not need, and some other stuff we need are missing. For backward compatibility, I am not sure how much we can change it. Given Matt's comment, I believe this is not a big issue now.
  • If we want to make it to serve our purpose---storing all ionic steps info, we need to replicate a lot of schema already existing in atomate2. More specifically, we need to implement/copy IonicStep and its fields (e.g. ElectronicStep) to pymatgen. These schemas will all become attributes of the pymatgen Trajectory class. First, it feels redundant, and second second it may not be easy to maintain: each time we make changes we'll have to change both codes.

But if we want it this way, we can definite do it---there is not any technical issue. And the benefit is obvious: the CalculationOutput model will be cleaner; it does not need to have a trajectory field any more.

@mjwen mjwen changed the title [WIP] add molecular dynamics job Molecular dynamics job Jul 4, 2022
@mjwen
Copy link
Member Author

mjwen commented Jul 4, 2022

@utf Ready for review.

In addition to adding the MDSetGenerator and MDMaker, major addition/changes include:

  • a pymatgen Trajectory is used to store ionic steps information for an MD job by default; no change for other jobs
  • to determine the status of an MD job, check whether the number of ionic steps is equal to NSW in INCAR, instead of checking the convergence of the last electronic and ionic steps
  • do not check max_force for an MD job

Copy link
Member

@utf utf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mjwen, thanks very much for this PR and for your work on the Trajectory!

Here are some minor comments.

src/atomate2/vasp/schemas/calculation.py Outdated Show resolved Hide resolved
@@ -187,27 +190,47 @@ class OutputSummary(BaseModel):
)

@classmethod
def from_vasp_calc_doc(cls, calc_doc: Calculation) -> "OutputSummary":
def from_vasp_calc_doc_and_trajectory(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you leave this name as it was before?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

"""
Create a summary of VASP calculation outputs from a VASP calculation document.
Create a summary of VASP calculation outputs from a VASP calculation document
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first part of a docstring should fit on one line. I think you can remove the part about trajectory. It is obvious from the rest of the docstring!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

# LREAL: True -> AUTO
updates.update(
{
"ENCUT": None, # None to use VASP default
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a good idea? Seems like it could result in very low ENCUTS. Was this taken from the old workflow?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is taken from pmg MPMDSet.

I agree with you that this may not be good. The MVLNPTMDSet uses a slightly larger value, 1.5*ENCUT, but still may not be ideal since the ENCUT changes from material to material. I prefer to set it to a const value, but smaller than that in the BaseVaspSet.yaml, like 520.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think 520 is good.

"NSW": self.nsteps,
"POTIM": self.time_step,
"ISPIN": 2 if self.spin_polarized else 1,
"EDIFF_PER_ATOM": 0.00001,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have moved away from EDIFF_PER_ATOM in atomate2. It is reasonable to set a lower EDIFF here than the default but ultimately, I worry EDIFF_PER_ATOM could result in very large EDIFFS for large supercells.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed. EDIFF_PER_ATOM could be really bad for large cells, especially in MD. Removed it. I believe the default EDIFF=1e-5 in BaseVaspSet.yaml is good enough for MD runs.

"KBLOCK": 100,
"PREC": "Normal",
"LDAU": False,
"ADDGRID": True,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have also removed ADDGRID from most VASP calculations based on the VASP developers advice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Copy link
Member

@utf utf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mjwen, this is nearly there I think. I just spotted a few more edits.

# LREAL: True -> AUTO
updates.update(
{
"ENCUT": None, # None to use VASP default
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think 520 is good.

end_temp: float = 300
nsteps: int = 1000
time_step: int = 2
spin_polarized: bool = False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove this parameter and set auto_ispin: bool = True. This means, if you're starting from a previous calculation, and that one has no magnetic moments, then ISPIN will be disabled automatically.

If you aren't starting from a previous calculation it is quite dangerous to set spin polarised to False by default.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

"TEEND": self.end_temp,
"NSW": self.nsteps,
"POTIM": self.time_step,
"ISPIN": 2 if self.spin_polarized else 1,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this and see note about auto_ispin above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

"BMIX": 1,
"MAXMIX": 20,
"NELM": 500,
"NSIM": 4,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the default so can be removed. Same with a few others. So you can remove:

  • NSIM
  • LSCALU
  • BMIX
  • NBLOCK

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. I also removed LREAL=AUTO and ISMEAR=0. For LREAL, the default setting in the yaml files have AUTO. For ISMEAR, I believer it is reasonable to follow the default yaml files.

"NBLOCK": 1,
"KBLOCK": 100,
"PREC": "Normal",
"LDAU": False,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is DFT+U disabled? Shouldn't we just use the default U values from pymatgen? In that case just remove this tag and they will be applied automatically!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a code hygiene perspective, @munrojm and I were discussing recently that there's a big advantage to having all calculations map cleanly to a pymatgen input set, since this greatly simplifies the task validation logic. If it's at all possible to inherit the settings from a specific input set, rather than copying manually, I think this would be very useful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mkhorton Just to make sure I understand, by simplifies the task validation logic, are you referring to the emmet building process or something else? And can you expand a bit on this...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, @mkhorton could you clarify what you mean by "inherit the settings from a specific input set, rather than copying manually"?

@davidwaroquiers
Copy link
Contributor

Hello @mjwen and @utf , this is really nice work. Just my 2 cents, discussing with @gpetretto, one thing that may often happen in a MD job is that you need to continue it (you most probably cannot do e.g. 50000 steps in the timelimit). From what we have seen, it is not clear whether the velocities are passed down when you restart. Actually, we think that they are not and the MD job would restart with random or 0 velocities. What do you think ?
Thanks,
Best

@mkhorton
Copy link
Member

This is a timely comment @davidwaroquiers; I'm actually at a workshop on checkpoint/restart today... I think we're at a point where we need to establish an ideal pattern in atomate2 for how best handle checkpointed states and/or continuation jobs, @utf and I have had some offline discussions about this. Perhaps better to discuss in a separate issue?

I haven't reviewed this PR in detail, does it already have logic for continuation jobs included?

@mjwen
Copy link
Member Author

mjwen commented Jul 13, 2022

@mkhorton No, this PR does not include any logic to continue a job. I vaguely remember you guys were discussing the need to continue a job; @jmmshn was also participating? It would be great to have a specific issue where we can discuss this in greater detail.

@mjwen
Copy link
Member Author

mjwen commented Jul 14, 2022

@utf Besides removing some default params, I've also removed nph ensemble as an option, which is not that common and I don't think MP folks have a good sense on what good parameters should be for it.

@utf
Copy link
Member

utf commented Jul 14, 2022

Hi @mkhorton @davidwaroquiers @mjwen. Agreed that we need a proper discussion about restarting. This is something @jmmshn has opinions about too.

I have some ideas for how we could do it but each with their own tradeoffs. I'd be interested to hear if there were any good ideas from your workshop @mkhorton.

I will create a separate issue to discuss this further.

"POTIM": self.time_step,
"LCHARG": False,
"LPLANE": False,
"LWAVE": True,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this write the wavecar after each step? Could this be very expensive due to IO time of writing supercell wave functions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed. Also removed LPLANE, which has nothing to do with an MD run and depends on the type of machine.

@davidwaroquiers
Copy link
Contributor

Hi @mkhorton @davidwaroquiers @mjwen. Agreed that we need a proper discussion about restarting. This is something @jmmshn has opinions about too.

I have some ideas for how we could do it but each with their own tradeoffs. I'd be interested to hear if there were any good ideas from your workshop @mkhorton.

I will create a separate issue to discuss this further.

Hi @utf, for abinit (PR #81 ), we have restart capabilities so we would indeed be very interested to discuss this and provide some of our ideas.

@utf
Copy link
Member

utf commented Jul 15, 2022

Great, thanks so much for this @mjwen!

@utf utf merged commit 73b9b3e into materialsproject:main Jul 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants