Molecular dynamics job #134

mjwen · 2022-05-05T22:37:28Z

Add basic MD job, based on

MPMDSet in pymatgen
MDFlow in atomate1

utf

Hey @mjwen, thanks for the PR. I realise this is a WIP but here are some initial comments.

utf · 2022-05-05T23:00:12Z

src/atomate2/vasp/jobs/core.py

+        ``{"my_file:txt": "contents of the file"}``.
+    """
+
+    # TODO expose basic setting of INCAR parameters to user


Please don't put these controls in the job maker, instead put them in the input set generator to be consistent with the other input sets.

Also, I'd advise against exposing the basic incar settings as those are easily overriden using user_incar_settings. Instead you could add a several "presets", specified by a string, e.g., "nvt", "npt", "long" or whatever you think is useful

utf · 2022-05-05T23:02:38Z

src/atomate2/vasp/sets/core.py

+        # use VASP default ENCUT
+        update["ENCUT"] = None
+
+        if defaults["ISPIN"] == 1:


Not sure what defaults is here?

Instead of this, you might want to set auto_ispin: bool = True in this dataclass. See the docstring for that setting here:

atomate2/src/atomate2/vasp/sets/base.py

Line 257 in 6445c2b

auto_ispin

And this is an example of where I've turned it on:

atomate2/src/atomate2/vasp/sets/core.py

Line 528 in 6445c2b

auto_ispin: bool = True

utf · 2022-05-05T23:02:54Z

tests/vasp/jobs/test_core.py

+
+    # generate transmuter job
+    job = MDMaker(
+        transformations=["SupercellTransformation"],


I guess you don't need these options?

mjwen · 2022-05-05T23:25:00Z

thanks for the suggestions @utf ! The npt, nvt, ... are great suggestions; will definitely do it.

I've got another questions for you. For MD run, I believe a user would want all ionic steps being parsed and stored.

seems TaskDocument.from_directory() will only parse the last step. What's a good way to processed? Modifying TaskDocument or creating a new class (probably subclassing) like TaskDocument?
All the ionic steps are potentially a large amount of data, so should we store them in the extra output store by default?

utf · 2022-05-05T23:45:42Z

I think the full ionic information is parsed. For each ionic step you get the energy, forces, and structure - do you need anything else? E.g., it is here:

atomate2/src/atomate2/vasp/schemas/calculation.py

Line 334 in 39c0508

ionic_steps: List[Dict[str, Any]] = Field(

You're right that this will likely be huge for an MD simulation. One option would be to parse a pymatgen Trajectory object instead and store this in the data store (this shouldn't be enabled by default though).

My suggestion would be:

Add a store_trajectory option to Calculation.from_vasp_files which parses the Trajectory, adds it to the list of vasp_objects and sets ionic_steps in the CalculationOutput to None.
Note that atomate2 already includes an enum for trajectories
Add the pymatgen Trajectory class to the list of objects that are automatically parsed into the data store in all vasp jobs:

atomate2/src/atomate2/vasp/jobs/base.py

Line 28 in 39c0508

_DATA_OBJECTS = [

In the MDMaker, override the make function and enable the store_trajectory option. I did a similar thing to enable storing band structures here:

atomate2/src/atomate2/vasp/jobs/core.py

Lines 204 to 215 in 39c0508

    
           if "parse_dos" not in self.task_document_kwargs: 
        
               # parse DOS only for uniform band structure 
        
               self.task_document_kwargs["parse_dos"] = mode == "uniform" 
        
           if "parse_bandstructure" not in self.task_document_kwargs: 
        
               self.task_document_kwargs["parse_bandstructure"] = mode 
        
           # copy previous inputs 
        
           if "additional_vasp_files" not in self.copy_vasp_kwargs: 
        
               self.copy_vasp_kwargs["additional_vasp_files"] = ("CHGCAR",) 
        
           return super().make.original(self, structure, prev_vasp_dir)

The original part of return super().make.original(self, structure, prev_vasp_dir) is essential otherwise the overriden function will return a job rather than actually running vasp.

Hope that makes sense, I can clarify further if not.

utf · 2022-05-05T23:51:17Z

Also, did you see this page in the documentation on how to write VASP tests?

https://materialsproject.github.io/atomate2/dev/vasp_tests.html

mjwen · 2022-05-06T01:18:26Z

I think the full ionic information is parsed. For each ionic step you get the energy, forces, and structure - do you need anything else?

I did a quick run of the MD job but did not find the ionic_steps in the output. After a closer look, it seems custodian changes IBRION=0 to IBRION=1 and then somehow the ionic_steps are now shown (not sure whether this is the reason). Anyway, can you point me to the place to config the behavior of custodian?

Besides energy, forces, and structure at each ionic step, I may want other properties, like stress (if ISIF gives it) and partial charges (via bader analysis?) in the long run.

I agree that pymatgen Trajectory may be a good option to store the data. I think I get the idea of how to make it happen and will let you know if I need more help.

utf · 2022-05-06T14:44:38Z

I did a quick run of the MD job but did not find the ionic_steps in the output. After a closer look, it seems custodian changes IBRION=0 to IBRION=1 and then somehow the ionic_steps are now shown (not sure whether this is the reason). Anyway, can you point me to the place to config the behavior of custodian?

There isn't any logic in the task document for storing ionic_steps depending on the value of IBRION. It should always be stored. Maybe you weren't looking in the right place. They should be in TaskDocument.calcs_reversed[0].output.ionic_steps.

And I've no idea about the custodian changing IBRION! You may have to look in the source code or ask on the issues page.

utf · 2022-05-06T14:53:53Z

Besides energy, forces, and structure at each ionic step, I may want other properties, like stress (if ISIF gives it) and partial charges (via bader analysis?) in the long run.

Good point! I double checked and these are the keys that are stored for each ionic step:

e_fr_energy
e_wo_entrp
e_0_energy
forces
stress
electronic_steps
structure

So that should cover most things.

Bader might be a little more difficult because you'd have to continuously run Bader as the calculation is progressing as the CHGCAR at the end only has the data for the last step.

utf · 2022-05-06T15:36:42Z

To make this clearer, I added IonicStep and ElectronicStep schemas in abecd04 and c3c5a0b.

mjwen · 2022-05-06T18:07:02Z

Bader might be a little more difficult because you'd have to continuously run Bader as the calculation is progressing as the CHGCAR at the end only has the data for the last step.

Yeah, it seems a second run is needed to get Bader for each ionic step. For now, I do not need it and let's leave it for a second.

To make this clearer, I added IonicStep and ElectronicStep schemas in abecd04 and c3c5a0b.

Nice, this is great!

Now that we have these nice schemas, instead of using pymartgen Trajectory, I think maybe we want to create a atomate2 schema for trajectory. Simply,

class Trajectory(BaseModel):

    ionic_steps: List[IonicStep] = Field(None, description="...")

will do it.

This will make it cleaner that we simply store ionic_steps in a different store than using a pymatgen Trajectory object. Also, the pymatgen Trajectory has lots of additional stuff and may not be the cleanest object to store the data.

utf · 2022-05-10T19:38:04Z

Unfortunately, creating a Trajectory BaseModel won't work. This is because only MSONable classes can be stored in the data store. All pydantic models will get converted to a dict (with no additional metadata) so it is impossible for jobflow to recognise them and store them in the data store.

codecov · 2022-05-10T22:24:34Z

Codecov Report

Merging #134 (5c3d39c) into main (6047862) will increase coverage by 0.51%.
The diff coverage is 91.93%.

@@            Coverage Diff             @@
##             main     #134      +/-   ##
==========================================
+ Coverage   72.99%   73.51%   +0.51%     
==========================================
  Files          49       49              
  Lines        4229     4349     +120     
  Branches      672      686      +14     
==========================================
+ Hits         3087     3197     +110     
- Misses        973      980       +7     
- Partials      169      172       +3

Impacted Files	Coverage Δ
src/atomate2/vasp/schemas/calc_types/_generate.py	`0.00% <ø> (ø)`
src/atomate2/vasp/sets/core.py	`83.22% <77.77%> (-1.10%)`	⬇️
src/atomate2/vasp/schemas/calculation.py	`85.66% <80.00%> (-0.30%)`	⬇️
src/atomate2/vasp/schemas/task.py	`90.45% <83.33%> (-0.42%)`	⬇️
src/atomate2/vasp/jobs/base.py	`97.82% <100.00%> (+0.04%)`	⬆️
src/atomate2/vasp/jobs/core.py	`87.93% <100.00%> (+0.77%)`	⬆️
src/atomate2/vasp/schemas/calc_types/enums.py	`100.00% <100.00%> (ø)`
src/atomate2/vasp/schemas/calc_types/utils.py	`71.18% <100.00%> (+1.01%)`	⬆️

mjwen · 2022-05-10T22:36:13Z

I've come up with a way to deal with this. But before proceeding, I'd like to hear what you think.

I've created a Trajectory BaseModel, and added it as a field trajectory of CalculationOutput. Its purpose is to replace ionic_steps in CalculationOutput. By default, CalculationOutput.trajectory is None and everything is the same as before. If instead we set store_trajectory = True in CalculationOutput.from_vasp_outputs(), ionic_steps will be set to None, and trajectory will be set to store the ionic steps. In addition, the trajectory will be stored in additional data store because trajectory is added to _DATA_OBJECTS_.

This is basically what we've discussed before, with the only difference being that we use a Trajectory BaseModel instead of a pmg Trajectory class. The good side is that we store the same ionic_steps, just in different places. The ugly side is that now we have ionic_steps and trajectory in CalculationOutput for the same info. If you think it is OK, how about not using trajectory at all and let ionic_steps be stored in the additional store by default? Any downside of doing this?

This is the major issue for now; and of course there are other TODOs and the tests I need to do before you want to review it.

mkhorton · 2022-05-10T23:45:16Z

This will make it cleaner that we simply store ionic_steps in a different store than using a pymatgen Trajectory object. Also, the pymatgen Trajectory has lots of additional stuff and may materialsproject/pymatgen#2515 to store the data.

Just adding a note to say I'm keeping an eye on the pymatgen PR -- I am absolutely ok with merging in edits to this to make it cleaner or more useful for your workflows. As far as I know, current uses of this class are minimal (we do use it in the MP API to return the Trajectory associated with a geometry optimization, but nothing downstream uses this and it is constructed on-the-fly from the relevant task doc, so changes would be ok).

mjwen · 2022-05-18T17:33:55Z

@utf what's your thoughts on this?

utf · 2022-06-21T16:31:13Z

Hi @mjwen, Sorry for the delay on this. I think I like the Trajectory class because then it can get stored in VASP objects, which is a obvious place to find the large objects. Is there a specific reason why you don't think the Trajectory class would be appropriate?

mjwen · 2022-06-21T17:28:56Z

Two reasons I did not want to do it:

The current pymatgen Trajectory class stores info and has methods that we do not need, and some other stuff we need are missing. For backward compatibility, I am not sure how much we can change it. Given Matt's comment, I believe this is not a big issue now.
If we want to make it to serve our purpose---storing all ionic steps info, we need to replicate a lot of schema already existing in atomate2. More specifically, we need to implement/copy IonicStep and its fields (e.g. ElectronicStep) to pymatgen. These schemas will all become attributes of the pymatgen Trajectory class. First, it feels redundant, and second second it may not be easy to maintain: each time we make changes we'll have to change both codes.

But if we want it this way, we can definite do it---there is not any technical issue. And the benefit is obvious: the CalculationOutput model will be cleaner; it does not need to have a trajectory field any more.

mjwen · 2022-07-04T17:05:14Z

@utf Ready for review.

In addition to adding the MDSetGenerator and MDMaker, major addition/changes include:

a pymatgen Trajectory is used to store ionic steps information for an MD job by default; no change for other jobs
to determine the status of an MD job, check whether the number of ionic steps is equal to NSW in INCAR, instead of checking the convergence of the last electronic and ionic steps
do not check max_force for an MD job

utf

Hi @mjwen, thanks very much for this PR and for your work on the Trajectory!

Here are some minor comments.

src/atomate2/vasp/schemas/calculation.py

utf · 2022-07-07T12:21:52Z

src/atomate2/vasp/schemas/task.py

@@ -187,27 +190,47 @@ class OutputSummary(BaseModel):
    )

    @classmethod
-    def from_vasp_calc_doc(cls, calc_doc: Calculation) -> "OutputSummary":
+    def from_vasp_calc_doc_and_trajectory(


Can you leave this name as it was before?

utf · 2022-07-07T12:22:33Z

src/atomate2/vasp/schemas/task.py

        """
-        Create a summary of VASP calculation outputs from a VASP calculation document.
+        Create a summary of VASP calculation outputs from a VASP calculation document


The first part of a docstring should fit on one line. I think you can remove the part about trajectory. It is obvious from the rest of the docstring!

utf · 2022-07-07T12:23:29Z

src/atomate2/vasp/sets/core.py

+        # LREAL: True -> AUTO
+        updates.update(
+            {
+                "ENCUT": None,  # None to use VASP default


Is this a good idea? Seems like it could result in very low ENCUTS. Was this taken from the old workflow?

Yes, it is taken from pmg MPMDSet.

I agree with you that this may not be good. The MVLNPTMDSet uses a slightly larger value, 1.5*ENCUT, but still may not be ideal since the ENCUT changes from material to material. I prefer to set it to a const value, but smaller than that in the BaseVaspSet.yaml, like 520.

Yeah, I think 520 is good.

utf · 2022-07-07T12:24:31Z

src/atomate2/vasp/sets/core.py

+                "NSW": self.nsteps,
+                "POTIM": self.time_step,
+                "ISPIN": 2 if self.spin_polarized else 1,
+                "EDIFF_PER_ATOM": 0.00001,


We have moved away from EDIFF_PER_ATOM in atomate2. It is reasonable to set a lower EDIFF here than the default but ultimately, I worry EDIFF_PER_ATOM could result in very large EDIFFS for large supercells.

agreed. EDIFF_PER_ATOM could be really bad for large cells, especially in MD. Removed it. I believe the default EDIFF=1e-5 in BaseVaspSet.yaml is good enough for MD runs.

utf · 2022-07-07T12:25:11Z

src/atomate2/vasp/sets/core.py

+                "KBLOCK": 100,
+                "PREC": "Normal",
+                "LDAU": False,
+                "ADDGRID": True,


We have also removed ADDGRID from most VASP calculations based on the VASP developers advice.

…r large MD calculations

utf

Thanks @mjwen, this is nearly there I think. I just spotted a few more edits.

utf · 2022-07-09T10:09:10Z

src/atomate2/vasp/sets/core.py

+        # LREAL: True -> AUTO
+        updates.update(
+            {
+                "ENCUT": None,  # None to use VASP default


Yeah, I think 520 is good.

utf · 2022-07-09T10:10:36Z

src/atomate2/vasp/sets/core.py

+    end_temp: float = 300
+    nsteps: int = 1000
+    time_step: int = 2
+    spin_polarized: bool = False


Can you remove this parameter and set auto_ispin: bool = True. This means, if you're starting from a previous calculation, and that one has no magnetic moments, then ISPIN will be disabled automatically.

If you aren't starting from a previous calculation it is quite dangerous to set spin polarised to False by default.

utf · 2022-07-09T10:11:04Z

src/atomate2/vasp/sets/core.py

+                "TEEND": self.end_temp,
+                "NSW": self.nsteps,
+                "POTIM": self.time_step,
+                "ISPIN": 2 if self.spin_polarized else 1,


Remove this and see note about auto_ispin above.

utf · 2022-07-09T10:11:46Z

src/atomate2/vasp/sets/core.py

+                "BMIX": 1,
+                "MAXMIX": 20,
+                "NELM": 500,
+                "NSIM": 4,


This is the default so can be removed. Same with a few others. So you can remove:

NSIM

LSCALU

BMIX

NBLOCK

done. I also removed LREAL=AUTO and ISMEAR=0. For LREAL, the default setting in the yaml files have AUTO. For ISMEAR, I believer it is reasonable to follow the default yaml files.

utf · 2022-07-09T10:16:50Z

src/atomate2/vasp/sets/core.py

+                "NBLOCK": 1,
+                "KBLOCK": 100,
+                "PREC": "Normal",
+                "LDAU": False,


Why is DFT+U disabled? Shouldn't we just use the default U values from pymatgen? In that case just remove this tag and they will be applied automatically!

From a code hygiene perspective, @munrojm and I were discussing recently that there's a big advantage to having all calculations map cleanly to a pymatgen input set, since this greatly simplifies the task validation logic. If it's at all possible to inherit the settings from a specific input set, rather than copying manually, I think this would be very useful.

@mkhorton Just to make sure I understand, by simplifies the task validation logic, are you referring to the emmet building process or something else? And can you expand a bit on this...

Also, @mkhorton could you clarify what you mean by "inherit the settings from a specific input set, rather than copying manually"?

davidwaroquiers · 2022-07-13T14:33:12Z

Hello @mjwen and @utf , this is really nice work. Just my 2 cents, discussing with @gpetretto, one thing that may often happen in a MD job is that you need to continue it (you most probably cannot do e.g. 50000 steps in the timelimit). From what we have seen, it is not clear whether the velocities are passed down when you restart. Actually, we think that they are not and the MD job would restart with random or 0 velocities. What do you think ?
Thanks,
Best

mkhorton · 2022-07-13T16:53:34Z

This is a timely comment @davidwaroquiers; I'm actually at a workshop on checkpoint/restart today... I think we're at a point where we need to establish an ideal pattern in atomate2 for how best handle checkpointed states and/or continuation jobs, @utf and I have had some offline discussions about this. Perhaps better to discuss in a separate issue?

I haven't reviewed this PR in detail, does it already have logic for continuation jobs included?

mjwen · 2022-07-13T17:59:05Z

@mkhorton No, this PR does not include any logic to continue a job. I vaguely remember you guys were discussing the need to continue a job; @jmmshn was also participating? It would be great to have a specific issue where we can discuss this in greater detail.

mjwen · 2022-07-14T06:15:35Z

@utf Besides removing some default params, I've also removed nph ensemble as an option, which is not that common and I don't think MP folks have a good sense on what good parameters should be for it.

utf · 2022-07-14T09:29:23Z

Hi @mkhorton @davidwaroquiers @mjwen. Agreed that we need a proper discussion about restarting. This is something @jmmshn has opinions about too.

I have some ideas for how we could do it but each with their own tradeoffs. I'd be interested to hear if there were any good ideas from your workshop @mkhorton.

I will create a separate issue to discuss this further.

utf · 2022-07-14T09:33:22Z

src/atomate2/vasp/sets/core.py

+                "POTIM": self.time_step,
+                "LCHARG": False,
+                "LPLANE": False,
+                "LWAVE": True,


Does this write the wavecar after each step? Could this be very expensive due to IO time of writing supercell wave functions?

Removed. Also removed LPLANE, which has nothing to do with an MD run and depends on the type of machine.

davidwaroquiers · 2022-07-14T13:19:51Z

Hi @mkhorton @davidwaroquiers @mjwen. Agreed that we need a proper discussion about restarting. This is something @jmmshn has opinions about too.

I have some ideas for how we could do it but each with their own tradeoffs. I'd be interested to hear if there were any good ideas from your workshop @mkhorton.

I will create a separate issue to discuss this further.

Hi @utf, for abinit (PR #81 ), we have restart capabilities so we would indeed be very interested to discuss this and provide some of our ideas.

utf · 2022-07-15T09:15:01Z

Great, thanks so much for this @mjwen!

mjwen added 3 commits May 5, 2022 14:42

Add MDSetGenerator based on pymatgen MPMDSet

bbe9b26

Add MDMaker job

a44480d

Attempt to add test

4a677f5

utf reviewed May 5, 2022

View reviewed changes

Allow different ensembles for MDSetGenerator, default to nvt

ff79d52

mjwen added 4 commits May 6, 2022 11:31

Add checking user incar setting

b4352a4

Merge branch 'main' into md

b552ca3

Set custodian handler for MD run

37e469c

Correct custodian handler for MD run

d5b7a1f

mjwen added 6 commits May 10, 2022 13:42

Store trajectory as in additional data store

6edfdaa

Add MD to CalcType

47a2101

Add MD to get calc type function

cf6ece7

Fix OutputSummary that uses info from ionic steps

7b02f67

Remove incomplete test

6883d97

Fix annotation

57e164e

mjwen added 2 commits July 4, 2022 09:18

Merge branch 'main' into md

137af78

Remove force POTIM=0.5 for H containing structure

b7f4e89

mjwen changed the title ~~[WIP] add molecular dynamics job~~ Molecular dynamics job Jul 4, 2022

Minor tweak docs

84fa505

utf reviewed Jul 7, 2022

View reviewed changes

mjwen added 8 commits July 8, 2022 14:15

Change store_ionic_steps to store_trajectory

00d7dbd

Restore func name to from_vasp_calc_doc

6bc888d

Make trajectory arg optional

b9709c0

Remove redundant docstring

b2a6c5d

Remove EDIFF_PER_ATOM, which could lead to large EDIFF, especially fo…

0288471

…r large MD calculations

Fix MDMaker to store trajectory by default

8adf37c

Regenerate enums.py

02accd2

Merge branch 'main' into md

07df203

utf reviewed Jul 9, 2022

View reviewed changes

mjwen added 3 commits July 13, 2022 22:31

Set ENCUT=520 and remove params that's the default in vasp

bde7f1c

Add more default params for npt and remove support of nph

b9b8754

Check incar settings

37becfb

utf reviewed Jul 14, 2022

View reviewed changes

jmmshn mentioned this pull request Jul 14, 2022

Standardized approach for job checkpoint/restart/continuation #156

Open

Remove setting LAVE and LPLANE

5c3d39c

utf merged commit 73b9b3e into materialsproject:main Jul 15, 2022

chiang-yuan mentioned this pull request Feb 22, 2024

[WIP] Support general ASE Calculator for general MLIP MD simulations #738

Closed

10 tasks

Molecular dynamics job #134

Molecular dynamics job #134

Conversation

mjwen commented May 5, 2022

utf left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mjwen commented May 5, 2022

utf commented May 5, 2022 • edited

utf commented May 5, 2022

mjwen commented May 6, 2022

utf commented May 6, 2022 • edited

utf commented May 6, 2022

utf commented May 6, 2022

mjwen commented May 6, 2022

utf commented May 10, 2022

codecov bot commented May 10, 2022 • edited

Codecov Report

mjwen commented May 10, 2022

mkhorton commented May 10, 2022

mjwen commented May 18, 2022

utf commented Jun 21, 2022

mjwen commented Jun 21, 2022 • edited

mjwen commented Jul 4, 2022

utf left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

utf left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davidwaroquiers commented Jul 13, 2022

mkhorton commented Jul 13, 2022

mjwen commented Jul 13, 2022

mjwen commented Jul 14, 2022

utf commented Jul 14, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davidwaroquiers commented Jul 14, 2022

utf commented Jul 15, 2022

utf commented May 5, 2022 •

edited

utf commented May 6, 2022 •

edited

codecov bot commented May 10, 2022 •

edited

mjwen commented Jun 21, 2022 •

edited