CombinedData.as_dict/from_dict and other methods overwrite; CombinedData.as_lammpsdata deepcopied #2191

htz1992213 · 2021-07-03T01:19:59Z

Summary

Include a summary of major changes in bullet points:

Add as_dict, from_dict for CombinedData to overwrite superclass methods
Overwrite structure, disassemble, from_ff_and_topologie, from_structur methods for CombinedData
Rewrite as_lammpsdata method for CombinedData so that each attribute is deep copied
Update docs
Join string frags
Added tests

htz1992213 · 2021-07-03T01:22:54Z

coveralls · 2021-07-06T19:38:05Z

Coverage decreased (-0.6%) to 83.143% when pulling cab0331 on htz1992213:master into 40817cc on materialsproject:master.

htz1992213 · 2021-07-07T23:18:08Z

Hey @mkhorton, I have added tests for the new methods.

htz1992213 · 2021-07-07T23:19:04Z

The serialization of the CombinedData class should work as expected with the update @rkingsbury

rkingsbury · 2021-07-08T16:22:41Z

The serialization of the CombinedData class should work as expected with the update @rkingsbury

Thanks @htz1992213 ! This looks like a great update that will really improve robustness and clarity of CombinedData.

rkingsbury · 2021-07-08T16:24:18Z

pymatgen/io/lammps/data.py

@@ -351,7 +351,7 @@ def get_string(self, distance=6, velocity=8, charge=4, hybrid=True):
            velocity (int): No. of significant figures to output for
                velocities. Default to 8.
            charge (int): No. of significant figures to output for
-                charges. Default to 3.
+                charges. Default to 4.


If there is no significant penalty in terms of size or performance, I suggest we default to 5 decimal places here. The only reason I say that is that for the TIP4P-FB water model I'm using, 5 decimals are required in order to keep the water molecule neutral. Obviously it's not a big deal to set this via kwarg if there are strong reasons to keep only 4 decimals as the default.

The previous default was 3, changed to 4 as a safe buffer. Would you prefer to change it to even more? I am actually working on a charge writer class in mdgo.forcefield, it can auto-detect significant digits so that users don't have to specify that explicitly to make it right.

rkingsbury · 2021-07-08T16:26:21Z

Longer-term, I still think we should work on getting DataFrame supported in monty so that we can do away with custom as_dict and from_dict methods (and this comment applies throughout pymatgen). But for now this is a good solution. CombinedData has been inheriting too many methods from LammpsData that don't make sense, so I'm glad you added these as well as the AttributeError where appropriate.

mkhorton · 2021-07-08T17:02:33Z

I think we should prefer the monty solution over the custom as_dict/from_dict here actually; let's try and get the monty solution first, and fall back to this if that's impractical? I think it would be an equivalent number of lines of code.

I'm trying to actively discourage custom as_dict, from_dict methods wherever possible, they're creating a lot of technical debt.

rkingsbury · 2021-07-08T17:27:46Z

I think we should prefer the monty solution over the custom as_dict/from_dict here actually; let's try and get the monty solution first, and fall back to this if that's impractical? I think it would be an equivalent number of lines of code.

I'm trying to actively discourage custom as_dict, from_dict methods wherever possible, they're creating a lot of technical debt.

@mkhorton I agree with you, however this PR fixes several pretty serious flaws in CombinedData that are affecting our workflows, and brings it into line with LammpsData which also has custom as_dict / from_dict. I'd encourage you to consider merging this as a stopgap solution until we get monty patched, which may take more time. Once that happens, we can remove as_dict / from_dict from all the Lammps I/O classes at once.

mkhorton · 2021-07-08T18:14:44Z

Unfortunately "stopgap" solutions are how we accrue technical debt, because we forget to go back and remove them. I am reluctant to merge this when there has been a clear alternative suggested, with specific code proposed, and that could be implemented on a comparable timeframe.

mkhorton · 2021-07-08T18:19:06Z

Note that this comment is for the as_dict, from_dict specifically, not for the remainder of the PR which is fine to merge.

htz1992213 · 2021-07-08T19:08:35Z

Unfortunately "stopgap" solutions are how we accrue technical debt, because we forget to go back and remove them. I am reluctant to merge this when there has been a clear alternative suggested, with specific code proposed, and that could be implemented on a comparable timeframe.

@mkhorton I think modifying monty is beyond my expertise because it involves making test cases that I may not familiar. I don't think this is a stopgap because it is for sure built on the latest dependencies of pymatgen. But I would be happy to modify both LammpsData/CombinedData immediately once the patch to monty is in place. I am quite actively maintaining this submodule so no worries about forgetting to go back here. And I think the urgent is to make sure the serialization of CombinedData works.

mkhorton · 2021-07-08T19:34:52Z

@mkhorton I think modifying monty is beyond my expertise because it involves making test cases that I may not familiar.

Ok, that's not a problem -- I'm happy to propose this change as a PR to monty myself but it'll take me a few days until I can get to it.

htz1992213 · 2021-08-25T08:58:14Z

Hey @mkhorton @rkingsbury, Shyue merged the Monty update, so I was able to deprecate the as_dict and from_dict methods for LammpsData and CombinedData. The MontyEncoder/Decoder works fine here. One thing I want to add is that, the pandas.DataFrame.to_dict method will convert the int type "index" of the df to str type. This won't cause a fundamental error here because LammpsData will just output that column as str anyway. But I think in general it is something that could be fixed in Monty.

rkingsbury · 2021-08-25T14:22:10Z

Thanks @htz1992213 ! Good catch re: the index type. I don't have a well-informed opinion of whether that type conversion is a problem or not.

mkhorton · 2021-08-25T16:46:03Z

Thanks @htz1992213! I'll go ahead and merge.

The index conversion is interesting to know -- this is done by pandas JSON export itself rather than monty, I believe?

htz1992213 · 2021-08-25T20:44:44Z

@rkingsbury @mkhorton Yes I think pandas export JSON by itself. But I think that default behavior can be changed as described in this answer

mkhorton · 2021-08-25T21:24:12Z

@htz1992213 ah, I see. If we make this change it would be a breaking change since it would change the serialization format, would that be ok?

htz1992213 · 2021-08-25T21:32:22Z

@mkhorton I made the tests in this PR to only check if the actual values in dfs are the same regardless of the type. So it should be ok updating monty according to the link. I don't think it will break the things here.

mkhorton · 2021-08-25T21:36:53Z

I was asking more if it would cause any problems for you for your own work, if you've already been using the functionality, otherwise I will submit another PR to monty ASAP.

htz1992213 · 2021-08-25T21:41:24Z

@mkhorton Oh I see. No, I don't think that is a problem for me. I was just discussing if it is a better practice to maintain the int type of the index for serializing a df in Monty.

mkhorton · 2021-08-25T22:04:31Z

I don't think it's a big issue, but it does seem best practice for serialization/deserialization to retain type information, and since it's a simple change I think we should do that before it sees widespread usage.

htz1992213 · 2021-08-26T18:00:19Z

@mkhorton Yeah I totally agree, a simple fix would be great!

rewrite super class mathods

eca5cbd

htz1992213 mentioned this pull request Jul 3, 2021

CombinedData serialization with monty is broken #2188

Open

Merge branch 'materialsproject:master' into master

a7f076f

add tests

4d73219

htz1992213 changed the title ~~[WIP] CombinedData.as_dict/from_dict and other methods overwrite; CombinedData.as_lammpsdata deepcopied~~ CombinedData.as_dict/from_dict and other methods overwrite; CombinedData.as_lammpsdata deepcopied Jul 7, 2021

rkingsbury reviewed Jul 8, 2021

View reviewed changes

mkhorton mentioned this pull request Jul 8, 2021

Add pandas DataFrame serialization materialsvirtuallab/monty#280

Merged

htz1992213 and others added 2 commits August 24, 2021 15:22

Merge branch 'materialsproject:master' into master

0997df4

deprecate from_dict as_dict

cab0331

mkhorton merged commit a68137e into materialsproject:master Aug 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CombinedData.as_dict/from_dict and other methods overwrite; CombinedData.as_lammpsdata deepcopied #2191

CombinedData.as_dict/from_dict and other methods overwrite; CombinedData.as_lammpsdata deepcopied #2191

htz1992213 commented Jul 3, 2021 •

edited

htz1992213 commented Jul 3, 2021

coveralls commented Jul 6, 2021 •

edited

htz1992213 commented Jul 7, 2021

htz1992213 commented Jul 7, 2021

rkingsbury commented Jul 8, 2021

rkingsbury Jul 8, 2021

htz1992213 Jul 8, 2021

rkingsbury commented Jul 8, 2021

mkhorton commented Jul 8, 2021

rkingsbury commented Jul 8, 2021

mkhorton commented Jul 8, 2021 •

edited

mkhorton commented Jul 8, 2021

htz1992213 commented Jul 8, 2021

mkhorton commented Jul 8, 2021

htz1992213 commented Aug 25, 2021

rkingsbury commented Aug 25, 2021

mkhorton commented Aug 25, 2021

htz1992213 commented Aug 25, 2021 •

edited

mkhorton commented Aug 25, 2021

htz1992213 commented Aug 25, 2021

mkhorton commented Aug 25, 2021

htz1992213 commented Aug 25, 2021

mkhorton commented Aug 25, 2021

htz1992213 commented Aug 26, 2021

CombinedData.as_dict/from_dict and other methods overwrite; CombinedData.as_lammpsdata deepcopied #2191

CombinedData.as_dict/from_dict and other methods overwrite; CombinedData.as_lammpsdata deepcopied #2191

Conversation

htz1992213 commented Jul 3, 2021 • edited

Summary

htz1992213 commented Jul 3, 2021

coveralls commented Jul 6, 2021 • edited

htz1992213 commented Jul 7, 2021

htz1992213 commented Jul 7, 2021

rkingsbury commented Jul 8, 2021

rkingsbury Jul 8, 2021

Choose a reason for hiding this comment

htz1992213 Jul 8, 2021

Choose a reason for hiding this comment

rkingsbury commented Jul 8, 2021

mkhorton commented Jul 8, 2021

rkingsbury commented Jul 8, 2021

mkhorton commented Jul 8, 2021 • edited

mkhorton commented Jul 8, 2021

htz1992213 commented Jul 8, 2021

mkhorton commented Jul 8, 2021

htz1992213 commented Aug 25, 2021

rkingsbury commented Aug 25, 2021

mkhorton commented Aug 25, 2021

htz1992213 commented Aug 25, 2021 • edited

mkhorton commented Aug 25, 2021

htz1992213 commented Aug 25, 2021

mkhorton commented Aug 25, 2021

htz1992213 commented Aug 25, 2021

mkhorton commented Aug 25, 2021

htz1992213 commented Aug 26, 2021

htz1992213 commented Jul 3, 2021 •

edited

coveralls commented Jul 6, 2021 •

edited

mkhorton commented Jul 8, 2021 •

edited

htz1992213 commented Aug 25, 2021 •

edited