New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CombinedData.as_dict/from_dict and other methods overwrite; CombinedData.as_lammpsdata deepcopied #2191
Conversation
Fixes #2188 |
Hey @mkhorton, I have added tests for the new methods. |
The serialization of the |
Thanks @htz1992213 ! This looks like a great update that will really improve robustness and clarity of |
@@ -351,7 +351,7 @@ def get_string(self, distance=6, velocity=8, charge=4, hybrid=True): | |||
velocity (int): No. of significant figures to output for | |||
velocities. Default to 8. | |||
charge (int): No. of significant figures to output for | |||
charges. Default to 3. | |||
charges. Default to 4. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is no significant penalty in terms of size or performance, I suggest we default to 5 decimal places here. The only reason I say that is that for the TIP4P-FB water model I'm using, 5 decimals are required in order to keep the water molecule neutral. Obviously it's not a big deal to set this via kwarg if there are strong reasons to keep only 4 decimals as the default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous default was 3, changed to 4 as a safe buffer. Would you prefer to change it to even more? I am actually working on a charge writer class in mdgo.forcefield
, it can auto-detect significant digits so that users don't have to specify that explicitly to make it right.
Longer-term, I still think we should work on getting |
I think we should prefer the monty solution over the custom I'm trying to actively discourage custom |
@mkhorton I agree with you, however this PR fixes several pretty serious flaws in |
Unfortunately "stopgap" solutions are how we accrue technical debt, because we forget to go back and remove them. I am reluctant to merge this when there has been a clear alternative suggested, with specific code proposed, and that could be implemented on a comparable timeframe. |
Note that this comment is for the |
@mkhorton I think modifying |
Ok, that's not a problem -- I'm happy to propose this change as a PR to monty myself but it'll take me a few days until I can get to it. |
Hey @mkhorton @rkingsbury, Shyue merged the Monty update, so I was able to deprecate the as_dict and from_dict methods for LammpsData and CombinedData. The MontyEncoder/Decoder works fine here. One thing I want to add is that, the pandas.DataFrame.to_dict method will convert the int type "index" of the df to str type. This won't cause a fundamental error here because LammpsData will just output that column as str anyway. But I think in general it is something that could be fixed in Monty. |
Thanks @htz1992213 ! Good catch re: the index type. I don't have a well-informed opinion of whether that type conversion is a problem or not. |
Thanks @htz1992213! I'll go ahead and merge. The index conversion is interesting to know -- this is done by pandas JSON export itself rather than monty, I believe? |
@rkingsbury @mkhorton Yes I think pandas export JSON by itself. But I think that default behavior can be changed as described in this answer |
@htz1992213 ah, I see. If we make this change it would be a breaking change since it would change the serialization format, would that be ok? |
@mkhorton I made the tests in this PR to only check if the actual values in dfs are the same regardless of the type. So it should be ok updating monty according to the link. I don't think it will break the things here. |
I was asking more if it would cause any problems for you for your own work, if you've already been using the functionality, otherwise I will submit another PR to monty ASAP. |
@mkhorton Oh I see. No, I don't think that is a problem for me. I was just discussing if it is a better practice to maintain the int type of the index for serializing a df in Monty. |
I don't think it's a big issue, but it does seem best practice for serialization/deserialization to retain type information, and since it's a simple change I think we should do that before it sees widespread usage. |
@mkhorton Yeah I totally agree, a simple fix would be great! |
Summary
Include a summary of major changes in bullet points:
as_dict
,from_dict
for CombinedData to overwrite superclass methodsstructure
,disassemble
,from_ff_and_topologie
,from_structur
methods for CombinedDataas_lammpsdata
method for CombinedData so that each attribute is deep copied