speed up bader_caller and chargemol_caller#3192
speed up bader_caller and chargemol_caller#3192janosh merged 23 commits intomaterialsproject:masterfrom
bader_caller and chargemol_caller#3192Conversation
…ile.ScratchdDr` context manager
|
Thank you so much for doing this! It has been on my backlog for a very long time. |
|
You don't need to add it in this PR (unless you're feeling ambitious!), but it's likely that an analogous fix will solve similar issues with the pymatgen/pymatgen/command_line/chargemol_caller.py Lines 181 to 189 in 0b7607a |
|
@arosen93 thanks for pointing out chagemol as well! It actually requires little modification so I guess I can try to solve them both. |
|
@chiang-yuan: I would certainly be very grateful if you did!!! 🙏 |
|
Chargemol needs to see all the input files in the same folder so the way around here is using symbolic link to point to large files . I also port the error message by bader and chagmol back to python RuntimeError. I have used |
bader_callerbader_caller and chargemol_caller
|
Looks like |
|
I found the problem. The code was reading compressed test files but somehow didn't raise error on Perlmutter. Modify the code to handle compressed files and decompress them to temp directory if needed. Not sure if this is the ideal solution since this falls back to copying data to temp folder again (but only for compressed file and folder, the uncompressed files are still read at their original place). This will also affect when we want to compress all the test files. #2994 |
janosh
left a comment
There was a problem hiding this comment.
@chiang-yuan Thanks for taking this on! Very excited about fully reviving the BaderAnalysis in pymatgen. Left a few questions.
|
|
||
| import numpy as np | ||
| from monty.io import zopen | ||
| from monty.tempfile import ScratchDir |
There was a problem hiding this comment.
I actually don't understand why the files need to be copied at all? Does bader modify them in place? Seems unlikely and even if, they should provide an option to disable that?
There was a problem hiding this comment.
They don't! So what I did here is creating a temporary scratchdir to store the output files from bader. However, bader cannot read compressed file so at the bottom we need to decompress file if it is compressed and store it temporarily somewhere (I just put it in ScratchDir as well)
| os.symlink(self._aeccar0path, "./AECCAR0") | ||
| os.symlink(self._aeccar2path, "./AECCAR2") | ||
| except OSError as e: | ||
| print(f"Error creating symbolic link: {e}") |
There was a problem hiding this comment.
Using os.symlink here seems strange. Why not read the files from where they are?
There was a problem hiding this comment.
chargemol is different. It seems like here is no input arguments to indicate where the input files (e.g. CHGCAR AECCAR0) stored, so we need to all the input file in ScratchDir. To avoid copying, just use symbolic link can "pretend" there are in the same directory but actually they are just the shortcut linked to the file in the original place.
| if os.path.exists("CHGREF"): | ||
| os.remove("CHGREF") | ||
| if os.path.exists(os.path.join(test_dir, "CHGREF")): | ||
| os.remove(os.path.join(test_dir, "CHGREF")) |
There was a problem hiding this comment.
Would probably be best to use the PymatgenTest.tmp_path fixture so we don't have to worry about manually cleaning up test-generated files.
There was a problem hiding this comment.
Is PymatgenTest.tmp_path already implemented? If yes, I need some time to look into how to refactor this part. If not, I need more time 😂
There was a problem hiding this comment.
Yes, right here
pymatgen/pymatgen/util/testing.py
Lines 50 to 54 in 71b7aec
There was a problem hiding this comment.
Thanks I am looking into it!
| __date__ = "Sep 23, 2011" | ||
|
|
||
|
|
||
| def decompress_file_to_path(fin_path, fout_path=None): |
There was a problem hiding this comment.
Isn't this quite similar to monty.io.zopen?
There was a problem hiding this comment.
Actually is more similar to decompress_file in monty but the function won't return the path and delete the original compressed file. Created a helper function there as a fix
There was a problem hiding this comment.
I assume @shyuep wouldn't mind modifying monty.shutil.decompress_file, esp. since the doc string is currently wrong: arg compression doesn't actually exist.
We can make a PR to monty to have the function return the path.
There was a problem hiding this comment.
@chiang-yuan Let's delete this function and use the changes in materialyzeai/monty#536 instead.
@shyuep Can we have a monty release whenever we do the next pymatgen release?
|
Sorry, looks like this was waiting on me. I missed that. Just fixed the merge conflict. I'll submit the PR to @chiang-yuan Any additional tests we might want to add? |
| __date__ = "Sep 23, 2011" | ||
|
|
||
|
|
||
| def decompress_file_to_path(fin_path, fout_path=None): |
There was a problem hiding this comment.
@chiang-yuan Let's delete this function and use the changes in materialyzeai/monty#536 instead.
@shyuep Can we have a monty release whenever we do the next pymatgen release?
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## master #3192 +/- ##
==========================================
- Coverage 74.57% 74.01% -0.57%
==========================================
Files 230 230
Lines 69494 69496 +2
Branches 16166 16163 -3
==========================================
- Hits 51824 51436 -388
- Misses 14595 15019 +424
+ Partials 3075 3041 -34
☔ View full report in Codecov by Sentry. |
|
@janosh I made two modifications, 1) delete The |
E assert 6.395428 == 6.6136782 ± 1.0e-03 E comparison failed E Obtained: 6.395428 E Expected: 6.6136782 ± 1.0e-03 /home/runner/work/pymatgen/pymatgen/tests/command_line/test_bader_caller.py:37: AssertionError
… compressed version which can't happen twice in same directory
E assert 6.613678 == 6.395428 ± 1.0e-03 and fix mypy
|
Thanks @chiang-yuan! 👍 this is a big improvement. required a lot of tweaking but i'm glad we got it merged. |
|
Also offering thanks for this :) |
In the original implementation, CHGCAR is copied into temporary folder by
monty.zopen. Since CHGCAR is usually very large, this makes bader analysis calling from pymatgen extremely slow.this might close #2487
monty.tempfiie.ScratchdDrcontext manageros.symlinkto link larges files forchargemol_callerChecklist
ruff.mypy.