-
Notifications
You must be signed in to change notification settings - Fork 895
speed up bader_caller
and chargemol_caller
#3192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ile.ScratchdDr` context manager
Thank you so much for doing this! It has been on my backlog for a very long time. |
You don't need to add it in this PR (unless you're feeling ambitious!), but it's likely that an analogous fix will solve similar issues with the pymatgen/pymatgen/command_line/chargemol_caller.py Lines 181 to 189 in 0b7607a
|
@arosen93 thanks for pointing out chagemol as well! It actually requires little modification so I guess I can try to solve them both. |
@chiang-yuan: I would certainly be very grateful if you did!!! 🙏 |
Chargemol needs to see all the input files in the same folder so the way around here is using symbolic link to point to large files . I also port the error message by bader and chagmol back to python RuntimeError. I have used |
bader_caller
bader_caller
and chargemol_caller
Looks like |
I found the problem. The code was reading compressed test files but somehow didn't raise error on Perlmutter. Modify the code to handle compressed files and decompress them to temp directory if needed. Not sure if this is the ideal solution since this falls back to copying data to temp folder again (but only for compressed file and folder, the uncompressed files are still read at their original place). This will also affect when we want to compress all the test files. #2994 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chiang-yuan Thanks for taking this on! Very excited about fully reviving the BaderAnalysis
in pymatgen
. Left a few questions.
import subprocess | ||
import warnings | ||
from glob import glob | ||
from shutil import which | ||
from tempfile import TemporaryDirectory | ||
|
||
import numpy as np | ||
from monty.io import zopen | ||
from monty.tempfile import ScratchDir |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually don't understand why the files need to be copied at all? Does bader
modify them in place? Seems unlikely and even if, they should provide an option to disable that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They don't! So what I did here is creating a temporary scratchdir to store the output files from bader. However, bader cannot read compressed file so at the bottom we need to decompress file if it is compressed and store it temporarily somewhere (I just put it in ScratchDir as well)
os.symlink(self._aeccar0path, "./AECCAR0") | ||
os.symlink(self._aeccar2path, "./AECCAR2") | ||
except OSError as e: | ||
print(f"Error creating symbolic link: {e}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using os.symlink
here seems strange. Why not read the files from where they are?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
chargemol
is different. It seems like here is no input arguments to indicate where the input files (e.g. CHGCAR AECCAR0) stored, so we need to all the input file in ScratchDir. To avoid copying, just use symbolic link can "pretend" there are in the same directory but actually they are just the shortcut linked to the file in the original place.
@@ -78,6 +78,8 @@ def test_from_path(self): | |||
assert np.allclose(charge, charge0) | |||
if os.path.exists("CHGREF"): | |||
os.remove("CHGREF") | |||
if os.path.exists(os.path.join(test_dir, "CHGREF")): | |||
os.remove(os.path.join(test_dir, "CHGREF")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would probably be best to use the PymatgenTest.tmp_path
fixture so we don't have to worry about manually cleaning up test-generated files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is PymatgenTest.tmp_path
already implemented? If yes, I need some time to look into how to refactor this part. If not, I need more time 😂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, right here
pymatgen/pymatgen/util/testing.py
Lines 50 to 54 in 71b7aec
@pytest.fixture(autouse=True) # make all tests run a in a temporary directory accessible via self.tmp_path | |
def _tmp_dir(self, tmp_path: Path, monkeypatch: pytest.MonkeyPatch) -> None: | |
# https://pytest.org/en/latest/how-to/unittest.html#using-autouse-fixtures-and-accessing-other-fixtures | |
monkeypatch.chdir(tmp_path) # change to pytest-provided temporary directory | |
self.tmp_path = tmp_path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks I am looking into it!
pymatgen/util/io_utils.py
Outdated
@@ -17,6 +17,29 @@ | |||
__date__ = "Sep 23, 2011" | |||
|
|||
|
|||
def decompress_file_to_path(fin_path, fout_path=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this quite similar to monty.io.zopen
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually is more similar to decompress_file
in monty
but the function won't return the path and delete the original compressed file. Created a helper function there as a fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume @shyuep wouldn't mind modifying monty.shutil.decompress_file
, esp. since the doc string is currently wrong: arg compression
doesn't actually exist.
We can make a PR to monty to have the function return the path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chiang-yuan Let's delete this function and use the changes in materialsvirtuallab/monty#536 instead.
@shyuep Can we have a monty
release whenever we do the next pymatgen
release?
Sorry, looks like this was waiting on me. I missed that. Just fixed the merge conflict. I'll submit the PR to @chiang-yuan Any additional tests we might want to add? |
pymatgen/util/io_utils.py
Outdated
@@ -17,6 +17,29 @@ | |||
__date__ = "Sep 23, 2011" | |||
|
|||
|
|||
def decompress_file_to_path(fin_path, fout_path=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chiang-yuan Let's delete this function and use the changes in materialsvirtuallab/monty#536 instead.
@shyuep Can we have a monty
release whenever we do the next pymatgen
release?
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## master #3192 +/- ##
==========================================
- Coverage 74.57% 74.01% -0.57%
==========================================
Files 230 230
Lines 69494 69496 +2
Branches 16166 16163 -3
==========================================
- Hits 51824 51436 -388
- Misses 14595 15019 +424
+ Partials 3075 3041 -34
☔ View full report in Codecov by Sentry. |
@janosh I made two modifications, 1) delete The |
E assert 6.395428 == 6.6136782 ± 1.0e-03 E comparison failed E Obtained: 6.395428 E Expected: 6.6136782 ± 1.0e-03 /home/runner/work/pymatgen/pymatgen/tests/command_line/test_bader_caller.py:37: AssertionError
… compressed version which can't happen twice in same directory
E assert 6.613678 == 6.395428 ± 1.0e-03 and fix mypy
Thanks @chiang-yuan! 👍 this is a big improvement. required a lot of tweaking but i'm glad we got it merged. |
Also offering thanks for this :) |
In the original implementation, CHGCAR is copied into temporary folder by
monty.zopen
. Since CHGCAR is usually very large, this makes bader analysis calling from pymatgen extremely slow.this might close #2487
monty.tempfiie.ScratchdDr
context manageros.symlink
to link larges files forchargemol_caller
Checklist
ruff
.mypy
.