Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optionally zip files at the end of jobs #414

Merged
merged 5 commits into from
Aug 31, 2023
Merged

Conversation

gpetretto
Copy link
Contributor

Summary

Implementation for the option to tune the zip of files at the end of (vasp) Jobs, as discussed in #361.
3 options are available: zipping all the files, zipping only code specific files or no zipping at all.

For vasp I gathered inputs and outputs from the links in #361, and added a few more that were not mentioned there.

TODO

  • tests
  • adding the option for other codes

@utf
Copy link
Member

utf commented Aug 14, 2023

Thanks @gpetretto. I think this is a great change, but can you use the existing gzip functionality in FileClient?

@utf
Copy link
Member

utf commented Aug 14, 2023

Actually, one does not typically use the file client directly but instead the helper functions in atomate2/common/files.py.

We already have a gunzip files function in there. But could you add a gzip_files function with a consistent set of arguments

Edit: we already have a gzip_files function!

def gzip_files(

@gpetretto
Copy link
Contributor Author

Thanks for pointing it out. Since it was originally using the monty gzip_dir I did not realize there was already an alternative in atomate2. Sorry.

I tried to simply switch to gzip_files, but I incurred in an issue and I am not sure if this should be considered the expected behavior or a bug. I tried to pass my list of files to zip, but due to this part of the code

else:
# no matches, only add the original file to be dealt with later
files.append(Path(file))

the execution fails, as it still adds the files in the include_files list, even if they don't exist.
I am not sure in which case it would be useful to have the file in the list of filtered files, even when it does not exist. May I remove that else condition in find_and_filter_files? Otherwise I will need to perform the check with glob before calling gzip_files.

@utf
Copy link
Member

utf commented Aug 14, 2023

I believe you can just set allow_missing to True?

@gpetretto
Copy link
Contributor Author

Sorry, I missed that again.

I switched to the gzip_files function, but now I have a doubt about how to implement the tests. For what I have seen all the test output files present in the tests are already zipped and I did not find a way to get the unzipped files in the output folder. Is there any way of doing that? Otherwise I can add a test folder with a minimal set of files already unzipped.

@utf
Copy link
Member

utf commented Aug 15, 2023

Otherwise I can add a test folder with a minimal set of files already unzipped.

I think this would be fine. Might be best to just touch the files, as their contents don't actually matter.

@codecov
Copy link

codecov bot commented Aug 15, 2023

Codecov Report

Merging #414 (6d3ad25) into main (18d014a) will decrease coverage by 0.08%.
Report is 25 commits behind head on main.
The diff coverage is 85.71%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #414      +/-   ##
==========================================
- Coverage   65.71%   65.64%   -0.08%     
==========================================
  Files          78       78              
  Lines        7462     7495      +33     
  Branches      964      975      +11     
==========================================
+ Hits         4904     4920      +16     
- Misses       2255     2270      +15     
- Partials      303      305       +2     
Files Changed Coverage Δ
src/atomate2/cp2k/jobs/base.py 89.39% <62.50%> (-3.94%) ⬇️
src/atomate2/common/files.py 80.19% <80.00%> (+5.19%) ⬆️
src/atomate2/lobster/jobs.py 94.44% <100.00%> (+0.32%) ⬆️
src/atomate2/settings.py 95.74% <100.00%> (+0.29%) ⬆️
src/atomate2/vasp/jobs/base.py 96.55% <100.00%> (+0.18%) ⬆️

... and 4 files with indirect coverage changes

@gpetretto
Copy link
Contributor Author

I think this would be fine. Might be best to just touch the files, as their contents don't actually matter.

I don't think I can do that while making a standard test, since the mocked execution still parses the outputs. Is there already a way to bypass the parsing? A minimal working test seems to need at least real vasprun.xml, OUTCAR and CONTCAR files.

Otherwise an overall easier way would be to move this portion of the code:

# gzip folder
if SETTINGS.VASP_ZIP_FILES == "atomate":
gzip_files(include_files=_FILES_TO_ZIP, allow_missing=True, force=True)
elif SETTINGS.VASP_ZIP_FILES:
gzip_files(force=True)

to a zip_vasp_files function and test it on its own.

@utf
Copy link
Member

utf commented Aug 18, 2023

I see. I'm not sure of the best way to do it. Unless we add some additional logic to the tests for specifically checking the gzip status.

@gpetretto
Copy link
Contributor Author

I tried to move the logic to a common function and test only its behavior.
I also added the same option for Lobster and CP2K.
For Lobster I followed @JaGeo's suggestion, but maybe it is better if she checks this.
For CP2K I only added cp2k.in and cp2k.out in the list of files to zip. However, since I have little knowledge of CP2K and I don't know if there are other large files that need to be zipped in the folder, I left the default of the setting CP2K_ZIP_FILES to True. Is there anyone that should be pinged and could provide a better list of files for CP2K?

@utf
Copy link
Member

utf commented Aug 31, 2023

I will merge this for now. Thanks very much @gpetretto

@utf utf merged commit 71f05e4 into materialsproject:main Aug 31, 2023
7 checks passed
@utf utf added the enhancement Improvements to existing features label Sep 1, 2023
@gpetretto gpetretto mentioned this pull request Sep 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improvements to existing features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants