-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugs writing both normal and tar output #157
Comments
Having used the
|
Strange, it works for me...which versions of scipy / xarray are you using? I'm using Python 3.6 on a GFDL workstation:
|
Hmm, I have identical versions. It may not be a scipy issue, it could be the tar IO itself. Looking further into now... If you have your mac handy, would be curious if you can replicate there |
Ahh it has to do with how we are calling tar with the "--delete" flag: https://github.com/spencerahill/aospy/blob/develop/aospy/calc.py#L687 I'm guessing that either (1) the "--delete" keyword probably isn't universal to tar versions, and/or (2) the behavior is different on actual tape archives as at GFDL vs. more standard filesystems. Good to rewrite this anyways; I don't like that we're using subprocess.call, seems like we should be able to accomplish what we want entirely through the builtin tarfile package |
Will return to this after finishing #155 |
Good find! So it seems my MWE and this are separate issues. We may still need to use |
That's fine by me |
I can confirm that the issue is that "--delete" isn't supported by every tar version. I installed a newer GNU tar via
Also, from what I can gather the only alternative I can find to using the '--delete' flag is to copy the entire tar archive, except for that file, overwriting the old tar archive with the new one, and then adding the updated netCDF file to the new tar archive. Since our tar archives can get pretty large, this seems undesirable to me. Maybe we should add a test for this, wherein we submit the calculations two times. One option is to just have a warning in the docs that this may happen and, if so, you should update your tar version. Still thinking through this all. |
This is sort-of handled by test_submit_mult_calcs, since it runs twice, once with and without parallelize. But because we're calling via So if we stay w/ subprocess.open, we need to actually catch the stderr and use it to raise if the write fails. |
Looking more into this, this would be messy enough that I don't want to go this route. So I'm leaning towards the following:
The only remaining question I have is whether to catch the tar error ourselves, and if so what to do with it. @spencerkclark let me know your thoughts on this when you get a chance. Right now I'm leaning towards just doing steps 1 and 2 for now, but I'm receptive to other arguments. |
@spencerahill can you give me a minimal working example to show that one cannot overwrite files using the import os
import tarfile
_FILE_ROOT = 'tar-tests/raw-files/'
_TAR_ROOT = 'tar-tests/'
def create_file(name, contents):
with open(os.path.join(_FILE_ROOT, name), 'w') as f:
f.write(contents)
create_file('a.txt', 'test a')
create_file('b.txt', 'test b')
# Create a tar file with a.txt and b.txt
with tarfile.open(os.path.join(_TAR_ROOT, 'test.tar'), 'a') as tar:
tar.add(os.path.join(_FILE_ROOT, 'a.txt'), arcname='a.txt')
tar.add(os.path.join(_FILE_ROOT, 'b.txt'), arcname='b.txt')
# Overwrite b.txt
create_file('b.txt', 'test c')
# Update the b.txt entry of the tar file
with tarfile.open(os.path.join(_TAR_ROOT, 'test.tar'), 'a') as tar:
tar.add(os.path.join(_FILE_ROOT, 'b.txt'), arcname='b.txt') Is this something weird to do with the archive filesystem at GFDL? |
The problem is that "overwriting" a file within a .tar archive just adds a new copy of it; the old one remains:
tar uses the timestamps to always grab the newer version if there are multiple copies of the same file. I don't think this is a good idea for aospy, because it's quite common that I re-do calculations that I've done before. Over time, this would lead to a huge number of extra, obsolete files in the .tar archive and potentially the .tar archive becoming really large.
Of course I hope I'm wrong and was overthinking this! |
This is the route I'll go on this; trying to catch and handle the error seems like overkill for now. Working on this now. |
I agree; this seem like the best way to go. |
C.f. #155 (comment) and subsequent comments in that thread.
Most relevantly:
The text was updated successfully, but these errors were encountered: