Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugs writing both normal and tar output #157

Closed
spencerahill opened this issue Mar 24, 2017 · 14 comments
Closed

Bugs writing both normal and tar output #157

spencerahill opened this issue Mar 24, 2017 · 14 comments

Comments

@spencerahill
Copy link
Owner

C.f. #155 (comment) and subsequent comments in that thread.

Most relevantly:

workaround is to use the engine='scipy' keyword argument in the call to to_netcdf

@spencerahill
Copy link
Owner Author

Having used the engine='scipy' fix locally, I'm still getting the following error when I try to submit calculations for a second time:

INFO:root:Writing desired gridded outputs to disk.
tar: Option --delete is not supported
Usage:
  List:    tar -tf <archive-filename>
  Extract: tar -xf <archive-filename>
  Create:  tar -cf <archive-filename> [filenames...]
  Help:    tar --help

@spencerahill spencerahill changed the title OSError coming from upstream xarray.to_netcdf Bugs writing both normal and tar output Mar 24, 2017
@spencerkclark
Copy link
Collaborator

Strange, it works for me...which versions of scipy / xarray are you using?

I'm using Python 3.6 on a GFDL workstation:

In [1]: import xarray as xr

In [2]: import scipy

In [3]: xr.__version__
Out[3]: '0.9.1'

In [4]: scipy.__version__
Out[4]: '0.18.1

@spencerahill
Copy link
Owner Author

Hmm, I have identical versions. It may not be a scipy issue, it could be the tar IO itself. Looking further into now...

If you have your mac handy, would be curious if you can replicate there

@spencerahill
Copy link
Owner Author

Ahh it has to do with how we are calling tar with the "--delete" flag: https://github.com/spencerahill/aospy/blob/develop/aospy/calc.py#L687

I'm guessing that either (1) the "--delete" keyword probably isn't universal to tar versions, and/or (2) the behavior is different on actual tape archives as at GFDL vs. more standard filesystems.

Good to rewrite this anyways; I don't like that we're using subprocess.call, seems like we should be able to accomplish what we want entirely through the builtin tarfile package

@spencerahill
Copy link
Owner Author

Will return to this after finishing #155

@spencerkclark
Copy link
Collaborator

Ahh it has to do with how we are calling tar with the "--delete" flag

Good find! So it seems my MWE and this are separate issues. We may still need to use engine='scipy' when overwriting regional calculations; I'll do some more testing to confirm.

@spencerahill
Copy link
Owner Author

We may still need to use engine='scipy' when overwriting regional calculations

That's fine by me

@spencerahill
Copy link
Owner Author

I can confirm that the issue is that "--delete" isn't supported by every tar version. I installed a newer GNU tar via homebrew and this goes away. Version info:

$ /usr/bin/tar --version
bsdtar 2.8.3 - libarchive 2.8.3

[tar-output-bugfix ?]shill@vpn-248-64:~/Dropbox/py/aospy/examples$ which tar
/usr/local/opt/gnu-tar/libexec/gnubin/tar

[tar-output-bugfix ?]shill@vpn-248-64:~/Dropbox/py/aospy/examples$ tar --version
tar (GNU tar) 1.29
Copyright (C) 2015 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by John Gilmore and Jay Fenlason.

Also, from what I can gather the only alternative I can find to using the '--delete' flag is to copy the entire tar archive, except for that file, overwriting the old tar archive with the new one, and then adding the updated netCDF file to the new tar archive. Since our tar archives can get pretty large, this seems undesirable to me.

Maybe we should add a test for this, wherein we submit the calculations two times.

One option is to just have a warning in the docs that this may happen and, if so, you should update your tar version. Still thinking through this all.

@spencerahill
Copy link
Owner Author

Maybe we should add a test for this, wherein we submit the calculations two times.

This is sort-of handled by test_submit_mult_calcs, since it runs twice, once with and without parallelize. But because we're calling via subprocess.call and not catching the stdout/stderr, these errors don't get caught.

So if we stay w/ subprocess.open, we need to actually catch the stderr and use it to raise if the write fails.

@spencerahill
Copy link
Owner Author

alternative I can find to using the '--delete' flag is to copy the entire tar archive, except for that file, overwriting the old tar archive with the new one, and then adding the updated netCDF file to the new tar archive

Looking more into this, this would be messy enough that I don't want to go this route. So I'm leaning towards the following:

  1. Note in the docs the potential for this error (especially if you're on a Mac), encouragement to fix by updating tar (via homebrew), or, if they can't/don't want to update, turn off the tar writing step.
  2. Actually implement making the tar output optional. It is within the Calc.compute and subsequent calls, but not in any way that's directly available to the user. This simply entails adding an e.g. write_to_tar kwarg to CalcInterface that ultimately gets fed to compute.

The only remaining question I have is whether to catch the tar error ourselves, and if so what to do with it. subprocess.call actually returns the exit code, so we could easily catch non-zero values. Or we could substitute with subprocess.check_call, which raises a CalledProcessError. But then what do we do? Maybe turn off tar writes for the rest of that calc? Replace/extend tar's error message with a more informative one of our own? But is that worth it? If we do that, we'd have to write tests to catch it, which could be the hardest part.

@spencerkclark let me know your thoughts on this when you get a chance. Right now I'm leaning towards just doing steps 1 and 2 for now, but I'm receptive to other arguments.

@spencerkclark
Copy link
Collaborator

@spencerahill can you give me a minimal working example to show that one cannot overwrite files using the 'a' option? I'm having trouble reproducing that behavior on my Mac:

import os
import tarfile

_FILE_ROOT = 'tar-tests/raw-files/'
_TAR_ROOT = 'tar-tests/'

def create_file(name, contents):
    with open(os.path.join(_FILE_ROOT, name), 'w') as f:
        f.write(contents)

create_file('a.txt', 'test a')
create_file('b.txt', 'test b')

# Create a tar file with a.txt and b.txt
with tarfile.open(os.path.join(_TAR_ROOT, 'test.tar'), 'a') as tar:
    tar.add(os.path.join(_FILE_ROOT, 'a.txt'), arcname='a.txt')
    tar.add(os.path.join(_FILE_ROOT, 'b.txt'), arcname='b.txt')

# Overwrite b.txt
create_file('b.txt', 'test c')

# Update the b.txt entry of the tar file
with tarfile.open(os.path.join(_TAR_ROOT, 'test.tar'), 'a') as tar:
    tar.add(os.path.join(_FILE_ROOT, 'b.txt'), arcname='b.txt')

Is this something weird to do with the archive filesystem at GFDL?

@spencerahill
Copy link
Owner Author

spencerahill commented Mar 27, 2017

The problem is that "overwriting" a file within a .tar archive just adds a new copy of it; the old one remains:

shill@vpn-248-64:~$ echo "123" > file1.txt
shill@vpn-248-64:~$ echo "456" > file2.txt
shill@vpn-248-64:~$ tar -cvf data.tar file*.txt
file1.txt
file2.txt
shill@vpn-248-64:~$ tar -tvf data.tar
-rw-r--r-- shill/staff       4 2017-03-26 22:24 file1.txt
-rw-r--r-- shill/staff       4 2017-03-26 22:24 file2.txt
shill@vpn-248-64:~$ echo "new 456" > file2.txt
shill@vpn-248-64:~$ tar --append --file=data.tar file2.txt
shill@vpn-248-64:~$ tar -tvf data.tar
-rw-r--r-- shill/staff       4 2017-03-26 22:24 file1.txt
-rw-r--r-- shill/staff       4 2017-03-26 22:24 file2.txt
-rw-r--r-- shill/staff       8 2017-03-26 22:26 file2.txt

tar uses the timestamps to always grab the newer version if there are multiple copies of the same file.

I don't think this is a good idea for aospy, because it's quite common that I re-do calculations that I've done before. Over time, this would lead to a huge number of extra, obsolete files in the .tar archive and potentially the .tar archive becoming really large.

I haven't tried this using Python's tarfile, however. The same occurs when using Python's tarfile module rather than doing it directly from the command line. From what I could gather in googling on this, this is inherent to the way tar works and actually dates all the way back to its original purpose for I/O on physical tape archives (Tape ARchives -> tar).

Of course I hope I'm wrong and was overthinking this!

@spencerahill
Copy link
Owner Author

  1. Note in the docs the potential for this error (especially if you're on a Mac), encouragement to fix by updating tar (via homebrew), or, if they can't/don't want to update, turn off the tar writing step.
  1. Actually implement making the tar output optional. It is within the Calc.compute and subsequent calls, but not in any way that's directly available to the user. This simply entails adding an e.g. write_to_tar kwarg to CalcInterface that ultimately gets fed to compute.

This is the route I'll go on this; trying to catch and handle the error seems like overkill for now. Working on this now.

@spencerkclark
Copy link
Collaborator

I agree; this seem like the best way to go.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants