Currently, when the file is downloaded using get(), it'll add host prefix to the downloaded file - but only when using multiple hosts. It also means that in the code you always need to re-check the name of the downloaded file. I found it both unnecessary and error-prone, and therefore suggest:
get(remote, local, host_prefix=True)
Here's what I (Jeff) see as the primary use cases for get that we need to consider in our solution, and what I think the default local results should be. Note: using trailing slashes for readability, they're not required; all local paths are relative for brevity. Multi-file examples are written with Erich's current host-suffix-based implementation first and a host-directory-based one in parens.
There are other, more extreme use cases that should probably be left up to the user to deal with, such as multiple files with identical basenames, or multiple runs of the same task in a row, both of which will clobber to various extents. Exposing a paramaterized string is probably a good way to go about this, as per Max's comment and other supporting commentary on IRC.
Originally submitted by **** (jmu) on 2009-11-04 at 01:49pm EST
Closed as Done on 2011-02-15 at 01:06am EST
Max Arnold (LwarX) posted:
I think host prefix should be completely optional and customizable. Probably this can be done with string interpolation:
To do this fabric can populate special dictionary with several variables for each get run ('filename', 'host', etc...)
on 2010-08-16 at 03:21am EDT
Jeff Forcier (bitprophet) posted:
Making this 1.0, it should be considered along with #140 and friends. Note that the implementation Erich wrote for #140 does not address this problem other than making sure that the existing behavior (create per-host files if running on more than one host) applies cleanly to folders as well as files.
Thus we still have to figure out how this problem is best solved, and make sure we preserve the folder/recursive behavior too.
on 2010-09-17 at 03:29pm EDT
So speaking specifically about this ticket, I'm seeing the following solutions:
Out of all these, I actually think the first one is possibly the best, all things considered; unless somebody has objections I haven't thought of yet (quite possible). Having recursive behavior definitely complicates things (insofar as you will sometimes end up with local files named <hostname>/filename.txt and other times filename.txt.<hostname>) but I don't see any easy way around that part of things.
on 2010-09-17 at 04:17pm EDT
Erich Heine (sophacles) posted:
I think that in all cases, a directory should be made for each host in the target directory.
so with the fabfile:
$ fab getfiles
It is simple, consistent and easily scriptable for later stuff.
I also think that there should be some customization allowed, in the form of the dirname. I think there should be a way to look up a representation on a per host basis. This last bit tho, could probably wait until host objects are done.
on 2010-09-17 at 05:33pm EDT
I do like the "just always use per-host directories" approach as it could simplify things (basically a cleaner version of option 1 above).
I also wonder if it might not be even better to make the "target" dir/filename optional, and default to using the remote file path. E.g. connecting to fabfile.org as jforcier and calling get('TODO.txt') would result in a local file/folder structure of $CWD/fabfile.org/home/jforcier/TODO.txt. And get('/var/www/.htaccess') would result in $CWD/fabfile.org/var/www/.htaccess. Etc.
I'd like to think this approach would solve any/all collision problems, at least across the dimensions of multiple hosts or multiple files with the same basename (e.g. get('/home/jforcier/.bashrc') ; get('/home/deploy/.bashrc') would result in file overwriting under any other implementation).
get('/home/jforcier/.bashrc') ; get('/home/deploy/.bashrc')
It introduces one minor extra complexity on top of the overall "what's my local filename?" issue that all solutions have, namely the implicit remote $HOME in relative remote paths. However I think this is small enough that the benefits of this approach outweigh it.
Unfortunately, we'd almost definitely still have to allow for users manually specifying the target local path, which means we still have to handle all the messy file-vs-directory crap anyways. Ugh. Will have to think on this some more, or maybe we ought to approach this from the use-case angle.
on 2010-09-19 at 01:41pm EDT
After updating the description and chatting in IRC I think we should use a combined approach that focuses on the format-string solution.
We should use the following components:
This format string applies to resulting individual files; doesn't matter if recursion is used. This threw me for a conceptual loop earlier on, I didn't see how this could work in tandem with the recursive behavior. If you just think about remote_dir/subdir/ as turning into a list of eg ['remote_dir/subdir/file1.txt', 'remote_dir/subdir/subdir2/file2.txt', ...], then it makes sense.
['remote_dir/subdir/file1.txt', 'remote_dir/subdir/subdir2/file2.txt', ...]
The default format string should be %(host)s/%(filename)s, so the single vs multi host situation has a single default. We should also return a list of the final file paths; a nested structure is too complicated, but we might as well return something instead of nothing.
Note that this means the only required argument would now be the source/remote filename.
The only problem I see with this approach is that of backwards compatibility, and/or users simply giving a non-format string in the 2nd argument position. The only solution to that I can see is to use the following heuristics:
on 2010-09-20 at 04:55pm EDT
I think the bullet points above are too much effort given the tradeoff. Instead, I think this is what I'll do:
The benefits of this are A) it's less overall work, we're simply dropping one old chunk of code and adding one unrelated new one, and B) it's heavily backwards compatible for all the more common use cases (and also the way we use it in our test suite, to be honest).
on 2011-01-19 at 01:21pm EST
OK, I think I'm pretty much done. All the older tests (as in, the ones put in prior to this ticket's work) still pass, except for a few that no longer apply and were removed; some were mutated to fit the new behavior; and another one or two were added.
Merging into the put-get integration branch and continuing...
on 2011-01-21 at 04:15pm EST
Found a case not tested which is, naturally, broken -- using format strings plus recursion. Derp.
Think I may end up merging sftp.py's get/get_dir split, but have to look at it a bit more.
on 2011-01-23 at 11:01am EST
The core problem with this edge case (applying format strings to recursive=True) seems to be that it forms a different behavior from what has already been implemented, re: how local_path needs to be treated in the simple (no format strings) case:
get('/path/to/remote_directory/*', 'local_existing_dir', recursive=False)
However! When we need to get recursive, the expectation is that remote_path and its subdirectories will be created locally, mirroring the remote end. Thus, given a remote filesystem containing remote_directory/subdir/file1, calling get('remote_directory', 'local_directory', recursive=True) would not write out local_directory/file1, but would write local_directory/remote_directory/subdir/file1 instead. And thus get() needs to tack on the remote directory hierarchy to the end of local_path.
get('remote_directory', 'local_directory', recursive=True)
In the above bullet-point cases, this is not a problem, because we're only getting the "leaf" file: even if one calls get('big/long/hierarchy/file.txt', 'local_dir'), we still only write out local_dir/file.txt. This isn't feasible for the recursive case -- flattening just won't work -- thus the recreation of the remote hierarchy.
So! What happens if local_path contains format strings such as %(dirname)s or %(path)s? The simple cases work fine, because they will still map to a single local file path. But in recursion, appending the remote directory structure would result in local paths like %(dirname)s/remote_directory/subdir/file1. Expand this and it becomes remote_directory/subdir/remote_directory/subdir/file1 -- which is incorrect.
I don't have a ready answer for this offhand, but it took a while just to get a grasp on why this was such a problem, so wanted to write it down here. Will have to think a bit. Suggestions welcome.
on 2011-02-13 at 10:10pm EST
on 2011-02-13 at 10:29pm EST
Went with the above, plus a handful of other changes I noticed while implementing/testing.
Most notably, making it behave like scp in that %(dirname)s/%(path)s reflect the relative remote path instead of the absolute one. I.e. get('/var/log/*', '%(path)s', recursive=True) was, earlier, creating $CWD/var/log/apache2/access.log and so forth. It should be cutting things off after /var/log, and creating $CWD/apache2/access.log etc.
get('/var/log/*', '%(path)s', recursive=True)
After all this, the code is pretty messy and the docstring is absolutely gargantuan; I'm unhappy with how this has all turned out. However, it's definitely not worth holding off the 1.0 release to make this "perfect". And barring any more unfound bugs, it's much more powerful/flexible than the 0.9 incarnation, while (I think) still almost as usable in the base case, and almost entirely backwards compatible to boot.
Marking this done for reals this time, can reopen if any prerelease sprints/testing finds more bugs.
on 2011-02-15 at 01:06am EST