Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Group.get needs to allow parameterized local filenames to avoid overwriting #1868

Closed
cooperlees opened this issue Sep 5, 2018 · 9 comments
Closed

Comments

@cooperlees
Copy link

Hi,

I'd like to be able to .get() on a ThreadingGroup and have the hostname/ip/somethingunique added to the dst filename so I can get the same file from multiple hosts.

Sorry if I missed it, but I can not see a clean way to move to:

ThreadingGroup(*hosts, **DEFAULT_FAB_KWARGS).get(self._src, self._dst)

from

        ...
        with fab.settings(fab.hide("running"), warn_only=True):
            fab.execute(self._get, hosts=hosts)

    @fab.parallel
    def _get(self):
        fab.get(self._src, self._dst + "." + fab.env.host)

Please merge or propose workaround if they exist! Thanks.

@bitprophet
Copy link
Member

I'm hoping this is in the upgrade doc because yes, it's a clear missing feature from v1 :)

EDIT: it is not in that doc! I'll add a line item and I think this may be the "TODO" ticket for this particular aspect of the feature so I'll tweak the desc.

@bitprophet bitprophet changed the title ThreadingGroup + get() with host specific dst filename Group.get needs to allow parameterized local filenames to avoid overwriting Sep 6, 2018
@bitprophet
Copy link
Member

We'll probably want to honor how v1 did this (which I was just in the middle of reinventing myself before I edited this comment...sigh) which is tl;dr an interpolated string config value, handed a bunch of values for parameterization (remote basename and pathname, connection object params, etc).

Main diff would be that we should hand in just a connection key/val by itself instead of manually teasing out useful Connection obj attributes ourselves as individual keys (i.e. no "{host}", but instead, "{connection.host}").

@bitprophet bitprophet added this to the p3 milestone Sep 6, 2018
@cooperlees
Copy link
Author

Thanks. I checked the document multiple times for this feature and couldn't find it anywhere. Many thanks for acknowledging and updating the PR + docs.

I'm not fussed, long as there is a nice way with the new API to use it I'll be super happy. Also happy to try and help, but prob getting up to speed might take a bit of time.

@bitprophet
Copy link
Member

Linking this to #1810 (Group.get/put) where it obviously becomes pretty necessary. Will see what shakes out. I'd prefer to solve it generically but may end up having to solve it for Group specifically first.

@bitprophet
Copy link
Member

bitprophet commented Jan 4, 2021

Thinking out loud. v1's behavior:

  • interpolation is a mix of what in v2 would be Connection properties (host string, as a single mashed up thing) and remote file properties (dirname, basename and full path)
    • I do not recall offhand how much of the latter is trivially available in Paramiko and how much is being done on Fabric's end
  • local path defaults to <host-string>/<remote-path>, so eg saying get('/var/log/foo.log') on a host downloads to remotehostname/var/log/foo.log locally
    • This is true even for a single connection, i.e., it is more complex than one might assume for simple cases, in order to not change behavior for complex ones (or eg to overwrite the same file N times, which would always be irritating/surprising)

My gut says that for an initial port to v2, I want to do a MVP and not have it take a bunch of extra time to be 100%, if we can just add the remainder later on in compatible fashion. We also want to take the opportunity to revisit the above decisions.

  • Currently, our Transfer.get defaults to the equivalent of "just the remote filename in local cwd", so we cannot reasonably change that without breaking current compat.
    • Though we could at least make Group.get (from Implement put/get for Group #1810) do it by default, as that is new.
    • Long term (eg 3.x) it seems reasonable to make even Connection/Transfer.get do the "default to including remote hostname in local path" thing?
      • but I guess we'll see, if nobody clamours for it during the lifetime of 2.x...maybe it was an overthinking?
    • in the ideal, we want Group to not be adding a ton of special new behavior vs Connection; users who want to eschew Group for their own iteration/concurrency wrapped around Connection should not have to reimplement a bunch of Group behaviors
  • Giving the interpolation the connection instance seems obvious, one can just do eg "{host}/foo.log" for hostname
    • Working in remote path bits is likely just an implementation detail one way or another
    • Arguably we want to be zen of python and make the user explicitly say "{connection.host}/{file.basename}" or whatnot, but I'm not sure that's worth the hassle for the average use case...? will see how it plays out in tests
  • I am not a huge fan of how the original leaned on host strings as that meant the local directories would differ in format depending on how specific the connection was (eg remotehost vs user@remotehost vs remotehost-port vs etc)
    • But on the flip side, always having the normalized version is annoyingly verbose
    • I suspect defaulting to just the host is plenty sufficient; it is by far the primary differentiator in reality, and anybody with offspec needs (eg hitting "the same host" on different ports for reasons) can just supply an explicit local path argument with the interpolation they need.
  • It's not clear to me how valuable the remote basename/pathname/fullpath bits are in reality, so if they require a bunch of fabric-level code vs already being trivially accessible from the Paramiko sftp objects, I'm very likely to make that the next iteration instead
    • or not at all - my gut says most use cases will be happy with per-connection directories or partial filenames, plus the remote basename
    • that said, since we clearly already have access to the base filename, having that part available seems a useful middle ground
  • The currently described behavior of Transfer.get says it doesn't autocreate directories on your behalf. This was a reaction to the burden of convenience/magic behavior in v1.
    • However it doesn't square well with what we want for common interpolation use cases here, ie we want it to be easy for users to supply a local path of "{host}/{filename}", or conversely, requiring them to premake those per host directories themselves would be very irritating friction
    • So this may be one area where Group legitimately wants to have "extra" behavior ("I manage a pile of Connections for you, including things like making per-host directories for downloading files") versus Connection itself.
    • OTOH that is an odd split (Connection doing interpolation of strings but only Group doing intermediate directory credation) and may be surprising/confusing.
    • It could also be argued that this is one case where changing our minds is semi backwards compatible (what used to be an error unless you did manual labor, now just works).
      • Counter-counterpoint: this just moves the needle back towards "we do a ton of magic shit for the user and whoopsie-doodle, now we have a very bug-prone implementation because of that".

@bitprophet
Copy link
Member

bitprophet commented Jan 4, 2021

That's a lot of words. What is the shakeout?

  • Clear first step: interpolate some subset of connection attributes & remote file attributes, in Connection
  • Clear second step: Group.get adds a default local path value containing Connection.host (relying on current 'if you gave a dir, the file goes inside it' behavior for the file part)
  • Unclear third step: auto-mkdir?
    • Tension between user friction (making your own directories is 💩 ) and "more code means more bugs" (autocreating directories leads to more possibilities for bugs/errors the user cannot work around) - but I suspect convenience wins here.
      • Future option for a "don't do that for me" default-True kwarg addition, eg users suffering some weird mkdir bug could do get(remote, local, mkdir=False) - but hopefully I am being paranoid about this.
    • Also tension between doing that magic in Connection vs in Group - I suspect having Connection do it is cleaner despite being very technically backwards incompatible.
  • Later fourth step: if the 'pathy' bits of interpolation require Fabric-level code, make them an additive improvement later. Only support what is easy, today.

@bitprophet
Copy link
Member

bitprophet commented Jan 18, 2021

Implemented the above. There's no useful pathy bits in Paramiko and we were already applying posixpath.basename within Transfer, so I just extended that to give dirname and basename for the time being. Between those and the Connection bits, should be a pretty big upgrade for v2 / be about on par with v1.

I did also do auto-mkdir using os.makedirs. It's not ideal because we cannot easily track whether that call DID anything and therefore we cannot easily undo it if the transfer fails; so there will sometimes be extra directories lying around in those cases. Not worth fussing over for now.

@bitprophet
Copy link
Member

Will be out in 2.6, ideally today.

@bitprophet
Copy link
Member

Note for completion: ended up having to s/os.makedirs/pathib.Path.mkdir/ as the former was just too limited.

bitprophet added a commit that referenced this issue Jan 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants