New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow any buffer type to be written to SFTPFile (1.17) #971
Conversation
Fixes paramiko#967 Also adds test coverage for writing various types to BufferedFile which required some small changes to the test LoopbackFile subclass. Change against the 1.17 branch.
Fixes paramiko#968 Changes the behaviour of the underlying asbytes helper to pass along unknown types. Most callers already handle this by passing the bytes along to a file or socket-like object which will raise TypeError anyway. Adds test coverage through the Transport implementation. Change against the 1.17 branch.
Fixes paramiko#967 paramiko#968 Rollup of earlier branches proposed as paramiko#969 and paramiko#970 with additional fix inside sftp_client. Includes new tests for SFTPFile usage. Change against the 1.17 branch.
paramiko/file.py
Outdated
data = b(data) | ||
if isinstance(data, text_type): | ||
# GZ 2017-05-25: Accepting text on a binary stream unconditionally | ||
# cooercing to utf-8 seems questionable, but compatibility reasons? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel like I recall other tickets about this (re: preserving truly binary data instead of passing through Unicode), but unable to find them at a glance. ¯\_(ツ)_/¯
At present, this is effectively what b()
was doing, so...not really much worse?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Truly binary data would not be text_type
(str
for py3, unicode
for py2), it would be bytes or something like that, so it should be passed through fine. If someone calls write()
with a unicode type, this kicks in, turning unicode to utf-8 bytes. This seems like the best thing to do if someone passes unicode here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, on Python 2, binary data well might be str
, thus the confusion :D but yea, not really that worried, just making a note of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's confusing - you're thinking of string_types
. text_type
is just unicode
on python2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, I sure am.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... I also typoed my first comment, was "str
for py2, unicode
for py2", first py2 should have been py3, I just corrected it ... that couldn't have helped ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's an important distinction from b()
in this change, in that it allows all other types through. This is a tradeoff, callers get duck typing, but internal library interfaces can't rely on having a bytes type. For BufferedFile
we get sane type handing when writing to BytesIO
later so just passing along is best.
The reason I added this comment is that while handling here makes sense on existing branches for compatibility reasons, I'm not sure it does for master. It might be better to just raise if given unicode, and add an encoding parameter/encode step to higher level interfaces.
elif isinstance(item, SFTPAttributes): | ||
item._pack(msg) | ||
else: | ||
raise Exception('unknown type for %r type %r' % (item, type(item))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd love a code comment here noting explicitly how "it was not any of the other expected types, so we assume it's something that can be asbytes()
'd into a packet-level string type, and hand it to add_string()
which does so" (assuming I'm interpreting that intent correctly!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's reasonable, I'll add a comment along those lines.
paramiko/common.py
Outdated
if isinstance(s, bytes_types): | ||
return s | ||
if isinstance(s, text_type): | ||
# GZ 2017-05-25: Accept text and encode as utf-8 for compatibilty only. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
compatibilty -> compatibility
what is "GZ"? probably don't need signature/date for this comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was curious about that too. Was wondering if some sort of joke around the submitter's Github username (bz2
) or vice versa :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'GZ' is indeed a person id for me, which I agree is confusing given the nick I use on this site.
I like id/date as markers with comments for review on tricky things that may need discussion. Am happy to move to docstring, strip or delete the comment for landing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with them being comments instead of docstrings, it's just that we're not used to the "initials: date" prefix formatting. Stripping them out is probably the way to go, for consistency's sake. Git blame is a thing :D
Appreciate the rigorous ticket + PR setup! Though in the interests of corralling conversation/changelog entries/etc, I've closed all but this one. First, thanks for cleaning this up - sounds at a glance that most of the broken API promises re: buffer/socket type objects, happened in the big Python 3 shakeup (release 1.13). (The gift that keeps on giving!) Second, it's probably a non-issue since it doesn't touch the crypto bits much, but I wonder how this will work out when merged up into the 2.x line - something to be aware of. Given the current state of branches, I may only release this in the 1.18+ or even 2.2+ lines, depending. Super appreciate the port to 1.17+ either way though - options are nice. Third, I'm currently going through and reviewing / leaving occasional line notes on the patchset. Will report back here with anything major, otherwise once you have a chance to respond to those I'll look at merging. EDIT: all done with that, ball back in your court! |
Tagging as both bug and feature since I'm torn on whether this qualifies as one of those "bugfix, but potentially widely instability-causing" changes that like to come out in feature releases. |
Thanks for the review! I've responded to the inline comments, and will push up a change with the suggested comment tweaks shortly. I tried to write the change to minimise conflicts, so it should apply cleanly-ish to all later branches. That said, there is some fixing up that would be nice to do around 2/3 compat and tests after this lands on master, which I'd be happy to propose as well. |
Thanks to bitprophet and ploxiln.
Added a possible changelog entry in f0124d9 - feel free to use/rewrite as desired. |
|
Thanks! The 1.17 merge to 2.0 looked pretty clean so I didn't propose a separate branch for that series, but just double checked your cherrypick and it all looks good to me. |
Fixes #967 #968
Rollup of earlier branches proposed as #969 and #970 with additional fix inside sftp_client.
Includes new tests for SFTPFile usage.