Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow any buffer type to be written to SFTPFile (1.17) #971

Merged
merged 8 commits into from Jun 9, 2017

Conversation

bz2
Copy link
Contributor

@bz2 bz2 commented May 26, 2017

Fixes #967 #968

Rollup of earlier branches proposed as #969 and #970 with additional fix inside sftp_client.

Includes new tests for SFTPFile usage.

bz2 added 7 commits May 25, 2017 20:35
Fixes paramiko#967

Also adds test coverage for writing various types to
BufferedFile which required some small changes to the test
LoopbackFile subclass.

Change against the 1.17 branch.
Fixes paramiko#968

Changes the behaviour of the underlying asbytes helper to
pass along unknown types. Most callers already handle this
by passing the bytes along to a file or socket-like object
which will raise TypeError anyway.

Adds test coverage through the Transport implementation.

Change against the 1.17 branch.
Fixes paramiko#967 paramiko#968

Rollup of earlier branches proposed as paramiko#969 and paramiko#970 with
additional fix inside sftp_client.

Includes new tests for SFTPFile usage.

Change against the 1.17 branch.
paramiko/file.py Outdated
data = b(data)
if isinstance(data, text_type):
# GZ 2017-05-25: Accepting text on a binary stream unconditionally
# cooercing to utf-8 seems questionable, but compatibility reasons?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel like I recall other tickets about this (re: preserving truly binary data instead of passing through Unicode), but unable to find them at a glance. ¯\_(ツ)_/¯ At present, this is effectively what b() was doing, so...not really much worse?

Copy link
Contributor

@ploxiln ploxiln May 31, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Truly binary data would not be text_type (str for py3, unicode for py2), it would be bytes or something like that, so it should be passed through fine. If someone calls write() with a unicode type, this kicks in, turning unicode to utf-8 bytes. This seems like the best thing to do if someone passes unicode here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, on Python 2, binary data well might be str, thus the confusion :D but yea, not really that worried, just making a note of it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's confusing - you're thinking of string_types. text_type is just unicode on python2

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, I sure am.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... I also typoed my first comment, was "str for py2, unicode for py2", first py2 should have been py3, I just corrected it ... that couldn't have helped ...

Copy link
Contributor Author

@bz2 bz2 May 31, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's an important distinction from b() in this change, in that it allows all other types through. This is a tradeoff, callers get duck typing, but internal library interfaces can't rely on having a bytes type. For BufferedFile we get sane type handing when writing to BytesIO later so just passing along is best.

The reason I added this comment is that while handling here makes sense on existing branches for compatibility reasons, I'm not sure it does for master. It might be better to just raise if given unicode, and add an encoding parameter/encode step to higher level interfaces.

elif isinstance(item, SFTPAttributes):
item._pack(msg)
else:
raise Exception('unknown type for %r type %r' % (item, type(item)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd love a code comment here noting explicitly how "it was not any of the other expected types, so we assume it's something that can be asbytes()'d into a packet-level string type, and hand it to add_string() which does so" (assuming I'm interpreting that intent correctly!)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's reasonable, I'll add a comment along those lines.

if isinstance(s, bytes_types):
return s
if isinstance(s, text_type):
# GZ 2017-05-25: Accept text and encode as utf-8 for compatibilty only.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compatibilty -> compatibility

what is "GZ"? probably don't need signature/date for this comment

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was curious about that too. Was wondering if some sort of joke around the submitter's Github username (bz2) or vice versa :D

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'GZ' is indeed a person id for me, which I agree is confusing given the nick I use on this site.

I like id/date as markers with comments for review on tricky things that may need discussion. Am happy to move to docstring, strip or delete the comment for landing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with them being comments instead of docstrings, it's just that we're not used to the "initials: date" prefix formatting. Stripping them out is probably the way to go, for consistency's sake. Git blame is a thing :D

@bitprophet
Copy link
Member

bitprophet commented May 31, 2017

Appreciate the rigorous ticket + PR setup! Though in the interests of corralling conversation/changelog entries/etc, I've closed all but this one.

First, thanks for cleaning this up - sounds at a glance that most of the broken API promises re: buffer/socket type objects, happened in the big Python 3 shakeup (release 1.13). (The gift that keeps on giving!)

Second, it's probably a non-issue since it doesn't touch the crypto bits much, but I wonder how this will work out when merged up into the 2.x line - something to be aware of. Given the current state of branches, I may only release this in the 1.18+ or even 2.2+ lines, depending. Super appreciate the port to 1.17+ either way though - options are nice.

Third, I'm currently going through and reviewing / leaving occasional line notes on the patchset. Will report back here with anything major, otherwise once you have a chance to respond to those I'll look at merging. EDIT: all done with that, ball back in your court!

@bitprophet
Copy link
Member

bitprophet commented May 31, 2017

Tagging as both bug and feature since I'm torn on whether this qualifies as one of those "bugfix, but potentially widely instability-causing" changes that like to come out in feature releases.

@bz2
Copy link
Contributor Author

bz2 commented May 31, 2017

Thanks for the review! I've responded to the inline comments, and will push up a change with the suggested comment tweaks shortly.

I tried to write the change to minimise conflicts, so it should apply cleanly-ish to all later branches. That said, there is some fixing up that would be nice to do around 2/3 compat and tests after this lands on master, which I'd be happy to propose as well.

Thanks to bitprophet and ploxiln.
bz2 added a commit to bz2/paramiko that referenced this pull request Jun 1, 2017
@bz2
Copy link
Contributor Author

bz2 commented Jun 1, 2017

Added a possible changelog entry in f0124d9 - feel free to use/rewrite as desired.

@bitprophet
Copy link
Member

  • Directly merged into 1.17 then up to 1.18
  • Couldn't find an alternate 2.0 version of the patchset so attempted to manually cherry-pick the whole range to the 2.0 branch (included some minor style/flake8 tweaks since the 2.0 lines got flake8'd recently)
  • tests still passing on both interpreter families so hopefully I didn't screw it up too badly
  • merged up to master

bitprophet added a commit that referenced this pull request Jun 9, 2017
@bitprophet bitprophet merged commit e25e1aa into paramiko:1.17 Jun 9, 2017
bitprophet added a commit that referenced this pull request Jun 9, 2017
bitprophet added a commit that referenced this pull request Jun 9, 2017
@bz2
Copy link
Contributor Author

bz2 commented Jun 9, 2017

Thanks!

The 1.17 merge to 2.0 looked pretty clean so I didn't propose a separate branch for that series, but just double checked your cherrypick and it all looks good to me.

dkhapun pushed a commit to cyberx-labs/paramiko that referenced this pull request Jun 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants