Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Update lib/sequel/plugins/string_stripper.rb #593

Closed
wants to merge 3 commits into
from

Conversation

Projects
None yet
2 participants

Wardrop commented Dec 12, 2012

Wrapped the call to #strip in a rescue to prevent exceptions being raised when dealing with blob data and other non-UTF-8 compatible strings.

@Wardrop Wardrop Update lib/sequel/plugins/string_stripper.rb
Wrapped the call to call to #strip method in a rescue to prevent issues with strings representing blob data and other non-UTF-8 strings.
01a4e4e
Owner

jeremyevans commented Dec 12, 2012

Thanks for the patch. I'm not generally a fan of blind rescues, but in this particular case, I think it makes sense. However, the behavior should be specced, so can you please add a commit with a spec?

Wardrop commented Dec 12, 2012

Ok, spec added to patch branch. Note, I haven't actually run the specs as I'm not setup to run them, but I essentially ran the spec in irb with the patch on and off to confirm the spec will work. Note, if you're not too keen on the blind rescue, you can tighten it by only catching "ArgumentError" errors, but then I'm not sure of other edge cases that may invoke some other error type.

Owner

jeremyevans commented Dec 12, 2012

The spec doesn't appear to fail over here even without the patch to the plugin. I could get it to fail by doing @o.name = " \xef".force_encoding('UTF-8'), but then Sequel is basically hiding an encoding error that should be raised.

What problems are you actually seeing, and are you sure those problems are not general encoding problems in your app? Calling String#strip on a non-UTF8 string should not raise an error in the first place, so I'm having second thoughts about Sequel just swallowing such errors.

Wardrop commented Dec 13, 2012

I get the error when inserting images into blob columns of a MSSQL database. The value I'm setting is the result of a call to File#read, which is a ASCII-8BIT string, as reported by String#encoding, but which seems to be implicitly converted to a UTF-8 string when calling #strip on it. I imagine anyone trying to set binary data will hit this using the string stripper plugin.

Perhaps my attempts at synthesising this condition are not the best. In looking further into this, I've noticed that Ruby can be run with different default encodings, for which you can refer to the constant __ENCODING__ to find out what your default is. In IRB, this is UTF-8, but for most files, this is US-ASCII unless you set the following at the top of the file:

#encoding: utf-8

Perhaps adding that to the top of string_stripper_spec.rb woud be acceptable. It would be at least be a reliable way to generate invalid UTF-8 strings. Anything between \x80 and \xFF should do that.

Owner

jeremyevans commented Dec 13, 2012

I agree that blobs should not be stripped. Changing the condition to v.is_a?(String) && !v.is_a?(SQL::Blob) should solve the binary issue.

I don't think the encoding argument is valid. If you have string where the actual encoding doesn't match the marked encoding (e.g. ASCII-8BIT encoding in a UTF-8 string), I think Sequel is right to raise an error.

Thanks for your help in notifying me about this issue. I'll work on a patch and try to have it committed sometime tomorrow.

Wardrop commented Dec 13, 2012

Thanks Jeremy. I admire your dedication in maintaining Sequel, and especially your appreciation for quality and doing things properly. Admirable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment