Encode diff literals according to the actual output encoding. #220

Merged
merged 7 commits into from Mar 16, 2013

Conversation

Projects
None yet
3 participants
Owner

JonRowe commented Mar 13, 2013

This is a partial fix for Issue #219, it forces the internal literals to use a matching encoding to one of the diffed encoding, reducing the chance of an encoding error being raised by RSpec. It doens't solve this for all, for example using mismatched encoded strings will still cause this error, but in such cases it won't be RSpec at fault but the user supplying differently encoded strings to comparison. It would be possible to rescue the encoding exception elsewhere in a matcher but that shouldn't be done in the diff IMO.

Fix an issue with using ascii encoding internally
Also this describes some of the expected behaviour for differently
encoded strings.
Owner

myronmarston commented Mar 14, 2013

Thanks for taking a stab at this, @JonRowe. I wonder if we don't even need to bother with constructing a regex there--isn't string.split(/\n/) the same as string.split("\n") anyway?

For reference we solved a similar issue recently in a46e7aa (see #201 for the source issue) -- there might be something there you can borrow or we can unify somehow.

Owner

JonRowe commented Mar 14, 2013

I believe so, but I didn't want to change the implementation unnecessarily, in case there was a reason a regular expression was used over the string.

Owner

myronmarston commented Mar 14, 2013

I believe so, but I didn't want to change the implementation unnecessarily, in case there was a reason a regular expression was used over the string.

If it mattered, I would expect a spec to fail if you change it to a string. We're generally pretty good about adding test cases for such things.

I'd prefer to see it changed to the string version.

Owner

JonRowe commented Mar 14, 2013

Done. I've also attempted to fix the Travis build (it all passes on OS X)

lib/rspec/expectations/differ.rb
+ else
+ string
+ end
+ end
@myronmarston

myronmarston Mar 14, 2013

Owner

Given the fact that the presence of absence of the String#encoding will not change at run time (the user is either running 1.9+ or they're not...), I think it's preferrable to move the conditional out a level so that it is evaluated at file load time:

if String.method_defined?(:encoding)
  def matching_encoding(string, source)
    string.encode(source.encoding)
  end
else
  def matching_encoding(string, source)
    string
  end
end

I haven't bench marked this but I believe this will be a bit faster at runtime by evaluating the conditional once (at file load time) rather than N times (each time matching_encoding is called).

On a side note, I have a strong preference for spaces in an arg list--string, source rather than string,source. Any reason you chose not to include the space?

@JonRowe

JonRowe Mar 14, 2013

Owner

I normally do too, especially in method definitions, so I don't know why I didn't here.

spec/rspec/expectations/differ_spec.rb
@@ -1,4 +1,5 @@
require 'spec_helper'
+#encoding: utf-8
@myronmarston

myronmarston Mar 14, 2013

Owner

I think the magic comment has to be the first line in the file. I suspect this is why travis is failing.

@JonRowe

JonRowe Mar 14, 2013

Owner

I actually thought it was, my bad

lib/rspec/expectations/differ.rb
@@ -37,11 +37,11 @@ def diff_as_string(data_new, data_old)
end
ensure
oldhunk = hunk
- output << "\n"
+ output << matching_encoding("\n",oldhunk)
@myronmarston

myronmarston Mar 14, 2013

Owner

Please put spaces between passed args (both here and in the matching_encoding call below).

spec/rspec/expectations/differ_spec.rb
+ @expected="Tu avec carté {count} itém has".encode('UTF-16LE')
+ @actual="Tu avec carte {count} item has"
+ expect { subject }.to raise_error Encoding::CompatibilityError
+ end
@myronmarston

myronmarston Mar 14, 2013

Owner

I was about to merge, but then noticed this...it seems odd to me to say "it copes with differently encoded strings" when the example shows that that causes an error to be raised (which, to me, is not coping).

Rather than raising this error, could it return a message like this?

Could not produce a diff because the encoding of the actual string (UTF-8) differs from the encoding of the expected string (UTF-16LE)

IMO, diffs aren't essential to the failure output. They're nice-to-haves. I don't think the differ should ever raise an error, particularly because that would silence the real expectation failure the user got.

@JonRowe

JonRowe Mar 14, 2013

Owner

Like I indicated in the PR, I felt that was something the matcher should do, rather than the diff tool, but I'll defer to you.

spec/rspec/expectations/differ_spec.rb
+ @expected="Tu avec carté {count} itém has".encode('UTF-16LE')
+ @actual="Tu avec carté {count} itém has".encode('UTF-16LE')
+ expect(subject).to eql("")
+ end
@myronmarston

myronmarston Mar 14, 2013

Owner

It would be nice to have an example where the encoded strings are different and actually produce a diff.

As it stands, the specs could pass with a simple guard conditional like return '' if non_ascii?(data_old) && non_ascii?(data_new).

@JonRowe

JonRowe Mar 14, 2013

Owner

Good catch, it actually highlights that the diff-lcs library itself can't handle encoded strings, same problem with ascii regular-expressions..

Owner

myronmarston commented Mar 14, 2013

Thanks for fixing up the travis build! I was about to hit merge (because your fix really does look good) when I noticed a couple small things that I think should be fixed. I feel kinda bad about giving you a second round of code review comments -- not sure why I didn't notice this stuff the first time around. Let me know what you think about those.

Owner

JonRowe commented Mar 14, 2013

I actually need to go patch diff-lcs too, because adding an actual diff broke it all over again, hence the pending test with an actual diff. For now what it does is it masks the errors with a "cannot produce diff message"

@JonRowe JonRowe referenced this pull request in halostatue/diff-lcs Mar 14, 2013

Merged

Fix some encoding issues within Diff::LCS::Hunk #15

+ it 'copes with encoded strings' do
+ @expected="Tu avec carté {count} itém has".encode('UTF-16LE')
+ @actual="Tu avec carte {count} item has".encode('UTF-16LE')
+ expect(subject).to eql 'Could not produce a diff because of the encoding of the string (UTF-16LE)'
@myronmarston

myronmarston Mar 15, 2013

Owner

I'm confused...diff-lcs can't produce diffs for non-UTF-8 strings at all?

@myronmarston

myronmarston Mar 15, 2013

Owner

Nevermind, I saw your explanation above (sorry for looking at things out-of-order).

@JonRowe

JonRowe Mar 15, 2013

Owner

Yeah, I'm waiting on that patch to be merged, we can leave this until then if you'd like rather than a pending test case...

myronmarston added a commit that referenced this pull request Mar 16, 2013

Merge pull request #220 from JonRowe/fix_internal_encoding_error
Encode diff literals according to the actual output encoding.

@myronmarston myronmarston merged commit 4131bb6 into rspec:master Mar 16, 2013

1 check passed

default The Travis build passed
Details
Owner

myronmarston commented Mar 16, 2013

Thanks, @JonRowe! Your contribution is much appreciated!

myronmarston added a commit that referenced this pull request Mar 16, 2013

Owner

myronmarston commented Mar 16, 2013

BTW, on future PRs, please add a Changelog entry (e.g. cf84fea), as that saves me the extra work of writing and pushing that after merging. (No worries if you forget or whatever, of course!)

Contributor

halostatue commented Mar 30, 2013

@JonRowe & @myronmarston: Just for what it's worth, the underlying issue that was pointed out for diff-lcs 1.2.1 has been fixed based on work that Jon did. I'll be pushing diff-lcs 1.2.2 shortly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment