Force syck to prevent UTF-8 breakage #109

Merged
merged 1 commit into from Jun 23, 2013

Conversation

Projects
None yet
4 participants
Collaborator

urticadioica commented Mar 5, 2013

Fixes #105.

Owner

igrigorik commented Mar 8, 2013

@urticadioica any thoughts on @jdelkins suggestion?

Collaborator

urticadioica commented Mar 8, 2013

Yeah, sorry for the delay. Part of the problem is that I know very little Ruby.

What I know is that my patch fixes the bug on Ruby 1.9 for me, and it leaves 1.8 untouched. @jdelkins' suggestion, first off, should say after require 'yaml', not before. It does fix the bug in 1.9, but unfortunately it crashes in 1.8.

I don't know why my patch seems to work on my box and not @jdelkins', but if that's truly the case, this patch doesn't do much good. If anyone who knows Ruby better than I do knows the best practice for falling back to syck (that still works in 1.8), use that instead.

Collaborator

urticadioica commented Mar 8, 2013

Double post. Sorry.

@jdelkins: Just to confirm my patch doesn't work for you (and tell me why, cause I'm curious), could you run a few lines in irb, and tell me what they return? (I was trying to think of the easy way to do this earlier, but like I said, I'm clumsy at Ruby.)

require 'syck'
require 'yaml'
YAML

Here's mine. If your patch works but not mine, this should yield a different result.

% irb1.9.3
irb(main):001:0> require 'syck'
=> true
irb(main):002:0> require 'yaml'
=> true
irb(main):003:0> YAML
=> Syck

jdelkins commented Mar 8, 2013

Here is the result:

1.9.3p385 :001 > require 'syck'
 => true
1.9.3p385 :002 > require 'yaml'
 => true
1.9.3p385 :003 > YAML
 => Syck

So, same result as you in irb, and yet still vimgolf challenges with non-ascii characters fail in my case. While I don't know what's happening, I can only speculate that somehow, psych is being loaded before it gets to the point where you are requiring syck in your patch. I think the following solution should work, however, for both 1.8 and 1.9.

...
require 'yaml'
module YAML
  if YAML.const_defined?(:ENGINE)
    ENGINE.yamler = 'syck'
  end
end
....

This seems to be harmless for 1.8 (which doesn't define YAML::ENGINE) and 1.9, where it forces the use of syck.

Collaborator

urticadioica commented May 6, 2013

Poking around with the yaml and json forms of all the challenges, I've recently discovered there are three yaml-related bugs. Each one of them is causing challenge files to be miswritten for at least some people.

  1. The UTF-8 bug I was trying to fix above. Users of Ruby 1.9 get defaulted to psych as their yaml engine, and it misinterprets the way multibyte UTF-8 characters are written in the yaml we get from the vimgolf server.
  2. When there's a line with nothing but one or more spaces, syck misinterprets it and writes no spaces at all. This turns out to be the cause of the whitespace issues some people have had in Get rid of html tags. There are also about four other challenges affected to a lesser degree. Simply switching to syck or psych won't work because of 1 and 2: you fix one bug, you cause the other.
  3. Files with DOS newlines are handled strangely. There are two different ways yaml represents the input/output strings: an indented, line-by-line style, and a quoted style. You can see both in http://vimgolf.com/challenges/50513cc930c82d000200000a.yaml. In the line-by-line style, carriage returns are inserted literally (if they're present at all), and both syck and psych ignore them at the end of the line. In the quoted style, they're shown by \r, and written to the file as shown. That's why in Count the random spaces!, Windows players have to switch their line breaks from \r\r\n to \r\n (just deleting a character), while Unix players have to switch from \r\n to \n (changing the autodetected DOS style to Unix). I never reported this when the challenge went up, because I figured the creator had just blown it. Turns out he did nothing wrong, but a yaml bug ruined his challenge.

Bug 3, the yaml engines ignoring carriage returns, turns out to be the only thing that's (usually) protected vimgolf from newline incompatibilities.

I wrote out the input/output files for all the challenges with syck, psych, and json, to make sure I'd found all the bugs. The best results I found came from the json parser, after stripping all the carriage returns. Too bad that would mean an extra dependency for earlier Rubies. Maybe we'll find a better solution...

Collaborator

urticadioica commented May 23, 2013

I gave it another shot. This time I switched file writing to JSON and stripped carriage returns. According to my tests, this should write every challenge file on the site correctly, which appears to be impossible with the YAML the server's giving us.

I depended on json_pure, so the installation issues that forced the switch from JSON in the first place (#59) don't apply.

@igrigorik igrigorik merged commit 01bc619 into igrigorik:master Jun 23, 2013

Collaborator

urticadioica commented Jun 23, 2013

Thanks for merging, but it's too bad it didn't make it into 0.4.3. :( Challenges broke again when I upgraded to the gem.

Owner

igrigorik commented Jun 23, 2013

@urticadioica doesn't mean we can't rev to 0.4.4. :-) ... In fact, I'd love to drop YAML support entirely (for security reasons).

@timvisher I believe your emacs code depends on yaml? Would it be a big deal to update to JSON if I deprecate the yaml endpoint?

Collaborator

timvisher commented Jun 24, 2013

On Sun, Jun 23, 2013 at 3:12 PM, Ilya Grigorik notifications@github.com wrote:

@timvisher I believe your emacs code depends on yaml? Would it be a big deal to update to JSON if I deprecate the yaml endpoint?

I'd prefer it. :) I'd need to update the mode, of course, but it would
be far better because there are already json parsers available in
elisp.

Owner

igrigorik commented Jun 24, 2013

Great, in which case.. I'll go ahead with the JSON route.

Collaborator

timvisher commented Jun 25, 2013

On Mon, Jun 24, 2013 at 2:26 PM, Ilya Grigorik notifications@github.com wrote:

Great, in which case.. I'll go ahead with the JSON route.

You could always go with EDN. ;)

Owner

igrigorik commented Jun 29, 2013

0.4.4 is live -- JSON only download / uploads.

@urticadioica urticadioica deleted the urticadioica:patch-1 branch Mar 5, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment