Skip to content

JRuby isn't escaping strings correctly #870

@nevans

Description

@nevans

I've encounted a weird issue that I don't know how to debug. This is using JRuby 10.0.2.0. I used the version installed by ruby-build using rbenv install.

There's a rake task in ruby/net-imap that generates some code (mostly regular expressions and hashes) from RFC tables. First it parses the RFC and writes the data into a JSON file. Then it reads that JSON file and generates the code. It was broken into two parts so rake can conditionally run the generator only if the JSON or the generator itself has changed.

Anyway, it's not generating the JSON correctly, and I can't figure out why. To replicate:

From a clone of https://github.com/ruby/net-imap, run bundle exec rake saslprep_rb.

To dig deeper, run irb, and from that irb console, do the following:

load "./rakelib/string_prep_tables_generator.rb"
generator = StringPrepTablesGenerator.new
rfc_path  = generator.rfc_filename
rfc_text  = File.read rfc_path
parsed    = generator.send(:parse_rfc_text, rfc_text)

This parsed variable now holds the data that will be generated and written to disk and then reread in a later step. But a couple of strings toward the bottom show the issue:

irb(main):052> d_1 = parsed["titles"]["D.1"]
=> "Characters with bidirectional property \"R\" or \"AL\""
irb(main):053> d_1 == d_1.dump.undump
=> true
irb(main):054> d_1.bytes == d_1.dump.undump.bytes
=> true
irb(main):055> d_1.encoding == d_1.dump.undump.encoding
=> true
irb(main):060> puts JSON.generate d_1.dump.undump
"Characters with bidirectional property \"R\" or \"AL\""
=> nil
irb(main):061> puts JSON.generate d_1
"Characters with bidirectional property "R" or "AL\""
=> nil

Same with:

irb(main):065> d_2 = parsed["titles"]["D.2"]
=> "Characters with bidirectional property \"L\""
irb(main):067> puts JSON.generate d_2.dump.undump
"Characters with bidirectional property \"L\""
=> nil
irb(main):068> puts JSON.generate d_2
"Characters with bidirectional property "L\""
=> nil

What's going on with the quoting there?! What's different about the string and its dump.undump counterpart?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions