Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Identifier-keyed tables must always use raw or always use encoded identifiers #3697
With the move to M17N in Ruby 1.9, it became possible to store variables, constants, etc with arbitrary encodings.
In JRuby, we have never fully supported this because all our identifier-keyed tables (method table, constant table, etc) use a Java String, and traditionally used a properly decoded string. This works fine when all identifiers are the same encoding, but breaks if different encodings are used (since we lose the original when going to a UTF-16 String).
In order to support this better, we attempted to represent our identifiers like MRI represents its IDs: as the raw bytes of whatever parsed identifier came in. This allows uniquely referencing a given symbol given just its raw bytes, provided the symbol is still alive. What we didn't do is propagate raw bytes throughout all identifier-related APIs; only some of them actually use the raw string, while others still use fully-decoded strings as characters.
If we wish to fix this, we can't do it part way. This leads to API conflicts that are hard or impossible to resolve.
There are two paths forward, as I see them:
Neither approach is really great.