Identifier-keyed tables must always use raw or always use encoded identifiers #3697
Labels
Milestone
Comments
headius
added a commit
that referenced
this issue
Feb 24, 2016
This works properly, but because it uses a "raw" string the resulting error message is mangled when MBC are present. See #3697.
Likely to be fixed by identifier work by me and @enebo for 9.2. |
Largely fixed by @enebo's symbol work. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
With the move to M17N in Ruby 1.9, it became possible to store variables, constants, etc with arbitrary encodings.
In JRuby, we have never fully supported this because all our identifier-keyed tables (method table, constant table, etc) use a Java String, and traditionally used a properly decoded string. This works fine when all identifiers are the same encoding, but breaks if different encodings are used (since we lose the original when going to a UTF-16 String).
In order to support this better, we attempted to represent our identifiers like MRI represents its IDs: as the raw bytes of whatever parsed identifier came in. This allows uniquely referencing a given symbol given just its raw bytes, provided the symbol is still alive. What we didn't do is propagate raw bytes throughout all identifier-related APIs; only some of them actually use the raw string, while others still use fully-decoded strings as characters.
If we wish to fix this, we can't do it part way. This leads to API conflicts that are hard or impossible to resolve.
There are two paths forward, as I see them:
new_ids
branch. This is a very large effort and may need to wait until a "JRuby 10k" given the wide-reaching API breakage that will result. It may never be feasible.Neither approach is really great.
The text was updated successfully, but these errors were encountered: