So the branch is a bit misnamed at this point but I will summarize this a tiny bit (have talked quite a lot to @headius about this already).
all identifiers are made as RubySymbols with exception of annotations (currently assumed as 7bit ASCII -- this can be changed later). RubySymbols have bytelist + 8859_1 identifier string. Parser and IR now use Symbol. rest of runtime continues to use the j.l.String as identifier/intern string. New vocabulary:
RubySymbol.idString() is how you get an id for what you are looking up.
Error reporting has new static str(), names(), and ids() methods for making properly encoded Error strings. This is not 100% completed but many many errors have been converted.
This start changing all defined methods to use the raw (8859_1) string as the key in the method table. Symbol StringLiteral both now properly use proper ByteList and does not muck with String.
…l name. Guard against it since it is a normal value.
…859_1 or a raw string. So I made toString() decode to actual charset based on encoding. This may need another field if we discover toString() on RubySymbol is called all the time but let's roll with this made on-demand for now.
```text jruby -e 'p method(def foo(あ); end).parameters' [[:req, :あ]] ``` to work. Up til now this would confuse our impl and we may see an error like: ```text ArgumentError: invalid byte sequence in US-ASCII inspect at org/jruby/RubySymbol.java:283 inspect at org/jruby/RubySymbol.java:268 inspect at org/jruby/RubyArray.java:1659 inspect at org/jruby/RubyArray.java:1659 p at org/jruby/RubyKernel.java:492 <main> at -e:1 ``` Pure ruby code should not properly be giving properly encoded values back. Java (e.g. native) methods mostly should not matter since they do not return names but if they do return names then they are all assumed to be UTF-8 encoded data.
…field name but name changed from a String to a ByteList which make new frame read write code find nothing.
…nkle in our strategy. Errors are generated from the search deep in Module but it is a raw String. This is somewhat ok for most strings because they just happen to be utf-8 so the display. This is a big issue though since we want proper error message....
… day would come but I was hoping I could cheat with 8859_1 strings enough to merge for 9.2. Unfortunately, lack of encoding through method methods means we cannot accurately throw properly encoded error messages since he have long lost the encoding of the method. So with that said, most people will not be able to notice this change. All the String methods for methods still exist. These methods will assume all data is UTF-8. This could be a dicey gamble but since it is the default for Ruby it likely will not impact anyone. Also since UTF-8 is 7bit clean for ASCII we should totally not have an issue there. Some API signatures have changed. This means 9.2 might end up causing some issues if you are doing something very low-level with our code base (although changes up to this point have already broken some APIs within parsing and IR and that is unavoidable). The method reader/writer maps needed their return type changed from <String, DynamicMethod> to <ByteList, DynamicMethod>. It was an option to make a new name for methods which returned this but this is so far down in our APIs I think we should run with it. Also ProfiledMethod has changed from String to ByteList (e.g. getName() -> ByteList from String). Ultimately ProfiledMethod needs to be properly encoded to be able to report the method in a properly encoded way...so there you have it. Next phase is to change interpreters and the JIT to stop from going: ByteList -> String -> ByteList. I did not do that part here since this is pretty large already.
…one more ByteList path from IR Operands.
…evert typeAsString change.
…red as FIXMEs like others but I am leaving some with bytelist_love to give a pedigree to when it was made.
consumers to proper method instead of toString. This also addressed some more m17n errors we had.