You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently the tokenizer only considers ASCII characters, and to read fields with non-ASCII names results in an exception with the message 'Invalid subfield path'.
A reproducible unit test is available at rui-mo@8dc1ea4. To make this test work, a temporary change was made to the 'isUnquotedSubscriptCharacter' function by removing 'isalnum'.
To fix this issue, shall we support UTF-8 characters in the Tokenizer, for example, by replacing 'isalnum' with 'u_isalnum' in the ICU library?
The text was updated successfully, but these errors were encountered:
Description
Currently the tokenizer only considers ASCII characters, and to read fields with non-ASCII names results in an exception with the message 'Invalid subfield path'.
A reproducible unit test is available at rui-mo@8dc1ea4. To make this test work, a temporary change was made to the 'isUnquotedSubscriptCharacter' function by removing 'isalnum'.
To fix this issue, shall we support UTF-8 characters in the Tokenizer, for example, by replacing 'isalnum' with 'u_isalnum' in the ICU library?
The text was updated successfully, but these errors were encountered: