-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Beginning of "tinystr" optimization #8
Conversation
This commit adds a compact string representation, but doesn't wire them up. Part of the plan for projectfluent#7
Changes the Locale and parser to use TinyStr for language.
I'm not asking for this to be merged just yet. There are some unused code warnings that will go away. A few questions. I see that the full generality of bcp-47 is not supported, for example language can only be 2-3 alpha characters, so "i-enochian" would be failed, similarly for registered language subtag. Supporting these uncommon values is the reason it's TinyStr8, but I can change this. This implementation uses "fake SIMD". I was able to get that to work well, but @SimonSapin has had success getting LLVM to auto-vectorize in rust-lang/rust#59283 . That code would probably be clearer, and a hair more efficient, but I wasn't able to get auto-vectorization to work for bytewise operations in a u64. If this approach generally looks good, I'll keep going. |
Yes, we're aiming for Unicode Locale Identifier, not Language Tag. Please, consult http://unicode.org/reports/tr35/#BCP_47_Conformance for differences. |
Ah, that's super helpful, thanks. I was using bcp-47 as my normative spec. I'll adjust. |
As of this commit, this PR moves |
@raphlinus - can we move this to |
Sure, I'm not attached. Feel free to take my draft whichever direction you like, and if you need my attention on something, just let me know exactly what you'd like. |
closing in favor of zbraniecki/unic-locale#7 |
This patch implements a "tiny string" datatype, very efficient but limited to bounded lengths, and starts with the integration into the Locale datatype. As of this PR, it's just the language subtag, but I wanted to upload it to get feedback before proceeding further.
Progress towards #7