Releases: simdutf/simdutf
Version 2.0.5
What's Changed
Full Changelog: v2.0.4...v2.0.5
Version 2.0.4
What's Changed
- Better code generation for UTF-8 to UTF-16 routine under GCC and LLVM (icelake kernel) by @lemire in #183
New Contributors
Full Changelog: v2.0.3...v2.0.4
Version 2.0.3
Version 2.0.2
This is a second patch release for version 2.0. It fixes a potential buffer overflow in the westmere kernel with transcoding from UTF-16 to UTF-8.
Full Changelog: v2.0.1...v2.0.2
Version 2.0.1
What's Changed
Full Changelog: v2.0.0...v2.0.1
Version 2.0.0
What's Changed
Most text today is represented using the Unicode standard. The simdutf library seeks to provide high performance Unicode functions for C++ programmers. Version 2.0 introduces a richer API, with support for the most popular Unicode formats (UTF-32, UTF-16BE, UTF-16LE and UTF-8). Users can transcode between these formats, and validate the inputs as needed. For users that so desire, we also return a structure containing failure information, including the nature and location of the error.
For advanced x64 processors, we introduce a whole new AVX-512 kernel which includes novel algorithms by @WojciechMula and @clausecker It can be twice as fast as a previous kernels, reaching speeds close to 5 GB/s on non-trivial Unicode inputs. The library relies on runtime dispatching so that if your processor supports the new kernel, it is automatically used. The currently supported processors include Ice Lake, Rocket Lake, and Zen4.
On an Ice Lake processor, we get the following speeds with the Arabic-Lipsum.utf8.txt test file:
function | UTF-8 to UTF-16 speed (GB/s) |
---|---|
simduft (AVX-512) | 4.6 GB/s |
simduft (AVX2) | 2.3 GB/s |
ICU | 1.4 GB/s |
iconv | 0.7 GB/s |
Major changes
- AVX512 kernel for Ice Lake / Zen 4 processors by @WojciechMula and @clausecker in #174
- Support for UTF-32, UTF-16BE and transcoding between UTF-32, UTF-16BE, UTF-16LE and UTF-8, by @NicolasJiaxin, @clausecker and others
- Ascii validation by @NicolasJiaxin in #110
- One pass autodetect encodings by @NicolasJiaxin in #134
- Returning a struct indicating success and length for some functions by @NicolasJiaxin in #157
- Iconv-like tool (sut) by @NicolasJiaxin in #160
Performance
Bug fixes
- fix valid_utf8_to_utf16.h producing invalid utf16 (issue111) by @lemire in #119
- Fix Buffer Overrun on aarch64 by @wx257osn2 in #171
- fix some typos by @striezel in #139
Testing
- Fuzzer for buffer overflow by @NicolasJiaxin in #163
- update actions/checkout in GitHub Actions to v3 by @striezel in #138
Building
Benchmarking
- Added iconv to the benchmarks, by @lemire in #164
- We use simpler performance counters since under graviton 2 (AWS), you may only access two counters at a time by @lemire in #123
New Contributors
- @striezel made their first contribution in #139
- @danlark1 made their first contribution in #145
- @ThePhD made their first contribution in #149
- @wx257osn2 made their first contribution in #171
Full Changelog: v1.0.1...v2.0.0
Version 1.0.1
Minor fixes.
Version 1.0.0
Initial release.