Showing with 21 additions and 8 deletions.
  1. +1 −1 CMakeLists.txt
  2. +9 −5 README.md
  3. +3 −1 source/utf8/unchecked.h
  4. +5 −0 tests/test_unchecked_api.h
  5. +3 −1 utf8cppConfig.cmake.in
2 changes: 1 addition & 1 deletion CMakeLists.txt
@@ -1,5 +1,5 @@
cmake_minimum_required (VERSION 3.0.2...3.27)
project (utf8cpp VERSION 3.2.4 LANGUAGES CXX)
project (utf8cpp VERSION 3.2.5 LANGUAGES CXX)

if(CMAKE_SOURCE_DIR STREQUAL CMAKE_CURRENT_SOURCE_DIR)
set(IS_ROOT_PROJECT ON)
Expand Down
14 changes: 9 additions & 5 deletions README.md
Expand Up @@ -155,13 +155,17 @@ The library was designed to be:
#### Alternatives
In case you want to look into other means of working with UTF-8 strings from C++, here is the list of solutions I am aware of:
Here is an article I was made aware of only recently: [The Wonderfully Terrible World of C and C++ Encoding APIs (with Some Rust)](https://thephd.dev/the-c-c++-rust-string-text-encoding-api-landscape), by JeanHeyd Meneide. In the article, this library is compared with:
1. [ICU Library](http://icu.sourceforge.net/). It is very powerful, complete, feature-rich, mature, and widely used. Also big, intrusive, non-generic, and doesn't play well with the Standard Library. I definitely recommend looking at ICU even if you don't plan to use it.
2. C++11 language and library features. Still far from complete, and not easy to use.
3. [Glib::ustring](http://www.gtkmm.org/gtkmm2/docs/tutorial/html/ch03s04.html). A class specifically made to work with UTF-8 strings, and also feel like `std::string`. If you prefer to have yet another string class in your code, it may be worth a look. Be aware of the licensing issues, though.
4. Platform dependent solutions: Windows and POSIX have functions to convert strings from one encoding to another. That is only a subset of what my library offers, but if that is all you need it may be good enough.
- [simdutf](https://github.com/simdutf/simdutf)
- [iconv](https://www.gnu.org/software/libiconv/)
- [boost.text](https://github.com/tzlaine/text)
- [ICU](https://unicode-org.github.io/icu/userguide/conversion/converters.html)
- [encoding_rs](https://github.com/hsivonen/encoding_rs)
- [Windows API functions for converting text between encodings](https://learn.microsoft.com/en-us/windows/win32/api/stringapiset/)
- [ztd.text](https://github.com/soasis/text/)
The article presents author's view of the quality of the API design, but also some speed benchmarks.
## Reference
Expand Down
4 changes: 3 additions & 1 deletion source/utf8/unchecked.h
Expand Up @@ -155,7 +155,9 @@ namespace utf8
{
while (start != end) {
uint32_t cp = utf8::internal::mask16(*start++);
// Take care of surrogate pairs first
if (start == end)
return result;
// Take care of surrogate pairs first
if (utf8::internal::is_lead_surrogate(cp)) {
uint32_t trail_surrogate = utf8::internal::mask16(*start++);
cp = (cp << 10) + trail_surrogate + internal::SURROGATE_OFFSET;
Expand Down
5 changes: 5 additions & 0 deletions tests/test_unchecked_api.h
Expand Up @@ -137,6 +137,11 @@ TEST(UnCheckedAPITests, test_utf16to8)
string utf8result;
utf8::unchecked::utf16to8(utf16string, utf16string + 5, back_inserter(utf8result));
EXPECT_EQ (utf8result.size(), 10);

utf8result.clear();
unsigned short highsurrogateonly[] = {0xd800};
utf8::unchecked::utf16to8(highsurrogateonly, highsurrogateonly + 1, back_inserter(utf8result));
EXPECT_TRUE(true); // we didn't crash
}

TEST(UnCheckedAPITests, test_utf8to16)
Expand Down
4 changes: 3 additions & 1 deletion utf8cppConfig.cmake.in
Expand Up @@ -3,4 +3,6 @@
include("${CMAKE_CURRENT_LIST_DIR}/utf8cppTargets.cmake")
check_required_components( "utf8cpp" )

add_library(utf8::cpp ALIAS utf8cpp)
if(NOT TARGET utf8::cpp)
add_library(utf8::cpp ALIAS utf8cpp)
endif()