-
Notifications
You must be signed in to change notification settings - Fork 1
Constant string and UNICODE strings
You have a class or structure which have a string field. You can store it as a raw pointer, for example
struct foo {
const char *foo_str;
};
And this will work unless memory block under pointer address still valid. Otherwise behavior is undefined.
struct foo_t {
const char *foo_str;
};
static void print_foo(const foo_t& foo) {
std::cout<< foo.foo_str << std::end;
}
int main(int argc, const char** argv) {
foo_t foo;
{
char hello_word_str[] = "Hello world!";
// just assign the address on the stack object
foo.foo_str = hello_word_str;
// ok, since value still in stack memory
print_foo(foo);
}
// Behavior is undefined, depends on compiler
print_foo(foo);
return 0;
}
Classic C++ approach - use std::string instead.
struct foo_t {
std::string foo_str;
};
void print_foo(const foo_t& foo);
int main(int argc, const char** argv) {
// this will allocate heap memory using new char[initial_size];
foo_t foo;
{
char hello_word_str[] = "Hello world!";
// this will deep copy stack memory into heap
foo.foo_str = hello_word_str;
// ok, since foo_str character array in heap
print_foo(foo);
}
// ok, since foo_str character array in heap, and not the stack
print_foo(foo);
return 0;
}
void print_foo(const foo_t& foo) {
// this should allocate another memory block, and deep copy string character array
// standard library and compiler can optimize it (copy elision), but no any guaranty
std::string message = foo.foo_str;
std::cout<< message << std::end;
}
So as you can see, with this approach you have worst performance, and in the same time you are using more memory. This is happening because std::string is designed to be mutable. So it is necessary to allocate another memory block, and deep copy original character array for the std::string, otherwise application behavior will be undefined.
How to improve it?
For example you can store the string in std::shared_ptr
or boost::shared_array, as well as you will need put this smart pointer into std::weak_ptr. This is not really useful, and you will spend more memory for shared_ptr
reference count pointer i.e. std::atomic_size_t and it's class this pointer, as well as one more this pointer for std::weak_ptr class.
Another way - is io::const_string, io::const_string is actually a smart pointer similar to boost::intrusive_ptr with atomic embedded reference counting strategy. io::const_string
is designed to be immutable, so it's copy constructor simply increases reference count (shallow copy) rather then deep copy original character array. Let's use const_string in our previous example:
struct foo_t {
io::const_string foo_str;
};
void print_foo(const foo_t& foo);
int main(int argc, const char** argv) {
// this will construct empty const_string, with nullptr charter array
// Anyway if you call foo.foo_str.data() it will return you "" not a nullptr
foo_t foo;
{
char hello_word_str[] = "Hello world!";
// this will deep copy stack memory into new heap memory block
foo.foo_str = io::const_string(hello_word_str);
// ok, character array in heap, or "" when out of memory
print_foo(foo);
}
// ok, character array in stack, or "" when out of memory
print_foo(foo);
return 0;
}
void print_foo(const foo_t& foo) {
// this will simply increase reference count of character array
io::const_string message = foo.foo_str;
std::cout<< message << std::end;
}
Another benefit of const_string
- unlike std::string, const_string constructor never throws including out of memory situation. This allows you to use const_string with compiler RTTI and exceptions off mode, without custom error handing by custom new and terminate handlers. So if you'd' like to check that const_string was successfully constructed, you always able to call empty()
method. This method check underlying memory array is nullptr, witch is defined behavior for the new (std::nothrow) char[size]
or malloc
out of memory state. In the same time, io::const_string constructor will call a std::new_handler in case of out of memory.
It is expected that you will store UTF-8 UNICODE characters (or ASCII/Latin1) inside const_string. When you need two or four bytes UNICODE representation, you are able to convert const_string into mutable std::basic_string
using :
convert_to_u16()
, convert_to_u32()
or convert_to_ucs
member function for the system wchar_t
UNICODE.
NOTE: GNU/Linux wchar_t is 32 bit long and storing UTF-32LE or UTF-32BE depending on CPU endian, when Windows wchar_t is always two bytes UTF-16LE no matter whether you using MS VC++ or MingGW[64] compiler.
To reduce memory usage and improve general performance by minimize atomic reference counting operations, const_string uses small string optimization technique in addition to copy on write. SSO applied for 14 (64 bit)/7(32 bit) character long strings.