GitHub - maxdz-gmbh/mdz_string: Very lightweight and portable ANSI C 89/90 compliant library for handling single-byte (ASCII/ANSI) strings and Unicode strings.

September 2024 NOTE: This repo is obsolete. Please use following repos for processing single-byte (ASCII/ANSI) strings instead:

mdz_ansi_alg for string processing algorithms
mdz_ansi_16 for short attached strings
mdz_ansi_dyn for long dynamically-allocated strings

Repos with unicode-strings functions (for UTF-8, UTF-16, UTF-32, wchar_t) will come soon.

NOTE: All 0.x releases are kind of "alpha-versions" without expectations of API backward-compatibility.

mdz_string Overview

Wiki: mdz_string Wiki
file: "mdz_string.h"

Please take a look at "mdz_string.h" file or mdz_string Wiki site for detailed functions descriptions.

mdz_string - very lightweight and versatile C library for handling single-byte (ASCII/ANSI) strings and Unicode strings, developed by maxdz Software GmbH.The library supports ASCII/ANSI, UTF8, UTF16, UTF32, wchar strings. Source code of library is highly-portable, conforms to ANSI C 89/90 Standard. Builds for Win32/Win64, Linux, FreeBSD, Android, macOS are available.

Only shared/dynamically loaded libraries (.so and .dll files with import libraries) are available for evaluation testing purposes. Static libraries/source code are covered by our commercial licenses.

Linux binaries are built against Linux Kernel 2.6.18 - and thus should be compatible with Debian (from ver. 4.0), Ubuntu (from ver. 8.04), Fedora (from ver. 9)

FreeBSD binaries - may be used from FreeBSD ver. 7.0

Win32 binaries are built using Visual Studio Platform Toolset "v90", thus are compatible with Windows versions from Windows 2000.
Win64 binaries are built using Visual Studio Platform Toolset "v100", thus are compatible with Windows versions from Windows XP.

Android x86/armeabi-v7a binaries - may be used from Android API level 16 ("Jelly Bean" ver. 4.1.x)
Android x86_64/arm64-v8a binaries - may be used from Android API level 21 ("Lollipop" ver. 5.0)

macOS binaries - x86_64, from MacOS X v10.6.0

mdz_string Advantages

1. High portability: the whole code conforms to ANSI C 89/90 Standard. Multithreading/asynchronous part is POSIX compatible (under UNIX/Linux).

2. Little dependencies: basically mdz_string functions are only dependend on standard C-library memory-management/access functions. Multithreading part is dependend on POSIX pthreads API (under UNIX/Linux) and old process control/synchronization API (from Windows 2000). It means you can use library in your code withouth any further dependencies except standard platform libraries/APIs.

3. Fast: Our single-byte (ASCII/ANSI) strings are very fast, concerning operations like searching, insertion, deletion, etc. especially for very large (like hundreds of megabytes or gigabytes) strings.

4. Flexibilty: nearly all functions our single-byte (ASCII/ANSI) strings, contain not only "left position" but also "right position" parameters to limit processed area from right. Also library contains more string functions than according STL, boost or glib analogs have.

5. Extended error-checking: all functions preserve internal error-code pointing the problem. It is possible to use strict error-checking (when all preserved error-codes should be MDZ_ERROR_NONE) or "relaxed"-checking - when only returned mdz_false will indicate error.

6. Extended control: strings do only explicit operations. It means for example, when "insert" function is called with auto-reservation flag set in mdz_false - it will return error if there is not enough capacity in string. No implicit reservations will be made.

7. Attached usage: strings should not necessarily use dynamically-allocated memory - which may be not available on your embedded system (or if malloc()/free() are forbidden to use in your safety-critical software). Just attach string/data to your statically-allocated memory and use all strings functionality.

8. Cache-friendly: it is possible to keep controlling and data parts together in memory using "embedded part".

9. Unicode support: UTF-8, UTF-16, UTF-32 are supported.

10. wchar_t support: also wchar_t strings are supported, with 2 and 4 bytes-large wchar_t characters.

11. Endianness-aware strings: wchar, utf16 and utf32 strings are endiannes-aware thus may be used to produce and manipulate strings with pre-defined endianness even if endianness of host differs.

12. Unicode "surrogate-pairs" awareness: 2-byte Unicode strings correctly process/distinguish "surrogate-pairs" as 1 Unicode symbol.

13. Asynchronous execution: almost all functions of single-byte (ASCII/ANSI) strings and insert functions can be executed asynchronously

Performance Comparison

Performance comparison tables for mdz_ansi_find() and mdz_ansi_firstOf() give an idea about mdz_ansi library overall performance on different platforms compared to STL and standard C library. Modern implementationsof STL and standard C library are pretty fast, using optimized versions of memory-access functions.

mdz_ansi_find Test

Following tests are executed:

Test 1/100M": Find 1 byte - in the end of 100M bytes long string
Test 5/100M": Find 5 bytes long string - in the end of 100M bytes long string
Test 10/100M": Find 10 bytes long string - in the end of 100M bytes long string
Test 100/100M": Find 100 bytes long string - in the end of 100M bytes long string
Test 1K/100M": Find 1K bytes long string - in the end of 100M bytes long string
Test 500K/1M": Find 500K bytes long string - in the end of 1M bytes long string
Test 100M-100/100M": Find "100M minus 100" bytes long string - in the end of 100M bytes long string
Test 100M/100M": Find 100M bytes long string - in 100M bytes long string

For Windows 10 (64-bit) on Intel i5-6600 @ 3.30GHz (4 cores/4 threads)

Monotone test : "long string" and "string to find" are both filled with '1's; on the last position of both strings is '2'

monotone = MDZ_FIND_MONOTONE method
clib = MDZ_FIND_CLIB method (standard C library)
brute = MDZ_FIND_BRUTE method
bmh = MDZ_FIND_BMH method

VC++ toolset v140 (32-bit)
(all numbers are in microseconds measured using QueryPerformanceCounter() in main execution thread)

Test	mdz_ansi, monotone	mdz_ansi, clib	mdz_ansi, brute	mdz_ansi, bmh	std::string.find()	clib (strstr())
1/100M	70,351				162,681	70,579
5/100M	407,668	460,052	3,045,869	781,655	3,381,061	482,075
10/100M	1,334,782	707,712	4,394,022	780,128	4,206,329	731,395
100/100M	1,333,516	10,914,646	15,779,350	781,370	15,652,407	11,253,026
1K/100M	1,332,838	70,179,989	139,398,637	781,439	139,808,212	75,808,535
500K/1M	13,202	166,409,422	323,375,345	9,411	324,276,637	178,302,908
100M-100/100M	1,262,919	10,884,012	14,182,350	1,066,737	14,150,110	10,383,086
100M/100M	117,970				144,573	114,565

MinGW/gcc toolset (32-bit)
(all numbers are in microseconds measured using QueryPerformanceCounter() in main execution thread)

Test	mdz_ansi, monotone	mdz_ansi, clib	mdz_ansi, brute	mdz_ansi, bmh
1/100M	148,067
5/100M	534,070	1,599,882	6,825,862	784,326
10/100M	551,404	3,635,378	7,898,385	783,832
100/100M	550,701	32,447,796	20,451,496	786,006
1K/100M	551,213	348,052,489	117,762,194	784,335
500K/1M	7,851	814,620,053	246,574,213	6,263
100M-100/100M	997,729	33,028,357	11,705,985	456,680
100M/100M	328,564

mdz_ansi_firstOf Test

Following tests are executed:

Test 1/100M": Find first of 1 byte - in the end of 100M bytes long string
Test 5/100M": Find first of 5 bytes - in the end of 100M bytes long string
Test 20/100M": Find first of 20 bytes - in the end of 100M bytes long string
Test 50/100M": Find first of 50 bytes - in the end of 100M bytes long string
Test 100/100M": Find first of 100 bytes - in the end of 100M bytes long string

For Windows 10 (64-bit) on Intel i5-6600 @ 3.30GHz (4 cores/4 threads)

VC++ toolset v140 (32-bit)
(all numbers are in microseconds measured using QueryPerformanceCounter() in main execution thread)

Test	mdz_ansi	std::string.find_first_of()	clib (strcspn())
1/100M	70,078	163,666	2,085,714
5/100M	370,204	3,719,660	2,077,677
20/100M	369,162	5,714,212	2,076,031
50/100M	368,994	10,965,401	2,078,038
100/100M	369,360	18,727,283	2,076,740

MinGW/gcc toolset (32-bit)
(all numbers are in microseconds measured using QueryPerformanceCounter() in main execution thread)

Test	mdz_ansi
1/100M	153,511
5/100M	278,387
20/100M	276,389
50/100M	275,956
100/100M	277,709

mdz_string Usage

mdz_string is implemented with strict input parameters checking. It means mdz_false or some other error indication will be returned if one or several input parameters are invalid - even if such an invalidity doesn't lead to inconsistence (for example adding or removing 0 items).

Test license generation: - in order to get free test-license, please proceed to our Shop page maxdz Shop and register an account. After registration you will be able to obtain free 30-days test-licenses for our products using "Obtain for free" button. Test license data should be used in mdz_string_init() call for library initialization.

NOTE: All 0.x releases are kind of "beta-versions" and can be used 1) only with test-license (during test period of 30 days, with necessity to re-generate license for the next 30 days test period) and 2) without expectations of interface backward-compatibility.

Code Example

mdz_string_init() with license information should be called for library initialization before any subsequent calls:

#include <mdz_string.h>

int main(int argc, char* argv[])
{
  /* mdz_string library initialization using test info retrieved after license generation (see "Test license generation" above) */
  
  mdz_bool bRet = mdz_string_init("<first-name-hash>", "<last-name-hash>", "<email-hash>", "<license-hash>");
  ...

  mdz_string_uninit(); /* call for un-initialization of library */
  
  return 0;  
}

After library initialization call mdz_utf8_create() for utf8 string creation. There should be also symmetric mdz_utf8_destroy() call for every create, otherwise allocated for string memory remains occupied:

#include <mdz_string.h>
#include <mdz_utf8.h>

int main(int argc, char* argv[])
{
  mdz_bool bRet = mdz_string_init("<first-name-hash>", "<last-name-hash>", "<email-hash>", "<license-hash>");
  
  // initialize pAnsi
  
  mdz_Utf8* pUtf8 = mdz_utf8_create(0); // create utf8-string
  ...
  ...
  // use pUtf8
  ...
  ...
  // destroy pUtf8
  
  mdz_utf8_destroy(&pUtf8); // after this pUtf8 should be NULL
  
  mdz_string_uninit();
  ...
}

Use mdz_Utf8* pointer for subsequent library calls:

#include <mdz_string.h>
#include <mdz_utf8.h>

int main(int argc, char* argv[])
{
  mdz_bool bRet = mdz_string_init("<first-name-hash>", "<last-name-hash>", "<email-hash>", "<license-hash>");
  
  mdz_Utf8* pUtf8 = mdz_utf8_create(0); // create utf8-string

  // reserve memory for 5 elements
  
  bRet = mdz_utf8_reserve(pUtf8, 5);
  
  // insert 'b' in front position with auto-reservation if necessary
  
  bRet = mdz_utf8_insertAnsi(pUtf8, 0, "b", 1, mdz_true); // "b" after this call
  
  // append string with "cde" with auto-reservation if necessary
  
  bRet = mdz_utf8_insert(pUtf8, (size_t) -1, "cde", 3, mdz_true); // "bcde" after this call
  
  ...
  
  mdz_utf8_destroy(&pUtf8);
  
  mdz_string_uninit();
  ...
}

mdz_ansi Overview

Wiki: mdz_ansi Wiki
file: "mdz_ansi.h"

Please take a look at "mdz_ansi.h" file or mdz_ansi Wiki site for detailed functions descriptions.

mdz_wchar Overview

Wiki: mdz_wchar Wiki
file: "mdz_wchar.h"

Please take a look at "mdz_wchar.h" file or mdz_wchar Wiki site for detailed functions descriptions.

mdz_utf8 Overview

Wiki: mdz_utf8 Wiki
file: "mdz_utf8.h"

Please take a look at "mdz_utf8.h" file or mdz_utf8 Wiki site for detailed functions descriptions.

mdz_utf16 Overview

Wiki: mdz_utf16 Wiki
file: "mdz_utf16.h"

Please take a look at "mdz_utf16.h" file or mdz_utf16 Wiki site for detailed functions descriptions.

mdz_utf32 Overview

Wiki: mdz_utf32 Wiki
file: "mdz_utf32.h"

Please take a look at "mdz_utf32.h" file or mdz_utf32 Wiki site for detailed functions descriptions.

Licensing info

Use of mdz_string library is regulated by license agreement in LICENSE.txt

Basically private non-commercial "test" usage is unrestricted. Commercial usage of library (incl. its source code) will be regulated by according license agreement.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
Win32/shared		Win32/shared
Win64/shared		Win64/shared
HISTORY.txt		HISTORY.txt
README.md		README.md
mdz_ansi.h		mdz_ansi.h
mdz_string.h		mdz_string.h
mdz_types.h		mdz_types.h
mdz_utf16.h		mdz_utf16.h
mdz_utf32.h		mdz_utf32.h
mdz_utf8.h		mdz_utf8.h
mdz_wchar.h		mdz_wchar.h

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Table of Contents

mdz_string Overview

mdz_string Advantages

Performance Comparison

mdz_string Usage

Code Example

mdz_ansi Overview

mdz_wchar Overview

mdz_utf8 Overview

mdz_utf16 Overview

mdz_utf32 Overview

Licensing info

About

Releases

Packages

Languages

maxdz-gmbh/mdz_string

Folders and files

Latest commit

History

Repository files navigation

Table of Contents

mdz_string Overview

mdz_string Advantages

Performance Comparison

mdz_string Usage

Code Example

mdz_ansi Overview

mdz_wchar Overview

mdz_utf8 Overview

mdz_utf16 Overview

mdz_utf32 Overview

Licensing info

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages