Skip to content

Commit

Permalink
lib: mpack: add new 'mpack' MessagePack (lite) library
Browse files Browse the repository at this point in the history
Fluent Bit uses MessagePack data serialization format for it internal
representation of records. In order to pack and unpack data we use
'msgpack-c' library. Despite the library is really good and have a good
performance, there are certain scenarios where we need something more
lightweight.

It's common that in the data pipeline there are many stages where we need
to count the actual number of records in a chunk, that number can be altered
by filters and also internal metrics routines needs to perform the same
calculation. I could not find a way to perform 'zero-copy' unpacking of
data just to count the records, a zone must be always allocated and data
copied everytime, which is unnecessary for our use case.

'mpack' is a lightweight MessagePack library that offers an API that for
certain operations don't need a buffer and offers a better performance,
some local tests shows a gain of 20% in speed.

This patch integrates 'mpack' and further patches will wrap functions
on top of it, specifically on flb_mp.c file.

refs:

- https://github.com/ludocode/mpack

Signed-off-by: Eduardo Silva <eduardo@treasure-data.com>
  • Loading branch information
edsiper committed Jul 15, 2019
1 parent f48b326 commit 3886c22
Show file tree
Hide file tree
Showing 14 changed files with 14,570 additions and 0 deletions.
3 changes: 3 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -269,6 +269,9 @@ option(MSGPACK_BUILD_TESTS OFF)
option(MSGPACK_BUILD_EXAMPLES OFF)
add_subdirectory(${FLB_PATH_LIB_MSGPACK} EXCLUDE_FROM_ALL)

# MPack
add_subdirectory(${FLB_PATH_LIB_MPACK} EXCLUDE_FROM_ALL)

# Chunk I/O
FLB_OPTION(CIO_LIB_STATIC ON)
FLB_OPTION(CIO_LIB_SHARED OFF)
Expand Down
1 change: 1 addition & 0 deletions cmake/headers.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ include_directories(
${FLB_PATH_ROOT_SOURCE}/${FLB_PATH_LIB_MONKEY}/include
${FLB_PATH_ROOT_SOURCE}/${FLB_PATH_LIB_MBEDTLS}/include
${FLB_PATH_ROOT_SOURCE}/${FLB_PATH_LIB_SQLITE}
${FLB_PATH_ROOT_SOURCE}/${FLB_PATH_LIB_MPACK}/src
)

# On Windows, the core uses libevent
Expand Down
1 change: 1 addition & 0 deletions cmake/libraries.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,4 @@ set(FLB_PATH_LIB_JSMN "lib/jsmn")
set(FLB_PATH_LIB_MBEDTLS "lib/mbedtls-2.16.1")
set(FLB_PATH_LIB_SQLITE "lib/sqlite-amalgamation-3240000")
set(FLB_PATH_LIB_ONIGMO "lib/onigmo")
set(FLB_PATH_LIB_MPACK "lib/mpack-amalgamation-1.0")
9 changes: 9 additions & 0 deletions lib/mpack-amalgamation-1.0/AUTHORS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
| Author | Profile |
| :------------------------------ | :----------------------------------------- |
| Nicholas Fraser | https://github.com/ludocode |
| Jerry Jacobs | https://github.com/xor-gate |
| Rik van der Heijden | https://github.com/rikvdh |
| Chris Heijdens | https://github.com/chris-heijdens |
| Jean-Louis Fuchs | https://github.com/ganwell |
| Christopher Field | https://github.com/volks73 |
| 喜欢兰花山丘 | https://github.com/wangzhione |
151 changes: 151 additions & 0 deletions lib/mpack-amalgamation-1.0/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,151 @@
MPack v1.0
----------

A number of breaking API changes have been made for the 1.0 release. Please take note of these changes when upgrading.

Breaking Changes:

- The Node API now separates tree initialization from parsing. After calling one of the `mpack_tree_init()` functions, you must explicitly call `mpack_tree_parse()` before accessing any nodes.

- The configuration file `mpack-config.h` is now optional, and requires `MPACK_HAS_CONFIG` in order to be included. This means you must define `MPACK_HAS_CONFIG` when upgrading or your config file will be ignored!

- Extension types are now disabled by default. You must define `MPACK_EXTENSIONS` to use them.

- `mpack_tag_t` is now considered an opaque type to prevent future breakage when changing its layout. Compatibility is maintained for this release, but this may change in future releases.

New Features:

- The Node API can now parse multiple messages from a data source. `mpack_tree_parse()` can be called repeatedly to parse each message.

- The Node API can now parse messages indefinitely from a continuous stream. A tree can be initialized with `mpack_tree_init_stream()` to receive a callback for more data.

- The Node API can now parse messages incrementally from a non-blocking stream. Call `mpack_tree_try_parse()` with a non-blocking read function to start and resume parsing. It will return true when a complete message has become available.

- The stdio helpers now allow reading from a `FILE*`. `_init_file()` functions have been renamed to `_init_filename()`. (The old names will continue to work for a few more versions.)

- The Node API now returns a node of "missing" type instead of "nil" type for optional map lookups. This allows the caller to tell the difference between a key having value nil and a missing key.

- The writer now supports a v4 compatibility mode. Call `mpack_writer_set_version(writer, mpack_version_v4);` to encode without using the `raw8`, `bin` and `ext` types. (This requires `MPACK_COMPATIBILITY`.)

- The timestamp type has been implemented. A timestamp is a signed number of nanoseconds since the Unix epoch (1970-01-01T00:00:00Z). (This requires `MPACK_EXTENSIONS`.)

Bug Fixes and Other Changes:

- Fixed an allocation bug when closing a growable writer without having written anything (#58).

- The reader's skip function is no longer ignored under `MPACK_OPTIMIZE_FOR_SIZE`.

MPack v0.8.2
------------

Changes:

- Fixed incorrect element tracking in `mpack_write_tag()`
- Added type-generic writer functions `mpack_write()` and `mpack_write_kv()`
- Added `mpack_write_object_bytes()` to insert pre-encoded MessagePack into a larger message
- Enabled strings in all builds by default
- Fixed unit test errors under `-ffast-math`
- Fixed some compiler warnings

MPack v0.8.1
------------

Changes:

- Fixed some compiler warnings
- Added various performance improvements
- Improved documentation

MPack v0.8
----------

Changes:

- Added `mpack_peek_tag()`
- Added reader helper functions to [expect re-ordered map keys](http://ludocode.github.io/mpack/md_docs_expect.html)
- [Improved documentation](http://ludocode.github.io/mpack/) and added [Pages](http://ludocode.github.io/mpack/pages.html)
- Made node key lookups check for duplicate keys
- Added various UTF-8 checking functions for reader and nodes
- Added support for compiling as C in recent versions of Visual Studio
- Removed `mpack_expect_str_alloc()` and `mpack_expect_utf8_alloc()`
- Fixed miscellaneous bugs and improved performance

MPack v0.7.1
------------

Changes:

- Removed `mpack_reader_destroy_cancel()` and `mpack_writer_destroy_cancel()`. You must now flag an error (such as `mpack_error_data`) in order to cancel reading.
- Added many code size optimizations. `MPACK_OPTIMIZE_FOR_SIZE` is no longer experimental.
- Improved and reorganized [Writer documentation](http://ludocode.github.io/mpack/group__writer.html)
- Made writer flag `mpack_error_too_big` instead of `mpack_error_io` if writing too much data without a flush callback
- Added optional `skip` callback and optimized `mpack_discard()`
- Fixed various compiler and code analysis warnings
- Optimized speed and memory usage

MPack v0.7
----------

Changes:

- Fixed various bugs in UTF-8 checking, error handler callbacks, out-of-memory and I/O errors, debug print functions and more
- Added many missing Tag and Expect functions such as `mpack_tag_ext()`, `mpack_expect_int_range()` and `mpack_expect_utf8()`
- Added extensive unit tests

MPack v0.6
----------

Changes:

- `setjmp`/`longjmp` support has been replaced by error callbacks. You can safely `longjmp` or throw C++ exceptions out of error callbacks. Be aware of local variable invalidation rules regarding `setjmp` if you use it. See the [documentation for `mpack_reader_error_t`](http://ludocode.github.io/mpack/mpack-reader_8h.html) and issue #19 for more details.
- All `inline` functions in the MPack API are no longer `static`. A single non-`inline` definition of each `inline` function is emitted, so they behave like normal functions with external linkage.
- Configuration options can now be pre-defined before including `mpack-config.h`, so you can customize MPack by defining these in your build system rather than editing the configuration file.

MPack v0.5.1
------------

Changes:

- Fixed compile errors in debug print function
- Fixed C++11 warnings

MPack v0.5
----------

Changes:

- `mpack_node_t` is now a handle, so it should be passed by value, not by pointer. Porting to the new version should be as simple as replacing `mpack_node_t*` with `mpack_node_t` in your code.
- Various other minor API changes have been made.
- Major performance improvements were made across all aspects of MPack.

MPack v0.4
----------

Changes

- Added `mpack_writer_init_growable()` to write to a growable buffer
- Converted tree parser to support node pool and pages. The Node API no longer requires an allocator.
- Added Xcode unit test project, included projects in release package
- Fixed various bugs

MPack v0.3
----------

Changes:

- Changed default config and test suite to use `DEBUG` and `_DEBUG` (instead of `NDEBUG`)
- Added Visual Studio project for running unit tests
- Fixed various bugs

MPack v0.2
----------

Changes:

- Added teardown callbacks to reader, writer and tree
- Simplified API for working with files (`mpack_file_tree_t` is now internal)

MPack v0.1
----------

Initial release.
6 changes: 6 additions & 0 deletions lib/mpack-amalgamation-1.0/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
set(src
src/mpack/mpack.c
)

add_definitions(-DMPACK_EXTENSIONS=1)
add_library(mpack-static STATIC ${src})
22 changes: 22 additions & 0 deletions lib/mpack-amalgamation-1.0/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
The MIT License (MIT)

Copyright (c) 2015-2018 Nicholas Fraser

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

146 changes: 146 additions & 0 deletions lib/mpack-amalgamation-1.0/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
## Introduction

MPack is a C implementation of an encoder and decoder for the [MessagePack](http://msgpack.org/) serialization format. It is:

* Simple and easy to use
* Secure against untrusted data
* Lightweight, suitable for embedded
* [Extensively documented](http://ludocode.github.io/mpack/)
* [Extremely fast](https://github.com/ludocode/schemaless-benchmarks#speed---desktop-pc)

The core of MPack contains a buffered reader and writer, and a tree-style parser that decodes into a tree of dynamically typed nodes. Helper functions can be enabled to read values of expected type, to work with files, to allocate strings automatically, to check UTF-8 encoding, and more.

The MPack code is small enough to be embedded directly into your codebase. Simply download the [amalgamation package](https://github.com/ludocode/mpack/releases) and add `mpack.h` and `mpack.c` to your project.

The MPack featureset can be customized at compile-time to set which features, components and debug checks are compiled, and what dependencies are available.

## Build Status

MPack is beta software under development.

[travis-home]: https://travis-ci.org/
[travis-mpack]: https://travis-ci.org/ludocode/mpack/branches
[appveyor-home]: https://ci.appveyor.com/
[appveyor-mpack-master]: https://ci.appveyor.com/project/ludocode/mpack/branch/master
[appveyor-mpack-develop]: https://ci.appveyor.com/project/ludocode/mpack/branch/develop
[coveralls-home]: https://coveralls.io/
[coveralls-mpack-master]: https://coveralls.io/github/ludocode/mpack?branch=master
[coveralls-mpack-develop]: https://coveralls.io/github/ludocode/mpack?branch=develop

<!-- we use some deprecated HTML attributes here to get these stupid badges to line up properly -->
| [Travis-CI][travis-home] | [AppVeyor][appveyor-home] | [Coveralls.io][coveralls-home] |
| :-------: | :----------: | :----------: |
| [<img src="https://travis-ci.org/ludocode/mpack.svg?branch=develop" alt="Build Status" align="top" vspace="4">][travis-mpack] | [<img src="https://ci.appveyor.com/api/projects/status/tux06aefpqq83k30/branch/develop?svg=true" alt="Build Status" align="top" vspace="4">][appveyor-mpack-develop] | [<img src="https://coveralls.io/repos/ludocode/mpack/badge.svg?branch=develop&service=github" alt="Build Status" align="top" vspace="4">][coveralls-mpack-develop] |

## The Node API

The Node API parses a chunk of MessagePack data into an immutable tree of dynamically-typed nodes. A series of helper functions can be used to extract data of specific types from each node.

```C
// parse a file into a node tree
mpack_tree_t tree;
mpack_tree_init_filename(&tree, "homepage-example.mp", 0);
mpack_tree_parse(&tree);
mpack_node_t root = mpack_tree_root(&tree);

// extract the example data on the msgpack homepage
bool compact = mpack_node_bool(mpack_node_map_cstr(root, "compact"));
int schema = mpack_node_i32(mpack_node_map_cstr(root, "schema"));

// clean up and check for errors
if (mpack_tree_destroy(&tree) != mpack_ok) {
fprintf(stderr, "An error occurred decoding the data!\n");
return;
}
```
Note that no additional error handling is needed in the above code. If the file is missing or corrupt, if map keys are missing or if nodes are not in the expected types, special "nil" nodes and false/zero values are returned and the tree is placed in an error state. An error check is only needed before using the data.
The above example decodes into allocated pages of nodes. A fixed node pool can be provided to the parser instead in memory-constrained environments. For maximum performance and minimal memory usage, the [Expect API](docs/expect.md) can be used to parse data of a predefined schema.
## The Write API
The Write API encodes structured data to MessagePack.
```C
// encode to memory buffer
char* data;
size_t size;
mpack_writer_t writer;
mpack_writer_init_growable(&writer, &data, &size);
// write the example on the msgpack homepage
mpack_start_map(&writer, 2);
mpack_write_cstr(&writer, "compact");
mpack_write_bool(&writer, true);
mpack_write_cstr(&writer, "schema");
mpack_write_uint(&writer, 0);
mpack_finish_map(&writer);
// finish writing
if (mpack_writer_destroy(&writer) != mpack_ok) {
fprintf(stderr, "An error occurred encoding the data!\n");
return;
}
// use the data
do_something_with_data(data, size);
free(data);
```

In the above example, we encode to a growable memory buffer. The writer can instead write to a pre-allocated or stack-allocated buffer, avoiding the need for memory allocation. The writer can also be provided with a flush function (such as a file or socket write function) to call when the buffer is full or when writing is done.

If any error occurs, the writer is placed in an error state. The writer will flag an error if too much data is written, if the wrong number of elements are written, if the data could not be flushed, etc. No additional error handling is needed in the above code; any subsequent writes are ignored when the writer is in an error state, so you don't need to check every write for errors.

Note in particular that in debug mode, the `mpack_finish_map()` call above ensures that two key/value pairs were actually written as claimed, something that other MessagePack C/C++ libraries may not do.

## Comparison With Other Parsers

MPack is rich in features while maintaining very high performance and a small code footprint. Here's a short feature table comparing it to other C parsers:

[mpack]: https://github.com/ludocode/mpack
[msgpack-c]: https://github.com/msgpack/msgpack-c
[cmp]: https://github.com/camgunz/cmp

| | [MPack][mpack]<br>(v0.8) | [msgpack-c][msgpack-c]<br>(v1.3.0) | [CMP][cmp]<br>(v14) |
|:------------------------------------|:---:|:---:|:---:|
| No libc requirement || ||
| Growable memory writer ||| |
| File I/O helpers ||| |
| Tree parser ||| |
| Propagating errors || ||
| Compound size tracking || | |
| Incremental parser || ||
| Incremental range/match helpers || | |
| Tree stream parser | || |
| UTF-8 verification || | |

A larger feature comparison table is available [here](docs/features.md) which includes descriptions of the various entries in the table.

[This benchmarking suite](https://github.com/ludocode/schemaless-benchmarks) compares the performance of MPack to other implementations of schemaless serialization formats. MPack outperforms all JSON and MessagePack libraries, and in some tests MPack is several times faster than [RapidJSON](https://github.com/miloyip/rapidjson) for equivalent data.

## Why Not Just Use JSON?

Conceptually, MessagePack stores data similarly to JSON: they are both composed of simple values such as numbers and strings, stored hierarchically in maps and arrays. So why not just use JSON instead? The main reason is that JSON is designed to be human-readable, so it is not as efficient as a binary serialization format:

- Compound types such as strings, maps and arrays are delimited, so appropriate storage cannot be allocated upfront. The whole object must be parsed to determine its size.

- Strings are not stored in their native encoding. Special characters such as quotes and backslashes must be escaped when written and converted back when read.

- Numbers are particularly inefficient (especially when parsing back floats), making JSON inappropriate as a base format for structured data that contains lots of numbers.

- Binary data is not supported by JSON at all. Small binary blobs such as icons and thumbnails need to be Base64 encoded or passed out-of-band.

The above issues greatly increase the complexity of the decoder. Full-featured JSON decoders are quite large, and minimal decoders tend to leave out such features as string unescaping and float parsing, instead leaving these up to the user or platform. This can lead to hard-to-find platform-specific and locale-specific bugs, as well as a greater potential for security vulnerabilites. This also significantly decreases performance, making JSON unattractive for use in applications such as mobile games.

While the space inefficiencies of JSON can be partially mitigated through minification and compression, the performance inefficiencies cannot. More importantly, if you are minifying and compressing the data, then why use a human-readable format in the first place?

## Running the Unit Tests

The MPack build process does not build MPack into a library; it is used to build and run the unit tests. You do not need to build MPack or the unit testing suite to use MPack.

On Linux, the test suite uses SCons and requires Valgrind, and can be run in the repository or in the amalgamation package. Run `scons` to build and run the test suite in full debug configuration.

On Windows, there is a Visual Studio solution, and on OS X, there is an Xcode project for building and running the test suite.

You can also build and run the test suite in all supported configurations, which is what the continuous integration server will build and run. If you are on 64-bit, you will need support for cross-compiling to 32-bit, and running 32-bit binaries with 64-bit Valgrind. On Ubuntu, you'll need `libc6-dbg:i386`. On Arch you'll need `gcc-multilib` or `lib32-clang`, and `valgrind-multilib`. Use `scons all=1 -j16` (or some appropriate thread count) to build and run all tests.
Loading

0 comments on commit 3886c22

Please sign in to comment.