The purpose of Chronicle Wire is to combine a number of concerns in a consistent manner:
-
Application configuration. (Using YAML)
-
Data serialization (YAML, binary YAML, JSON, Raw binary data, CSV)
-
Accessing off heap memory in a thread safe manner. (Bind to shared off heap memory)
-
High performance data exchange via binary formats. (Only include as much meta data as you need)
- Design
- Support
- Text Formats
- Binary Formats
- Using Wire
- Binding to a field value
- Compression Options
- Bytes options
- Uses
- Similar projects
- Design notes.
- Schema evolution.
- Zero copy.
- Random Access.
- Safe against malicious input.
- Reflection / generic algorithms.
- Initialization order.
- Unknown field retention?
- Object-maximumLimit RPC system.
- Schema language
- Usable as mutable state
- Padding takes space on the wire.
- Unset fields take space on the wire?
- Pointers take space on the wire.
- Platform support
Chronicle Wire uses Chronicle Bytes for bytes manipulation and Chronicle Core for low level JVM access.
Often you want to use these interchangeably.
-
Configuration includes aliased type information. This supports easy extension through adding new classes/versions and cross platform through type aliasing.
-
By supporting types, a configuration file can bootstrap itself. You control how the configuration file is decoded. engine.yaml
-
To send the configuration of a server to a client or visa-versa.
-
To store the configuration of a data store in it’s header.
-
In configuration, to be able to create any object or component.
-
Save a configuration after you have changed it.
-
To be able to share data in memory between processes in a thread safe manner.
Chronicle Wire supports a separation of describing what data you want to store and retrieve and how it should be rendered/parsed. Wire handles a variety of formatting options for a wide range of formats.
A key aim of Wire is to support schema changes. It should make reasonable attempts to handle:
-
optional fields,
-
fields in a different order,
-
fields the consumer doesn’t expect. Optionally parsing them or ignoring them,
-
more or less data than expected (in field-less formats),
-
reading a different type to the one written,
-
updating fixed length fields, atomically where possible via a "bound" data structure.
It should also be as efficient as possible in the case where any or all of these are true:
-
fields are in the order expected.
-
fields are the type expected.
-
fields names/numbers are not used.
-
self describing types are not needed.
-
random access of data values is supported.
Wire is designed to make it easy to convert from one wire format to another. e.g. you can use fixed width binary data in memory for performance and variable width or text over the network. Different TCP connection could use different formats.
Wire also supports hybrid wire formats. e.g. you can have one format embedded in another.
The text formats include * YAML (a subset of mapping structures included) * JSON (super set to support serialization) * CSV (super set to support serialization) * XML (planned) * FIX (proposed)
Options include:
-
field names (e.g. JSON) or field numbers (e.g. FIX)
-
optional fields with default values that can be dropped.
-
zero copy access to fields. (planned)
-
thread safe operations in text. (planned)
To support wire format discovery, the first byte should be in the ASCII range, adding an ASCII whitespace if needed.
The binary formats include:
-
binary YAML.
-
delta compressing Binary YAML. (Chronicle-Wire-Enterprise)
-
typed data without fields.
-
raw untyped fieldless data.
-
BSON (Binary JSon). (planned)
Options for Binary format:
-
field names or field numbers.
-
variable width.
-
optional fields with a default value can be dropped.
-
fixed width data with zero copy support.
-
thread safe operations.
Note: Wire supports debug/transparent combinations like self describing data with zero copy support.
To support wire format discovery, the first bytes should have the top bit set.
First you need to have a buffer to write to. This can be a byte[], a ByteBuffer, off heap memory, or even an address and length you have obtained from some other library.
// Bytes which wraps a ByteBuffer which is resized as needed. Bytes<ByteBuffer> bytes = Bytes.elasticByteBuffer();
Now you can choose which format you are using. As the wire formats are themselves unbuffered, you can use them with the same buffer, but in general using one wire format is easier.
Wire wire = new TextWire(bytes); // or WireType wireType = WireType.TEXT; Wire wireB = wireType.apply(bytes); // or Bytes<ByteBuffer> bytes2 = Bytes.elasticByteBuffer(); Wire wire2 = new BinaryWire(bytes2); // or Bytes<ByteBuffer> bytes3 = Bytes.elasticByteBuffer(); Wire wire3 = new RawWire(bytes3);
So now you can write to the wire with a simple document.
wire.write(() -> "message").text("Hello World") .write(() -> "number").int64(1234567890L) .write(() -> "code").asEnum(TimeUnit.SECONDS) .write(() -> "price").float64(10.50); System.out.println(bytes);
prints
message: Hello World number: 1234567890 code: SECONDS price: 10.5
// the same code as for text wire wire2.write(() -> "message").text("Hello World") .write(() -> "number").int64(1234567890L) .write(() -> "code").asEnum(TimeUnit.SECONDS) .write(() -> "price").float64(10.50); System.out.println(bytes2.toHexString());
prints
00000000 C7 6D 65 73 73 61 67 65 EB 48 65 6C 6C 6F 20 57 ·message ·Hello W 00000010 6F 72 6C 64 C6 6E 75 6D 62 65 72 A3 D2 02 96 49 orld·num ber····I 00000020 C4 63 6F 64 65 E7 53 45 43 4F 4E 44 53 C5 70 72 ·code·SE CONDS·pr 00000030 69 63 65 90 00 00 28 41 ice···(A
Using the RawWire strips away all the meta data to reduce the size of the message, and improve speed. The down side is that we cannot easily see what the message contains.
// the same code as for text wire wire3.write(() -> "message").text("Hello World") .write(() -> "number").int64(1234567890L) .write(() -> "code").asEnum(TimeUnit.SECONDS) .write(() -> "price").float64(10.50); System.out.println(bytes3.toHexString());
prints in RawWire
00000000 0B 48 65 6C 6C 6F 20 57 6F 72 6C 64 D2 02 96 49 ·Hello W orld···I 00000010 00 00 00 00 07 53 45 43 4F 4E 44 53 00 00 00 00 ·····SEC ONDS···· 00000020 00 00 25 40 ··%@
For more examples see Examples Chapter1
While serialized data can be updated by replacing a whole record, this might not be the most efficient option, nor thread safe. Wire offers the ability to bind a reference to a fixed value of a field and perform atomic operations on that field such as volatile read/write and compare-and-swap.
// field to cache the location and object used to reference a field. private LongValueReference counter = null; // find the field and bind an approritae wrapper for the wire format. wire.read(COUNTER).int64(counter, x -> counter = x); // thread safe across processes on the same machine. long id = counter.getAndAdd(1);
Other types such as 32 bit integer values and an array of 64-bit integer values are supported.
Wire is built on top of the Bytes library, however Bytes in turn can wrap
-
ByteBuffer - heap and direct
-
byte\[\] (via ByteBuffer)
-
raw memory addresses.
Wire will be used for:
-
file headers.
-
TCP connection headers where the optimal Wire format actually used can be negotiated.
-
message/excerpt contents.
-
the next version of Chronicle Queue.
-
the API for marshalling generated data types.
Simple Binary Encoding is designed to do what it says. It’s simple, it’s binary and it supports C++ and Java. It is designed to be a more efficient replacement for FIX. It is not limited to FIX protocols and can be easily extended by updating an XML schema.
XML, when it first started, didn’t use XML for it’s own schema files, and it’s not insignificant that SBE doesn’t use SBE for it’s schema either. This is because it is not trying to be human readable. It has XML which, though standard, isn’t designed to be particularly human readable either. Peter Lawrey thinks it’s a limitation that it doesn’t naturally lend itself to a human readable form.
The encoding SBE uses is similar to binary, with field numbers and fixed width types. SBE assumes the field types, which can be more compact than Wire’s most similar option (though not as compact as others).
SBE has support for schema changes provided the type of a field doesn’t change.
Message Pack is a packed binary wire format which also supports JSON for human readability and compatibility. It has many similarities to the binary (and JSON) formats of this library. c.f. Wire is designed to be human readable first, based on YAML, and has a range of options to make it more efficient. The most extreme being fixed position binary.
Msgpack has support for embedded binary, whereas Wire has support for comments and hints to improve rendering for human consumption.
The documentation looks well thought out, and it is worth emulating.
Feature |
Wire Text |
Wire Binary |
Protobuf |
Cap’n Proto |
SBE |
FlatBuffers |
Schema evolution |
yes |
yes |
yes |
yes |
caveats |
yes |
Zero-copy |
yes |
yes |
no |
yes |
yes |
yes |
Random-access reads |
yes |
yes |
no |
yes |
no |
yes |
Random-access writes |
yes |
yes |
no |
? |
no |
? |
Safe against malicious input |
yes |
yes |
yes |
yes |
yes |
opt-in / upfront |
Reflection / generic algorithms |
yes |
yes |
yes |
yes |
yes |
yes |
Initialization order |
any |
any |
any |
any |
preorder |
bottom-up |
Unknown field retention |
yes |
yes |
yes |
yes |
no |
no |
Object-capability RPC system |
yes |
yes |
no |
yes |
no |
no |
Schema language |
no |
no |
custom |
custom |
XML |
custom |
Usable as mutable state |
yes |
yes |
yes |
no |
no |
no |
Padding takes space on wire? |
optional |
optional |
no |
optional |
yes |
yes |
Unset fields take space on wire? |
optional |
optional |
no |
yes |
yes |
no |
Pointers take space on wire? |
no |
no |
no |
yes |
no |
yes |
C++ |
planned |
planned |
yes |
yes (C++11)* |
yes |
yes |
Java |
Java 8 |
Java 8 |
yes |
yes* |
yes |
yes |
C# |
yes |
yes |
yes |
yes* |
yes |
yes* |
Go |
no |
no |
yes |
yes |
no |
yes* |
Other languages |
no |
no |
6+ |
others* |
no |
no |
Authors' preferred use case |
distributed computing |
financial / trading |
distributed computing |
platforms / sandboxing |
financial / trading |
games |
Note
|
The Binary YAML format can be automatically converted to YAML without any knowledge of the schema as the messages are self describing. |
Note
|
You can parse all the expected fields (if any) and then parse any remaining fields. As YAML supports object field "names" or keys, these could be Strings or even Object as keys and values. |
Note: It not clear what padding which doesn’t take up space on the wire means.
See https://capnproto.org/news/2014-06-17-capnproto-flatbuffers-sbe.html for a comparison to other encoders.
Wire optionally supports:
-
field name changes,
-
field order changes,
-
capturing or ignoring unexpected fields,
-
setting of fields to the default, if not available,
-
raw messages can be longer or shorter than expected.
The more flexibility, the larger the overhead in terms of CPU and memory. Wire allows you to dynamically pick the optimal configuration and convert between these options.
Wire supports zero copy random access to fields and direct copy from in memory to the network. It also support translation from one wire format to another e.g. switching between fixed length data and variable length data.
You can access a random field in memory e.g. in 2 TB file, page in/pull into CPU cache, only the data relating to you read or write.
format | access style |
---|---|
fixed length binary |
random access without parsing first |
variable length binary |
random access with partial parsing. i.e. you can skip large portions |
fixed length text |
random access with parsing |
variable length text |
no random access |
Wire References are relative to the start of the data contained, to allow loading in an arbitrary point in memory.
Wire has built in tiers of bounds checks to prevent accidental read/writing corrupting the data. It is not complete enough for a security review.
Wire supports generic reading and writing of an arbitrary stream. This can be used in combination with predetermined fields. e.g. you can read the fields you know about and ask it to provide the fields you didn’t. You can also give generic field names like keys to a map as YAML does.
Wire can handle unknown information like lengths by using padding. It will go back and fill in any data which it wasn't aware of as it was writing the data. e.g. when it writes an object it doesn't know how long it is going to be so it adds padding at the start. Once the object has been written it goes back and overwrites the length. It can also hand cases where the length was more than needed- known as packing.
Wire can handle reading data it didn’t expect interspersed with data it did expect. Rather than specify the expected field name, a StringBuilder is provided.
Note: there are times when you want to skip/copy an entire field or message without reading any more of it. This is also supported.
Wire supports references based on a name, number or UUID. This is useful when including a reference to an object the reader should look up via other means.
-
A common case, if when you have a proxy to a remote object and you want to pass or return this in an RPC call.
Wire’s schema is not externalised from the code, however it is planned to use YAML in a format it can parse.
Wire supports storing an application’s internal state. This will not allow it to grow or shrink. You can’t free any of it without copying the pieces you need and discarding the original copy.
The Wire format chosen determines if there is any padding on the wire. If you copy the in memory data directly, it’s format doesn’t change. If you want to drop padding you can copy the message to a wire format without padding. You can decide whether the original padding is to be preserved or not if turned back into a format with padding.
We could look at supporting Cap’n’Proto’s zero byte removal compression.
Wire supports fields with and without optional fields and automatic means of removing them. It doesn’t support automatically adding them back in, as information has been lost.
Wire doesn’t have pointer but it does have content lengths which are a useful hint for random access and robustness, but these are optional.