[Java,C#,C++] Support field access order checking #948

ZachBray · 2023-06-21T15:05:50Z

The problem

The encoders and decoders that SBE generates in Java (and some other languages) require developers to follow a strict contract (that is not enforced by the type system):

Developers must encode/decode repeating groups and variable-length data fields in the order in which they appear in the schema.
Developers must explicitly encode/decode/skip all present groups and variable-length data, i.e., no implicit skipping.
Developers must call next() before encoding/decoding each repeating group element.

When encoding, failure to follow the contract can produce invalid messages that do not conform to the format described in the associated SBE schema.

When decoding, failure to follow the contract can result in the misinterpretation of valid messages.

The goal

The goals of this PR are:

to support checking that the contract above is followed (at runtime) in non-production environments
to minimise false positives
- w.r.t. the classification: "this code safely uses encoders and decoders"
- i.e., minimise cases where checks pass but the code is broken
to minimise false negatives
- i.e., minimise cases where checks fail but the code works

(Preferring Goal 2 to Goal 3 when there is a choice.)

This solution

This PR adds runtime checks based on a state machine that models interactions with encoders/decoders/codecs as transitions.

New system properties

This PR adds three new system properties:

sbe.generate.precedence.checks controls whether to generate runtime checks at build time
sbe.enable.precedence.checks controls whether to run checks at runtime
- defaults to not(agrona.disable.bounds.checks)
sbe.disable.implicit.copying, which we'll discuss later

Changes in generated code

Car example

flags	diff
`-Dsbe.generate.precedence.checks=true -Dsbe.cpp.disable.implicit.copying=true`	diff vs master
`-Dsbe.generate.precedence.checks=false -Dsbe.cpp.disable.implicit.copying=false`	diff vs master
`-Dsbe.generate.precedence.checks=false -Dsbe.cpp.disable.implicit.copying=true`	diff vs master

Java changes commentary

(When sbe.generate.precedence.checks is true at build time.)

Let's consider an example SBE schema fragment:

    <sbe:message name="SendChatMessage" id="99">
        <field name="chatId" id="1" type="int64"/>
        <data name="subject" id="2" type="varDataEncoding"/>
        <data name="body" id="3" type="varDataEncoding"/>
    </sbe:message>

Each message encoder and decoder now has a static boolean field SBE_ENABLE_PRECEDENCE_CHECKS that controls whether runtime checks will be performed.

private static final boolean ENABLE_BOUNDS_CHECKS = !Boolean.getBoolean("agrona.disable.bounds.checks");

private static final boolean SBE_ENABLE_PRECEDENCE_CHECKS = Boolean.parseBoolean(System.getProperty(
        "sbe.enable.precedence.checks",
        Boolean.toString(ENABLE_BOUNDS_CHECKS)));

Each message encoder and decoder also has a codecState field that represents the state of the codec according to our model. Benchmarks show better performance when runtime checks are enabled with an int than an enum.

    private int codecState = CodecStates.NOT_WRAPPED;

Each field accessor now verifies that the field access is allowed and transitions the encoder's or decoder's state.

    public SendChatMessageEncoder body(final CharSequence value)
    {
        final int length = null == value ? 0 : value.length();
        if (length > 254)
        {
            throw new IllegalStateException("length > maxValue for type: " + length);
        }

+       if (SBE_ENABLE_PRECEDENCE_CHECKS)
+       {
+           onBodyAccessed();
+       }

        final int headerLength = 1;
        final int limit = parentMessage.limit();
        parentMessage.limit(limit + headerLength + length);
        buffer.putByte(limit, (byte)length);
        buffer.putStringWithoutLengthAscii(limit + headerLength, value);

        return this;
    }

Verifying and transitioning the state machine is done within onBodyAccessed() rather than inline because benchmarks show performance improvements when running with runtime checks disabled.

    private void onBodyAccessed()
    {
        switch (codecState())
        {
            case CodecStates.V0_SUBJECT_DONE:
                codecState(CodecStates.V0_BODY_DONE);
                break;
            default:
                throw new IllegalStateException("Illegal field access order. " +
                    "Cannot access field \"body\" in state: " + CodecStates.name(codecState()) +
                    ". Expected one of these transitions: [" + CodecStates.transitions(codecState()) +
                    "]. Please see the diagram in the Javadoc of the inner class #CodecStates.");
        }
    }

The states of the encoder or decoder, along with some lookup tables for constructing error messages, are generated within an inner class:

    /**
     * The states in which a encoder/decoder/codec can live.
     *
     * <p>The state machine diagram below, encoded in the dot language, describes
     * the valid state transitions according to the order in which fields may be
     * accessed safely. Tools such as PlantUML and Graphviz can render it.
     *
     * <pre>{@code
     *   digraph G {
     *       NOT_WRAPPED -> V0_BLOCK [label="  wrap(version=0)  "];
     *       V0_BLOCK -> V0_BLOCK [label="  chatId(?)  "];
     *       V0_BLOCK -> V0_SUBJECT_DONE [label="  subject(?)  "];
     *       V0_SUBJECT_DONE -> V0_BODY_DONE [label="  body(?)  "];
     *   }
     * }</pre>
     */
    private static class CodecStates
    {
        private static final int NOT_WRAPPED = 0;
        private static final int V0_BLOCK = 1;
        private static final int V0_SUBJECT_DONE = 2;
        private static final int V0_BODY_DONE = 3;

        private static final String[] STATE_NAME_LOOKUP =
        {
            "NOT_WRAPPED",
            "V0_BLOCK",
            "V0_SUBJECT_DONE",
            "V0_BODY_DONE",
        };

        private static final String[] STATE_TRANSITIONS_LOOKUP =
        {
            "\"wrap(version=0)\"",
            "\"chatId(?)\", \"subject(?)\"",
            "\"body(?)\"",
            "",
        };

        private static String name(final int state)
        {
            return STATE_NAME_LOOKUP[state];
        }

        private static String transitions(final int state)
        {
            return STATE_TRANSITIONS_LOOKUP[state];
        }
    }

The dot-language diagram can be rendered using online or IDE-based tools.

Error messages

In the case of breaking the developer contract, the generated code produces an exception that contains:

the action against the encoder or decoder that failed
the current state of the encoder or decoder
the expected actions that can safely be performed against the encoder or decoder in its current state

Continuing our example, if we accidentally switch body and subject:

        final SendChatMessageEncoder encoder = new SendChatMessageEncoder()
            .wrapAndApplyHeader(buffer, OFFSET, messageHeaderEncoder);

        encoder.chatId(1)
            .body("About 1 ft tall and furry.")
            .subject("Missing cat");

Then we get the following error message:

Illegal field access order.
Cannot access field "body" in state: V0_BLOCK.
Expected one of these transitions: ["chatId(?)", "subject(?)"].
Please see the diagram in the Javadoc of the inner class #CodecStates.

Modelling notes:

Decoding different versions

The SBE specification says the following:

If the received version is less than the decoder’s version (that is, the producer’s encoder is older than the consumer’s decoder), then only the fields of the older version may be parsed. This information is available through metadata as “sinceVersion” attribute of a field. If sinceVersion is greater than received schema version, then the field is not available. How a decoder signals an application that a field is unavailable is an implementation detail. One strategy is for an application to provide a default value for unavailable fields.

Java decoders provide a default value when a field is absent in the received version, but C# codecs do not. Instead, C# codecs rely on the developer querying presence properties, e.g., bool IsBodyPresent { get; }. To support languages like C#, it seems sensible to model that absent fields may be skipped. Therefore, we model each version as a separate subgraph of the state machine. To allow the behaviour of decoders like Java's that return defaults, we perform the state machine transition only when a default is not returned.

Restrictions to block field access

Our current model has some false negatives (Goal 3).
In particular, we do not model the ability to access top-level block fields or group block fields after a subsequent group or variable-level data field has been accessed.

I believe eliminating the false negatives around group block access would result in an explosion in state space where the number of states would be exponential w.r.t. the number of repeating groups. I reverted a commit where I attempted to achieve this in a naive manner. The commit message discusses some of the problems at length.

Eliminating the false negatives around top-level block access is easier. We just special case these fields in the code generator.

Representing repeated groups

To provide better error messages and to avoid unrelated fields interrogating fields private to group encoders/decoders, we model groups using several states that differentiate between empty/before-next/many-elements-remaining/last-element.

Questions:

Why are methods like MessageDecoder#GroupDecoder#wrap public? I cannot think of a sensible way to model these interactions. I expect people usually wrap implicitly via .myGroupCount(N) in encoders or .myGroup() in decoders.

Other changes

i. I've added some documentation to the Wiki in a separate patch.

ii. The semantics of copying flyweights seem non-obvious to me, and the presence of implicit copy constructors and copy assignment operators permit one to write (probably) less-efficient code (but I haven't measured it, and the additional work might be optimised away). For example:

MyMessage::MyGroup grp = myMessage.myGroupCount(1); // calls copy constructor

and

MyMessage::MyOuterGroup::MyInnerGroup &inner = myOuterGroup.myInnerGroupCount(1);

// ...

inner = myOuterGroup.next().myInnerGroupCount(2); // calls copy assignment operator

vs.

MyMessage::MyGroup &grp = myMessage.myGroupCount(1);

and

MyMessage::MyOuterGroup::MyInnerGroup &inner1 = myOuterGroup.myInnerGroupCount(1);

// ...

MyMessage::MyOuterGroup::MyInnerGroup &inner2 = myOuterGroup.next().myInnerGroupCount(2);

Therefore, I've explicitly deleted the copy constructor and copy assignment operator from flyweights when the sbe.cpp.disable.implicit.copying is set to true.

sbe-tool/src/test/java/uk/co/real_logic/sbe/FieldAccessOrderCheckTest.java

sbe-tool/src/test/resources/field-order-check-schema.xml

ZachBray · 2023-06-22T07:12:09Z

I plan to make a slight improvement today. As of the current revision (4eb8fd0), we do not expose any mechanism to check that a message is fully encoded, but we easily could.

I plan to generate a method, e.g., void checkFullyEncoded(), that will throw an exception that provides similar context to our existing errors in the case where the encoder is not in a terminal state.

It would be possible to do something similar for decoders, but it would rely on having numGroups and numVarDataFields in the message header.

ZachBray · 2023-06-22T08:52:57Z

I plan to make a slight improvement today. As of the current revision (4eb8fd0), we do not expose any mechanism to check that a message is fully encoded, but we easily could.

I plan to generate a method, e.g., void checkFullyEncoded(), that will throw an exception that provides similar context to our existing errors in the case where the encoder is not in a terminal state.

As of c67e20a, we generate a void checkEncodingIsComplete() method on encoders when the sbe.generate.access.order.checks property is enabled at build time.

SBE does not support arbitrary access to fields when encoding and decoding. In particular, accessing groups and variable length fields advances a `limit`, which marks where the start of the next group/variable length field is expected. We aim to introduce some checking (in debug mode - when bounds checks are enabled) that verifies field access order is consistent enough with the order defined in the schema to encode or decode a valid message.

Also, enable some tests around field order checking. Outline of the approach: - Using a FSM, model the states of an encoder during encoding. - Hopefully, this model will be language independent. - Generate code to perform state transitions upon field accesses. - And to throw when transitions are not in the model. Assumptions: - JIT compilation will remove `if (DEBUG_MODE) { ... }` blocks when the static boolean constant, `DEBUG_MODE`, is `false`. Questions: - Is the approach reasonable? - Is the newly disallowed behaviour reasonable? - For example, it prevents encoding block fields after a group or variable length field has been encoded. Further work: - Decode path - More tests (including tests for decoding old messages where some fields may not be present)

Use `-Dsbe.print.state.machine=$NAME_OF_MSG`.

This change builds upon the work to check field encoding order. It is a slightly more-complex problem than with encoding, as when decoding it is necessary to deal with multiple versions of encoded data. To handle this, we fuse together the state machines for decoding each possible version and branch on "decoder.wrap(...)". Further work: - More tests - Better error messages

"New" decoders (in Java at least) read null-like values for fields that are not present in "old" messages. There are no restrictions on the order in which these "new" fields are accessed when decoding "old" data. In Java, a version check returns a null-like value when a field is not present in the acting version. This behaviour matches the wiki documentation. C#, currently, has different behaviour: the application code needs to interrogate a boolean property, e.g., `IsFooInActingVersion`, before accessing the data property, e.g., `Foo`. The property itself assumes that the field is present. It will attempt to read it from the buffer and possibly advance a `limit` field. In this commit, I have moved the state transition beneath the existing version check in Java decoders. This change allows these fields to be accessed when decoding "old" messages without changing the state of decoding, which seems appropriate as there are no side-effects on internal state, e.g., the `limit` field. Also, in this commit, I have expanded the test coverage to include the getter and setter variants for fixed-size arrays and variable-length data.

Also, use an enum to represent codec states rather than a state number and generate (dot-based) state machine diagram in Javadoc for states enum.

Also, fix some spots where I had forgotten to move the state transition below the version check in decoders.

There is still work to do in this area. In particular: 1. States should be more descriptive. 1. Errors should contain full field paths. 2. Group error messages should be more clear, e.g., "cannot write group element before calling next()".

…ely, in some cases where it is not). This commit makes the field order checks more-lenient. The aim is to allow the application code to access fields in the root/top-level block or blocks inside groups whenever it is safe. However, the approach is broken and does not work in general. To (try to) achieve this aim, we rely on the fact that once a block is positioned correctly, it is safe to access during the remainder of encoding/decoding. For example, the following code is safe: ``` encoder.wrap(...); encoder.blockField1(42); encoder.varData("abc"); encoder.blockField2(43); ``` As soon as `wrap(...)` is called, the top-level block will be positioned over a valid region for the remainder of encoding. Groups are slightly trickier. The following code is _safe_: ``` encoder.wrap(...); Foo.BarEncoder bar = encoder.groupBarCount(2); bar.next() .x(1).y(2) .next(); .x(3).y(4); ``` But the following code is _unsafe_: ``` encoder.wrap(...); Foo.BarEncoder bar = encoder.groupBarCount(2); bar.x(1).y(2) .next(); .x(3).y(4); ``` And the following code is _unsafe_ too: ``` encoder.wrap(...) Foo.BarEncoder bar = encoder.groupBarCount(0); encoder.baz("abc"); bar.x(1).y(2); ``` The `offset` for the group encoder is only set when `next()` is called. Therefore, calling `x(1)` and `y(2)` will likely overwrite some data elsewhere in the buffer (probably near offset 0). We can still rely on the fact that once `next()` has been called - even if it is called multiple times, or too many times (causing it to throw) - the group's block fields will be safe to encode/decode. Thinking about this in terms of our state machine that models codec state and field accesses, we can put this another way. There is at least one transition that represents the `next()` method call. The states (transitively) reachable from these transitions can safely access the associated group block. As already mentioned, despite the passing tests, the approach in this commit doesn't always work. I had thought it would be possible to assign a number to each state as we find them using DFS and use this number as a shortcut to determining whether a block access was safe. For example, in each block field we would check: ``` if (currentState.number() >= firstStateAfterNextIsCalled.number()) { // ALLOW } else { // DENY } ``` However, (even ignoring the cycles due to repeating groups and the use of DFS) this solution makes the following assumptions: 1. There is a single state: `firstStateAfterNextIsCalled`, such that all subsequent transitively reachable states are safe. 2. It is possible to number the states in such a way that `numberOfState(b) >= numberOfState(a)` implies `b` is (transitively) reachable from `a`. Let's consider the following psuedo-schema w.r.t. our assumptions above: ``` group a { x: int32 } group b { y: int32 } ``` Can we set `x` if we've called `BEncoder#next`? Sometimes. The following code is invalid. ``` AEncoder a = encoder.aCount(0); BEncoder b = encoder.bCount(1).next(); b.y(2); a.x(1); // Whoops! We overwrote some data somewhere. ``` But this code is valid. ``` AEncoder a = encoder.aCount(1).next(); BEncoder b = encoder.bCount(1).next(); b.y(2); a.x(1); ``` Therefore, multiple transitions must exist for the `BEncoder#next` call with different exit states, which breaks Assumption 1. Assumption 2 is also problematic. As alluded to above, we must have a bifurcation of states depending on whether a group has elements or is empty. With arbitrary bifurcations, it is not possible to number the states in such a way that Assumption 2 holds. Consider the following state diagram: ``` state N / \ empty non-empty / \ state X state Y ``` Depending on whether a group is empty, e.g., `aCount(0)`, or not, e.g., `aCount(1)`, we transition to either `state X` or `state Y` from `state N`. For Assumption 2 to hold, we need `numberOf(state X) >= numberOf(state N)` (reachable) and `numberOf(state Y) >= numberOf(state N)` (reachable) and `numberOf(state X) < numberOf(state Y)` (unreachable) and `numberOf(state Y) < numberOf(state X)` (unreachable). Clearly, this is impossible. Prior to this commit, we avoided such problems by enforcing a more-strict ordering of field accesses. With this more-strict ordering, setting fields in a group block in later states was not allowed. Therefore, the state machine didn't have to have to split into mutually exclusive subgraphs of reachable states depending on whether or not the group is empty. For example, let's consider a psuedo-schema with two groups. ``` group a { x: int32 } group b { y: int32 } ``` Once the application code calls `encoder.bCount(2).next()`, our state machine will enter the `V0_B_N_BLOCK` state regardless of whether previously the application code called `encoder.aCount(0)` or `encoder.aCount(1).next()`. In `V_B_N_BLOCK` the application code is not allowed to call `AEncoder#x`; therefore, it is safe, i.e., it has no false positives. However, it is also too-strict, i.e., it has false negatives, e.g., the `encoder.aCount(1).next()` case. How likely are these false negatives to come up in the "wild"? If we think false negatives will often come up, is it possible to remove them whilst still using a state machine? We could consider generating more states to represent the reachable states under each fork of group emptiness. The number of states would grow exponentially w.r.t. the number of top-level groups, but let's ignore that for the moment and just consider whether or not it is possible. Continuing the example above with two groups, when the application code calls `encoder.bCount(2).next()`, the state it enters would depend on whether it had previously called `encoder.aCount(0)` or `encoder.aCount(n)...next()` for some value `n > 0`. To safely model this example, we'd need at least the following states: 1. has no `a` elements 2. has no `a` elements, and has no `b` elements 3. has `a` elements, but has not called `AEncoder#next` 4. has `a` elements, and has called `AEncoder#next` 5. has `a` elements, has called `AEncoder#next`, has `b` elements, but has not called `BEncoder#next` 6. has `a` elements, has called `AEncoder#next`, has `b` elements, and has called `BEncoder#next` 7. has no `a` elements, has `b` elements, but has not called `BEncoder#next` 8. has no `a` elements, has `b` elements, and has called `BEncoder#next` A trickier case is with nested groups: ``` group foo { x: int32 group bar { y: int32 } r: string s: string } ``` When is `Bar#y` safe to call? I believe it is safe to call in states where `FooEncoder#next` and `BarEncoder#next` have been called in-order more recently than `BarEncoder#count(0)`. Note, as `bar` is a group within `foo`, the `BarEncoder#next` and `BarEncoder#count` methods may be called repeatedly during the encoding of a single message. Implementing a more-permissive safety check, using this model, would entail checking that the current state of the encoder/decoder is in a state where this condition holds. To safely model this example, we'd need (at least) the following states: 1. has no `foo` elements 2. has `foo` elements, but has not called `FooEncoder#next` 3. has `foo` elements, and has called `FooEncoder#next` 4. has `foo` elements, has called `FooEncoder#next`, and has no `bar` elements 5. has `foo` elements, has called `FooEncoder#next`, has no `bar` elements, and has encoded `r` 6. has `foo` elements, has called `FooEncoder#next`, has no `bar` elements, and has encoded `s` 7. has `foo` elements, has called `FooEncoder#next`, has `bar` elements, but has not called `BarEncoder#next` 8. has `foo` elements, has called `FooEncoder#next`, has `bar` elements, has not called `BarEncoder#next`, and has encoded `r` 9. has `foo` elements, has called `FooEncoder#next`, has `bar` elements, has not called `BarEncoder#next`, and has encoded `s` 10. has `foo` elements, has called `FooEncoder#next`, has `bar` elements, and has called `BarEncoder#next` 11. has `foo` elements, has called `FooEncoder#next`, has `bar` elements, and has called `BarEncoder#next`, and has encoded `r` 12. has `foo` elements, has called `FooEncoder#next`, has `bar` elements, and has called `BarEncoder#next`, and has encoded `s` ``` FooEncoder foo = encoder.fooCount(3); // State 2 foo.next(); // State 3 foo.barCount(0); // State 4 foo.r("abc"); // State 5 foo.s("def"); // State 6 foo.next(); // State 4 BarEncoder bar = foo.barCount(1); // State 7 foo.r("abc"); // State 8 bar.next(); // State 11 bar.y(42); // State 11 foo.s("def"); // State 12 foo.next(); // State X? bar.y(43); // Weird but allowed. foo.barCount(0); // State Y? State X cannot be State 10, as State 10 should not allow _re-encoding_ of the bar group count ``` OK. We didn't have enough states. In the case where we "wrap back around", we need to model whether `bar` block access is allowed (State 13 vs State 3): 13. has `foo` elements, has called `FooEncoder#next`, has called `BarEncoder#next` for last iteration, but hasn't set `bar` group size in this iteration. An even trickier case is with doubly-nested groups: ``` group foo { x: int32 group bar { y: int32 group baz { z: int32 } } } ``` When is `Baz#z` safe to call? I believe it is safe to call in states where `FooEncoder#next`, `BarEncoder#next` and `BazEncoder#next` have been called in-order more recently than `BarEncoder#count(0)` or `BazEncoder#count(0). Encoding this in states is likely to result in more cases like State 13 above. Generalising: a group block field may be accessed only when all outer groups' and its own `#next` methods have been called in-order more recently than any outer groups' or its own `#count` method with a `0` argument. Can model "more-recently" exactly, i.e., with no false positives and no false negatives, using a state machine? Another way of looking at the state machines we generate, is as parsers for a language. Where our language, based on an SBE schema, comprises valid "strings" of calls to an encoder/decoder. Is this language a regular language? If not, representing it using a FSM will not work. Cue Pumping Lemma? Next steps: - Add more tests to show where this approach breaks - Revert the changes and see if it is acceptable that some valid uses of codecs will be prevented when enabling field order checking. I.e., it will give false negatives when testing the hypothesis "this encoding/decoding is valid". - Personally, I think false negatives are better than false positives. (False positives are possible as of this commit.) Having said that, it will discourage use of these checks in existing systems where valid code already exists that does not pass the checks.

(When testing the hypothesis "this decoding/encoding code is valid".) These tests show that there are problems with the approach in the last commit.

…fortunately, in some cases where it is not)." This reverts commit d9eb33d.

To reduce the number of false negatives the field access order checker generates, in this commit, we treat the top-level block fields differently. We know that once the top-level encoder is wrapped, these fields are always safe to access. As a result, we were able to enable several tests that previously failed.

Introduces the system property: ``` sbe.enable.access.order.checks ```

Some JIT compilers are able to eliminate dead code within branches of if-statements where the conditional is a static expression. Some JIT compilers perform inlining to some degree and use heuristics, e.g., method size. It isn't clear whether code size metrics will be calculated after dead code elimination is performed. In this commit, I've moved the code that performs state transitions when access order checks are enabled into separate methods that the encoders/decoders call upon field accesses, e.g., the getter for a field `xyz` might look like this: ``` int xyz() { if (ENABLED_ACCESS_ORDER_CHECKS) { onXyzAccessed(); } // ... } ``` This keeps the initial size of `xyz` smaller than when the code of `onXyzAccessed()` is "inlined" by the code generator. There is a lot of noise on my ThinkPad P14s, but benchmarks do show improvements after this change. Before: ``` $ java -jar -Dagrona.disable.bounds.checks=true -Dsbe.enable.access.order.checks=false ./sbe-benchmarks/build/libs/sbe-benchmarks.jar ".*Car.*" -bm thrpt -f 2 -i 3 -wi 2 \# JMH version: 1.36 \# VM version: JDK 1.8.0_302, OpenJDK 64-Bit Server VM, 25.302-b08 \# VM invoker: /home/zach/.asdf/installs/java/zulu-8.56.0.21/jre/bin/java \# VM options: -Dagrona.disable.bounds.checks=true -Dsbe.enable.access.order.checks=false ... Benchmark Mode Cnt Score Error Units CarBenchmark.testDecode thrpt 6 18051405.792 ± 444296.898 ops/s CarBenchmark.testEncode thrpt 6 11964827.418 ± 221996.747 ops/s ``` After: ``` $ java -jar -Dagrona.disable.bounds.checks=true -Dsbe.enable.access.order.checks=false ./sbe-benchmarks/build/libs/sbe-benchmarks.jar ".*Car.*" -bm thrpt -f 2 -i 3 -wi 2 \# JMH version: 1.36 \# VM version: JDK 1.8.0_302, OpenJDK 64-Bit Server VM, 25.302-b08 \# VM invoker: /home/zach/.asdf/installs/java/zulu-8.56.0.21/jre/bin/java \# VM options: -Dagrona.disable.bounds.checks=true -Dsbe.enable.access.order.checks=false ... Benchmark Mode Cnt Score Error Units CarBenchmark.testDecode thrpt 6 18132922.415 ± 1964596.044 ops/s CarBenchmark.testEncode thrpt 6 20018131.180 ± 671367.070 ops/s ``` Baseline (d2ec8f8): ``` $ java -jar -Dagrona.disable.bounds.checks=true -Dsbe.enable.access.order.checks=false ./sbe-benchmarks/build/libs/sbe-benchmarks.jar ".*Car.*" -bm thrpt -f 2 -i 3 -wi 2 \# JMH version: 1.36 \# VM version: JDK 1.8.0_302, OpenJDK 64-Bit Server VM, 25.302-b08 \# VM invoker: /home/zach/.asdf/installs/java/zulu-8.56.0.21/jre/bin/java \# VM options: -Dagrona.disable.bounds.checks=true -Dsbe.enable.access.order.checks=false ... Benchmark Mode Cnt Score Error Units CarBenchmark.testDecode thrpt 6 19270366.357 ± 2618336.027 ops/s CarBenchmark.testEncode thrpt 6 22519718.535 ± 794169.180 ops/s ```

Removes extraneous space in Java output when access order checks are disabled. Removes unnecessary access modifiers in C++ when access order checks are disabled.

kieranelby · 2023-10-02T22:09:08Z

Hi, really good to see the access checks being added, definitely think this will help eliminate some rather scary bugs!

I was a bit surprised that the checks weren't doing anything in C# until I realised despite generating them I had to preprocessor #define ENABLE_ACCESS_ORDER_CHECKS too which is a bit unusual, but I guess the performance rationale makes sense, and it is documented on the Safe-Flyweight-Usage page.

For what it's worth I tried out some obvious encoding blunders from C# (group after data, fewer items in group than I said, too many Next() calls, data fields in wrong order), and all got caught, the exception messages were pretty helpful 👍

ZachBray · 2023-10-04T23:20:56Z

Thanks for the feedback, @kieranelby. It was good to meet you all in 3D yesterday.

IIRC, the other adjustments to make to this PR - i.e., non-DTO-related requests - were:

To rename ENABLE_ACCESS_ORDER_CHECKS to avoid the homograph and to improve namespacing. Martin has suggested SBE_ENABLE_PRECEDENCE_CHECKS or SBE_ENABLE_SEQUENCING_CHECKS.
To consider using a symbol/macro/system_property scoped to each schema. Would a global "flag" also be useful? If we had both, which should take precedence?
To notify the user if generation is disabled but the runtime flag is enabled.

In situations where generation is - intentionally - partially enabled, i.e., across some but not all schemas, and we use a global flag, I worry this could be annoying. But perhaps that scenario is unlikely?
To consider only disabling specifiers, e.g., noexcept, based on the runtime flag rather than at code generation time. The only potential downside I can see here is that a method that interacts with a codec and is marked as noexcept would ~~compile~~ work when the runtime checks flag is disabled but ~~fail to compile when the flag is enabled, but that will probably be an easy fix for the consumer~~ throw a runtime exception when the flag is enabled (EDITED). I'll experiment with this further.

The aim of this change is to avoid naming collisions with other flags, particularly in financial domains where "order" is used frequently.

kieranelby · 2023-10-20T15:31:59Z

Hi @ZachBray sorry for the delay in getting back, one use case that occurred to us:

Some teams package up a client library for talking to their system.

They might want to include in the library two versions of the generated code: myawesomprotocol_unsafe (no checks) and myawesomeprotocol_checked (access check), but the schema (myawesomeprotocol) would be the same.

Maybe to support this it would be good to use the output namespace in the name of the property that controls whether checks are enabled?

Cheers, Kieran

ZachBray · 2023-10-23T10:59:42Z

Hi @kieranelby,

They might want to include in the library two versions of the generated code: myawesomprotocol_unsafe (no checks) and myawesomeprotocol_checked (access check), but the schema (myawesomeprotocol) would be the same.

They wouldn't generate once (with checks) and use flags to enable/disable at runtime?

Maybe to support this it would be good to use the output namespace in the name of the property that controls whether checks are enabled?

Do I understand correctly that you are suggesting the enabled/disabled flag should be based on a fully qualified schema name? If so, I think this aligns with Point 2. Would you be happy with only namespaced flags? I suspect it will be simpler than having namespaced flags and a global flag.

Happy to have a call to discuss, if it is easier.

Thanks, Zach.

kieranelby · 2023-10-23T11:04:20Z

Hi @ZachBray , I thought disabling/enabling at runtime only works in java - for C++ and C# it is build time? Yes agree perhaps the global flag not needed if we have per-namespace flags.

ZachBray · 2023-10-23T15:44:19Z

I thought disabling/enabling at runtime only works in java - for C++ and C# it is build time?

@kieranelby, sorry, yes, you're right.

Yes agree perhaps the global flag not needed if we have per-namespace flags.

I'll make the change.

mjpt777 · 2023-11-07T16:16:07Z

To rename ENABLE_ACCESS_ORDER_CHECKS to avoid the homograph and to improve namespacing. Martin has suggested SBE_ENABLE_PRECEDENCE_CHECKS or SBE_ENABLE_SEQUENCING_CHECKS.

I'm leaning to SBE_ENABLE_PRECEDENCE_CHECKS when looking at examples.

ZachBray · 2023-11-10T18:15:48Z

@mjpt777, I've already made the change to SBE_ENABLE_SEQUENCING_CHECKS, but happy to s/SBE_ENABLE_SEQUENCING_CHECKS/SBE_ENABLE_PRECEDENCE_CHECKS/.

IIRC, you had some reservations about using flags named per-schema, i.e., around @kieranelby's request, is that still the case?

mjpt777 · 2023-11-10T18:28:40Z

@mjpt777, I've already made the change to SBE_ENABLE_SEQUENCING_CHECKS, but happy to s/SBE_ENABLE_SEQUENCING_CHECKS/SBE_ENABLE_PRECEDENCE_CHECKS/.

Either is fine. I have a mild preference for "precedence" but happy to go with whatever you prefer.

IIRC, you had some reservations about using flags named per-schema, i.e., around @kieranelby's request, is that still the case?

I think I'm okay with this now. Per schema flags are fine.

Some users would like the ability to enable precedence checks on some schemas but not others. Initially, I considered generating flags or system properties with the schema namespace; however, this forces the more-complex scheme on all users. It also might get unwieldly with long namespaces. It also assumes a one-to-one correspondence between namespaces and schemas. Instead, I have introduced two new system properties that can be applied at code generation time: - `sbe.precedence.checks.flagName`, which controls the symbol/macro used to enable precedence checks at build time in the generated C#/C++ code. It defaults to `"SBE_ENABLE_PRECEDENCE_CHECKS"`. - `sbe.precedence.checks.propName`, which controls the property name used to enable precedence checks at runtime in the generated Java code. It defaults to `"sbe.enable.precedence.checks"`. Now, users can run SbeTool with different values for these system properties to generate code with different enablement flags. Note above that the defaults are: - `SBE_ENABLE_PRECEDENCE_CHECKS` rather than `SBE_ENABLE_SEQUENCING_CHECKS`, and - `sbe.enable.precedence.checks` rather than `sbe.enable.sequencing.checks`. These changes and a change to another system property: - `sbe.generate.sequencing.checks` -> `sbe.generate.precedence.checks` address some feedback from Martin on the associated PR real-logic#948.

ZachBray · 2023-11-13T17:15:32Z

@mjpt777 and @kieranelby, the per-schema precedence checking flags should now be addressed by this commit: 08367e0

Ultimately, I opted to give the user control over the flags generated by SbeTool, rather than embedding the namespace name automatically. Hopefully, this keeps it simple for most users. Please let me know if you have any reservations about the approach.

I have updated the docs to account for the change in my Wiki fork.

kieranelby · 2023-11-13T20:28:37Z

Hi @ZachBray

Yes that's a good idea to allow advanced use-cases to customise sbe.precedence.checks.flagName while still keeping it simple for the normal case 👍

Cheers,
Kieran

sbe-tool/src/main/java/uk/co/real_logic/sbe/generation/java/JavaGenerator.java

sbe-tool/src/main/java/uk/co/real_logic/sbe/generation/java/AccessOrderModel.java

sbe-tool/src/main/java/uk/co/real_logic/sbe/generation/java/JavaGenerator.java

sbe-tool/src/main/java/uk/co/real_logic/sbe/generation/java/AccessOrderModel.java

build.gradle

Also, addresses Mike's feedback around the names of those system properties. In this commit, I've introduced a factory-like class: `PrecedenceChecks`. It creates models for message field access when the checks are enabled. It also provides a convenient place to group together some properties affecting the generation of field precedence checks. I have also renamed `AccessOrderModel` to `FieldPrecedenceModel` in this commit, as it aligns better with the system property names.

Previously, we only gave the (quite ambiguous) name of the inner class in the error message. Now, we give a qualified name that includes the codec/encoder/decoder class name (for C++, C#, and Java).

In this commit, I've removed the `PrecedenceChecks#Configuration` class in favour of a static method within `TargetCodeGeneratorLoader`. Unfortunately, C# behaves differently to all the other code generators. To avoid duplicating the default configuration for precedence checks in the `CSharp` class, I've made the method publicly visible, which is a bit ugly. It has also introduced a cyclic dependency between the code generators, e.g., `JavaGenerator`, and `TargetCodeGeneratorLoader`, as in order to preserve the existing public signatures I have kept the old constructors around and supplied a default `PrecedenceChecks`.

Move defaults out of property evaluation and into context.

mikeb01 reviewed Jun 21, 2023

View reviewed changes

sbe-tool/src/test/java/uk/co/real_logic/sbe/FieldAccessOrderCheckTest.java Outdated Show resolved Hide resolved

mikeb01 reviewed Jun 21, 2023

View reviewed changes

sbe-tool/src/test/resources/field-order-check-schema.xml Show resolved Hide resolved

ZachBray changed the title ~~[Java] Support field access order checking~~ [Java,CSharp] Support field access order checking Jul 5, 2023

ZachBray force-pushed the feature/field-order-check branch 3 times, most recently from 7c1f56a to b3f809d Compare July 5, 2023 11:37

ZachBray force-pushed the feature/field-order-check branch from b3f809d to aa136ce Compare August 18, 2023 16:44

ZachBray changed the title ~~[Java,CSharp] Support field access order checking~~ [Java,C#,C++] Support field access order checking Aug 18, 2023

ZachBray force-pushed the feature/field-order-check branch 3 times, most recently from 0365f16 to d33e54b Compare August 22, 2023 15:15

ZachBray added 17 commits September 20, 2023 12:47

[Java] Support printing field order access state machine.

9f9de4a

Use `-Dsbe.print.state.machine=$NAME_OF_MSG`.

[Java] Generate better error messages.

60b4803

Also, use an enum to represent codec states rather than a state number and generate (dot-based) state machine diagram in Javadoc for states enum.

[Java] Test field order checks with fixed-length strings.

e06462a

Also, fix some spots where I had forgotten to move the state transition below the version check in decoders.

[Java] Test field checks do not affect toString.

3d05f78

[Java] Fix field order checks to work with decodedLength().

3b8d557

[Java] Fix field access order in tests and re-enable.

f879284

[Java] Add (disabled) failing tests for false positives.

3e77828

(When testing the hypothesis "this decoding/encoding code is valid".) These tests show that there are problems with the approach in the last commit.

Revert "[Java] Permit block field access whenever it is safe (and, un…

46bd3d3

…fortunately, in some cases where it is not)." This reverts commit d9eb33d.

[Java] Support turning on access checks independently of bounds checks.

d9621e6

Introduces the system property: ``` sbe.enable.access.order.checks ```

ZachBray force-pushed the feature/field-order-check branch from 5fb5a18 to cef3e1f Compare September 20, 2023 12:03

[Java, C++] Tidy up.

dab9ae2

Removes extraneous space in Java output when access order checks are disabled. Removes unnecessary access modifiers in C++ when access order checks are disabled.

ZachBray mentioned this pull request Oct 7, 2023

[C#] - Invalid generated code for string field. #955

Closed

[Java, C#, C++] Rename flags to enable field access order checking.

69d13b6

The aim of this change is to avoid naming collisions with other flags, particularly in financial domains where "order" is used frequently.

ZachBray force-pushed the feature/field-order-check branch from 558df22 to 69d13b6 Compare October 20, 2023 13:37

ZachBray force-pushed the feature/field-order-check branch from 11afc92 to 08367e0 Compare November 13, 2023 16:57

mikeb01 requested changes Nov 28, 2023

View reviewed changes

ZachBray added 2 commits December 4, 2023 18:20

Address Mike's feedback: provide more specific diagram location.

1efe4f2

Previously, we only gave the (quite ambiguous) name of the inner class in the error message. Now, we give a qualified name that includes the codec/encoder/decoder class name (for C++, C#, and Java).

ZachBray requested a review from mikeb01 December 4, 2023 18:54

mikeb01 approved these changes Dec 4, 2023

View reviewed changes

ZachBray added 3 commits December 5, 2023 09:44

Validate PrecedenceChecks#Context in conclude.

0eb1266

Avoid cyclic dependency between classes.

8796e9e

Move defaults out of property evaluation and into context.

ZachBray force-pushed the feature/field-order-check branch from a7733e1 to 8796e9e Compare December 6, 2023 11:17

ZachBray merged commit a259c7d into real-logic:master Dec 6, 2023
31 checks passed

ZachBray mentioned this pull request Dec 6, 2023

Runtime checks of variable-length calls #924

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Java,C#,C++] Support field access order checking #948

[Java,C#,C++] Support field access order checking #948

ZachBray commented Jun 21, 2023 •

edited

Loading

ZachBray commented Jun 22, 2023

ZachBray commented Jun 22, 2023

kieranelby commented Oct 2, 2023

ZachBray commented Oct 4, 2023 •

edited

Loading

kieranelby commented Oct 20, 2023

ZachBray commented Oct 23, 2023

kieranelby commented Oct 23, 2023

ZachBray commented Oct 23, 2023

mjpt777 commented Nov 7, 2023

ZachBray commented Nov 10, 2023

mjpt777 commented Nov 10, 2023

ZachBray commented Nov 13, 2023

kieranelby commented Nov 13, 2023

[Java,C#,C++] Support field access order checking #948

[Java,C#,C++] Support field access order checking #948

Conversation

ZachBray commented Jun 21, 2023 • edited Loading

The problem

The goal

This solution

New system properties

Changes in generated code

Car example

Java changes commentary

Error messages

Modelling notes:

Decoding different versions

Restrictions to block field access

Representing repeated groups

Questions:

Other changes

ZachBray commented Jun 22, 2023

ZachBray commented Jun 22, 2023

kieranelby commented Oct 2, 2023

ZachBray commented Oct 4, 2023 • edited Loading

kieranelby commented Oct 20, 2023

ZachBray commented Oct 23, 2023

kieranelby commented Oct 23, 2023

ZachBray commented Oct 23, 2023

mjpt777 commented Nov 7, 2023

ZachBray commented Nov 10, 2023

mjpt777 commented Nov 10, 2023

ZachBray commented Nov 13, 2023

kieranelby commented Nov 13, 2023

ZachBray commented Jun 21, 2023 •

edited

Loading

ZachBray commented Oct 4, 2023 •

edited

Loading