Fuzzing and hardening #693

koivunej · 2021-03-19T17:00:45Z

This builds upon #690. Some found fixes have already landed with #695. It adds on top of any fixes in earlier already:

feature strict to interledger-packet for accepting only varuints which have no leading zeroes
use of that interledger-packet/strict in btp, stream at least
var_octet_bytes can now only be usize wide, which should probably be specced lower to match reality
- handling u64::MAX length payloads would really need a streaming approach
the u64::MAX case became apparent in ccp which preallocated the amount of items thus in fuzzing created huge allocations; now ccp and others mostly Vec::new which does have a chance of perf hit
fuzzing in 4 crates

interledger-packet/strict is useful for roundtrip fuzzing where input is first parsed and then converted back into bytes. When interledger-packet/strict is not enabled any additional leading zeros are accepted for varuints. Accepting them but not outputting them when converting the packet back to bytes creates a lot of false panics when fuzzing. strict shouldn't be enabled in any production usage because it did look like the java implementation for example didn't have such strict checks on parsed BigIntegers.

Most of this work came out of #680 where at updating interledger-stream I ran into the bad length check. Fixing the length check right away failed a lot of existing test cases, which got me to do differential fuzzing between two versions. Differential fuzzing of course runs into all of the previously found issues with roundtrip fuzzing. ~~I will probably drop the differential fuzzing before making this ready for review.~~ Dropped already.

Per crate fuzzing targets and status:

packet: address is good, packet roundtrip added late, immediate roundtrip failure(s), #[ignore] tests
btp: most fuzzed, looking quite good
stream: roundtrip added late, immediate roundtrip failure(s), #[ignore] tests
ccp: roundtripping, might be good

~~85% of added lines are caused by 4 added Cargo.lock files.~~

koivunej · 2021-03-23T14:55:33Z

Build failed with the latest checkpoint-y commit but that is ok as work continues. I've removed the btp "differential fuzzing" in 68260fe. The timestamp parsing was found quite lenient while fuzzing the btp, so that'll be the next thing but I'm ready to call this good to go with the ccp huge preallocations fixed. I'll probably complete this work by creating a few issues on:

more fuzzing
fixing the preallocation somehow in ccp and others which currently call Vec::new()

koivunej · 2021-03-26T15:04:52Z

Hardening doesn't seem to great with test-md failures. Lets see if I can decipher these.

crates/interledger-btp/fuzz/Cargo.toml

crates/interledger-btp/src/packet.rs

crates/interledger-ccp/src/packet.rs

koivunej · 2021-03-26T16:18:30Z

It would appear that the "xrp-settlement" failed. From the artifact it'd appear that everything went smooth for alice but in node-bob-settlement-engine.log:

$ less node-bob-settlement-engine.log
2021-01-29T09:58:02.213Z settlement-xrp Generated new XRP testnet account: address=rnGBkZRibgCZ6p3rVHHNVj2VmudXe3nedk secret=snscPxz9TQoDXMSN3fBekxaMk1Xt8
2021-01-29T09:58:02.723Z settlement-core Started settlement engine server
2021-01-29T09:58:12.944Z settlement-xrp Received incoming XRP payment: xrp=0.0005 account=5f3024b6-38d7-46aa-b32a-79cda329e30a txHash=19A0316F6C859266EDE0E2E4981A2AFEEB3DEE30622FF75EBF8F5AB4E1464182
2021-01-29T09:58:12.947Z settlement-core Notifying connector to credit settlement: amountToCredit=0.0005 account=5f3024b6-38d7-46aa-b32a-79cda329e30a settlementId=19A0316F6C859266EDE0E2E4981A2AFEEB3DEE30622FF75EBF8F5AB4E1464182
2021-01-29T09:58:12.955Z settlement-core Error: Connector failed to process settlement: amountToCredit=0.0005 account=5f3024b6-38d7-46aa-b32a-79cda329e30a settlementId=19A0316F6C859266EDE0E2E4981A2AFEEB3DEE30622FF75EBF8F5AB4E1464182
2021-01-29T09:58:12.955Z settlement-core Connector credited incoming settlement: leftover=0.0005 credited=0 amountToCredit=0.0005 account=5f3024b6-38d7-46aa-b32a-79cda329e30a settlementId=19A0316F6C859266EDE0E2E4981A2AFEEB3DEE30622FF75EBF8F5AB4E1464182
[ioredis] Unhandled error event: Error: connect ECONNREFUSED 127.0.0.1:6382
    at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1141:16)
[ioredis] Unhandled error event: Error: connect ECONNREFUSED 127.0.0.1:6382
    at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1141:16)
[ioredis] Unhandled error event: Error: connect ECONNREFUSED 127.0.0.1:6382
    at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1141:16)

127.0.0.1:6382 is the redis assigned to bob's settlement engine
- in scripts the redises are shut down first so it explains these ECONNREFUSED

The whole execution log from the log failure:

Testing "xrp-settlement" on source mode. [4/4]
rm: cannot remove '/home/runner/.interledger/bin/*': No such file or directory
Stopping Interledger nodes...
Building interledger.rs... (This may take a couple of minutes)
    Finished dev [unoptimized + debuginfo] target(s) in 0.18s

Starting Redis instances...done
OK
OK
OK
OK

Starting settlement engines...


Starting Interledger nodes...


Failed to spin up settlement engine. Check out your configuration and log files.
Error running markdown file: ./README.md (parsed bash script /tmp/tmp.IqLUsVs1xo)
Waiting for Interledger.rs nodes to start up..............Some tests failed. [3/4]

I do wonder how come the line preceding Waiting for Interledger.rs nodes to start up is after Failed to spin up settlement engine. Also the failed to spin up settlement engine comes right before exit 1. Cannot see any subshell magic in the lib or run-md-test.sh. Maybe the parser? Nope. The parser seems to just leave the necessary parts in and it is executed with bash -O expand_aliases "$TMP_SCRIPT" where $TMP_SCRIPT is tempfile.

Could be that the formatting issues are because of the strange output where \n is used to prefix messages? Itching to get rid of these and just echo and variable substition these.

Need to get these running locally. I guess I have to spin up a vm. The scripts don't really look like they cleanup after themselves... So this could even be interference from previous test but probably not as these have been executing well.

koivunej · 2021-03-26T18:45:11Z

Ok I guess the test is just flaky? Good three times (2 here, 1 in first step of bisection) in row, next one will be the third. Separating the CI stuff to a new PR.

crates/interledger-btp/src/packet.rs

koivunej · 2021-03-29T16:43:27Z

crates/interledger-packet/Cargo.toml

+[features]
+# used when fuzzing; accepts only roundtripping input
+strict = []
+


After debugging some docker builds today it occured to me that those builds are done with --all-features. Need to check if that applies to all crates in the workspace, or does building ilp-node with --all-features just toggle on features of the ilp-node.

This is not enabled with the default binary build at least in rust 1.51, did not check others.

Enabling this would require adding strict feature to ilp-node, which would need to enable interledger/strict, strict for interledger, which would again need to enable interledger-packet/strict and others.

koivunej · 2021-03-30T09:43:08Z

Did the hopefully the final rebasing, made sure that the strict is not enabled for the default cargo build -p ilp-node --all-features builds -- enabling it would be in the spriti of OER notes (rfc 0030) but if someone is trying it against other implementations those might not be compatible.

pradovic

LGTM mod 2 simple comments/questions

crates/interledger-btp/src/packet.rs

crates/interledger-stream/src/lib.rs

koivunej · 2021-04-02T16:10:31Z

test failure which #701 would fix. Lets see what's the test-md failure. It was the usual (see #693 (comment)), now we should have packet capture. But the failure doesn't repeat today...

with fuzzing the roundtripping of parsed packets is the only tool at the moment. for strict roundtripping we need to deny some varuint patterns with leading zeroes which might be actually needed for the protocol. Signed-off-by: Joonas Koivunen <joonas.koivunen@gmail.com>

koivunej · 2021-04-06T07:36:06Z

I'll merge this today evening or tomorrow after re-running it to see if there's some time-specific issue which could reproduce.

target, should never panic: * packet: Packet::try_from * address: Address::try_from originally tried out fuzzing between the versions when doing the bytes05 upgrade, but as that is quite troublesome to keep up to date I've removed it. Signed-off-by: Joonas Koivunen <joonas.koivunen@gmail.com>