Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fuzzing and hardening #693

Merged
merged 27 commits into from
Apr 6, 2021
Merged

Fuzzing and hardening #693

merged 27 commits into from
Apr 6, 2021

Conversation

koivunej
Copy link
Collaborator

@koivunej koivunej commented Mar 19, 2021

This builds upon #690. Some found fixes have already landed with #695. It adds on top of any fixes in earlier already:

  • feature strict to interledger-packet for accepting only varuints which have no leading zeroes
  • use of that interledger-packet/strict in btp, stream at least
  • var_octet_bytes can now only be usize wide, which should probably be specced lower to match reality
    • handling u64::MAX length payloads would really need a streaming approach
  • the u64::MAX case became apparent in ccp which preallocated the amount of items thus in fuzzing created huge allocations; now ccp and others mostly Vec::new which does have a chance of perf hit
  • fuzzing in 4 crates

interledger-packet/strict is useful for roundtrip fuzzing where input is first parsed and then converted back into bytes. When interledger-packet/strict is not enabled any additional leading zeros are accepted for varuints. Accepting them but not outputting them when converting the packet back to bytes creates a lot of false panics when fuzzing. strict shouldn't be enabled in any production usage because it did look like the java implementation for example didn't have such strict checks on parsed BigIntegers.

Most of this work came out of #680 where at updating interledger-stream I ran into the bad length check. Fixing the length check right away failed a lot of existing test cases, which got me to do differential fuzzing between two versions. Differential fuzzing of course runs into all of the previously found issues with roundtrip fuzzing. I will probably drop the differential fuzzing before making this ready for review. Dropped already.

Per crate fuzzing targets and status:

  • packet: address is good, packet roundtrip added late, immediate roundtrip failure(s), #[ignore] tests
  • btp: most fuzzed, looking quite good
  • stream: roundtrip added late, immediate roundtrip failure(s), #[ignore] tests
  • ccp: roundtripping, might be good

85% of added lines are caused by 4 added Cargo.lock files.

@koivunej
Copy link
Collaborator Author

Build failed with the latest checkpoint-y commit but that is ok as work continues. I've removed the btp "differential fuzzing" in 68260fe. The timestamp parsing was found quite lenient while fuzzing the btp, so that'll be the next thing but I'm ready to call this good to go with the ccp huge preallocations fixed. I'll probably complete this work by creating a few issues on:

  • more fuzzing
  • fixing the preallocation somehow in ccp and others which currently call Vec::new()

@koivunej koivunej marked this pull request as ready for review March 26, 2021 12:42
@koivunej koivunej requested a review from pradovic March 26, 2021 12:42
@koivunej
Copy link
Collaborator Author

Hardening doesn't seem to great with test-md failures. Lets see if I can decipher these.

@koivunej
Copy link
Collaborator Author

koivunej commented Mar 26, 2021

It would appear that the "xrp-settlement" failed. From the artifact it'd appear that everything went smooth for alice but in node-bob-settlement-engine.log:

$ less node-bob-settlement-engine.log
2021-01-29T09:58:02.213Z settlement-xrp Generated new XRP testnet account: address=rnGBkZRibgCZ6p3rVHHNVj2VmudXe3nedk secret=snscPxz9TQoDXMSN3fBekxaMk1Xt8
2021-01-29T09:58:02.723Z settlement-core Started settlement engine server
2021-01-29T09:58:12.944Z settlement-xrp Received incoming XRP payment: xrp=0.0005 account=5f3024b6-38d7-46aa-b32a-79cda329e30a txHash=19A0316F6C859266EDE0E2E4981A2AFEEB3DEE30622FF75EBF8F5AB4E1464182
2021-01-29T09:58:12.947Z settlement-core Notifying connector to credit settlement: amountToCredit=0.0005 account=5f3024b6-38d7-46aa-b32a-79cda329e30a settlementId=19A0316F6C859266EDE0E2E4981A2AFEEB3DEE30622FF75EBF8F5AB4E1464182
2021-01-29T09:58:12.955Z settlement-core Error: Connector failed to process settlement: amountToCredit=0.0005 account=5f3024b6-38d7-46aa-b32a-79cda329e30a settlementId=19A0316F6C859266EDE0E2E4981A2AFEEB3DEE30622FF75EBF8F5AB4E1464182
2021-01-29T09:58:12.955Z settlement-core Connector credited incoming settlement: leftover=0.0005 credited=0 amountToCredit=0.0005 account=5f3024b6-38d7-46aa-b32a-79cda329e30a settlementId=19A0316F6C859266EDE0E2E4981A2AFEEB3DEE30622FF75EBF8F5AB4E1464182
[ioredis] Unhandled error event: Error: connect ECONNREFUSED 127.0.0.1:6382
    at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1141:16)
[ioredis] Unhandled error event: Error: connect ECONNREFUSED 127.0.0.1:6382
    at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1141:16)
[ioredis] Unhandled error event: Error: connect ECONNREFUSED 127.0.0.1:6382
    at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1141:16)
  • 127.0.0.1:6382 is the redis assigned to bob's settlement engine
    • in scripts the redises are shut down first so it explains these ECONNREFUSED

The whole execution log from the log failure:

Testing "xrp-settlement" on source mode. [4/4]
rm: cannot remove '/home/runner/.interledger/bin/*': No such file or directory
Stopping Interledger nodes...
Building interledger.rs... (This may take a couple of minutes)
    Finished dev [unoptimized + debuginfo] target(s) in 0.18s

Starting Redis instances...done
OK
OK
OK
OK

Starting settlement engines...


Starting Interledger nodes...


Failed to spin up settlement engine. Check out your configuration and log files.
Error running markdown file: ./README.md (parsed bash script /tmp/tmp.IqLUsVs1xo)
Waiting for Interledger.rs nodes to start up..............Some tests failed. [3/4]

I do wonder how come the line preceding Waiting for Interledger.rs nodes to start up is after Failed to spin up settlement engine. Also the failed to spin up settlement engine comes right before exit 1. Cannot see any subshell magic in the lib or run-md-test.sh. Maybe the parser? Nope. The parser seems to just leave the necessary parts in and it is executed with bash -O expand_aliases "$TMP_SCRIPT" where $TMP_SCRIPT is tempfile.

Could be that the formatting issues are because of the strange output where \n is used to prefix messages? Itching to get rid of these and just echo and variable substition these.

Need to get these running locally. I guess I have to spin up a vm. The scripts don't really look like they cleanup after themselves... So this could even be interference from previous test but probably not as these have been executing well.

@koivunej

This comment has been minimized.

@koivunej
Copy link
Collaborator Author

koivunej commented Mar 26, 2021

Ok I guess the test is just flaky? Good three times (2 here, 1 in first step of bisection) in row, next one will be the third. Separating the CI stuff to a new PR.

Comment on lines +10 to +13
[features]
# used when fuzzing; accepts only roundtripping input
strict = []

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After debugging some docker builds today it occured to me that those builds are done with --all-features. Need to check if that applies to all crates in the workspace, or does building ilp-node with --all-features just toggle on features of the ilp-node.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not enabled with the default binary build at least in rust 1.51, did not check others.

Enabling this would require adding strict feature to ilp-node, which would need to enable interledger/strict, strict for interledger, which would again need to enable interledger-packet/strict and others.

@koivunej
Copy link
Collaborator Author

Did the hopefully the final rebasing, made sure that the strict is not enabled for the default cargo build -p ilp-node --all-features builds -- enabling it would be in the spriti of OER notes (rfc 0030) but if someone is trying it against other implementations those might not be compatible.

Copy link
Collaborator

@pradovic pradovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM mod 2 simple comments/questions

crates/interledger-btp/src/packet.rs Show resolved Hide resolved
crates/interledger-stream/src/lib.rs Outdated Show resolved Hide resolved
@koivunej
Copy link
Collaborator Author

koivunej commented Apr 2, 2021

test failure which #701 would fix. Lets see what's the test-md failure. It was the usual (see #693 (comment)), now we should have packet capture. But the failure doesn't repeat today...

with fuzzing the roundtripping of parsed packets is the only tool at the
moment. for strict roundtripping we need to deny some varuint patterns
with leading zeroes which might be actually needed for the protocol.

Signed-off-by: Joonas Koivunen <joonas.koivunen@gmail.com>
@koivunej
Copy link
Collaborator Author

koivunej commented Apr 6, 2021

I'll merge this today evening or tomorrow after re-running it to see if there's some time-specific issue which could reproduce.

target, should never panic:

* packet: Packet::try_from
* address: Address::try_from

originally tried out fuzzing between the versions when doing the bytes05
upgrade, but as that is quite troublesome to keep up to date I've
removed it.

Signed-off-by: Joonas Koivunen <joonas.koivunen@gmail.com>
Signed-off-by: Joonas Koivunen <joonas.koivunen@gmail.com>
Signed-off-by: Joonas Koivunen <joonas.koivunen@gmail.com>
since BtpPacket is not public, we add a new #[cfg(fuzzing)] pub mod with
our required utilities.

* roundtrip_btppacket: checks that input matches output

also introduces a "strict" feature for roundtrip fuzzing and more test
cases.

Signed-off-by: Joonas Koivunen <joonas.koivunen@gmail.com>
roundtrip fuzzing as in if something is parsed correctly, expect it to
produce the same bytes when converted back to bytes.

Signed-off-by: Joonas Koivunen <joonas.koivunen@gmail.com>
Signed-off-by: Joonas Koivunen <joonas.koivunen@gmail.com>
Signed-off-by: Joonas Koivunen <joonas.koivunen@gmail.com>
koivunej and others added 19 commits April 6, 2021 11:29
Signed-off-by: Joonas Koivunen <joonas.koivunen@gmail.com>
Signed-off-by: Joonas Koivunen <joonas.koivunen@gmail.com>
Signed-off-by: Joonas Koivunen <joonas.koivunen@gmail.com>
not yet fixed, a bit more like a chrono issue.

Signed-off-by: Joonas Koivunen <joonas.koivunen@gmail.com>
it only supports the lengths as in the oer rfc. reading a shorter
version will output the same shorter version back, which is helpful with
roundtrip testing.

the type itself is only ever handled by logging so this wasn't critical
for anything else than fuzzing.

Co-authored-by: Joonas Koivunen <joonas.koivunen@gmail.com>
Signed-off-by: Joonas Koivunen <joonas.koivunen@gmail.com>
Signed-off-by: Joonas Koivunen <joonas.koivunen@gmail.com>
Signed-off-by: Joonas Koivunen <joonas.koivunen@gmail.com>
asserting a boolean before unwrapping is wasteful, and hides the actual
case.

Signed-off-by: Joonas Koivunen <joonas.koivunen@gmail.com>
Signed-off-by: Joonas Koivunen <joonas.koivunen@gmail.com>
Signed-off-by: Joonas Koivunen <joonas.koivunen@gmail.com>
this is discussed more in the comments.

Signed-off-by: Joonas Koivunen <joonas.koivunen@gmail.com>
turns out there still are issues.

Signed-off-by: Joonas Koivunen <joonas.koivunen@gmail.com>
Signed-off-by: Joonas Koivunen <joonas.koivunen@gmail.com>
Signed-off-by: Joonas Koivunen <joonas.koivunen@gmail.com>
Signed-off-by: Joonas Koivunen <joonas.koivunen@gmail.com>
Signed-off-by: Joonas Koivunen <joonas.koivunen@gmail.com>
this is moved from a FIXME in fuzzing target, will be put up for
good-first-issue once PR is merged.

Signed-off-by: Joonas Koivunen <joonas.koivunen@gmail.com>
Signed-off-by: Joonas Koivunen <joonas.koivunen@gmail.com>
cannot see any point in storing the large files; a high churn rate is
expected since they are for libraries.

Signed-off-by: Joonas Koivunen <joonas.koivunen@gmail.com>
@koivunej
Copy link
Collaborator Author

koivunej commented Apr 6, 2021

Dropped the Cargo.lock files in the latest push, I cannot see any point in keeping them.

@koivunej
Copy link
Collaborator Author

koivunej commented Apr 6, 2021

Test failure is "normal flaky", lets hope for a test-md failure instead. "Normal flaky" error was:

thread 'three_nodes::three_nodes' panicked at 'error binding to 127.0.0.1:40675: error creating server listener: Address already in use (os error 98)', /home/runner/.cargo/registry/src/github.com-1ecc6299db9ec823/warp-0.2.5/src/server.rs:213:27

This is because the tests do not use :0 but instead hope racing to listen and close a port. This failure happens surprisingly rarely.

@koivunej
Copy link
Collaborator Author

koivunej commented Apr 6, 2021

Now two passes after the "normal flaky." Lets try a third.

@koivunej
Copy link
Collaborator Author

koivunej commented Apr 6, 2021

Issue has been created, seems that we'll need to find another PR to trigger the occasional test-md failure.

@koivunej koivunej merged commit 285f423 into interledger:master Apr 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants