Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: server reliability improvements #136

Open
wants to merge 12 commits into
base: master
from

Conversation

@kincaidoneil
Copy link
Member

kincaidoneil commented Mar 14, 2020

Improvements:

  • Safely closing the server. Waits for all incoming ILP Prepare packets currently being handled to be returned + 100ms before disconnecting the plugin and resolving Server.close(). Also added a test that attempts to create this race condition.
  • Callback/custom logic before fulfilling packets. Enables a shouldFulfill callback function to be passed to the connection or server, which will be called last before fulfilling an incoming ILP Prepare with money. This enables applications to perform async operations to account for the incoming packet in their own system, and/or choose to fulfill or reject any individual packet. A couple minor questions:
    • I changed totalReceived so it no longer throws if there's an overflow, since that's the only case where the packet wouldn't actually be fulfilled if shouldFulfill resolved successfully. Is this behavior okay?
    • To uniquely track each packet, the connectionId (last segment of ILP address, including the connection tag) and sequence number are passed to the callback. Should the connection tag be passed to the callback instead?

Also: fixes vulnerabilities, and updates the logging so the same connection ID is used between the client & server, by using the token in the destination ILP address.

ensure end() doesn't resolve until all pending packets are processed and returned
- consumers may need to perform external accounting logic before a packet is fulfilled, or decide if any packet should be fulfilled or not
- add `shouldFulfill` callback to a connection that enables async operations
- if the callback resolves, the packet is fulfilled; if the callback rejects, the packet is rejected
- designed to guarantee the packet will be returned immediately after the callback resolves
src/connection.ts Outdated Show resolved Hide resolved
src/connection.ts Outdated Show resolved Hide resolved
@@ -1586,16 +1610,7 @@ export class Connection extends EventEmitter {
private bumpIdle (): void { this.lastActive = new Date() }

private addTotalReceived (value: Long): void {
const result = checkedAdd(this._totalReceived, value)
if (result.overflow) {

This comment has been minimized.

Copy link
@sharafian

sharafian Mar 16, 2020

Contributor

why is this check being removed? It was here because we're representing the numbers as Longs and STREAM technically could overflow that if enough is received on a connection

This comment has been minimized.

Copy link
@kincaidoneil

kincaidoneil Mar 16, 2020

Author Member

That's the problem. After shouldFulfill resolves and the packet is accounted for as fulfilled, if totalReceived overflows and throws, then the packet will be rejected.

And, even if I check that totalReceived doesn't overflow before calling shouldFulfill, other packets could be received in the interim that then cause it to overflow.

What's the best solution here? Should I add "holds" on the total received to prevent this?

This comment has been minimized.

Copy link
@sentientwaffle

sentientwaffle Mar 19, 2020

Contributor

After shouldFulfill resolves and the packet is accounted for as fulfilled, if totalReceived overflows and throws, then the packet will be rejected.

That sounds to me like the right behavior. Overflowing totalReceived is a rare edge case, but we still need to handle it.

Maybe the problem is that shouldFulfill has two jobs. The second -- handling fulfilled packets (as opposed to those that may be fulfilled) could be passed to by a separate callback or event. That is assuming it is a necessary feature -- I'm not clear why this is needed over "money" events.

This comment has been minimized.

Copy link
@kincaidoneil

kincaidoneil Mar 26, 2020

Author Member

I changed it back so it rejects the packet on the overflow.

I'm still not sure how I feel about this, because while it seems bad to credit the money in an external system, fulfill the packet, but incorrectly reflect that in totalReceived, it might be worse to credit the money in the external system, reject the packet, and correctly reflect totalReceived. (Since for this use case, the external system is probably the source of truth, rather than totalReceived).

That said -- it's definitely an edge case and a deployment should probably be configured to prevent fulfilling anywhere close to such a large amount.

src/server.ts Outdated Show resolved Hide resolved
src/connection.ts Outdated Show resolved Hide resolved
src/connection.ts Outdated Show resolved Hide resolved
test/connection.test.ts Show resolved Hide resolved
test/index.test.ts Show resolved Hide resolved
- wait 1s before fulfilling to test that server safely handles in-flight requests
- data handler throws error after plugin is disconnected
src/connection.ts Outdated Show resolved Hide resolved
test/index.test.ts Show resolved Hide resolved
updated flow:
1. stop handling new requests
2. wait for pending requests to complete
3. gracefully close the connection
src/connection.ts Outdated Show resolved Hide resolved
@@ -1586,16 +1610,7 @@ export class Connection extends EventEmitter {
private bumpIdle (): void { this.lastActive = new Date() }

private addTotalReceived (value: Long): void {
const result = checkedAdd(this._totalReceived, value)
if (result.overflow) {

This comment has been minimized.

Copy link
@sentientwaffle

sentientwaffle Mar 19, 2020

Contributor

After shouldFulfill resolves and the packet is accounted for as fulfilled, if totalReceived overflows and throws, then the packet will be rejected.

That sounds to me like the right behavior. Overflowing totalReceived is a rare edge case, but we still need to handle it.

Maybe the problem is that shouldFulfill has two jobs. The second -- handling fulfilled packets (as opposed to those that may be fulfilled) could be passed to by a separate callback or event. That is assuming it is a necessary feature -- I'm not clear why this is needed over "money" events.

@@ -59,7 +59,8 @@ export class ServerConnectionPool {
const conn = await Connection.build({
...this.connectionOpts,
sharedSecret,
connectionTag
connectionTag,
connectionId: id

This comment has been minimized.

Copy link
@wilsonianb

wilsonianb Mar 24, 2020

If this is for logging, what do you think about only using the token minus connection tag (and receipt nonce & secret) as the connectionId?

This comment has been minimized.

Copy link
@kincaidoneil

kincaidoneil Mar 26, 2020

Author Member

Changed this to only use the first 8 characters of the connection ID so the logs don't get muddled

@@ -87,9 +102,27 @@ export class Server extends EventEmitter {
* End all connections and disconnect the plugin
*/
async close (): Promise<void> {
// Stop handling new requests, and return T99 while the connection is closing.
// If an F02 unreachable was returned on new packets: clients would immediately destroy the connection
// If an F99 was returned on on new packets: clients would retry with no backoff

This comment has been minimized.

Copy link
@wilsonianb

wilsonianb Mar 24, 2020

Suggested change
// If an F99 was returned on on new packets: clients would retry with no backoff
// If an F99 was returned on new packets: clients would retry with no backoff

This comment has been minimized.

Copy link
@wilsonianb

wilsonianb Mar 24, 2020

T99 makes sense to me, but I'm unclear why clients would retry with F99.
ILP spec says:

Final errors indicate that the payment is invalid and should not be retried unless the details are changed.

This comment has been minimized.

Copy link
@kincaidoneil

kincaidoneil Mar 25, 2020

Author Member

All STREAM sender implementations retry on F99 and F08 errors, but no other final errors. F99s might be returned e.g. if the packet didn't meet the exchange rate or it was unfulfillable

@codecov-io

This comment has been minimized.

Copy link

codecov-io commented Mar 26, 2020

Codecov Report

Merging #136 into master will decrease coverage by 2.80%.
The diff coverage is 83.33%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #136      +/-   ##
==========================================
- Coverage   91.26%   88.46%   -2.81%     
==========================================
  Files          14       14              
  Lines        1500     1725     +225     
  Branches      108      269     +161     
==========================================
+ Hits         1369     1526     +157     
- Misses        112      117       +5     
- Partials       19       82      +63     
Impacted Files Coverage Δ
src/index.ts 80.55% <ø> (-4.74%) ⬇️
src/pool.ts 100.00% <ø> (ø)
src/connection.ts 87.09% <66.66%> (-2.91%) ⬇️
src/server.ts 85.71% <100.00%> (-1.79%) ⬇️
src/util/rational.ts 75.00% <0.00%> (-6.64%) ⬇️
src/stream.ts 87.45% <0.00%> (-4.41%) ⬇️
src/util/congestion.ts 80.95% <0.00%> (-4.05%) ⬇️
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d2c6b14...7cfa8be. Read the comment docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

5 participants
You can’t perform that action at this time.