Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

grpc-js: Unref backoff timer in subchannel #2586

Conversation

murgatroid99
Copy link
Member

This was making the tests taking a long time to exit after finishing. Our general principle is that grpc should keep the process open if and only if there is a pending request or other surface API event (e.g. channel state change). If a call is not assigned to a connection, the channel has a timer that keeps the process open, and if the call is assigned to a connection, the corresponding transport keeps the connection open. Either way, the subchannel is never responsible for that, so it should unref any timers or other handles that it owns.

@murgatroid99 murgatroid99 merged commit 7e53472 into grpc:@grpc/grpc-js@1.9.x Sep 26, 2023
4 of 5 checks passed
@silentroach
Copy link

silentroach commented Dec 1, 2023

We found that in 1.9.4 with this unref if you write a script that makes a gRPC call and displays the output, the script might terminate (successfully) before the actual execution of the request, because it no longer waits for this timer.

❯ GRPC_VERBOSITY=debug GRPC_TRACE=all ts-node -T examples/suggest.ts
D 2023-12-01T10:54:56.473Z | index | Loading @grpc/grpc-js version 1.9.4
D 2023-12-01T10:54:56.736Z | resolving_load_balancer | dns:somedomain:443 IDLE -> IDLE
D 2023-12-01T10:54:56.736Z | connectivity_state | (1) dns:somedomain:443 IDLE -> IDLE
D 2023-12-01T10:54:56.736Z | dns_resolver | Resolver constructed for target dns:somedomain:443
D 2023-12-01T10:54:56.738Z | channel | (1) dns:somedomain:443 Channel constructed with options {
  "grpc.keepalive_time_ms": 30000,
  "grpc.keepalive_permit_without_calls": 1
}
D 2023-12-01T10:54:56.745Z | channel_stacktrace | (1) Channel constructed
    some stack trace
D 2023-12-01T10:54:56.746Z | channel | (1) dns:somedomain:443 createResolvingCall [0] method="/some.v1.Service/Suggestions", deadline=Infinity
D 2023-12-01T10:54:56.746Z | resolving_call | [0] Created
D 2023-12-01T10:54:56.746Z | resolving_call | [0] Deadline: Infinity
D 2023-12-01T10:54:56.747Z | resolving_call | [0] start called
D 2023-12-01T10:54:56.747Z | dns_resolver | Looking up DNS hostname somedomain
D 2023-12-01T10:54:56.749Z | resolving_load_balancer | dns:somedomain:443 IDLE -> CONNECTING
D 2023-12-01T10:54:56.749Z | connectivity_state | (1) dns:somedomain:443 IDLE -> CONNECTING
D 2023-12-01T10:54:56.749Z | channel | (1) dns:somedomain:443 callRefTimer.ref | configSelectionQueue.length=1 pickQueue.length=0
D 2023-12-01T10:54:56.749Z | resolving_call | [0] startRead called
D 2023-12-01T10:54:56.750Z | resolving_call | [0] write() called with message of length 62
D 2023-12-01T10:54:56.750Z | resolving_call | [0] halfClose called
D 2023-12-01T10:54:56.752Z | dns_resolver | Resolved addresses for target dns:somedomain:443: [some.ip.address:443]
D 2023-12-01T10:54:56.752Z | subchannel | (2) some.ip.address:443 Subchannel constructed with options {
  "grpc.keepalive_time_ms": 30000,
  "grpc.keepalive_permit_without_calls": 1
}
D 2023-12-01T10:54:56.752Z | subchannel_refcount | (2) some.ip.address:443 refcount 0 -> 1
D 2023-12-01T10:54:56.752Z | subchannel_refcount | (2) some.ip.address:443 refcount 1 -> 2
D 2023-12-01T10:54:56.752Z | pick_first | Start connecting to subchannel with address some.ip.address:443
D 2023-12-01T10:54:56.753Z | pick_first | IDLE -> CONNECTING
D 2023-12-01T10:54:56.753Z | resolving_load_balancer | dns:somedomain:443 CONNECTING -> CONNECTING
D 2023-12-01T10:54:56.753Z | channel | (1) dns:somedomain:443 callRefTimer.unref | configSelectionQueue.length=1 pickQueue.length=0
D 2023-12-01T10:54:56.753Z | connectivity_state | (1) dns:somedomain:443 CONNECTING -> CONNECTING
D 2023-12-01T10:54:56.753Z | subchannel | (2) some.ip.address:443 IDLE -> CONNECTING
D 2023-12-01T10:54:56.753Z | transport | dns:somedomain:443 creating HTTP/2 session to some.ip.address:443
D 2023-12-01T10:54:56.755Z | channel | (1) dns:somedomain:443 createRetryingCall [1] method="/some.v1.Service/Suggestions"
D 2023-12-01T10:54:56.755Z | resolving_call | [0] Created child [1]
D 2023-12-01T10:54:56.755Z | retrying_call | [1] start called
D 2023-12-01T10:54:56.755Z | channel | (1) dns:somedomain:443 createLoadBalancingCall [2] method="/some.v1.Service/Suggestions"
D 2023-12-01T10:54:56.755Z | retrying_call | [1] Created child call [2] for attempt 1
D 2023-12-01T10:54:56.755Z | load_balancing_call | [2] start called
D 2023-12-01T10:54:56.755Z | load_balancing_call | [2] Pick called
D 2023-12-01T10:54:56.755Z | load_balancing_call | [2] Pick result: QUEUE subchannel: null status: undefined undefined
D 2023-12-01T10:54:56.755Z | channel | (1) dns:somedomain:443 callRefTimer.ref | configSelectionQueue.length=0 pickQueue.length=1
D 2023-12-01T10:54:56.755Z | retrying_call | [1] startRead called
D 2023-12-01T10:54:56.755Z | load_balancing_call | [2] startRead called
D 2023-12-01T10:54:56.755Z | retrying_call | [1] write() called with message of length 67
D 2023-12-01T10:54:56.756Z | load_balancing_call | [2] write() called with message of length 67
D 2023-12-01T10:54:56.756Z | retrying_call | [1] halfClose called
D 2023-12-01T10:54:56.827Z | subchannel_refcount | (2) some.ip.address:443 refcount 2 -> 3
D 2023-12-01T10:54:56.827Z | subchannel_refcount | (2) some.ip.address:443 refcount 3 -> 2
D 2023-12-01T10:54:56.827Z | pick_first | CONNECTING -> CONNECTING
D 2023-12-01T10:54:56.827Z | resolving_load_balancer | dns:somedomain:443 CONNECTING -> CONNECTING
D 2023-12-01T10:54:56.827Z | channel | (1) dns:somedomain:443 callRefTimer.unref | configSelectionQueue.length=0 pickQueue.length=0
D 2023-12-01T10:54:56.827Z | load_balancing_call | [2] Pick called
D 2023-12-01T10:54:56.827Z | load_balancing_call | [2] Pick result: QUEUE subchannel: null status: undefined undefined
D 2023-12-01T10:54:56.827Z | channel | (1) dns:somedomain:443 callRefTimer.ref | configSelectionQueue.length=0 pickQueue.length=1
D 2023-12-01T10:54:56.827Z | connectivity_state | (1) dns:somedomain:443 CONNECTING -> CONNECTING
D 2023-12-01T10:54:56.827Z | channel | (1) dns:somedomain:443 callRefTimer.unref | configSelectionQueue.length=0 pickQueue.length=1

---
here process is finished

but if we start some timer manually before calling some method with something like

setTimeout(() => {
  console.log('timed out');
}, 5000);

then everything is ok because our process will wait for our timer

@murgatroid99
Copy link
Member Author

@silentroach Please test this with the latest version (1.9.12) and then if the problem persists, file an issue.

@silentroach
Copy link

Yep, the same problem in 1.9.12, will file an issue, thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants