grpc servers hanging with a "connection attempt timed out before receiving SETTINGS frame" error #36256

gibsondan · 2024-04-04T16:12:26Z

What version of gRPC and what language are you using?

1.62.1, python

What operating system (Linux, Windows,...) and version?

Linux

What runtime / compiler are you using (e.g. python version or version of gcc)

python 3.11

What did you do?

We run an project that is built on grpc and involves running a grpc server.

In the last few months, several different users have reported an issue that always has the same commonalities:

The grpc server is still running, but mysteriously stops serving any requests. All requests start failing with the following identical error message (note connection attempt timed out before receiving SETTINGS frame):

grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.UNAVAILABLE
	details = "failed to connect to all addresses; last error: UNKNOWN: ipv4:<redacted>: connection attempt timed out before receiving SETTINGS frame"
	debug_error_string = "UNKNOWN:Error received from peer  {created_time:"2024-03-22T18:43:43.405427415+00:00", grpc_status:14, grpc_message:"failed to connect to all addresses; last error: UNKNOWN: ipv4:<redacted>: connection attempt timed out before receiving SETTINGS frame"}"

The time that it takes before timing out is fairly consistently 20 seconds - I've seen some where it was always 7 seconds too, but when it's happening, its an identical timeout every time.

This is different than the error message I'm used to seeing when a gRPC server is totally inaccessible or is down (and the process is still running / the threads still appear to be ready to serve requests when inspected via py-spy). I'm more accustomed to an error message like this (Failed to connect to remote host: Connection refused):

<_InactiveRpcError of RPC that terminated with:
	status = StatusCode.UNAVAILABLE
	details = "failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:4001: Failed to connect to remote host: Connection refused"
	debug_error_string = "UNKNOWN:Error received from peer  {grpc_message:"failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:4001: Failed to connect to remote host: Connection refused", grpc_status:14, created_time:"2024-04-04T11:06:16.272108-05:00"}"
>

Unfortunately I do not have a simple or reliable repro for this, but i'm wondering if you all have any recommendations for additional debugging flags we could add or more information that would be helpful to get to the bottom of what might be going on here - or if this error message clearly indicates that we are hitting some timeout with a value that we could tune. Thanks in advance for any guidance you can provide.

What did you expect to see?

A running grpc server

What did you see instead?

A "hanging" grpc server that returns "connection attempt timed out before receiving SETTINGS frame"

Anything else we should know about your project / environment?

If it's helpful context, the way we initialize our grpc server is here:
The way we initialize our grpc server can be found here - we pass a ThreadPoolExecutor into a new grpc server object: https://github.com/dagster-io/dagster/blob/master/python_modules/dagster/dagster/_grpc/server.py#L1184-L1192

we create a ThreadPoolExecutor and pass it into a grpc.server object.

The text was updated successfully, but these errors were encountered:

gibsondan · 2024-04-04T21:12:03Z

We also ran py-spy dump and py-spy record on the process while it was in this bad state.

All the worker threads were idle (all in the middle doing networking things like posting to a requests session, which is not a surprise since our work is pretty I/O Bound)

Thread 70 (idle): "grpc-server-rpc-handler_5"
    create_connection (urllib3/util/connection.py:85)
    _new_conn (urllib3/connection.py:174)
    connect (urllib3/connection.py:363)
    _validate_conn (urllib3/connectionpool.py:1058)
    _make_request (urllib3/connectionpool.py:404)
    urlopen (urllib3/connectionpool.py:715)
    send (requests/adapters.py:504)
    send (requests/sessions.py:701)
    request (requests/sessions.py:587)
    post (requests/sessions.py:635)

and when we ran a py-spy record session it showed that no threads were progressing, they were all stuck in the exaxct same place for the duration of the record run (much longer than they should have before they timed out or at least did something). So it seems like some native thread may have grabbed the GIL and refused to give it up?

gibsondan · 2024-04-04T21:13:57Z

One stuck thread was also in the middle of opening its own channel to another grpc service (that's expected in the code that we running in the server API), although I have no particular reason to believe that's significant or caused the issue. The rest were all doing requests connection work.

Thread 71 (idle): "grpc-server-rpc-handler_6"
    __init__ (grpc/_channel.py:2046)
    secure_channel (grpc/__init__.py:2119)
    ...

gibsondan · 2024-04-05T00:02:53Z

Actually a bit of overlap here with #36098 here potentially depending on what they meant by “freeze” - since one of the threads that we dumped was trying to create a grpc client. Could also be a coincidence.

DerRidda · 2024-04-05T12:53:07Z

Just coming out of debugging a stalling Apache beam pipeline running on Google Cloud Dataflow and nailing it down to a dependency bump from grpcio 1.60.0 to 1.62.1.

Profiling revealed all the walltime was spent past this wait https://github.com/grpc/grpc/blob/v1.62.1/src/python/grpcio/grpc/_channel.py#L959 ending up with what looks like deadlocking on a Python threading lock.

So something might have happened between these two versions.

gibsondan · 2024-04-05T13:11:43Z

@DerRidda thanks for corroborating. We have some early evidence that rolling back just to 1.62.0 may be sufficient to resolve the problem, but not conclusive yet.

gibsondan · 2024-04-05T17:22:18Z

Scratch that - we are still reproducing on 1.62.0 - trying 1.60.0 next. The first report we received of this issue hanging processes was also on February 9th, about a week after 1.60.1 was released.

DerRidda · 2024-04-05T19:26:12Z

Also not actually fixed on my end after all. Might be a combination of dependencies evolving around grpc. Will try more next week.

DerRidda · 2024-04-05T20:51:17Z

@gibsondan Judging from the Apache Beam SDK it might actually be grpcio 1.59.3 I have running as known to be unaffected. See: apache/beam#30867 (comment)

I can't test this know but I strongly recommend starting there, might be all of 1.6x is affected by whatever this issue is.

gibsondan · 2024-04-08T18:17:15Z

The user I've been working with hasn't seen the issue happen since downgrading to 1.60.0 , and it was happening pretty reliably before. Still monitoring to see if it pops up again.

XuanWang-Amos · 2024-04-08T19:01:04Z

Hi all, based on the error message, it appears there might be an issue with either the transport or internet connection. To determine if this is a regression, we'll need more logs. Please enable those env vars to collect logs from gRPC core:

GRPC_VERBOSITY=debug GRPC_TRACE=all,-timer,-timer_check

gibsondan · 2024-04-09T20:19:23Z

I'm not in a position to run that since I don't have a local reproduction myself, but I will recommend that the next time somebody reports it. Maybe you are able to do that @DerRidda ?

DerRidda · 2024-04-09T20:36:15Z

So yeah, I managed to beat my application into shape with a mixture of dropping the protobuf and grpcio version to known good ones, actually older versions than that. I am pretty sure 1.60.0 is also fine, as is 1.59.3 but that alone wasn't it for me. I used a big hammer and fixed probably more dependencies into place than needed:

grpcio = "1.59.3"
protobuf = "4.25.1"
googleapis-common-protos = "1.61.0"
google-cloud-core = "2.3.3"
google-api-core   = "2.14.0"
google-api-python-client= "2.109.0"

The top two might be most interesting for a generic use case outside of Apache Beam on Dataflow or the GCP ecosystem in general, @gibsondan.

Could be related to #36247?
I can say that all my misbehaving version used protobuf 4.25.3
@veblush Could you shed some light on whether or not the issue fixed in that PR could be related to see behavior we have been seeing here?

Summary: While we don't have a conclusive answer to the sporadic reports of grpc server hangs,evidence is mounting to support a pin: - At least one user who was reliably hitting the hang in grpc/grpc#36256 there reported it going away after downgrading to 1.60.0 (I went one version lower here, to 1.59.3, because I didn't want to pin in the middle of a minor version) - The comments on that issue from other users also report earlier versions helping (although they also reported needing to downgrade some other dependencies too - would want more data points to support that before we add more pins) Test Plan: BK

tvalentyn · 2024-04-10T21:38:02Z

I have a fairly reliable repro, and some details in apache/beam#30867 (comment)

gnossen · 2024-04-12T21:43:23Z

@tvalentyn Thanks for the excellent debugging! cygrpc.Channel should not be blocking -- that's why it doesn't drop the GIL. Is there any way we can reproduce ourselves? Failing that, are you able to provide native backtraces from GDB that would help us figure out where exactly in native code the hang is happening? We see that the thread holding the GIL is in the middle of a syscall, but which one?

tvalentyn · 2024-04-12T23:58:05Z

I can probably give you access to a VM with a stuck process.

tvalentyn · 2024-04-12T23:58:21Z

so far I was not able to repro in a simpler setup

tvalentyn · 2024-04-13T00:10:10Z

We see that the thread holding the GIL is in the middle of a syscall, but which one?

 strace -p  29 
... many ongoing calls like: ...
futex(0x7f326692b2d0, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x7f326692b2cc, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=1062, tv_nsec=895526244}, FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out)
futex(0x7f326692b2d0, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x7f326692b2cc, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=1062, tv_nsec=900717587}, FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out)
futex(0x7f326692b2d0, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x7f326692b2cc, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=1062, tv_nsec=905950989}, FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out)
futex(0x7f326692b2d0, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x7f326692b2cc, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=1062, tv_nsec=911155466}, FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out)
futex(0x7f326692b2d0, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x7f326692b2cc, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=1062, tv_nsec=916524559}, FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out)
futex(0x7f326692b2d0, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x7f326692b2cc, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=1062, tv_nsec=921858727}, FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out)
futex(0x7f326692b2d0, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x7f326692b2cc, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=1062, tv_nsec=927178317}, 
FUTEX_BITSET_MATCH_ANY^Cstrace: Process 29 detached
 <detached ...>

tvalentyn · 2024-04-13T00:20:42Z

root@beamapp-valentyn-04122309-04121609-dnrj-harness-tbzw:/# gdb ./opt/apache/beam-venv/beam-venv-worker-sdk-0-0/bin/python 29
(gdb) info threads
  Id   Target Id                                        Frame 
* 1    Thread 0x7f326634eb80 (LWP 29) "python"          __futex_abstimed_wait_common64 (private=0, cancel=true, 
    abstime=0x7ffef3bf2640, op=137, expected=0, futex_word=0x7f326692b2cc <_PyRuntime+1228>) at ./nptl/futex-internal.c:57
  2    Thread 0x7f325b1ff6c0 (LWP 31) "jemalloc_bg_thd" __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, 
    op=393, expected=0, futex_word=0x7f325b816630) at ./nptl/futex-internal.c:57
  3    Thread 0x7f3251c7d6c0 (LWP 32) "default-executo" syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
  4    Thread 0x7f325147c6c0 (LWP 33) "resolver-execut" syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
  5    Thread 0x7f3250c7b6c0 (LWP 34) "grpc_global_tim" syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
  6    Thread 0x7f3243fff6c0 (LWP 35) "event_engine"    syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
  7    Thread 0x7f32437fe6c0 (LWP 36) "event_engine"    syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
  8    Thread 0x7f3242ffd6c0 (LWP 37) "event_engine"    syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
  9    Thread 0x7f32427fc6c0 (LWP 38) "event_engine"    syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
  10   Thread 0x7f3241ffb6c0 (LWP 39) "lifeguard"       syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
  11   Thread 0x7f3240ff96c0 (LWP 44) "python"          __futex_abstimed_wait_common64 (private=0, cancel=true, 
    abstime=0x7f3240ff7880, op=137, expected=0, futex_word=0x7f326692b2cc <_PyRuntime+1228>) at ./nptl/futex-internal.c:57
  12   Thread 0x7f3233fff6c0 (LWP 45) "python"          __futex_abstimed_wait_common64 (private=0, cancel=true, 
    abstime=0x7f3233ffe2e0, op=137, expected=0, futex_word=0x7f326692b2cc <_PyRuntime+1228>) at ./nptl/futex-internal.c:57
  13   Thread 0x7f32337fe6c0 (LWP 46) "python"          __futex_abstimed_wait_common64 (private=<optimized out>, cancel=true, 
    abstime=0x0, op=393, expected=0, futex_word=0x7f324c003ca0) at ./nptl/futex-internal.c:57
  14   Thread 0x7f3231ffb6c0 (LWP 49) "python"          __futex_abstimed_wait_common64 (private=0, cancel=true, 
    abstime=0x7f3231ffa350, op=137, expected=0, futex_word=0x7f326692b2cc <_PyRuntime+1228>) at ./nptl/futex-internal.c:57
  15   Thread 0x7f3230ff96c0 (LWP 52) "python"          __futex_abstimed_wait_common64 (private=0, cancel=true, 
    abstime=0x7f3230ff7880, op=137, expected=0, futex_word=0x7f326692b2cc <_PyRuntime+1228>) at ./nptl/futex-internal.c:57
  16   Thread 0x7f32317fa6c0 (LWP 53) "python"          __futex_abstimed_wait_common64 (private=0, cancel=true, 
    abstime=0x7f32317f9420, op=137, expected=0, futex_word=0x7f326692b2cc <_PyRuntime+1228>) at ./nptl/futex-internal.c:57
  17   Thread 0x7f322bfff6c0 (LWP 54) "python"          __futex_abstimed_wait_common64 (private=0, cancel=true, 
    abstime=0x7f322bffe2e0, op=137, expected=0, futex_word=0x7f326692b2cc <_PyRuntime+1228>) at ./nptl/futex-internal.c:57
  18   Thread 0x7f322b7fe6c0 (LWP 55) "python"          __futex_abstimed_wait_common64 (private=<optimized out>, cancel=true, 
    abstime=0x0, op=393, expected=0, futex_word=0x7f324c003380) at ./nptl/futex-internal.c:57
  19   Thread 0x7f322affd6c0 (LWP 56) "python"          __futex_abstimed_wait_common64 (private=0, cancel=true, 
    abstime=0x7f322affc2e0, op=137, expected=0, futex_word=0x7f326692b2cc <_PyRuntime+1228>) at ./nptl/futex-internal.c:57
  20   Thread 0x7f322a7fc6c0 (LWP 57) "python"          __futex_abstimed_wait_common64 (private=<optimized out>, cancel=true, 
    abstime=0x0, op=393, expected=0, futex_word=0x5599e6fa4b00) at ./nptl/futex-internal.c:57
  21   Thread 0x7f32327fc6c0 (LWP 58) "python"          __futex_abstimed_wait_common64 (private=<optimized out>, cancel=true, 
    abstime=0x0, op=393, expected=0, futex_word=0x7f322c3d3c50) at ./nptl/futex-internal.c:57
  22   Thread 0x7f3232ffd6c0 (LWP 59) "python"          __futex_abstimed_wait_common64 (private=0, cancel=true, 
    abstime=0x7f3232ffc2e0, op=137, expected=0, futex_word=0x7f326692b2cc <_PyRuntime+1228>) at ./nptl/futex-internal.c:57
  23   Thread 0x7f32417fa6c0 (LWP 60) "python"          __futex_abstimed_wait_common64 (private=<optimized out>, cancel=true, 
    abstime=0x0, op=393, expected=0, futex_word=0x7f3234001060) at ./nptl/futex-internal.c:57
  24   Thread 0x7f3229ffb6c0 (LWP 61) "python"          __futex_abstimed_wait_common64 (private=0, cancel=true, 
    abstime=0x7f3229ff99f0, op=137, expected=0, futex_word=0x7f326692b2cc <_PyRuntime+1228>) at ./nptl/futex-internal.c:57
  25   Thread 0x7f32297fa6c0 (LWP 62) "python"          __futex_abstimed_wait_common64 (private=0, cancel=true, 
    abstime=0x7f32297f92e0, op=137, expected=0, futex_word=0x7f326692b2cc <_PyRuntime+1228>) at ./nptl/futex-internal.c:57
  26   Thread 0x7f3228ff96c0 (LWP 63) "python"          __futex_abstimed_wait_common64 (private=<optimized out>, cancel=true, 
    abstime=0x0, op=393, expected=0, futex_word=0x7f3244008010) at ./nptl/futex-internal.c:57
  27   Thread 0x7f32287f86c0 (LWP 64) "python"          __futex_abstimed_wait_common64 (private=0, cancel=true, 
    abstime=0x7f32287f6780, op=137, expected=0, futex_word=0x7f326692b2cc <_PyRuntime+1228>) at ./nptl/futex-internal.c:57
  28   Thread 0x7f3227ff76c0 (LWP 65) "python"          0x00007f32664fe485 in __GI___clock_nanosleep (
    clock_id=clock_id@entry=0, flags=flags@entry=0, req=req@entry=0x7f3227ff6760, rem=rem@entry=0x0)
    at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:48
  29   Thread 0x7f32277f66c0 (LWP 66) "python"          __futex_abstimed_wait_common64 (private=0, cancel=true, 
    abstime=0x7f32277f5350, op=137, expected=0, futex_word=0x7f326692b2cc <_PyRuntime+1228>) at ./nptl/futex-internal.c:57
  30   Thread 0x7f3226ff56c0 (LWP 67) "python"          __futex_abstimed_wait_common64 (private=0, cancel=true, 
    abstime=0x7f3226ff3250, op=137, expected=0, futex_word=0x7f326692b2cc <_PyRuntime+1228>) at ./nptl/futex-internal.c:57
  31   Thread 0x7f32267f46c0 (LWP 68) "python"          __futex_abstimed_wait_common64 (private=0, cancel=true, 
    abstime=0x7f32267f1530, op=137, expected=0, futex_word=0x7f326692b2cc <_PyRuntime+1228>) at ./nptl/futex-internal.c:57
  32   Thread 0x7f3225ff36c0 (LWP 69) "python"          syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
  33   Thread 0x7f32257f26c0 (LWP 70) "python"          __futex_abstimed_wait_common64 (private=0, cancel=true, 
    abstime=0x7f32257ef3d0, op=137, expected=0, futex_word=0x7f326692b2cc <_PyRuntime+1228>) at ./nptl/futex-internal.c:57
  34   Thread 0x7f3224ff16c0 (LWP 71) "python"          __futex_abstimed_wait_common64 (private=0, cancel=true, 
    abstime=0x7f3224ff0e20, op=137, expected=0, futex_word=0x7f326692b2cc <_PyRuntime+1228>) at ./nptl/futex-internal.c:57
  35   Thread 0x7f32247f06c0 (LWP 72) "grpc_global_tim" syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38

tvalentyn · 2024-04-13T00:33:24Z

https://gist.github.com/tvalentyn/0c51698bfceebc25609066a3f1e68ef8 has more info

tvalentyn · 2024-04-13T00:35:18Z

particularly:

Thread 32 (Thread 0x7f3225ff36c0 (LWP 69) "python"):
#0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1  0x00007f3264f15a35 in ?? () from /usr/local/lib/python3.8/site-packages/grpc/_cython/cygrpc.cpython-38-x86_64-linux-gnu.so
#2  0x00007f3264f15b04 in ?? () from /usr/local/lib/python3.8/site-packages/grpc/_cython/cygrpc.cpython-38-x86_64-linux-gnu.so
#3  0x00007f3264b968de in ?? () from /usr/local/lib/python3.8/site-packages/grpc/_cython/cygrpc.cpython-38-x86_64-linux-gnu.so
#4  0x00007f3264b9686f in ?? () from /usr/local/lib/python3.8/site-packages/grpc/_cython/cygrpc.cpython-38-x86_64-linux-gnu.so
#5  0x00007f3264c8a287 in ?? () from /usr/local/lib/python3.8/site-packages/grpc/_cython/cygrpc.cpython-38-x86_64-linux-gnu.so
#6  0x00007f3264c67d5e in ?? () from /usr/local/lib/python3.8/site-packages/grpc/_cython/cygrpc.cpython-38-x86_64-linux-gnu.so
#7  0x00007f3264c68209 in ?? () from /usr/local/lib/python3.8/site-packages/grpc/_cython/cygrpc.cpython-38-x86_64-linux-gnu.so
#8  0x00007f3264c3d905 in ?? () from /usr/local/lib/python3.8/site-packages/grpc/_cython/cygrpc.cpython-38-x86_64-linux-gnu.so
#9  0x00007f3264e2f02d in ?? () from /usr/local/lib/python3.8/site-packages/grpc/_cython/cygrpc.cpython-38-x86_64-linux-gnu.so
#10 0x00007f3264e5967c in ?? () from /usr/local/lib/python3.8/site-packages/grpc/_cython/cygrpc.cpython-38-x86_64-linux-gnu.so
#11 0x00007f3266778b15 in type_call (type=0x7f3265480de0, args=(b'bigtable.googleapis.com', (('grpc.max_send_message_length', -1), ('grpc.max_receive_message_length', -1), ('grpc.keepalive_time_ms', 30000), ('grpc.keepalive_timeout_ms', 10000), (b'grpc.primary_user_agent', 'grpc-python/1.62.1')), <grpc._cython.cygrpc.ComputeEngineChannelCredentials at remote 0x7f32501e49f0>), kwds=0x0) at Objects/typeobject.c:974
#12 0x00007f3266762d09 in _PyObject_MakeTpCall (callable=<type at remote 0x7f3265480de0>, args=<optimized out>, nargs=<optimized out>, keywords=<optimized out>) at Objects/call.c:159
#13 0x00007f326678ffaf in _PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7f32501d07c0, callable=<optimized out>) at ./Include/cpython/abstract.h:125
#14 _PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7f32501d07c0, callable=<optimized out>) at ./Include/cpython/abstract.h:115
#15 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x7f322c1ecf30) at Python/ceval.c:4963
#16 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3469
#17 0x00007f32667635aa in function_code_fastcall (co=<optimized out>, args=<optimized out>, nargs=5, globals=<optimized out>) at Objects/call.c:284
#18 0x00007f3266762acc in _PyObject_FastCallDict (callable=<function at remote 0x7f3258063700>, args=<optimized out>, nargsf=<optimized out>, kwargs=<optimized out>) at Objects/call.c:96
#19 0x00007f3266763919 in _PyObject_Call_Prepend (callable=callable@entry=<function at remote 0x7f3258063700>, obj=obj@entry=<Channel(_single_threaded_unary_stream=False) at remote 0x7f32501e4970>, args=args@entry=('bigtable.googleapis.com', (('grpc.max_send_message_length', -1), ('grpc.max_receive_message_length', -1), ('grpc.keepalive_time_ms', 30000), ('grpc.keepalive_timeout_ms', 10000)), <grpc._cython.cygrpc.ComputeEngineChannelCredentials at remote 0x7f32501e49f0>, None), kwargs=kwargs@entry=0x0) at Objects/call.c:888
#20 0x00007f326677a716 in slot_tp_init (self=<Channel(_single_threaded_unary_stream=False) at remote 0x7f32501e4970>, args=('bigtable.googleapis.com', (('grpc.max_send_message_length', -1), ('grpc.max_receive_message_length', -1), ('grpc.keepalive_time_ms', 30000), ('grpc.keepalive_timeout_ms', 10000)), <grpc._cython.cygrpc.ComputeEngineChannelCredentials at remote 0x7f32501e49f0>, None), kwds=0x0) at Objects/typeobject.c:6790
#21 0x00007f3266778b65 in type_call (type=<optimized out>, args=('bigtable.googleapis.com', (('grpc.max_send_message_length', -1), ('grpc.max_receive_message_length', -1), ('grpc.keepalive_time_ms', 30000), ('grpc.keepalive_timeout_ms', 10000)), <grpc._cython.cygrpc.ComputeEngineChannelCredentials at remote 0x7f32501e49f0>, None), kwds=0x0) at Objects/typeobject.c:994
#22 0x00007f3266762d09 in _PyObject_MakeTpCall (callable=<ABCMeta(__module__='grpc._channel', __annotations__={'_single_threaded_unary_stream': <type at remote 0x7f32668d6580>, '_channel': <type at remote 0x7f3265480de0>, '_call_state': <type at remote 0x5599e679bb10>, '_connectivity_state': <type at remote 0x5599e679bec0>, '_target': <type at remote 0x7f32668f08e0>}, __doc__='A cygrpc.Channel-backed implementation of grpc.Channel.', __init__=<function at remote 0x7f3258063700>, _process_python_options=<function at remote 0x7f3258063790>, subscribe=<function at remote 0x7f3258063820>, unsubscribe=<function at remote 0x7f32580638b0>, unary_unary=<function at remote 0x7f3258063940>, unary_stream=<function at remote 0x7f32580639d0>, stream_unary=<function at remote 0x7f3258063a60>, stream_stream=<function at remote 0x7f3258063af0>, _unsubscribe_all=<function at remote 0x7f3258063b80>, _close=<function at remote 0x7f3258063c10>, _close_on_fork=<function at remote 0x7f3258063ca0>, __enter__=<function at remote 0x7f3258063d30>, __exit__=<function at remote 0x7f3258063dc0...(truncated), args=<optimized out>, nargs=<optimized out>, keywords=<optimized out>) at Objects/call.c:159
#23 0x00007f326678ffaf in _PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7f32501bbbe8, callable=<optimized out>) at ./Include/cpython/abstract.h:125
#24 _PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7f32501bbbe8, callable=<optimized out>) at ./Include/cpython/abstract.h:115
#25 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x7f322c1ecf30) at Python/ceval.c:4963
#26 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3469
#27 0x00007f326678b79e in _PyEval_EvalCodeWithName (_co=<optimized out>, globals=<optimized out>, locals=locals@entry=0x0, args=<optimized out>, argcount=<optimized out>, kwnames=0x7f325025c7d8, kwargs=0x7f3250238fe0, kwcount=<optimized out>, kwstep=1, defs=0x7f32654a7118, defcount=2, kwdefs=0x0, closure=0x0, name='secure_channel', qualname='secure_channel') at Python/ceval.c:4298
#28 0x00007f326676345f in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at Objects/call.c:436
#29 0x00007f32667aaeee in PyVectorcall_Call (callable=<function at remote 0x7f326414f3a0>, tuple=<optimized out>, kwargs=<optimized out>) at Objects/call.c:200
#30 0x00007f326678d9c2 in _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3559
#31 0x00007f326678b79e in _PyEval_EvalCodeWithName (_co=<optimized out>, globals=<optimized out>, locals=locals@entry=0x0, args=<optimized out>, argcount=<optimized out>, kwnames=0x7f3250212e38, kwargs=0x7f32501c3f38, kwcount=<optimized out>, kwstep=1, defs=0x7f3252af5b48, defcount=9, kwdefs=0x0, closure=0x0, name='create_channel', qualname='create_channel') at Python/ceval.c:4298
#32 0x00007f326676345f in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at Objects/call.c:436
#33 0x00007f32667aaeee in PyVectorcall_Call (callable=<function at remote 0x7f3252a57670>, tuple=<optimized out>, kwargs=<optimized out>) at Objects/call.c:200
#34 0x00007f326678d9c2 in _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3559
#35 0x00007f326678b79e in _PyEval_EvalCodeWithName (_co=<optimized out>, globals=<optimized out>, locals=locals@entry=0x0, args=<optimized out>, argcount=<optimized out>, kwnames=0x7f3251d6b898, kwargs=0x7f32501c73a8, kwcount=<optimized out>, kwstep=1, defs=0x7f3251dd0af8, defcount=5, kwdefs=0x0, closure=0x0, name='create_channel', qualname='BigtableGrpcTransport.create_channel') at Python/ceval.c:4298
#36 0x00007f326676345f in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at Objects/call.c:436
#37 0x00007f32667642c7 in _PyObject_Vectorcall (kwnames=('host', 'credentials', 'options'), nargsf=1, args=0x7f32501c73a0, callable=<function at remote 0x7f3251e3edc0>) at ./Include/cpython/abstract.h:127
#38 method_vectorcall (method=<optimized out>, args=0x7f32501c73a8, nargsf=<optimized out>, kwnames=('host', 'credentials', 'options')) at Objects/classobject.c:60
#39 0x00007f326678d665 in _PyObject_Vectorcall (kwnames=('host', 'credentials', 'options'), nargsf=<optimized out>, args=<optimized out>, callable=<method at remote 0x7f32501c3cc0>) at ./Include/cpython/abstract.h:127
#40 call_function (kwnames=('host', 'credentials', 'options'), oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=<optimized out>) at Python/ceval.c:4963
#41 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3515
#42 0x00007f32667635aa in function_code_fastcall (co=<optimized out>, args=<optimized out>, nargs=3, globals=<optimized out>) at Objects/call.c:284
#43 0x00007f326678c632 in _PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7f325023ae08, callable=<function at remote 0x7f325029b4c0>) at ./Include/cpython/abstract.h:127
#44 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x7f322c1ecf30) at Python/ceval.c:4963
#45 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3486
#46 0x00007f32667635aa in function_code_fastcall (co=<optimized out>, args=<optimized out>, nargs=1, globals=<optimized out>) at Objects/call.c:284
#47 0x00007f32667ac328 in _PyObject_Vectorcall (nargsf=1, kwnames=0x0, args=0x7f3225ff1f30, callable=<function at remote 0x7f325029b5e0>) at ./Include/cpython/abstract.h:127
#48 _PyObject_FastCall (nargs=1, args=0x7f3225ff1f30, func=<function at remote 0x7f325029b5e0>) at ./Include/cpython/abstract.h:147
#49 property_descr_get (self=self@entry=<property at remote 0x7f3251e1fd10>, obj=obj@entry=<Client(_read_only=False, _admin=False, _client_info=<ClientInfo(python_version='3.8.19', grpc_version='1.62.1', api_core_version='2.18.0', gapic_version=None, client_library_version='2.23.0', user_agent=None, rest_version=None) at remote 0x7f3250272640>, _emulator_host=None, _client_options=None, _admin_client_options=None, _channel=None, SCOPE=('https://www.googleapis.com/auth/bigtable.data',), project='apache-beam-testing', _credentials=<Credentials(token=None, expiry=None, _quota_project_id=None, _trust_boundary=None, _universe_domain='googleapis.com', _use_non_blocking_refresh=False, _refresh_worker=<RefreshThreadManager(_worker=None, _lock=<_thread.lock at remote 0x7f3250272ab0>) at remote 0x7f3250272940>, _scopes=(...), _default_scopes=None, _service_account_email='default', _universe_domain_cached=False) at remote 0x7f3250272730>, _http_internal=None, _client_cert_source=None) at remote 0x7f32502725e0>, type=<optimized out>) at Objects/descrobject.c:1496
#50 0x00007f3266770e34 in _PyObject_GenericGetAttrWithDict (obj=<Client(_read_only=False, _admin=False, _client_info=<ClientInfo(python_version='3.8.19', grpc_version='1.62.1', api_core_version='2.18.0', gapic_version=None, client_library_version='2.23.0', user_agent=None, rest_version=None) at remote 0x7f3250272640>, _emulator_host=None, _client_options=None, _admin_client_options=None, _channel=None, SCOPE=('https://www.googleapis.com/auth/bigtable.data',), project='apache-beam-testing', _credentials=<Credentials(token=None, expiry=None, _quota_project_id=None, _trust_boundary=None, _universe_domain='googleapis.com', _use_non_blocking_refresh=False, _refresh_worker=<RefreshThreadManager(_worker=None, _lock=<_thread.lock at remote 0x7f3250272ab0>) at remote 0x7f3250272940>, _scopes=(...), _default_scopes=None, _service_account_email='default', _universe_domain_cached=False) at remote 0x7f3250272730>, _http_internal=None, _client_cert_source=None) at remote 0x7f32502725e0>, name='table_data_client', dict=0x0, suppress=0) at Objects/object.c:1254
#51 0x00007f326678c4ad in _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:2966
#52 0x00007f32667635aa in function_code_fastcall (co=<optimized out>, args=<optimized out>, nargs=1, globals=<optimized out>) at Objects/call.c:284
#53 0x00007f32667ac328 in _PyObject_Vectorcall (nargsf=1, kwnames=0x0, args=0x7f3225ff2110, callable=<function at remote 0x7f3250298040>) at ./Include/cpython/abstract.h:127
#54 _PyObject_FastCall (nargs=1, args=0x7f3225ff2110, func=<function at remote 0x7f3250298040>) at ./Include/cpython/abstract.h:147
#55 property_descr_get (self=self@entry=<property at remote 0x7f32502961d0>, obj=obj@entry=<Table(table_id='python-test-9c36d03d', _instance=<Instance(instance_id='python-test-7cb79e01', _client=<Client(_read_only=False, _admin=False, _client_info=<ClientInfo(python_version='3.8.19', grpc_version='1.62.1', api_core_version='2.18.0', gapic_version=None, client_library_version='2.23.0', user_agent=None, rest_version=None) at remote 0x7f3250272640>, _emulator_host=None, _client_options=None, _admin_client_options=None, _channel=None, SCOPE=('https://www.googleapis.com/auth/bigtable.data',), project='apache-beam-testing', _credentials=<Credentials(token=None, expiry=None, _quota_project_id=None, _trust_boundary=None, _universe_domain='googleapis.com', _use_non_blocking_refresh=False, _refresh_worker=<RefreshThreadManager(_worker=None, _lock=<_thread.lock at remote 0x7f3250272ab0>) at remote 0x7f3250272940>, _scopes=(...), _default_scopes=None, _service_account_email='default', _universe_domain_cached=False) at remote 0x7f3250272730>, _http_internal=None, _client_cert_source=None) at remote 0x7f32502725...(truncated), type=<optimized out>) at Objects/descrobject.c:1496
#56 0x00007f3266770e34 in _PyObject_GenericGetAttrWithDict (obj=<Table(table_id='python-test-9c36d03d', _instance=<Instance(instance_id='python-test-7cb79e01', _client=<Client(_read_only=False, _admin=False, _client_info=<ClientInfo(python_version='3.8.19', grpc_version='1.62.1', api_core_version='2.18.0', gapic_version=None, client_library_version='2.23.0', user_agent=None, rest_version=None) at remote 0x7f3250272640>, _emulator_host=None, _client_options=None, _admin_client_options=None, _channel=None, SCOPE=('https://www.googleapis.com/auth/bigtable.data',), project='apache-beam-testing', _credentials=<Credentials(token=None, expiry=None, _quota_project_id=None, _trust_boundary=None, _universe_domain='googleapis.com', _use_non_blocking_refresh=False, _refresh_worker=<RefreshThreadManager(_worker=None, _lock=<_thread.lock at remote 0x7f3250272ab0>) at remote 0x7f3250272940>, _scopes=(...), _default_scopes=None, _service_account_email='default', _universe_domain_cached=False) at remote 0x7f3250272730>, _http_internal=None, _client_cert_source=None) at remote 0x7f32502725...(truncated), name='name', dict=0x0, suppress=0) at Objects/object.c:1254
#57 0x00007f326678c4ad in _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:2966
#58 0x00007f326678b79e in _PyEval_EvalCodeWithName (_co=<optimized out>, globals=<optimized out>, locals=locals@entry=0x0, args=<optimized out>, argcount=<optimized out>, kwnames=0x0, kwargs=0x7f325022dea0, kwcount=<optimized out>, kwstep=1, defs=0x7f3250295058, defcount=2, kwdefs=0x0, closure=0x0, name='mutate_rows', qualname='Table.mutate_rows') at Python/ceval.c:4298
#59 0x00007f326676345f in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at Objects/call.c:436
#60 0x00007f326678c632 in _PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7f325022de90, callable=<function at remote 0x7f3250298b80>) at ./Include/cpython/abstract.h:127
#61 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x7f322c1ecf30) at Python/ceval.c:4963
#62 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3486
#63 0x00007f32667635aa in function_code_fastcall (co=<optimized out>, args=<optimized out>, nargs=2, globals=<optimized out>) at Objects/call.c:284
#64 0x00007f326676438b in _PyObject_Vectorcall (kwnames=0x0, nargsf=2, args=0x7f3225ff2580, callable=<function at remote 0x7f32502fa700>) at ./Include/cpython/abstract.h:127
#65 method_vectorcall (method=method@entry=<method at remote 0x7f325022ccc0>, args=<optimized out>, nargsf=nargsf@entry=1, kwnames=0x0) at Objects/classobject.c:89
#66 0x00007f32667aaeee in PyVectorcall_Call (callable=<method at remote 0x7f325022ccc0>, tuple=<optimized out>, kwargs=<optimized out>) at Objects/call.c:200
#67 0x00007f326679038c in _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3559
#68 0x00007f32667635aa in function_code_fastcall (co=<optimized out>, args=<optimized out>, nargs=1, globals=<optimized out>) at Objects/call.c:284
#69 0x00007f326678c632 in _PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7f3250214d80, callable=<function at remote 0x7f32520cdf70>) at ./Include/cpython/abstract.h:127
#70 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x7f322c1ecf30) at Python/ceval.c:4963
#71 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3486
#72 0x00007f32667635aa in function_code_fastcall (co=<optimized out>, args=<optimized out>, nargs=4, globals=<optimized out>) at Objects/call.c:284
#73 0x00007f32667aaeee in PyVectorcall_Call (callable=<function at remote 0x7f32520cde50>, tuple=<optimized out>, kwargs=<optimized out>) at Objects/call.c:200
#74 0x00007f326678d9c2 in _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3559
#75 0x00007f32667635aa in function_code_fastcall (co=<optimized out>, args=<optimized out>, nargs=1, globals=<optimized out>) at Objects/call.c:284
#76 0x00007f326678c632 in _PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7f32501bb7b8, callable=<function at remote 0x7f3266081d30>) at ./Include/cpython/abstract.h:127
#77 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x7f322c1ecf30) at Python/ceval.c:4963
#78 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3486
#79 0x00007f32667635aa in function_code_fastcall (co=<optimized out>, args=<optimized out>, nargs=1, globals=<optimized out>) at Objects/call.c:284
#80 0x00007f326678c632 in _PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7f325023aa78, callable=<function at remote 0x7f3266083040>) at ./Include/cpython/abstract.h:127
#81 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x7f322c1ecf30) at Python/ceval.c:4963
#82 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3486
#83 0x00007f32667635aa in function_code_fastcall (co=<optimized out>, args=<optimized out>, nargs=1, globals=<optimized out>) at Objects/call.c:284
#84 0x00007f32667643f2 in _PyObject_Vectorcall (kwnames=0x0, nargsf=1, args=0x7f3225ff2dd8, callable=<function at remote 0x7f3266081dc0>) at ./Include/cpython/abstract.h:127
#85 method_vectorcall (method=method@entry=<method at remote 0x7f3250234e00>, args=0x7f32662b0058, nargsf=nargsf@entry=0, kwnames=0x0) at Objects/classobject.c:67
#86 0x00007f32667aaeee in PyVectorcall_Call (callable=<method at remote 0x7f3250234e00>, tuple=<optimized out>, kwargs=<optimized out>) at Objects/call.c:200
#87 0x00007f32667e2677 in t_bootstrap (boot_raw=boot_raw@entry=0x7f32501c5570) at ./Modules/_threadmodule.c:1002
#88 0x00007f32667d5674 in pythread_wrapper (arg=<optimized out>) at Python/thread_pthread.h:232
#89 0x00007f32664b8134 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#90 0x00007f32665387dc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

tvalentyn · 2024-04-13T00:54:36Z

looks like we might need to have debug symbols for cygrpc.cpython-38-x86_64-linux-gnu.so to get more info

tvalentyn · 2024-04-13T00:55:26Z

also, i think the process i looked at got "unstuck" after 90 min or so.

tvalentyn · 2024-04-15T18:18:14Z

Will try to rebuild grpc dependency with debug symbols as follows

export REPO_ROOT=grpc  # REPO_ROOT can be any directory of your choice
git clone -b v1.62.1 https://github.com/grpc/grpc $REPO_ROOT
cd $REPO_ROOT
git submodule update --init

# For the next two commands do `sudo pip install` if you get permission-denied errors
pip install -r requirements.txt
export GRPC_PYTHON_CFLAGS="-g"
GRPC_PYTHON_BUILD_WITH_CYTHON=1 pip install .

and supply it in a custom container for the dataflow job.

tvalentyn · 2024-04-15T19:41:22Z

I cannot reproduce the error when I use the build of GRPC from sources.

I tried setting GRPC_VERBOSITY=debug GRPC_TRACE=all,-timer,-timer_check , but not seeing any additional log output.

tvalentyn · 2024-04-15T21:27:24Z

Could we make resolution of this issue a blocker for release v1.63.0 ?

XuanWang-Amos · 2024-04-15T21:42:40Z

C++ team need this release to unblock GCS, is there any particular reason you want to block the release? From the context looks like it's not a regression introduced in v1.63.0 and can be temporary resolved by pinging to a lower version.

tvalentyn · 2024-04-15T22:07:11Z

Mostly to not have to add an upperbound on grpcio, since that can sometimes make dependency resolution complicated down the road. ok, we'll add an upper bound for now then.

gibsondan · 2024-04-15T22:34:19Z

I would love to know what version would work as an upper bound if we have any leads on that front.

…

On Mon, Apr 15, 2024 at 5:07 PM tvalentyn ***@***.***> wrote: Mostly to not have to add an upperbound on grpcio, since that can sometimes make dependency resolution complicated down the road. ok, we'll add an upper bound for now then. — Reply to this email directly, view it on GitHub <#36256 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACAPJC53SPYYJIZ2G2QNQFLY5RFSNAVCNFSM6AAAAABFXSTTPKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANJXHA4TGMZUHE> . You are receiving this because you were mentioned.Message ID: ***@***.***>

tvalentyn · 2024-04-15T22:52:13Z

I can repro apache/beam#30927 on grpcio==1.60.0 and cannot repro on 1.59.x, so I'll set the bound to "grpcio<1.60.0"

tvalentyn · 2024-04-15T23:29:30Z

Scratch that, I can still repro on grpcio==1.59.3

tvalentyn · 2024-04-15T23:31:40Z

As @DerRidda mentions in #36256 (comment) - this might involve multiple dependencies or some other dependency is at play in this regression.

tvalentyn · 2024-04-16T01:08:58Z

Ok, the issue i have been investigating so far appears to match: googleapis/python-bigtable#949, which will be fixed in the upcoming release of grpcio.

Any of the following mitigations help:

upgrade grpcio to 1.62.2 or above
downgrade google-api-core to 2.16.2 or below
downgrade grpcio to 1.58.0 or below

ZikBurns · 2024-04-19T22:05:20Z

I've also been getting the same error in my clients with insecure_channel:

Failed to connect - <_InactiveRpcError of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "failed to connect to all addresses; last error: UNKNOWN: ipv4:10.0.4.54:49152: connection attempt timed out before receiving SETTINGS frame"
debug_error_string = "UNKNOWN:failed to connect to all addresses; last error: UNKNOWN: ipv4:10.0.4.54:49152: connection attempt timed out before receiving SETTINGS frame {created_time:"2024-04-19T20:08:31.01818523+00:00", grpc_status:14}"

However, not all clients receive the error, around 30% of them. I've also been using gRPC 1.62.1 in Python 3.11.

I've tried changing the version of grpcio for the server and client. I also get this error in versions 1.56.0, 1.58.0, 1.63.0rc1 and 1.63.0rc2.

UPDATE: I wasn't properly closing the gRPC server, and that was leaving a process using the same port, causing the error. Make sure you are properly closing/killing all processes related to gRPC in Python...

XuanWang-Amos · 2024-04-23T17:14:53Z

We have backported the fix to v1.62.2, update to this version or above should fix the issue.

Please open an new issue if you still have issues using v1.62.2 or above versions.

gibsondan · 2024-04-23T18:29:05Z

Thank you for the fix!

gibsondan added kind/bug lang/Python priority/P2 labels Apr 4, 2024

gibsondan assigned gnossen and XuanWang-Amos Apr 4, 2024

DerRidda mentioned this issue Apr 5, 2024

[Bug]: GRCPIO versions from 1.59.0 to 1.62.1 can cause Beam Python pipelines to get stuck apache/beam#30867

Closed

16 tasks

gibsondan mentioned this issue Apr 9, 2024

Add grpcio<1.60.0 pin dagster-io/dagster#21106

Closed

tvalentyn mentioned this issue Apr 10, 2024

[Failing Test]: BigtableIOWriteTest::test_bigtable_write is about 50% flaky apache/beam#30927

Closed

16 tasks

XuanWang-Amos closed this as completed Apr 23, 2024

grpc servers hanging with a "connection attempt timed out before receiving SETTINGS frame" error #36256

grpc servers hanging with a "connection attempt timed out before receiving SETTINGS frame" error #36256

Comments

gibsondan commented Apr 4, 2024 • edited

What version of gRPC and what language are you using?

What operating system (Linux, Windows,...) and version?

What runtime / compiler are you using (e.g. python version or version of gcc)

What did you do?

What did you expect to see?

What did you see instead?

Anything else we should know about your project / environment?

gibsondan commented Apr 4, 2024 • edited

gibsondan commented Apr 4, 2024

gibsondan commented Apr 5, 2024

DerRidda commented Apr 5, 2024

gibsondan commented Apr 5, 2024

gibsondan commented Apr 5, 2024

DerRidda commented Apr 5, 2024

DerRidda commented Apr 5, 2024

gibsondan commented Apr 8, 2024

XuanWang-Amos commented Apr 8, 2024

gibsondan commented Apr 9, 2024

DerRidda commented Apr 9, 2024

tvalentyn commented Apr 10, 2024

gnossen commented Apr 12, 2024

tvalentyn commented Apr 12, 2024 • edited

tvalentyn commented Apr 12, 2024

tvalentyn commented Apr 13, 2024 • edited

tvalentyn commented Apr 13, 2024

tvalentyn commented Apr 13, 2024

tvalentyn commented Apr 13, 2024

tvalentyn commented Apr 13, 2024

tvalentyn commented Apr 13, 2024

tvalentyn commented Apr 15, 2024

tvalentyn commented Apr 15, 2024

tvalentyn commented Apr 15, 2024

XuanWang-Amos commented Apr 15, 2024

tvalentyn commented Apr 15, 2024

gibsondan commented Apr 15, 2024 via email

tvalentyn commented Apr 15, 2024

tvalentyn commented Apr 15, 2024

tvalentyn commented Apr 15, 2024

tvalentyn commented Apr 16, 2024 • edited

ZikBurns commented Apr 19, 2024 • edited

XuanWang-Amos commented Apr 23, 2024

gibsondan commented Apr 23, 2024

gibsondan commented Apr 4, 2024 •

edited

gibsondan commented Apr 4, 2024 •

edited

tvalentyn commented Apr 12, 2024 •

edited

tvalentyn commented Apr 13, 2024 •

edited

tvalentyn commented Apr 16, 2024 •

edited

ZikBurns commented Apr 19, 2024 •

edited