Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] CHIP Error 0x000000C1: Endpoint pool full on system with many IP links #27007

Open
agners opened this issue Jun 1, 2023 · 10 comments
Open

Comments

@agners
Copy link
Contributor

agners commented Jun 1, 2023

Reproduction steps

Running Matter SDK via Python library from the v1.1.0.1 tag on a Linux system with many interface links (e.g. veth* links as generated by Docker) can cause the following error at startup:

2023-06-01 14:43:40 core-matter-server chip.ZCL[127] INFO Using ZAP configuration...
Traceback (most recent call last):
  File "/usr/local/bin/matter-server", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/usr/local/lib/python3.11/site-packages/matter_server/server/__main__.py", line 79, in main
    server = MatterServer(
             ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/matter_server/server/server.py", line 74, in __init__
    self.stack = MatterStack(self)
                 ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/matter_server/server/stack.py", line 32, in __init__
    self._chip_stack = ChipStack(
                       ^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/chip/ChipStack.py", line 64, in wrapper
    instance[0] = cls(*args, **kwargs)
                  ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/chip/ChipStack.py", line 270, in __init__
    res.raise_on_error()
  File "/usr/local/lib/python3.11/site-packages/chip/native/__init__.py", line 67, in raise_on_error
    raise self.to_exception()
chip.exceptions.ChipStackError: src/system/SystemLayerImplSelect.cpp:268: CHIP Error 0x000000C1: Endpoint pool full

It seems to get problematic when having ~30 links. The build configuration uses chip_mdns=minimal, no special chip_minmdns_default_policy (so assuming default).

chip-tool seems to crash similarly:

[1685624264.414533][570:570] CHIP:ZCL: Using ZAP configuration...

Thread 1 "chip-tool" received signal SIGSEGV, Segmentation fault.
0x000055555608feab in chip::System::LayerImplSelect::StopWatchingSocket (this=0x5555561f6720 <chip::DeviceLayer::SystemLayerImpl()::gSystemLayerImpl>, tokenInOut=0x555556f00740) at ../../examples/chip-tool/third_party/connectedhomeip/src/system/SystemLayerImplSelect.cpp:397
397     ../../examples/chip-tool/third_party/connectedhomeip/src/system/SystemLayerImplSelect.cpp: No such file or directory.

Build configuration is pretty much default:

scripts/build/build_examples.py --target ${EXAMPLE_PREFIX}chip-tool \
                                --pregen-dir ./zzz_pregenerated \
                                build

Bug prevalence

Always, when having enough links

GitHub hash of the SDK that was being used

v1.1.0.1

Platform

python

Platform Version(s)

No response

Anything else?

No response

@agners
Copy link
Contributor Author

agners commented Jun 1, 2023

I am assuming that increasing mSocketWatchPool can help to support more links. However, I wonder if maybe there could be a better strategy, e.g. ignoring certain type of links?

40: veth3df6f44@if39: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master hassio state UP mode DEFAULT group default 
    link/ether 32:33:ca:f4:28:a6 brd ff:ff:ff:ff:ff:ff link-netnsid 13

Also on the same platform Matter SDK built from v1.0.0.1 seems to not suffer from that problem (or presumably it requires more links to get problematic).

@bzbarsky-apple
Copy link
Contributor

@agners When building, what do you set INET_CONFIG_NUM_UDP_ENDPOINTS to? Or do you just use the default value?

@bzbarsky-apple
Copy link
Contributor

And perhaps we should just convert mSocketWatchPool from being an array to actually being a pool, which would make it effectively infinite-sized on Linux.

@agners
Copy link
Contributor Author

agners commented Jun 1, 2023

@agners When building, what do you set INET_CONFIG_NUM_UDP_ENDPOINTS to? Or do you just use the default value?

I don't set anything in particular, so the build system should use the default.

@bzbarsky-apple
Copy link
Contributor

I don't set anything in particular, so the build system should use the default.

Alright. Well, as a stopgap you could set a larger config value there...

@agners
Copy link
Contributor Author

agners commented Jun 1, 2023

Yes, I had that in my mind as well.

Currently I am testing with libnl which at least in my (somewhat synthetic) test environment seems to help as well. I am not sure though if that helps in the cases our users have.

@luke-ingle
Copy link

Hi, just to chime in because I've experienced a similar issue using chip-tool, but I don't seem to get any error messages about it.

I pulled from github today, and built according to the documentation, however this also affects previous builds from anything pushed on Thursday last week.

I try running the pairing command and get a segfault just after the ZAP output.
chip_output_segfault.log

I've checked my network interfaces and realised (for various reasons) I had 50 different ones. I deleted the majority of them, and then chip-tool worked fine.

The annoying thing is that there was no indication that this was the problem. (Unless I've missed something obvious in the output?)

I've fixed my end, but just thought I'd let you know others have experienced a similar issue.

@agners
Copy link
Contributor Author

agners commented Jul 4, 2023

I try running the pairing command and get a segfault just after the ZAP output.
chip_output_segfault.log

Hm, interesting, probably a missing error handling. A gdb backtrace of that would be interesting (gdb --args out/host/chip-tool ..., then run and then use the backtrace command after the Segfault).

@bzbarsky-apple
Copy link
Contributor

@luke-ingle A separate issue with a stack showing where the crash happened would be extremely helpful.

@luke-ingle
Copy link

@bzbarsky-apple

New issue raised
#27915

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants