Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Unable to pair M5 board / nordic using chip tool #28139

Closed
phonnakasturi-apple opened this issue Jul 20, 2023 · 2 comments · Fixed by #28143
Closed

[BUG] Unable to pair M5 board / nordic using chip tool #28139

phonnakasturi-apple opened this issue Jul 20, 2023 · 2 comments · Fixed by #28143
Labels
bug Something isn't working darwin needs triage

Comments

@phonnakasturi-apple
Copy link

Reproduction steps

Unable to pair the M5 board / nordic board with chip tool app using SHA 02aec55

Running into the following error:

[1689887332195] [88055:18418667] [BLE] CancelBleIncompleteConnection() failed, err = src/ble/BleLayer.cpp:372: CHIP Error 0x00000003: Incorrect state
[1689887332195] [88055:18418667] [DL] System Layer shutdown
[1689887332195] [88055:18418667] [TOO] Run command failure: src/protocols/secure_channel/PASESession.cpp:255: CHIP Error 0x00000032: Timeout

Steps to Reproduce:

  1. cd ~/connectedhomeip
  2. git reset --hard 02aec55
  3. git clean -Xdf
  4. git submodule update --init
  5. export PKG_CONFIG_PATH="/opt/homebrew/opt/openssl@3/lib/pkgconfig"
  6. source scripts/bootstrap.sh
  7. source scripts/activate.sh
  8. export ZAP_INSTALL_PATH=~/Downloads/zap-mac/
  9. gn gen out/debug --args='chip_mdns="platform"'
  10. ninja -C out/debug
  11. Put either nordic board in pairing mode (Button1) or M5v board in pairing mode
  12. ./out/debug/chip-tool pairing ble-thread 1 hex:0E08000063D80BB10000000300001935060004001FFFC0020814A93EB3E9AD4C340708FD16B6B48DCB407305101F1F13546EB4E68625B43BCB5C720419030F4D79486F6D653137303932373930380102ACF50410067BCAF2D4450681A7CCC3062855C9530C0402A0F7F8 20202021 3840 --paa-trust-store-path ./credentials/development/paa-root-certs

Bug prevalence

always

GitHub hash of the SDK that was being used

02aec55

Platform

darwin

Platform Version(s)

No response

Anything else?

Attaching the logs from nordic + chip tool + M5 board:

M5logs.txt
nordicboard_logs.txt
chip-tool_logs.txt

Note:

  1. The same was previously observed with SHA c7d9a11
  2. Last SHA chip tool pairing worked was with SHA 5ff1818
  3. Looks like the issue was introduced with SHA [eb8131e] Stop generating BridgeGlobalStructs.h for dynamic-bridge-app (Stop generating BridgeGlobalStructs.h for dynamic-bridge-app #27020) (from git bisect)
@phonnakasturi-apple phonnakasturi-apple added bug Something isn't working needs triage labels Jul 20, 2023
@bzbarsky-apple
Copy link
Contributor

Git bisect says:

ad5403e8e537612c2d2f903895642445d45d1e08 is the first bad commit
commit ad5403e8e537612c2d2f903895642445d45d1e08
Author: alwaysonketo <43084048+cuizelin99@users.noreply.github.com>
Date:   Mon Jun 19 09:59:05 2023 -0700

    Cancel incomplete BLE connection when CloseAllBleConnections() is called (#27304)
    
    When CloseAllBleConnections() is called, any ongoing attempt to establish new BLE connection should be cancelled because it is no longer needed for a new BLE connection to establish. CancelConnection() of the connection delegate should be called to cancel any ongoing new BLE connection so that platform- specific BLEManager can do cleanup. This is needed because we are encountering a problem that BLEManager not doing clean up when establishing PASE session times out, causing problem in the BT layer. Since BLEManager is platform-specific, it doesn't have knowledge of when PASE times out. But BLEManager needs to do clean up whenever CloseAllBleConnections() is called.

so this is caused by #27304.

I am observing two things that fail:

  1. On m5stack, there are threading assertion failures:
0x401381b3: chip::Platform::Internal::AssertChipStackLockedByCurrentThread(char const*, int) at connectedhomeip/examples/all-clusters-app/esp32/build/esp-idf/chip/../../../../../../config/esp32/third_party/connectedhomeip/src/lib/support/CodeUtils.h:508
 (inlined by) chipDie at connectedhomeip/examples/all-clusters-app/esp32/build/esp-idf/chip/../../../../../../config/esp32/third_party/connectedhomeip/src/lib/support/CodeUtils.h:518
 (inlined by) chip::Platform::Internal::AssertChipStackLockedByCurrentThread(char const*, int) at connectedhomeip/examples/all-clusters-app/esp32/build/esp-idf/chip/../../../../../../config/esp32/third_party/connectedhomeip/src/platform/LockTracker.cpp:36

0x401391b1: chip::System::LayerImplFreeRTOS::CancelTimer(void (*)(chip::System::Layer*, void*), void*) at connectedhomeip/examples/all-clusters-app/esp32/build/esp-idf/chip/../../../../../../config/esp32/third_party/connectedhomeip/src/system/SystemLayerImplFreeRTOS.cpp:81

0x4015382d: chip::Ble::BLEEndPoint::StopReceiveConnectionTimer() at connectedhomeip/examples/all-clusters-app/esp32/build/esp-idf/chip/../../../../../../config/esp32/third_party/connectedhomeip/src/ble/BLEEndPoint.cpp:1487

0x40153ad4: chip::Ble::BLEEndPoint::DoClose(unsigned char, chip::ChipError) at connectedhomeip/examples/all-clusters-app/esp32/build/esp-idf/chip/../../../../../../config/esp32/third_party/connectedhomeip/src/ble/BLEEndPoint.cpp:331

0x40130cd1: chip::Ble::BleLayer::HandleConnectionError(unsigned short, chip::ChipError) at connectedhomeip/examples/all-clusters-app/esp32/build/esp-idf/chip/../../../../../../config/esp32/third_party/connectedhomeip/src/ble/BleLayer.cpp:745

0x401402f3: chip::DeviceLayer::Internal::BLEManagerImpl::HandleGAPDisconnect(ble_gap_event*) at connectedhomeip/examples/all-clusters-app/esp32/build/esp-idf/chip/../../../../../../config/esp32/third_party/connectedhomeip/src/platform/ESP32/nimble/BLEManagerImpl.cpp:1287
  1. On Mac, there is a CancelBleIncompleteConnection call that happens during PASE setup, like so:
  * frame #0: 0x0000000101bf916c chip-tool`chip::Ble::BleLayer::CancelBleIncompleteConnection(this=0x0000000102beb5a0) at BleLayer.cpp:372:5
    frame #1: 0x0000000101bf8cd4 chip-tool`chip::Ble::BleLayer::CloseAllBleConnections(this=0x0000000102beb5a0) at BleLayer.cpp:312:22
    frame #2: 0x0000000102260b80 chip-tool`chip::Controller::DeviceCommissioner::ReleaseCommissioneeDevice(this=0x0000000108d11900, device=0x0000000107915e80) at CHIPDeviceController.cpp:560:35
    frame #3: 0x0000000102265d64 chip-tool`chip::Controller::DeviceCommissioner::OnDiscoveredDeviceOverBleSuccess(appState=0x0000000108d11900, connObj=0x000000010bf953c0) at CHIPDeviceController.cpp:795:15

which cancels the connection the PASE session is on.

On Linux, BLEManagerImpl::CancelConnection (called by CancelBleIncompleteConnection) is a no-op, so things "work" there, kind of by accident.

Given this, and the lack of response on #27607, I think we should revert #27304.

@bzbarsky-apple
Copy link
Contributor

Filed #28142 on the esp32 issue.

bzbarsky-apple added a commit to bzbarsky-apple/connectedhomeip that referenced this issue Jul 21, 2023
…) is called (project-chip#27304)"

This reverts commit ad5403e.

There are two issues:

* project-chip#27607 which has had no
  response for weeks.
* project-chip#28139 which breaks
  commissioning on Mac, and would break it on Linux if it implemented BLE
  connection cancellation.

We need to sort out why CHIPDeviceController is canceling BLE connections when
starting PASE over BLE (!).
bzbarsky-apple added a commit that referenced this issue Jul 24, 2023
…) is called (#27304)" (#28143)

This reverts commit ad5403e.

There are two issues:

* #27607 which has had no
  response for weeks.
* #28139 which breaks
  commissioning on Mac, and would break it on Linux if it implemented BLE
  connection cancellation.

We need to sort out why CHIPDeviceController is canceling BLE connections when
starting PASE over BLE (!).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working darwin needs triage
Projects
Archived in project
2 participants