Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NimBLE - immediate disconnect when BLE key store fills and device changes address (IDFGH-3586) #5530

Closed
bfriedkin opened this issue Jul 2, 2020 · 38 comments
Labels
Status: Done Issue is done internally

Comments

@bfriedkin
Copy link

Environment

  • Development Kit: Moddable Two (ESP32)
  • Kit version: ESP32 r6
  • Module or chip used: ESP32-WROOM-32D
  • IDF version: v3.3.2
  • Build System: CMake?
  • Compiler version: xtensa-esp32-elf-gcc (crosstool-NG crosstool-ng-1.22.0-80-g6c4433a) 5.2.0
  • Operating System: macOS
  • Using an IDE?: No
  • Power Supply: USB

Problem Description

Using the bleprph NimBLE example app configured for secure connections, MITM, bonding and MAX_BONDS = 3, the device immediately disconnects with an iPhone running LightBlue after turning Bluetooth off and on the iPhone. The iPhone maintains the bond when toggling Bluetooth off and on, but the iPhone does change its BT address. This seems to be an iPhone BLE behavior. I believe that the failure might be related to the config store being full. When this happens, there are already three bond entries saved to NVS.

Expected Behavior

The device should detect that the peer address has changes, call the store status callback to make room, and proceed without disconnecting.

Actual Behavior

The ESP32 disconnects and the iPhone is unable to establish a connection until the iPhone deletes the bond.

Steps to reproduce

  1. Build and run the bleprph app with the options described above
  2. Run LightBlue on iPhone and browse into one of the services to read a characteristic
  3. The pairing dialog displays on iPhone. Accept the passkey.
  4. In the console for the device build, type "key Y"
  5. Back out of the connection in LightBlue so that the device starts advertising again
  6. On the iPhone, navigate the Settings -> Bluetooth and turn Bluetooth off and on again. Do not remove "nimble-belprph" from the MY DEVICES list
  7. Repeat step 2 above

Code to reproduce this issue

The bleprph sample app without any changes.

Debug Logs

When I launch I see that three bonds are already stored:

D (1305) NIMBLE_NVS: ble_store_config_cccds restored 3 bonds

When I run without turning Bluetooth off on the iPhone, the log shows a successful connection with bonding:

mtu update event; conn_handle=0 cid=4 mtu=256
subscribe event; conn_handle=0 attr_handle=8 reason=1 prevn=0 curn=0 previ=0 curi=1
I (369071) NimBLE_BLE_PRPH: PASSKEY_ACTION_EVENT started 

I (369081) NimBLE_BLE_PRPH: Passkey on device's display: 943244
I (369081) NimBLE_BLE_PRPH: Accept or reject the passkey through console in this format -> key Y or key N
I (374771) You entered: key Y
I (374771) NimBLE_BLE_PRPH: ble_sm_inject_io result: 0

encryption change event; status=0 handle=0 our_ota_addr_type=0 our_ota_addr=24:6f:28:19:d2:52 our_id_addr_type=0 our_id_addr=24:6f:28:19:d2:52 peer_ota_addr_type=1 peer_ota_addr=54:00:de:56:50:12 peer_id_addr_type=1 peer_id_addr=54:00:de:56:50:12 conn_itvl=24 conn_latency=0 supervision_timeout=72 encrypted=1 authenticated=1 bonded=1

subscribe event; conn_handle=0 attr_handle=8 reason=2 prevn=0 curn=0 previ=1 curi=0
disconnect; reason=531 handle=0 our_ota_addr_type=0 our_ota_addr=24:6f:28:19:d2:52 our_id_addr_type=0 our_id_addr=24:6f:28:19:d2:52 peer_ota_addr_type=1 peer_ota_addr=54:00:de:56:50:12 peer_id_addr_type=1 peer_id_addr=54:00:de:56:50:12 conn_itvl=24 conn_latency=0 supervision_timeout=72 encrypted=1 authenticated=1 bonded=1

After I turn Bluetooth off and on, a subsequent connection shows the failure. Note that the peer_id_addr has changed:

GAP procedure initiated: advertise; disc_mode=2 adv_channel_map=0 own_addr_type=0 adv_filter_policy=0 adv_itvl_min=0 adv_itvl_max=0
connection established; status=0 handle=0 our_ota_addr_type=0 our_ota_addr=24:6f:28:19:d2:52 our_id_addr_type=0 our_id_addr=24:6f:28:19:d2:52 peer_ota_addr_type=1 peer_ota_addr=41:24:f7:d6:bd:10 peer_id_addr_type=1 peer_id_addr=41:24:f7:d6:bd:10 conn_itvl=24 conn_latency=0 supervision_timeout=72 encrypted=0 authenticated=0 bonded=0

disconnect; reason=531 handle=0 our_ota_addr_type=0 our_ota_addr=24:6f:28:19:d2:52 our_id_addr_type=0 our_id_addr=24:6f:28:19:d2:52 peer_ota_addr_type=1 peer_ota_addr=41:24:f7:d6:bd:10 peer_id_addr_type=1 peer_id_addr=41:24:f7:d6:bd:10 conn_itvl=24 conn_latency=0 supervision_timeout=72 encrypted=0 authenticated=0 bonded=0

I've attached my sdkconfig file to this case.

At this point we are looking for a workaround. Ideally it would be helpful to know in advance at the app level when this failure might trigger so we can hedge against any issues by deleting all bonds or something similar.

Thank you!

Regards,
Brian

@github-actions github-actions bot changed the title NimBLE - immediate disconnect when BLE key store fills and device changes address NimBLE - immediate disconnect when BLE key store fills and device changes address (IDFGH-3586) Jul 2, 2020
@Alvin1Zhang
Copy link
Collaborator

Thanks for reporting, we will look into. Thanks.

@prasad-alatkar
Copy link
Contributor

Hi @bfriedkin , iPhone/Android devices use RPA addresses by default, which change periodically or after Bluetooth on-off. Now once the iPhone changes address, then it's treated as a new device and hence you are observing pairing event for each Bluetooth on-off cycle. However the RPA address can be resolved using IRKs (Identity Resolving keys). For that you may need to add, BLE_SM_PAIR_KEY_DIST_ID in ble_hs_cfg.sm_our_key_dist & ble_hs_cfg.sm_their_key_dist. I am attaching a reference patch based on bleprph example, please try out the patch and let me know if the issue gets resolved : bleprph_IRK_distribute.txt. If you still face the issue please provide complete debug log. You can enable debug log option by make menuconfig ---> Component config ---> Log output ---> Default log verbosity to Debug.

@bfriedkin
Copy link
Author

@prasad-alatkar -

Thank you for the quick reply and patch. Unfortunately the patch seems to make the issue trigger right away. Perhaps that provides a clue? Here are my latest steps to reproduce:

  • Erase ESP32 flash
  • Build bleprph app with patch
  • Reset iPhone network settings
  • Build and run the bleprph app
  • Launch LightBlue, connect to the nimble-bleprph app and read the random number generator characteristic

That triggers the pairing sequence which succeeds. But then on my iPhone:

  • In LightBlue back out of the service view and disconnect from the peripheral
  • In LightBlue and try to connect once again to the nimble-bleprph.

At this point LightBlue immediately displays a Disconnect alert. In the LightBlue log the error is "Peer removed pairing information". The attached irk-log.txt file contains the log.

I then reverted the patch and tried the same. Without the patch, I am able to back out of the service view, reconnect with the peripheral and read the characteristic. After the one-time bonding, each subsequent time I reconnect and read the characteristic there is no pairing dialog. This is what I would expect. Unfortunately if I then toggle Bluetooth off and on, I experience the same disconnect failure in LightBlue. The attached no-irk-log.txtfile contains the log.

I've also attached the bleprph app I used in the first trial. Note that I do have LE Secure Connections enabled. In the original bleprph app the key_dist settings were not set when ble_hs_cfg.sm_sc is set to 1.

Looking forward to next steps.

Regards,
Brian

no-irk-log.txt
irk-log.txt
bleprph.zip

@prasad-alatkar
Copy link
Contributor

Hi @bfriedkin Can you please confirm if IDF commit: 04e3cf4 is present in your git repo ? If not Can you please pull in latest changes from release/v3.3 branch ? The reason for this is PR!7 has already fixed similar issue. If this doesn't resolve your issue, Can you please try applying this small patch on NimBLE submodule i.e. on $IDF_PATH/components/nimble/nimble and provide me the debug logs ? :
NimBLE_RPA_not_resolved_Debug_logs.txt

In the original bleprph app the key_dist settings were not set when ble_hs_cfg.sm_sc is set to 1.

Thank you for pointing out, I will fix this.

@bfriedkin
Copy link
Author

@prasad-alatkar -

Thank you. I will confirm the commit and try the patches today.

Regarding LE Secure Connections, should I still be running with the prior bleprph_IRK_distribute.txt patch that sets the key_dist config fields?

Regards,
Brian

@bfriedkin
Copy link
Author

@prasad-alatkar -

I merged commit 04e3cf4 into ESP-IDF v3.3.2. I also applied your NimBLE_RPA_not_resolved_Debug_logs.txt patch. Your patch didn't originally build, but I was able to fix the problems and run the bleprph app. I have the bleprph app configured to use the bleprph_IRK_distribute.txt patch.

The good news is that PR!7 does seem to fix both the disconnect problem and subsequent failures in the ble key store. I was able to toggle Bluetooth off/on on my iPhone and the device reconnected without requiring another passkey entry. I was able to read the encrypted random number characteristic in LightBlue.

The bad news is that with this patch applied, there is a new problem: Failure reading encrypted characteristic. Steps to reproduce:

  • Launch bleprph on ESP32
  • Clear Bluetooth settings on iPhone and launch LightBlue
  • Connect and pair with bleprph
  • Disconnect from bleprph and remove bleprph bond from iPhone
  • Return back to LightBlue, connect to bleprph, pair and read the encrypted random number characteristic

At this point the iPhone fails to read the characteristic. The IDF log at this point shows the following:

ble_hs_hci_evt_acl_process(): conn_handle=0 pb=2 len=7 data=0x03 0x00 0x04 0x00 0x0a 0x0c 0x00 
rxed att command: read req; conn=0 handle=0x000c
looking up peer sec; peer_addr_type=0 peer_addr=0x50 0x83 0xd5 0x1a 0xab 0x84  
txed att command: error rsp; conn=0 req_op=10 handle=0x000c error_code=5
host tx hci data; handle=0 length=9
ble_hs_hci_acl_tx(): 0x00 0x00 0x09 0x00 0x05 0x00 0x04 0x00 0x01 0x0a 0x0c 0x00 0x05 
Number of Completed Packets: num_handles=1
handle:0 pkts:1

I've attached the complete log showing the successful re-pair and reconnects. The failure above can be found at the end of the log.

Regards,
Brian

rpa-not-resolved.txt

@h2zero
Copy link
Contributor

h2zero commented Jul 4, 2020

In your log just before the error:

encryption change event; status=1035 handle=0 <...>  encrypted=0 authenticated=0 bonded=0

status=1035 means: BLE_SM_ERR_DHKEY Indicates to the remote device that the DHKey Check value received doesn’t match the one calculated by the local device.

Which happened because you deleted the bond from your phone, which would trigger the repeat pairing event on the peripheral and in the example it deletes the bond locally.

After this you need to reconnect to re-pair the devices and it should work. Would be nice if that wasn't necessary but it seems to be at this point.

@bfriedkin
Copy link
Author

@h2zero -

Thank you for helping here and good catch on the encryption error. What you describe in terms of the DHKey Check value makes sense and is consistent with the error message I see in the LightBlue log on the phone. Unfortunately I still bump into problems even after trying to reconnect and re-pair from the phone.

I was also able to confirm that the BLE_GAP_EVENT_REPEAT_PAIRING event is triggered before the failure. In the attached log I added a trace in the event handler code:

ESP_LOGI(tag, "Got BLE_GAP_EVENT_REPEAT_PAIRING event");

That trace appears in the attached NimBLE log.

I spent a little more time today trying to further isolate when the failure happens. With the 04e3cf4 commit merged into ESP-IDF 3.3.2, I found that the encryption failure always occurs after you remove the bond on the phone and toggle Bluetooth off and on. I believe this is because the iPhone changes it's BT address.

In today's testing I started fresh, erasing the ESP32 flash and did the following:

  1. Read encrypted characteristic in LightBlue
  2. Disconnect from bleprph, remove bond on phone and read encrypted characteristic again
  3. Disconnect from bleprph, remove bond on phone, turn Bluetooth off/on on phone and read encrypted characteristic again

The third try always fails with the BLE_SM_ERR_DHKEY error.

I then built the same service on top of the ESP32/Bluedroid and repeated the same steps above. No failure was reproduced. When running on top of Bluedroid, I notice that the pairing dialog displays on the iPhone before I try to read the characteristic, as opposed to when I read the characteristic using NimBLE. I have seen this type of behavior before and assume it is unrelated to the issue at hand.

Both the NimBLE and Bluedroid tests were run on the same ESP32 hardware connected to the same iPhone. I therefore believe the problem lies in the NimBLE implementation. A mobile client should not have to reconnect and/or delete bonds for this scenario to succeed.

I've attached both NimBLE and Bluedroid logs for reference. The Bluedroid log is a little thin... I can increase the verbosity if that would help at all.

@prasad-alatkar - I am hoping this additional information helps to better isolate the failure. Crossing my fingers for a fix...

Regards,
Brian

nimble-three-tries.txt
bluedroid-three-tries.txt

@h2zero
Copy link
Contributor

h2zero commented Jul 6, 2020

I can confirm and repro this, but also confirm that after the failure to pair/read, disconnecting and reconnecting will allow pairing to succeed. I'm unsure as to why this happens but it may be something that needs to be addressed upstream, @prasad-alatkar would know more about that though.

When running on top of Bluedroid, I notice that the pairing dialog displays on the iPhone before I try to read the characteristic, as opposed to when I read the characteristic using NimBLE.

This is how it works with NimBLE and is more inline with the BLE core spec as the database is allowed to be inspected but values are what is protected. Just thought I'd share that while I'm posting. If you want to have similar behavior to bluedroid you can call ble_gap_security_initiate() in the connect event.

@prasad-alatkar
Copy link
Contributor

Hi @bfriedkin Thank you for putting detailed description of the issue. Can you please try the attached patch and let me know if it resolves the issue ? Patch: NimBLE_RPA_fix_identity_change.txt

The patch needs to be applied on NimBLE submodule ($IDF_PATH/components/nimble/nimble). As already discussed above, the BLE_SM_ERR_DHKEY failure comes after deleting the existing bond so as to continue with pairing request received from peer. Since we have resolved identity of peer device (RPA address) , we change peer address to its identity address. Now once we delete the bond because of repeated pairing request, we are still using the "changed" peer address and that is the reason for this issue. The attached patch fixes this bahavior.

@bfriedkin
Copy link
Author

@prasad-alatkar -

Thank you for your continued focus here. I tried the additional patch. Unfortunately that leads to a crash when I reconnect the iPhone after deleting the bond. It looks like the crash occurs when the app calls ble_store_util_delete_peer() from the BLE_GAP_EVENT_REPEAT_PAIRING event handler. Here is the call stack:

 Replace Identity addr with random addr received at start of the connection
Guru Meditation Error: Core  0 panic'ed (StoreProhibited). Exception was unhandled.
Core 0 register dump:
PC      : 0x400f5597  PS      : 0x00060b30  A0      : 0x800f5818  A1      : 0x3ffc7100  
0x400f5597: ble_rpa_replace_id_with_rand_addr at /Users/brianfriedkin/esp32/esp-idf-3.3.2/components/nimble/nimble/nimble/host/src/ble_hs_resolv.c:204

A2      : 0x3ffc7130  A3      : 0x3ffc2fb1  A4      : 0x3ffc2f9c  A5      : 0x3ffc2fa4  
A6      : 0x00000006  A7      : 0x00000000  A8      : 0x00000000  A9      : 0x3ffc70e0  
A10     : 0x00000000  A11     : 0x00000000  A12     : 0x00000006  A13     : 0x3ffc7107  
A14     : 0x00000084  A15     : 0x00000084  SAR     : 0x00000018  EXCCAUSE: 0x0000001d  
EXCVADDR: 0x00000007  LBEG    : 0x4000c28c  LEND    : 0x4000c296  LCOUNT  : 0x00000000  

ELF file SHA256: 3441d1da177dd7e8523373318e5b7c370af6a13068134c3a9998dacf354a1e68

Backtrace: 0x400f5597:0x3ffc7100 0x400f5815:0x3ffc7130 0x400f9daa:0x3ffc7160 0x400dc76b:0x3ffc71a0 0x400eeda9:0x3ffc7220 0x400eef33:0x3ffc7240 0x400f02c5:0x3ffc7270 0x400f724c:0x3ffc72d0 0x400f7842:0x3ffc7350 0x400f7c66:0x3ffc7390 0x400f4b37:0x3ffc73d0 0x400f2c39:0x3ffc7400 0x400f2c4b:0x3ffc7420 0x400fa0ee:0x3ffc7440 0x400dc422:0x3ffc7460 0x4008fe22:0x3ffc7480
0x400f5597: ble_rpa_replace_id_with_rand_addr at /Users/brianfriedkin/esp32/esp-idf-3.3.2/components/nimble/nimble/nimble/host/src/ble_hs_resolv.c:204

0x400f5815: ble_hs_resolv_list_rmv at /Users/brianfriedkin/esp32/esp-idf-3.3.2/components/nimble/nimble/nimble/host/src/ble_hs_resolv.c:619

0x400f9daa: ble_store_util_delete_peer at /Users/brianfriedkin/esp32/esp-idf-3.3.2/components/nimble/nimble/nimble/host/src/ble_store_util.c:239

0x400dc76b: bleprph_gap_event at /Users/brianfriedkin/esp32/esp-idf-3.3.2/examples/bluetooth/nimble/bleprph/main/main.c:235

0x400eeda9: ble_gap_call_event_cb at /Users/brianfriedkin/esp32/esp-idf-3.3.2/components/nimble/nimble/nimble/host/src/ble_gap.c:4893

0x400eef33: ble_gap_call_conn_event_cb at /Users/brianfriedkin/esp32/esp-idf-3.3.2/components/nimble/nimble/nimble/host/src/ble_gap.c:4893

0x400f02c5: ble_gap_repeat_pairing_event at /Users/brianfriedkin/esp32/esp-idf-3.3.2/components/nimble/nimble/nimble/host/src/ble_gap.c:4893

0x400f724c: ble_sm_chk_repeat_pairing at /Users/brianfriedkin/esp32/esp-idf-3.3.2/components/nimble/nimble/nimble/host/src/ble_sm.c:2561

0x400f7842: ble_sm_pair_req_rx at /Users/brianfriedkin/esp32/esp-idf-3.3.2/components/nimble/nimble/nimble/host/src/ble_sm.c:2561

0x400f7c66: ble_sm_rx at /Users/brianfriedkin/esp32/esp-idf-3.3.2/components/nimble/nimble/nimble/host/src/ble_sm.c:2561

0x400f4b37: ble_hs_hci_evt_acl_process at /Users/brianfriedkin/esp32/esp-idf-3.3.2/components/nimble/nimble/nimble/host/src/ble_hs_hci_evt.c:886

0x400f2c39: ble_hs_process_rx_data_queue at /Users/brianfriedkin/esp32/esp-idf-3.3.2/components/nimble/nimble/porting/npl/freertos/include/nimble/nimble_npl_os.h:109

0x400f2c4b: ble_hs_event_rx_data at /Users/brianfriedkin/esp32/esp-idf-3.3.2/components/nimble/nimble/porting/npl/freertos/include/nimble/nimble_npl_os.h:109

0x400fa0ee: ble_npl_event_run at /Users/brianfriedkin/esp32/esp-idf-3.3.2/components/nimble/nimble/porting/nimble/src/nimble_port.c:132
 (inlined by) nimble_port_run at /Users/brianfriedkin/esp32/esp-idf-3.3.2/components/nimble/nimble/porting/nimble/src/nimble_port.c:82

0x400dc422: bleprph_host_task at /Users/brianfriedkin/esp32/esp-idf-3.3.2/examples/bluetooth/nimble/bleprph/main/main.c:337

0x4008fe22: vPortTaskWrapper at /Users/brianfriedkin/esp32/esp-idf-3.3.2/components/freertos/port.c:403

I've also attached the full log here as well as my copy of the bleprph app.

Regards,
Brian

NimBLE_RPA_fix_identity_change.log

bleprph.zip

@prasad-alatkar
Copy link
Contributor

@bfriedkin I am attaching modified patch to try, please try it out. I am a little afraid that this patch may not help you overcome the issue, but the crash. Please let me know your observations after trying out this patch, If the issue still persists I may need to try with iPhone instead of Android phone. Patch to try out:
NimBLE_RPA_fix_identity_change_2.txt

@bfriedkin
Copy link
Author

@prasad-alatkar -

Today I built with the following:

I ran the following tests:

  1. Read encrypted characteristic in LightBlue
  2. Disconnect from bleprph, remove bond on phone and read encrypted characteristic again
  3. Disconnect from bleprph, remove bond on phone, turn Bluetooth off/on on phone and read encrypted characteristic again
  4. Disconnect from bleprph, turn Bluetooth off/on on phone and read encrypted characteristic again

Tests 1-3 succeeded. Unfortunately test 4 failed. Turning Bluetooth off and back on on the iPhone but keeping the bond to the ESP32 device triggered the immediate disconnect failure.

I've attached the log here. Note that I tried step 4 two times before terminating the device app and copying the log.

Good idea to try with an iPhone. Hoping you can do that and more easily reproduce/resolve the issue. This issue will need to be resolved on both iOS and Android anyway...

Regards,
Brian

NimBLE_RPA_fix_identity_change_2.log

@h2zero
Copy link
Contributor

h2zero commented Jul 8, 2020

I can confirm the issue with the patch on my iPhone as described above.

I made a small addition to the patch that seems to have resolved it.
@bfriedkin could you give this a try?

NimBLE_RPA_fix_identity_change_h2zero.txt

Edit: wrong file 😢

@bfriedkin
Copy link
Author

@h2zero -

Thank you for your continued support here. I removed the NimBLE_RPA_fix_identity_change_2.txt and applied the NimBLE_RPA_fix_identity_change_h2zero.txt patch. Unfortunately that didn't resolve issue (4) above. The connection fails on the phone. The LightBlue log shows Error: Peer removed pairing information.

I've attached the log corresponding to tests (1)-(4) above.

Regards,
Brian

NimBLE_RPA_fix_identity_change_h2zero.log

@h2zero
Copy link
Contributor

h2zero commented Jul 8, 2020

@bfriedkin You're welcome, I'd like to see this fixed as well.

In your last 2 logs I see disconnect; reason=19 which is BLE_HS_ETIMEOUT_HCI : HCI request timed out; controller unresponsive. It seems the controller is not responding after sending it the LE Long Term Key Requested Negative Reply command.

I just realized you're on idf v3.3.2, I tested that patch on v4.0.1.

Thanks for testing and logging 😄

@bfriedkin
Copy link
Author

Hello -

Just to underscore what @h2zero pointed out, we need a fix for ESP-IDF v3.3.2. We have products built on that SDK version that we cannot currently migrate to v4.x. Furthermore, v3.3.2 is a Long Term Support release that will be supported until March 2022.

@h2zero - I appreciate you independently confirming the failures. Lots of moving parts here with the steps to reproduce and multiple patches.

@prasad-alatkar - Hoping today's discussion helps to get us closer to a solution.

Regards,
Brian

@prasad-alatkar
Copy link
Contributor

Hi @bfriedkin , I was able to reproduce the issue with iPhone, however I do see the issue is not iPhone specific, iPhone just disconnects continuously whereas Android device deletes the saved bond and re-pairs again. There have been lots of patches and trials here, so I will try to put conclusive solution here:

  1. We need BLE_SM_PAIR_KEY_DIST_ID enabled in their & our key distribution. This is achievable from bleprph_IRK_distribute.txt patch shared earlier.
  2. I am attaching NimBLE submodule patch here (0001-NimBLE_RPA_fix_identity_change_repairing_failure.txt):
    0001-NimBLE_RPA_fix_identity_change_repairing_failure.txt , please disregard all other previous patches.
  3. IDF commit 04e3cf4 should be present in your code repo.

Regarding

We have products built on that SDK version that we cannot currently migrate to v4.x. Furthermore, v3.3.2 is a Long Term Support release that will be supported until March 2022.

No need to migrate to v4.x, for that matter v4.x will also need this fix. The fix will be available in next release/v3.3.x so you may need to upgrade to next 3.3.x release or cherry-pick the relevant commits. Please let me know if it resolves your issue.

Regards,
Prasad

@bfriedkin
Copy link
Author

@prasad-alatkar -

I followed your steps (1) - (3) above, but I applied the 0001-NimBLE_RPA_fix_identity_change_repairing_failure.txt patch after moving the files from IDF commit 04e3cf4 into the nimble component. I started from a fresh clone of v3.3.2:

git clone -b v3.3.2 --recursive https://github.com/espressif/esp-idf.git

Unfortunately I see the same failure on iPhone. The connection fails on the iPhone when the bond is saved on the iPhone but Bluetooth is toggled off and then back on.

I've attached a diff of my nimble component directory that shows the changed files. I am running the bleprph_IRK_distribute.txt patch in the bleprph app. I am running these tests on a 2020 iPhone SE running iOS 13.5.1. Please let me know if I missed any steps. In these tests I did not erase the flash before starting.

I've attached my copy of the bleprph example app and the log from this set of tests. Towards the end of the log I see a few potential failures:

LE LTK Req. handle=0 rand=00 encdiv=0
looking up our sec; peer_addr_type=1 peer_addr=0x9c 0x40 0xb7 0xe9 0x84 0x42  ediv=0x00 rand=0x0 
ble_hs_hci_cmd_send: ogf=0x08 ocf=0x001b len=2
0x1b 0x20 0x02 0x00 0x00 
Command complete: cmd_pkts=4 ogf=0x8 ocf=0x1b status=0 handle=0
disconnect; reason=19 handle=0 our_ota_addr_type=0 our_ota_addr=24:6f:28:19:d2:52 our_id_addr_type=0 our_id_addr=24:6f:28:19:d2:52 peer_ota_addr_type=1 peer_ota_addr=42:84:e9:b7:40:9c peer_id_addr_type=1 peer_id_addr=42:84:e9:b7:40:9c conn_itvl=24 conn_latency=0 supervision_timeout=72 encrypted=0 authenticated=0 bonded=0

error setting advertisement data; rc=22
Resetting state; reason=19
looking up peer sec; 
failed to configure restored IRK
looking up peer sec; 
looking up peer sec; 
Device Address: 24:6f:28:19:d2:52
ble_hs_hci_cmd_send: ogf=0x08 ocf=0x0007 len=0
0x07 0x20 0x00 
Command complete: cmd_pkts=5 ogf=0x8 ocf=0x7 status=0 

Regards,
Brian

nimble.patch.zip
bleprph.zip

@prasad-alatkar
Copy link
Contributor

Hi @bfriedkin , I have confirmed the issue gets resolved with the patch for release/v3.3.2 and release/v3.3. I would recommend you to follow below mentioned steps to be on the same page:

  1. git checkout v3.3.2
  2. git submodule update --init
  3. Make the changes to example code from bleprph_IRK_distribute.txt patch i.e. git apply bleprph_IRK_distribute.patch
  4. Go to components/nimble/nimble and:
    • git checkout nimble-1.1.0-idf-v3.3
    • git pull
    • git apply 0001-NimBLE_RPA_fix_identity_change_repairing_failure.patch, this will bring nimble submodule pointer to a9239c6c
  5. Go to bleprph example:
    • make -j8 erase_flash flash monitor
  6. Repeat the test steps 1 to 4 you mentioned earlier here: NimBLE - immediate disconnect when BLE key store fills and device changes address (IDFGH-3586) #5530 (comment), you should have your issue resolved !!

@bfriedkin
Copy link
Author

bfriedkin commented Jul 12, 2020

@prasad-alatkar -

I have mostly good news. :-) Following you steps above I was able to successfully repeat the test steps 1 to 4 on my iPhone without any failures! Apparently I previously didn't apply the nimble-1.1.0-idf-v3.3 patch correctly. Thank you for your patience. The detailed steps really helped.

Unfortunately when I tested our BLE peripheral application on Android (not the bleprph example app), I am getting a reproducible crash related to the re-pairing operation. Our application has more services and characteristics, though that doesn't seem particularly relevant here. Steps to reproduce:

  1. Reset all network settings on Android device
  2. Erase device flash
  3. Launch LightBlue
  4. Open peripheral
  5. Read encrypted characteristic, accepting all pairing dialogs on the Android device
  6. Back out and close connection
  7. Navigate to Android Settings -> Bluetooth
  8. In the Paired devices section, "Forget" the ESP32 paired device
  9. Repeat steps 3 to 5

At this point the ESP32 app asserts and drops into GDB. Here is the relevant part of the log output:

 Replace Identity addr with random addr received at start of the connection
assertion "ble_hs_locked_by_cur_task()" failed: file "/Users/brianfriedkin/esp32/esp-idf/components/nimble/nimble/nimble/host/src/ble_hs_conn.c", line 303, function: ble_hs_conn_find_by_addr
[android-remove-bond-crash.log](https://github.com/espressif/esp-idf/files/4907801/android-remove-bond-crash.log)

abort() was called at PC 0x400d355b on core 0
0x400d355b: __assert_func at /Users/ivan/e/newlib_xtensa-2.2.0-bin/newlib_xtensa-2.2.0/xtensa-esp32-elf/newlib/libc/stdlib/../../../.././newlib/libc/stdlib/assert.c:63 (discriminator 8)

ELF file SHA256: 3e39da5ef1e1ec6667cac80ad346f7261f3f60bc1febcbbb6cfc0720d738f213

Backtrace: 0x4008ca24:0x3ffd0610 0x4008cc4d:0x3ffd0630 0x400d355b:0x3ffd0650 0x40114bba:0x3ffd0680 0x40111fe7:0x3ffd06a0 0x40112271:0x3ffd06d0 0x40105a7a:0x3ffd0700 0x400e2b0d:0x3ffd0740 0x4010d468:0x3ffd0790 0x4010d5f3:0x3ffd07b0 0x4010eae9:0x3ffd07e0 0x401067fc:0x3ffd0840 0x40106f16:0x3ffd08c0 0x401073ea:0x3ffd0900 0x40109248:0x3ffd0940 0x40108791:0x3ffd0970 0x401087a3:0x3ffd0990 0x40113bb6:0x3ffd09b0 0x400e4cb3:0x3ffd09d0 0x4008cf96:0x3ffd09f0
0x4008ca24: invoke_abort at /Users/brianfriedkin/esp32/esp-idf/components/esp32/panic.c:715

0x4008cc4d: abort at /Users/brianfriedkin/esp32/esp-idf/components/esp32/panic.c:715

0x400d355b: __assert_func at /Users/ivan/e/newlib_xtensa-2.2.0-bin/newlib_xtensa-2.2.0/xtensa-esp32-elf/newlib/libc/stdlib/../../../.././newlib/libc/stdlib/assert.c:63 (discriminator 8)

0x40114bba: ble_hs_conn_find_by_addr at /Users/brianfriedkin/esp32/esp-idf/components/nimble/nimble/nimble/host/src/ble_hs_conn.c:93
0x40111fe7: ble_rpa_replace_id_with_rand_addr at /Users/brianfriedkin/esp32/esp-idf/components/nimble/nimble/nimble/host/src/ble_hs_resolv.c:200
0x40112271: ble_hs_resolv_list_rmv at /Users/brianfriedkin/esp32/esp-idf/components/nimble/nimble/nimble/host/src/ble_hs_resolv.c:617
0x40105a7a: ble_store_util_delete_peer at /Users/brianfriedkin/esp32/esp-idf/components/nimble/nimble/nimble/host/src/ble_store_util.c:239
0x400e2b0d: nimble_gap_event at /Users/brianfriedkin/Projects/moddable/modules/network/ble/nimble/modBLEServer.c:739
0x4010d468: ble_gap_call_event_cb at /Users/brianfriedkin/esp32/esp-idf/components/nimble/nimble/nimble/host/src/ble_gap.c:4632
0x4010d5f3: ble_gap_call_conn_event_cb at /Users/brianfriedkin/esp32/esp-idf/components/nimble/nimble/nimble/host/src/ble_gap.c:4632
0x4010eae9: ble_gap_repeat_pairing_event at /Users/brianfriedkin/esp32/esp-idf/components/nimble/nimble/nimble/host/src/ble_gap.c:4632
0x401067fc: ble_sm_chk_repeat_pairing at /Users/brianfriedkin/esp32/esp-idf/components/nimble/nimble/nimble/host/src/ble_sm.c:196
0x40106f16: ble_sm_pair_req_rx at /Users/brianfriedkin/esp32/esp-idf/components/nimble/nimble/nimble/host/src/ble_sm.c:1861
0x401073ea: ble_sm_rx at /Users/brianfriedkin/esp32/esp-idf/components/nimble/nimble/nimble/host/src/ble_sm.c:2648
0x40109248: ble_hs_hci_evt_acl_process at /Users/brianfriedkin/esp32/esp-idf/components/nimble/nimble/nimble/host/src/ble_hs_hci_evt.c:883
0x40108791: ble_hs_process_rx_data_queue at /Users/brianfriedkin/esp32/esp-idf/components/nimble/nimble/porting/npl/freertos/include/nimble/nimble_npl_os.h:109
0x401087a3: ble_hs_event_rx_data at /Users/brianfriedkin/esp32/esp-idf/components/nimble/nimble/porting/npl/freertos/include/nimble/nimble_npl_os.h:109

Our repeat pairing event handler case looks like this:

		case BLE_GAP_EVENT_REPEAT_PAIRING:
			// delete old bond and accept new link
			if (0 == ble_gap_conn_find(event->repeat_pairing.conn_handle, &desc))
				ble_store_util_delete_peer(&desc.peer_id_addr);
			return BLE_GAP_REPEAT_PAIRING_RETRY;

I've attached the complete log here. Hoping this is an easy fix. Perhaps there is a missing ble_hs_lock() and ble_hs_unlock() set of calls. Assuming that is the case, I would appreciate having a new patch to replace the 0001-NimBLE_RPA_fix_identity_change_repairing_failure.patch.

Regards,
Brian

android-remove-bond-crash.log

@prasad-alatkar
Copy link
Contributor

@bfriedkin yes, you are right regarding ble_hs_lock. I am attaching reworked patch which should resolve the issue.
0002-NimBLE_RPA_fix_identity_change_repairing_failure.txt

@bfriedkin
Copy link
Author

@prasad-alatkar -

The 0002-NimBLE_RPA_fix_identity_change_repairing_failure.txt patch along with the nimble-1.1.0-idf-v3.3 "patch" seems to resolve the issue for us. I tested on both iOS and Android using the BLE peripheral setup from one of our products. Nice work. Glad we were able to get this handled together.

Because we have products that require this fix, I'd appreciate any information you can provide related to how and when these patches will be publicly available. Will these fixes be back-ported to ESP-IDF v3.3.2 or do we need to wait for the next v3.3.x release? Can I be notified in either case?

Regards,
Brian

@prasad-alatkar
Copy link
Contributor

Hi @bfriedkin , glad that the issue is resolved. So the fix will be first merged into master branch and subsequently it will be backported till release/v3.3 branch. The fix will be made available in the next v3.3.x release, I don't think it will be backported to release/v3.3.2(cc: @Alvin1Zhang ). I will notify you here once the fix is merged. Just FYI, the details of esp-idf releases are shared here: https://github.com/espressif/esp-idf/releases

@bfriedkin
Copy link
Author

@prasad-alatkar -

Unfortunately we've bumped into another issue that needs to be resolved. It seems possibly related to these bonding fixes, so I'm reporting the issue here.

The problem is that when the CCCD store (NVS on ESP32) fills up, the functions that purges the store, starting from the Storage Status callback, eventually fails. This is triggered by the client subscribing for notifications, when the client/server are bonded and the CCCD store is full. It can be a little challenging to reproduce, but here are the basic steps:

  1. Erase ESP32 flash
  2. Run the attached bleprph_cccd app
  3. Connect to the peripheral from a mobile client
  4. Pair/bond with the peripheral by browsing the encrypted characteristic
  5. Open the static characteristic and subscribe for notifications
  6. Back out of the connection, toggle Bluetooth off/on and/or remove the bond on the mobile client
  7. Repeat steps (3) to (6) above.

I am using LightBlue as the client. The bleprph_cccd app is a lightly modified version of the bleprph app that simply adds the notify property to the static characteristic. The app sdkconfig specifies CONFIG_NIMBLE_MAX_CCCDS=3 to help trigger the problem more quickly. Basically, to trigger the failure, you want to fill the CCCD store by having the phone change its address and/or creating new bonds. After that happens the problem triggers fairly quickly. The bug leads to a failure in our product app when the client isn't able to successfully register for notifications.

I've attached a log that shows the problem. To help pinpoint the failure, I've also added log output to a few of the NimBLE modules, which I've also attached here. Recall I am running ESP-IDF v3.3.2 with the patches provided.

Looking at the log, the first failure occurs here at line 901:

subscribe event; conn_handle=0 attr_handle=8 reason=1 prevn=0 curn=0 previ=0 curi=1
error persisting cccd; too many entries (3)
In status cb, BLE_STORE_OBJ_TYPE_CCCD 
in ble_gap_unpair_oldest_except
In ble_store_util_bonded_peers, max_peers = 3
looking up our sec; 
looking up our sec; 
too many peers, i = 1, num_peers = 1
In status cb, ble_gap_unpair_oldest_except result: 6

We are using the stock ble_store_util_status_rr() callback function. Notice the error 6 returned from the ble_gap_unpair_oldest_except() function.

At line 1447 we see a case that succeeds:

subscribe event; conn_handle=0 attr_handle=8 reason=1 prevn=0 curn=0 previ=0 curi=1
error persisting cccd; too many entries (3)
In status cb, BLE_STORE_OBJ_TYPE_CCCD 
in ble_gap_unpair_oldest_except
In ble_store_util_bonded_peers, max_peers = 3
looking up our sec; 
looking up our sec; 
ble_gap_unpair
...
D (203136) NIMBLE_NVS: Deleting CCCD, nvs idx = 1
...
In status cb, ble_gap_unpair_oldest_except result: 0
...
D (203316) NIMBLE_NVS: Persisting CCCD value in NVS...

Finally at line 1588 there is another failure, but with a different trigger:

subscribe event; conn_handle=0 attr_handle=14 reason=1 prevn=0 curn=1 previ=0 curi=0
error persisting cccd; too many entries (3)
In status cb, BLE_STORE_OBJ_TYPE_CCCD 
in ble_gap_unpair_oldest_except
In ble_store_util_bonded_peers, max_peers = 3
looking up our sec; 
num_peers == 0
In status cb, ble_gap_unpair_oldest_except result: 5

I hope these notes, log and test app help you more quickly isolate and resolve the problem.

esp-cccd.zip

Regards,
Brian

@prasad-alatkar
Copy link
Contributor

prasad-alatkar commented Jul 20, 2020

Hi @bfriedkin, unfortunately I could not reproduce the issue. Either I am missing something here or we are not at same NimBLE pointer. Can you please confirm if your NimBLE submodule pointer is updated to nimble-1.1.0-idf-v3.3. Below are steps to follow to bring NimBLE submodule to latest:

  1. Go to components/nimble/nimble
  2. git checkout nimble-1.1.0-idf-v3.3
  3. git pull

Now nimble submodule pointer should point to a9239c6c or you can simply try to reproduce the issue on release/v3.3 (with patch 0002-NimBLE_RPA_fix_identity_change_repairing_failure.txt applied). Basically I want to make sure that the NimBLE submodule has these commits: 0d46b272, ca299e70 present. If you are already on latest NimBLE submodule pointer, Can you please provide more details on sequence of steps to follow to reproduce the issue ?

@bfriedkin
Copy link
Author

@prasad-alatkar -

Sorry to hear you are having trouble reproducing the issue.

Git seems to think that I am pointing at the correct submodule:

Brians-MacBook-Air:esp-idf brianfriedkin$ cd components/nimble/nimble/
Brians-MacBook-Air:nimble brianfriedkin$ git log
commit a9239c6ca5e8f6dc72792ac716a8fe3ffad08b1e (HEAD -> nimble-1.1.0-idf-v3.3, origin/nimble-1.1.0-idf-v3.3)
Merge: 5364a96f 69bdb87c
Author: Hrishikesh Dhayagude <hrishi@espressif.com>
Date:   Wed May 20 17:43:25 2020 +0800

In that same log, I see the two commits you reference above:

commit 0d46b2724e80591af7d09fcbff8230cff6109c64
Author: Prasad Alatkar <prasad.alatkar@espressif.com>
Date:   Tue Apr 28 23:40:40 2020 +0530

    NimBLE : Change peer_addr type to PUBLIC in case of Host based privacy
    
    - When Identity of peer is resolved and peer address is RPA then NimBLE stack
      changes the peer_addr type to PUBLIC_ID(0x02); in Host based privacy it
      should not be done as it can result in mismatch while comparing stored
      addresses especially in case of CCCD storage.

commit ca299e7065ba9d11f5a9aed41b0adf3a9c65daf9
Author: Prasad Alatkar <prasad.alatkar@espressif.com>
Date:   Tue Apr 28 23:28:38 2020 +0530

    NimBLE store: Fix bug in NVS while retrieving matching index in get_nvs_db_attribute

I am running with the 0002-NimBLE_RPA_fix_identity_change_repairing_failure.txt patch applied in addition to the bleprph_IRK_distribute.txt patch applied to the previously attached bleprph_cccd app in the esp-cccd.zip archive. Basically I am running the same set of patches you recommended leading up to my comment on 16 July. Again I am running the patches against ESP-IDF v3.3.2.

While problem is difficult to reproduce, it does occur. There are two different failures, which I've already documented above. Filling up the CCCD store along with the iPhone changing its address (RPA) seem to trigger the failure. I spent some time today trying to find a simple set of steps. While I wasn't able to find a simple way to reproduce the failure, I did reproduce the failure by doing the following in the previously attached bleprph_cccd app. I am using LightBlue as the mobile client on iOS:

  1. Build bleprph_cccd app: make erase_flash flash monitor
  2. Open LightBlue, open brian-bleprph and browse and read the encrypted characteristic 5c3a659e-897e-45e1-b016-007107c96df6. Pair and bond.
  3. Back out one level
  4. Browse the static characteristic 5c3a659e-897e-45e1-b016-007107c96df7 and Listen for Notifications
  5. Back out two levels to disconnect from the peripheral
  6. Navigate to iPhone Settings -> Bluetooth and Forget brian-bleprph
  7. Open LightBlue, open brian-bleprph and browse and read the encrypted characteristic 5c3a659e-897e-45e1-b016-007107c96df6. Pair and bond.
  8. Back out one level
  9. Browse the static characteristic 5c3a659e-897e-45e1-b016-007107c96df7 and Listen for Notifications
  10. Back out two levels to disconnect from the peripheral
  11. Navigate to iPhone Settings -> Bluetooth and toggle Bluetooth off/on
  12. Open LightBlue, browse the static characteristic 5c3a659e-897e-45e1-b016-007107c96df7 and Listen for Notifications
  13. Back out two levels to disconnect from the peripheral
  14. Navigate to iPhone Settings -> Bluetooth and Forget brian-bleprph and toggle Bluetooth off/on
  15. Open LightBlue, open brian-bleprph and browse and read the encrypted characteristic 5c3a659e-897e-45e1-b016-007107c96df6. Pair and bond.
  16. Back out one level
  17. Browse the static characteristic 5c3a659e-897e-45e1-b016-007107c96df7 and Listen for Notifications
  18. Back out two levels to disconnect from the peripheral
  19. Open LightBlue, open brian-bleprph and browse the static characteristic 5c3a659e-897e-45e1-b016-007107c96df7 and Listen for Notifications

(Note at this point two of the three CCCD slots are filled)

  1. Back out two levels to disconnect from the peripheral
  2. Navigate to iPhone Settings -> Bluetooth and Forget brian-bleprph
  3. Open LightBlue, open brian-bleprph and browse and read the encrypted characteristic 5c3a659e-897e-45e1-b016-007107c96df6. Pair and bond.
  4. Back out one level
  5. Browse the static characteristic 5c3a659e-897e-45e1-b016-007107c96df7 and Listen for Notifications
  6. Back out two levels to disconnect from the peripheral
  7. Navigate to iPhone Settings -> Bluetooth and Forget brian-bleprph and toggle Bluetooth off/on
  8. Open LightBlue, open brian-bleprph and browse and read the encrypted characteristic 5c3a659e-897e-45e1-b016-007107c96df6. Pair and bond.
  9. Back out one level
  10. Browse the static characteristic 5c3a659e-897e-45e1-b016-007107c96df7 and Listen for Notifications
  11. Navigate to iPhone Settings -> Bluetooth and Forget brian-bleprph
  12. Open LightBlue, open brian-bleprph and browse and read the encrypted characteristic 5c3a659e-897e-45e1-b016-007107c96df6. Pair and bond.
  13. Back out one level
  14. Browse the static characteristic 5c3a659e-897e-45e1-b016-007107c96df7 and Listen for Notifications
  15. Back out one level
  16. Browse and read the encrypted characteristic 5c3a659e-897e-45e1-b016-007107c96df6
  17. Back out two levels to disconnect from the peripheral
  18. Navigate to iPhone Settings -> Bluetooth and toggle Bluetooth off/on
  19. Open LightBlue, open brian-bleprph and browse and read the encrypted characteristic 5c3a659e-897e-45e1-b016-007107c96df6. Pair and bond.
  20. Browse the static characteristic 5c3a659e-897e-45e1-b016-007107c96df7 and Listen for Notifications

At this point, at line 3649 in the attached cccd-failure2.log file, you can see one of the failures:

subscribe event; conn_handle=0 attr_handle=14 reason=1 prevn=0 curn=1 previ=0 curi=0
error persisting cccd; too many entries (3)
In status cb, BLE_STORE_OBJ_TYPE_CCCD 
in ble_gap_unpair_oldest_except
In ble_store_util_bonded_peers, max_peers = 3
looking up our sec; 
looking up our sec; 
too many peers, i = 1, num_peers = 1
In status cb, ble_gap_unpair_oldest_except result: 6
txed att command: error rsp; conn=0 req_op=18 handle=0x000f error_code=6
host tx hci data; handle=0 length=9
ble_hs_hci_acl_tx(): 0x00 0x00 0x09 0x00 0x05 0x00 0x04 0x00 0x01 0x12 0x0f 0x00 0x06 
Number of Completed Packets: num_handles=1
handle:0 pkts:1

Also at this point in LightBlue, the Listen for Notifications text does not change to Stop Listening, presumably due to the error response.

This is one of the two errors I pointed out previously. The logs don't lie. :-) Certainly the problem is difficult to reproduce. I'm hoping given the additional details you can find a quicker way to reproduce the failure.

If you are still having trouble, I would recommend taking a look at the prior logs and modified sources I provided in my previous post. You can easily see where things are failing in the code.

Regards,
Brian

cccd-failure2.txt.zip

@h2zero
Copy link
Contributor

h2zero commented Jul 21, 2020

I haven't tested to confirm this but I was looking through the log and I can see the client takes up all 3 cccd storage slots and they are not all deleted with unpairing.

It happens like this:

  1. Connect > bond > characteristic subscribe > cccd stored > disconnect
  2. Connect > service changed subscribe > cccd stored > characteristic subscribe > cccd stored >disconnect
  3. Delete bond on phone > connect > service changed subscribe > cccd store fails (full) >
    ble_gap_unpair_oldest_except fails with result: 6, because the only bond is the current peer > disconnect
  4. Connect > bond again > deletes old bond from esp and only 1 cccd store > characteristic subscribe >store cccd (now full again) > disconnect
  5. Change phone BLE address > connect > service changed subscribe > cccd store (fails) >
    ble_gap_unpair_oldest_except succeeds (should not happen, maybe called with ota address?) > store cccd (full again) >
    subscibe characteristic > ble_gap_unpair_oldest_except fails with result: 5 because the bond was incorrectly removed.

@prasad-alatkar
Copy link
Contributor

Hi @bfriedkin , extremely sorry to make you go through pain of listing all possible steps, couldn't help as I was not able to reproduce the issue despite trying multiple sequence of steps. After following the steps you mentioned, I was able to reproduce the issue with "iPhone + LightBlue" combination. I mostly have good news here, please try attached patch
0003-NimBLE_RPA_fix_identity_change_repairing_failure.txt , to be applied on NimBLE submodule. This patch also includes changes from previous patch, so applying this patch alone would do. This issue occurs when connection address (ble_hs_conn) is not updated with the changed peer address in "LE connection complete event" along with RPA stuff. Because of this changed/wrong "peer address" there is appearance of extra CCCD in NVS. Most probably the issue should be resolved with the patch, but if you still face the issue then please provide me log with this additional patch :
CCCD_failure_RPA_log.txt

@bfriedkin
Copy link
Author

@prasad-alatkar -

Glad to hear that you were ultimately able to reproduce a problem! Thank you for the patch and instructions.

Based on your analysis thus far, do you anticipate the latest fixes address both of the error cases @h2zero and I outlined above?

Unfortunately I might not be able to give this new patch a try until Thursday (U.S. time). Apologies for the delay. I'll let you know then how our testing goes. If things still fail I'll get you a log built against the CCCD_failure_RPA_log.txt patch.

Regards,
Brian

@prasad-alatkar
Copy link
Contributor

Hi @bfriedkin, @h2zero has not pointed out any other/different issue, rather has tried to summarized the issue. Yes, He is right regarding this comment: "should not happen, maybe called with ota address". This is exactly what I try to fix with the latest patch. So I believe the latest fix addresses the issue, however I will wait for confirmation from your end for the same.

dhrishi pushed a commit to espressif/esp-nimble that referenced this issue Jul 22, 2020
geeksville added a commit to meshtastic/esp-nimble that referenced this issue Jul 23, 2020
@geeksville
Copy link

@prasad-alatkar btw - if it helps, I can confirm that 0003 patch worked nicely to fix RPA android clients connecting to my open-source project.

dhrishi pushed a commit to espressif/esp-nimble that referenced this issue Jul 23, 2020
dhrishi pushed a commit to espressif/esp-nimble that referenced this issue Jul 23, 2020
@bfriedkin
Copy link
Author

bfriedkin commented Jul 23, 2020

@prasad-alatkar -

I am having trouble applying the patch on top of the updated NimBLE submodule. I am applying these changes to ESP IDF v3.3.2 cloned today:

Brians-MacBook-Air:esp32 brianfriedkin$ git clone -b v3.3.2 --recursive https://github.com/espressif/esp-idf.git
Brians-MacBook-Air:esp32 brianfriedkin$ cd esp-idf
Brians-MacBook-Air:esp-idf brianfriedkin$ cd components/nimble/nimble/
Brians-MacBook-Air:nimble brianfriedkin$ git checkout nimble-1.1.0-idf-v3.3
Previous HEAD position was ecb1e9e5... NimBLE: Add Host based privacy (RPA) support (v3.3)
Branch nimble-1.1.0-idf-v3.3 set up to track remote branch nimble-1.1.0-idf-v3.3 from origin.
Switched to a new branch 'nimble-1.1.0-idf-v3.3'
Brians-MacBook-Air:nimble brianfriedkin$ git pull
Already up-to-date.
Brians-MacBook-Air:nimble brianfriedkin$ git apply 0003-NimBLE_RPA_fix_identity_change_repairing_failure.txt 
error: patch failed: nimble/host/src/ble_hs_resolv.c:178
error: nimble/host/src/ble_hs_resolv.c: patch does not apply
error: patch failed: nimble/host/src/ble_hs_resolv_priv.h:53
error: nimble/host/src/ble_hs_resolv_priv.h: patch does not apply

What am I missing?

Thanks,
Brian

@prasad-alatkar
Copy link
Contributor

@geeksville Thank you for confirming the fix independently. @bfriedkin, looks like you pulled in latest NimBLE submodule, so fix is already present :) Can you please confirm if NimBLE submodule pointer is this: e7f0c530 ? If that is the case, you may not require to apply the patch.

@bfriedkin
Copy link
Author

@prasad-alatkar -

This is what I see before updating to nimble-1.1.0-idf-v3.3 :

Brians-MacBook-Air:nimble brianfriedkin$ git log
commit ecb1e9e5e3efb830d0b04dc3382be7b1012fdeaa (HEAD)
Author: Prasad Alatkar <prasad.alatkar@espressif.com>
Date:   Thu Mar 5 20:18:40 2020 +0800

    NimBLE: Add Host based privacy (RPA) support (v3.3)

commit c876cbcbc2b1fcae257d8f921083eebf0964eef5
Author: Prasad Alatkar <prasad.alatkar@espressif.com>
Date:   Mon Mar 2 16:29:45 2020 +0530

    nimble/host: Fix check for valid static random address

commit 4229791707c707a03309e33b0a54d1f1ac980e84
Author: Prasad Alatkar <prasad.alatkar@espressif.com>
Date:   Tue Feb 4 12:10:58 2020 +0530

And then after updating to nimble-1.1.0-idf-v3.3 :

Brians-MacBook-Air:nimble brianfriedkin$ git log
commit e7f0c530b6816e156292163f7d07b7375292a1af (HEAD -> nimble-1.1.0-idf-v3.3, origin/nimble-1.1.0-idf-v3.3)
Merge: a9239c6c 4f3e5e92
Author: Hrishikesh Dhayagude <hrishi@espressif.com>
Date:   Thu Jul 23 13:28:05 2020 +0800

    Merge branch 'nimble/repeated_pairing_rpa_fix_v3.3' into 'nimble-1.1.0-idf-v3.3'
    
    NimBLE: Add NRPA, Fix repeated pairing request failure in RPA feature and minor fix in `os_mempool.c` (Backport nimble-1.1.0-idf-v3.3)
    
    See merge request espressif/esp-nimble!51

commit 4f3e5e923103d04c7849fec97f0a075b79f88dfc
Author: Prasad Alatkar <prasad.alatkar@espressif.com>
Date:   Tue Jul 14 12:07:10 2020 +0530

    nimble/porting: Explicitly NULL terminate potential non NULL terminated string buffer
    
    os_mempool.c: Using strncpy, buffer will not be NULL terminated if size of
    source string is same as buffer size. Explicitly added NULL character at the
    end.

commit 549dd34723ef7bf6764dc9d59953310e1f19a2c5
Author: Prasad Alatkar <prasad.alatkar@espressif.com>
Date:   Tue Jul 14 10:55:17 2020 +0530

    NimBLE: Fix repeated pairing request failure in RPA feature
    
    Fixes https://github.com/espressif/esp-idf/issues/5530

Do I need to add anything else? Please let me know. From what I can tell it looks like this is all I need.

Thanks,
Brian

@prasad-alatkar
Copy link
Contributor

Yes @bfriedkin , you don't need to do anything else. The fix is already present.

@bfriedkin
Copy link
Author

@prasad-alatkar -

I did some testing against the latest fix using LightBlue on iOS, Android and macOS. No problems found. :-) Thank you!

We didn't get a chance to test the products today. Hoping to do that tomorrow. Sorry for the delay.

I do have a few questions regarding the fixes. The app updates will be rolled out as a product update built against ESP-IDF v3.3.2 and the fixes. The products were previously built against Bluedroid. The ESP32 side will likely have bonds stored by Bluedroid and similarly the mobile client apps will have the associated bonds. Will the ESP32 device ignore the Bluedroid bond because the NimBLE store is different? Will the bond stored by the mobile client(s) be effectively ignored and replaced by a new bond after re-pairing completes?

Thanks,
Brian

@prasad-alatkar
Copy link
Contributor

Hi @bfriedkin

Will the ESP32 device ignore the Bluedroid bond because the NimBLE store is different?

Yes.

Will the bond stored by the mobile client(s) be effectively ignored and replaced by a new bond after re-pairing completes?

This totally depends on design of mobile client, I have observed few Android phones deleting the old bond if the peer has deleted the bond. However in case of iPhone it may not be the case. So to answer your question, the mobile client may require to delete the previous bond and reinitialize pairing again. However this will be only one time activity.

espressif-bot pushed a commit that referenced this issue Aug 1, 2020
… (Backport v4.0)

- Merges espressif/esp-nimble#12
- Fixes repeated pairing failure in RPA feature

Closes #5530
projectgus pushed a commit that referenced this issue Aug 7, 2020
… (Backport v4.2)

- Merges espressif/esp-nimble#12
- Fixes repeated pairing failure in RPA feature

Closes #5530
espressif-bot pushed a commit that referenced this issue Sep 10, 2020
… (Backport v3.3)

- Merges espressif/esp-nimble#12
- Fixes repeated pairing failure in RPA feature

Closes #5530
espressif-bot pushed a commit that referenced this issue Sep 10, 2020
… (Backport v4.1)

- Merges espressif/esp-nimble#12
- Fixes repeated pairing failure in RPA feature

Closes #5530
dhrishi pushed a commit to espressif/esp-nimble that referenced this issue Oct 29, 2020
dhrishi pushed a commit to espressif/esp-nimble that referenced this issue Jan 18, 2021
dhrishi pushed a commit to espressif/esp-nimble that referenced this issue Apr 7, 2021
@espressif-bot espressif-bot added the Status: Done Issue is done internally label May 31, 2021
espressif-bot pushed a commit to espressif/esp-nimble that referenced this issue Mar 2, 2022
espressif-bot pushed a commit to espressif/esp-nimble that referenced this issue Feb 17, 2023
espressif-bot pushed a commit to espressif/esp-nimble that referenced this issue Mar 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Done Issue is done internally
Projects
None yet
Development

No branches or pull requests

6 participants