Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESP32 bug: user is forced to repair when they should not be #266

Closed
geeksville opened this issue Jul 8, 2020 · 10 comments · Fixed by #296
Closed

ESP32 bug: user is forced to repair when they should not be #266

geeksville opened this issue Jul 8, 2020 · 10 comments · Fixed by #296
Assignees
Labels
bug Something isn't working

Comments

@geeksville
Copy link
Member

It seems sometimes the device forgets its bluetooth pairing state. The bug is definitely on the device side and independent of the GATT services used. I suspect that the ESP32 bluetooth's persistent NVS data might be getting discarded/corrupted somehow? Next steps are to print the entire NVS contents just before we try to start bluetooth and see if it is changing.

@geeksville
Copy link
Member Author

This issue has been mentioned on Meshtastic. There might be relevant details there:

https://meshtastic.discourse.group/t/bluetooth-connection/152/7

@geeksville geeksville added the bug Something isn't working label Jul 8, 2020
@geeksville geeksville self-assigned this Jul 8, 2020
@geeksville geeksville added this to To do in Meshtastic work via automation Jul 8, 2020
@geeksville geeksville moved this from To do to In progress in Meshtastic work Jul 8, 2020
@geeksville geeksville changed the title Sometimes user is forced to repair when they should not be ESP32 bug: user is forced to repair when they should not be Jul 11, 2020
@geeksville
Copy link
Member Author

This issue has been mentioned on Meshtastic. There might be relevant details there:

https://meshtastic.discourse.group/t/alpha-tester-thread-please-try-device-code-0-7-11/298/185

@geeksville
Copy link
Member Author

possibly related: espressif/esp-idf#3968

@geeksville
Copy link
Member Author

yeah - this portion of esp32-hal-misc.c seems bad:

    esp_err_t err = nvs_flash_init();
    if(err == ESP_ERR_NVS_NO_FREE_PAGES){
        const esp_partition_t* partition = esp_partition_find_first(ESP_PARTITION_TYPE_DATA, ESP_PARTITION_SUBTYPE_DATA_NVS, NULL);
        if (partition != NULL) {
            err = esp_partition_erase_range(partition, 0, partition->size);
            if(!err){
                err = nvs_flash_init();
            } else {
                log_e("Failed to format the broken NVS partition!");
            }
        }
    }

@geeksville
Copy link
Member Author

geeksville commented Jul 17, 2020

misc notes to self:
node 0xb4 has 202 entries allocated.
node 0x38 has 188 entries allocated.
ble security overview: https://www.kynetics.com/docs/2018/BLE_Pairing_and_bonding/

ok - NVS is not filling up, but this looks plausible:
espressif/esp-idf#5530 (comment) (and related thread) https://www.esp32.com/viewtopic.php?t=13049

alas - not related (it was in nimble). though it did make me realize that the idf 3.3 in the arduino build is pretty old. Looking at commits.

Meshtastic work automation moved this from In progress to Done Jul 18, 2020
@geeksville geeksville reopened this Jul 19, 2020
Meshtastic work automation moved this from Done to In progress Jul 19, 2020
@geeksville
Copy link
Member Author

alas - just had a failure. will now try logging all NVS writes.

@geeksville
Copy link
Member Author

geeksville commented Jul 19, 2020

misc notes to self:

after a failure node 0x38 has 178 used entries. Therefore it must have done a GC on the NVS.
esptool.py read_flash 0x9000 0x5000 node38-june19.bin

good description of how RPA addressing works https://www.electronicdesign.com/technologies/communications/article/21801870/ble-v42-creating-faster-more-secure-powerefficient-designspart-2 (which seems related to the problem)

Someone encountered the same problem but never followed up: espressif/esp-idf#3968

ooh super interesting! espressif/esp-idf#1811 (comment) MITM_BOND might be the fix! Alas no.

After failure when phone tries to connect it just looks like hte normal auth flow?

Setting RTC 1595275827 secs
Read RTC time as 1595275827 (cur millis 12233) valid=1
GPS fix type 0
[D][BLEDevice.cpp:102] gattServerEventHandler(): gattServerEventHandler [esp_gatt_if: 3] ... Unknown
[D][BLEServer.cpp:368] onConnect(): BLEServerCallbacks
[D][BLEServer.cpp:369] onConnect(): BLEServerCallbacks
[D][BLEServer.cpp:370] onConnect(): BLEServerCallbacks
[D][BLEDevice.cpp:571] getAdvertising(): get advertising
[D][BLEAdvertising.cpp:193] start(): - advertising service: 6ba1b218-15a8-461f-9fa8-5dcae273eafd
[D][BLEDevice.cpp:571] getAdvertising(): get advertising
[D][BLEAdvertising.cpp:495] handleGAPEvent(): handleGAPEvent [event no: 0]
[D][BLEDevice.cpp:571] getAdvertising(): get advertising
[D][BLEAdvertising.cpp:495] handleGAPEvent(): handleGAPEvent [event no: 1]
[D][BLEDevice.cpp:571] getAdvertising(): get advertising
[D][BLEAdvertising.cpp:495] handleGAPEvent(): handleGAPEvent [event no: 6]
[I][BLEDevice.cpp:237] gapEventHandler(): ESP_GAP_BLE_PASSKEY_NOTIF_EVT
[I][BLEDevice.cpp:239] gapEventHandler(): passKey = 369823
onPassKeyNotify 369823
Trigger powerFSM 7
Transition powerFSM transition=Bluetooth pairing, from=ON to=ON
[D][BLEDevice.cpp:571] getAdvertising(): get advertising
[D][BLEAdvertising.cpp:495] handleGAPEvent(): handleGAPEvent [event no: 11]
showing bluetooth screen
GPS fix type 0
GPS fix type 0
GPS fix type 0
[I][BLEDevice.cpp:253] gapEventHandler(): ESP_GAP_BLE_AUTH_CMPL_EVT
phone authenticate failed 99
[D][BLEDevice.cpp:571] getAdvertising(): get advertising
[D][BLEAdvertising.cpp:495] handleGAPEvent(): handleGAPEvent [event no: 8]
showing standard frames
[D][BLEDevice.cpp:102] gattServerEventHandler(): gattServerEventHandler [esp_gatt_if: 3] ... Unknown
processUBX: counter hit MAX_PAYLOAD_SIZE
GPS fix type 0
GPS fix type 0

geeksville added a commit to geeksville/Meshtastic-esp32 that referenced this issue Jul 19, 2020
geeksville added a commit to geeksville/Meshtastic-esp32 that referenced this issue Jul 19, 2020
…#266

(and it is good / more secure anyways - the old code was just
based on the example docs)
@geeksville
Copy link
Member Author

This issue has been mentioned on Meshtastic. There might be relevant details there:

https://meshtastic.discourse.group/t/im-away-from-forum-for-a-couple-of-days-fixing-esp32-re-pairing/776/3

@geeksville
Copy link
Member Author

geeksville commented Jul 20, 2020

ooh - someone with the same problem: https://www.bountysource.com/issues/61707647-whitelisting-still-does-not-allow-connections-idfgh-322. The bluedroid support for RPA is broken on ESP32 and they don't seem to be fixing it. Instead people are moving to Nimble.

Which is a bummer - because I wouldn't want to make such a big change in the project that this point. But it seems unavoidable (because I'm not going to be fixing bluedroid ;-). So I'll need to switch to Nimble (like the commerical devs seem to be doing). It will provide a few benefits:

  • This bug can be fixed. (Because both iOS and Android now require RPA)
  • SUBSTANTIAL savings in IRAM usage (very useful for turning wifi back on at the same time)
  • Can use the same bluetooth stack on NRF52 AND ESP32 (so testing coverage is improved and maintenance costs go down)
  • Will save LOTS of flash space on NRF52 because no need for softdevice with this stack.
  • Support BLE-Mesh if we want to use it someday

@geeksville
Copy link
Member Author

This issue has been mentioned on Meshtastic. There might be relevant details there:

https://meshtastic.discourse.group/t/an-update-on-the-esp-32-loss-of-pairing-bug/776/4

geeksville added a commit to meshtastic/arduino-esp32 that referenced this issue Jul 21, 2020
…EDROID_ENABLED

Which is really what should have been tested.  This allows use of
the Arduino layer with the newer Nimble stack for those that don't want
to use Bluedroid.

In support of meshtastic/firmware#266
geeksville added a commit to meshtastic/esp-nimble that referenced this issue Jul 23, 2020
geeksville added a commit to meshtastic/esp-idf that referenced this issue Jul 23, 2020
geeksville added a commit to geeksville/Meshtastic-esp32 that referenced this issue Jul 23, 2020
geeksville added a commit to meshtastic/arduino-esp32 that referenced this issue Jul 23, 2020
…EDROID_ENABLED

Which is really what should have been tested.  This allows use of
the Arduino layer with the newer Nimble stack for those that don't want
to use Bluedroid.

In support of meshtastic/firmware#266
geeksville added a commit to meshtastic/esp32-arduino-lib-builder that referenced this issue Jul 24, 2020
Meshtastic patched version esp-idf commit #e7f316d5a4eb64ca52d40575cb20815d456a9c4f
used.

In support of: meshtastic/firmware#266
geeksville added a commit to meshtastic/arduino-esp32 that referenced this issue Jul 24, 2020
Meshtastic patched version esp-idf commit #e7f316d5a4eb64ca52d40575cb20815d456a9c4f
used.

In support of: meshtastic/firmware#266
geeksville added a commit to geeksville/Meshtastic-esp32 that referenced this issue Jul 24, 2020
    Meshtastic patched version esp-idf commit #e7f316d5a4eb64ca52d40575cb20815d456a9c4f
    used.

    In support of: meshtastic#266
Meshtastic work automation moved this from In progress to Done Jul 24, 2020
me-no-dev pushed a commit to espressif/arduino-esp32 that referenced this issue Nov 6, 2020
…EDROID_ENABLED

Which is really what should have been tested.  This allows use of
the Arduino layer with the newer Nimble stack for those that don't want
to use Bluedroid.

In support of meshtastic/firmware#266
me-no-dev added a commit to espressif/arduino-esp32 that referenced this issue Nov 6, 2020
…BLED (#4497)

* Change check for CONFIG_BT_ENABLE to really be a check for CONFIG_BLUEDROID_ENABLED

Which is really what should have been tested.  This allows use of
the Arduino layer with the newer Nimble stack for those that don't want
to use Bluedroid.

In support of meshtastic/firmware#266

* Change check for CONFIG_BT_ENABLE to really be a check for CONFIG_BLUEDROID_ENABLED

Which is really what should have been tested.  This allows use of
the Arduino layer with the newer Nimble stack for those that don't want
to use Bluedroid.

In support of meshtastic/firmware#266

* wifi prov changes

* merge fixes

Co-authored-by: geeksville <kevinh@geeksville.com>
me-no-dev added a commit to espressif/arduino-esp32 that referenced this issue Mar 31, 2021
…BLED (#4497)

* Change check for CONFIG_BT_ENABLE to really be a check for CONFIG_BLUEDROID_ENABLED

Which is really what should have been tested.  This allows use of
the Arduino layer with the newer Nimble stack for those that don't want
to use Bluedroid.

In support of meshtastic/firmware#266

* Change check for CONFIG_BT_ENABLE to really be a check for CONFIG_BLUEDROID_ENABLED

Which is really what should have been tested.  This allows use of
the Arduino layer with the newer Nimble stack for those that don't want
to use Bluedroid.

In support of meshtastic/firmware#266

* wifi prov changes

* merge fixes

Co-authored-by: geeksville <kevinh@geeksville.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

1 participant