New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[rtl872x] fix multiple BLE issues #2710
Conversation
hal/src/rtl872x/ble_hal.cpp
Outdated
struct BleGapCache { | ||
bool isAdv; | ||
bool isScan; | ||
bool isconn; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: isConn
hal/src/rtl872x/ble_hal.cpp
Outdated
@@ -1585,13 +1646,14 @@ int BleGap::startScanning(hal_ble_on_scan_result_cb_t callback, void* context) { | |||
|
|||
SCOPE_GUARD ({ | |||
if (isScanning_) { | |||
const int LE_SCAN_STOP_RETRIES = 10; | |||
const int LE_SCAN_STOP_RETRIES = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove loop entirely if no longer required? I think when we ran into this previously a few retries did help, but it was always very difficult to reliably reproduced without logging enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't reproduce this. But if you did observe the retries help, let's keep it as-is at cost of 100ms postponed.
@@ -1616,8 +1681,13 @@ int BleGap::startScanning(hal_ble_on_scan_result_cb_t callback, void* context) { | |||
CHECK_RTL(le_scan_start()); | |||
isScanning_ = true; | |||
// GAP_SCAN_STATE_SCANNING may be propagated immediately following the GAP_SCAN_STATE_START | |||
if (waitState(BleGapDevState().scan(GAP_SCAN_STATE_SCANNING))) { | |||
return SYSTEM_ERROR_TIMEOUT; | |||
if (waitState(BleGapDevState().scan(GAP_SCAN_STATE_START), 10, true) && waitState(BleGapDevState().scan(GAP_SCAN_STATE_SCANNING), 10, true)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10ms sounds like a very tight condition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's designated. I just utilize the waitState()
to do a quick check before a long wait.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it critical to call BleGapDevState().scan(GAP_SCAN_STATE_START)
before BleGapDevState().scan(GAP_SCAN_STATE_SCANNING)
?
That seems to be the only real change here?
Like you mention, we still call waitState(BleGapDevState().scan(GAP_SCAN_STATE_SCANNING))
for the full wait duration after this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it critical to call BleGapDevState().scan(GAP_SCAN_STATE_START) before BleGapDevState().scan(GAP_SCAN_STATE_SCANNING)?
Nope. Just treat it as a check (force
poll is set to true
) without waiting any time (although there is 10ms). The GAP_SCAN_STATE_START
and GAP_SCAN_STATE_SCANNING
sometimes are notified in wrong sequence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some questions about general behavior.
I do see these errors in the app somewhat frequently:
0016113700 [app] INFO: start scan
0016113721 [app] INFO: got advert
0016113747 [app] INFO: got advert
0016113825 [app] INFO: got advert
0016113849 [app] INFO: got advert
0016114006 [app] INFO: got advert
0016114134 [app] INFO: got advert
0016114213 [hal.ble] TRACE: Failed to stop scanning, resetting stack
0016114216 [hal.ble] TRACE: Going to stop the stack...
0016114227 [app] INFO: got advert
0016114243 [app] INFO: got advert
0016114248 [app] INFO: got advert
0016114587 [net.lwip_rltk] INFO: promisc_deinit TODO
0016114589 [hal] INFO: WiFi off
0016114607 [hal] INFO: rltk_wlan_set_netif_info: 0, 94:94:4a:04:29:f8
0016114944 [hal] INFO: WiFi on
0016115277 [hal.ble] TRACE: Restore advertising state
0016115285 [app] INFO: scan took too long: 1584ms
0016115287 [app] INFO: end scan
The app continues to scan / advertise so I assume this is expected?
I also see this pattern frequently:
0016229913 [app] INFO: start scan
0016230041 [app] INFO: got advert
0016230141 [app] INFO: got advert
0016230352 [app] INFO: got advert
0016230353 [app] INFO: got advert
0016230437 [app] INFO: scan took too long: 523ms
0016230439 [app] INFO: end scan
Is the 50ms BLE scan timeout per beacon?
I see the time difference between some of the individual scan results is >50ms, is this expected?
@@ -1616,8 +1681,13 @@ int BleGap::startScanning(hal_ble_on_scan_result_cb_t callback, void* context) { | |||
CHECK_RTL(le_scan_start()); | |||
isScanning_ = true; | |||
// GAP_SCAN_STATE_SCANNING may be propagated immediately following the GAP_SCAN_STATE_START | |||
if (waitState(BleGapDevState().scan(GAP_SCAN_STATE_SCANNING))) { | |||
return SYSTEM_ERROR_TIMEOUT; | |||
if (waitState(BleGapDevState().scan(GAP_SCAN_STATE_START), 10, true) && waitState(BleGapDevState().scan(GAP_SCAN_STATE_SCANNING), 10, true)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it critical to call BleGapDevState().scan(GAP_SCAN_STATE_START)
before BleGapDevState().scan(GAP_SCAN_STATE_SCANNING)
?
That seems to be the only real change here?
Like you mention, we still call waitState(BleGapDevState().scan(GAP_SCAN_STATE_SCANNING))
for the full wait duration after this
@@ -1005,19 +1030,28 @@ void BleGap::bleCommandThread(void *context) { | |||
BleGap* gap = (BleGap*)context; | |||
while (true) { | |||
uint8_t command; | |||
if (!os_queue_take(gap->cmdQueue_, &command, CONCURRENT_WAIT_FOREVER, nullptr)) { | |||
if (!os_queue_peek(gap->cmdQueue_, &command, CONCURRENT_WAIT_FOREVER, nullptr)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the effective difference between peek
on the queue rather than take
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In int BleGap::stop(bool restore)
you'll see this code snippet:
if (isQueueAvailable()) {
LOCAL_DEBUG(" >>>>>>>>>>>> s_bleMutex.unlock() <<<<<<<<<<<<<<");
s_bleMutex.unlock();
}
This is going to fix a deadlock in BLE driver. Using peek
will set isQueueAvailable()
to let us know if there is command that might be acquiring the BLE lock and need to be processed and unlock the lock accordingly.
Yes.
The scan timeout can be changed in user app through |
Problem
Solution
Steps to Test
Build and run the attached user application.
Example App
References
N/A
Completeness