Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sntp module causes crash, apparently from the esp-idf framework itself (IDFGH-11754) #12854

Closed
3 tasks done
dannybackx opened this issue Dec 21, 2023 · 12 comments
Closed
3 tasks done
Assignees
Labels
Resolution: NA Issue resolution is unavailable Status: Done Issue is done internally Type: Bug bugs in IDF

Comments

@dannybackx
Copy link

Answers checklist.

  • I have read the documentation ESP-IDF Programming Guide and the issue is not addressed there.
  • I have updated my IDF branch (master or release) to the latest version and checked that the issue is present there.
  • I have searched the issue tracker for a similar issue and not found a similar issue.

IDF version.

5.2 beta 1

Espressif SoC revision.

esp32s3

Operating System used.

Linux

How did you build your project?

Command line with idf.py

If you are using Windows, please specify command line type.

None

Development Kit.

ZX3D95CE01S-TR-4848

Power Supply used.

External 5V

What is the expected behavior?

I admit I've been bitten by my own code calling an SNTP function that deliberately causes a crash, but now I see the framework itself can get there as well.

Either the sntp module shouldn't crash when it's not called in the right order (be nice about that), or the calling order from the framework should be fixed.

See logs included, network hiccup can crash sntp.

What is the actual behavior?

E (15:40:40.691) Network: Failed to connect to SSID RoeselUBackx (reason 200 BEACON_TIMEOUT)
I (15:40:40.691) Network: onWifiDisconnect: retry 1
I (15:40:43.119) esp_netif_handlers: wifi ip: 192.168.0.212, mask: 255.255.255.0, gw: 192.168.0.1
I (15:40:43.120) Network: onStaGotIP: call module App
I (15:40:43.124) App: Network connected, ip 192.168.0.212

assert failed: sntp_setoperatingmode /IDF/components/lwip/lwip/src/apps/sntp/sntp.c:730 (Operating mode must not be set while SNTP client is running)


Backtrace: 0x40379f9e:0x3fccae10 0x403873f9:0x3fccae30 0x4038f91d:0x3fccae50 0x420664f5:0x3fccaf70 0x42079c45:0x3fccaf90 0x420f4d53:0x3fccafc0 0x42078862:0x3fccafe0 0x42065f45:0x3fccb000
0x40379f9e: panic_abort at /home/danny/src/github/esp32/esp-idf-v5.2-beta1/components/esp_system/panic.c:472

0x403873f9: esp_system_abort at /home/danny/src/github/esp32/esp-idf-v5.2-beta1/components/esp_system/port/esp_system_chip.c:93

0x4038f91d: __assert_func at /home/danny/src/github/esp32/esp-idf-v5.2-beta1/components/newlib/assert.c:81

0x420664f5: sntp_setoperatingmode at /home/danny/src/github/esp32/esp-idf-v5.2-beta1/components/lwip/lwip/src/apps/sntp/sntp.c:730 (discriminator 1)

0x42079c45: sntp_init_api at /home/danny/src/github/esp32/esp-idf-v5.2-beta1/components/esp_netif/lwip/esp_netif_sntp.c:51 (discriminator 2)

0x420f4d53: tcpip_exec_api at /home/danny/src/github/esp32/esp-idf-v5.2-beta1/components/esp_netif/lwip/esp_netif_lwip.c:681

0x42078862: esp_netif_api_cb at /home/danny/src/github/esp32/esp-idf-v5.2-beta1/components/esp_netif/lwip/esp_netif_lwip.c:208

0x42065f45: tcpip_thread_handle_msg at /home/danny/src/github/esp32/esp-idf-v5.2-beta1/components/lwip/lwip/src/api/tcpip.c:202
 (inlined by) tcpip_thread at /home/danny/src/github/esp32/esp-idf-v5.2-beta1/components/lwip/lwip/src/api/tcpip.c:148


Steps to reproduce.

  1. Step
  2. Step
  3. Step
    ...

Debug Logs.

No response

More Information.

No response

@dannybackx dannybackx added the Type: Bug bugs in IDF label Dec 21, 2023
@espressif-bot espressif-bot added the Status: Opened Issue is new label Dec 21, 2023
@github-actions github-actions bot changed the title sntp module causes crash, apparently from the esp-idf framework itself sntp module causes crash, apparently from the esp-idf framework itself (IDFGH-11754) Dec 21, 2023
@AxelLin
Copy link
Contributor

AxelLin commented Dec 22, 2023

assert failed: sntp_setoperatingmode /IDF/components/lwip/lwip/src/apps/sntp/sntp.c:730 (Operating mode must not be set while SNTP client is running)

esp_sntp_setoperatingmode() needs to be called before esp_sntp_init().

@dannybackx
Copy link
Author

Yes, and I am respecting that in my code.

But then some network instability occurs and it turns out the framework has a call to sntp_setoperatingmode hidden in some code of its own, and it creates havoc.

@david-cermak
Copy link
Collaborator

@dannybackx Is it possible that you called esp_netif_sntp_init() multiple times (without destroying it)?
If so, I'll add a quick fix checking the state and returning ESP_ERR_INVALID_STATE if already initialized (agree that the API should be more resilient to such issues)

if not, could you please provide more info about your startup code and SNTP initialization? some steps to reproduce?

@dannybackx
Copy link
Author

svn+ssh://dannybackx@svn.code.sf.net/p/esp32-s3-86-box-project-base/code/trunk/alarm/main/Network.cpp and ... App.cpp

@dannybackx
Copy link
Author

I believe it's possible to get into a scenario like you suggest from my code only.

However, I don't believe that's what happened.

If you look at the call stack on my original post, you'll see it's the framework that calls esp_netif_sntp_init again, causing a crash, not my code.

@david-cermak
Copy link
Collaborator

I believe it's possible to get into a scenario

A quick way to find out is to add this

diff --git a/components/esp_netif/lwip/esp_netif_sntp.c b/components/esp_netif/lwip/esp_netif_sntp.c
@@ -90,6 +90,9 @@ void esp_netif_sntp_renew_servers(void *handler_args, esp_event_base_t base, int
 
 esp_err_t esp_netif_sntp_init(const esp_sntp_config_t * config)
 {
+    if (s_storage) {
+        return ESP_OK; // ESP_ERR_INVALID_STATE
+    }
     esp_err_t ret = ESP_OK;

and check if you're able to reproduce it.

If you look at the call stack

This is telling me that esp_netif_api_cb() was called from the TCP/IP thread, likely triggered by:

ESP_GOTO_ON_ERROR(esp_netif_tcpip_exec(sntp_init_api, (void*)config), err, TAG, "Failed initialize SNTP service");

(where we ask the TCP/IP thread to run it in esp_netif_sntp_init())

@dannybackx
Copy link
Author

I think I found a way to remove the problem from my code (added a call to esp_netif_sntp_deinit() just after esp_sntp_stop() in the disconnect handler).

But as you also said above

If so, I'll add a quick fix checking the state and returning ESP_ERR_INVALID_STATE if already initialized (agree that the API should be more resilient to such issues)

making the IDF code more robust is what I'm really asking for.

@espressif-bot espressif-bot added Status: In Progress Work is in progress and removed Status: Opened Issue is new labels Dec 22, 2023
@dannybackx
Copy link
Author

At first sight what I wrote above (more consistency in my code) doesn't fix it. Didn't try the esp_netif_sntp_init patch you suggested yet.

E (14:12:35.492) Network: Failed to connect to SSID RoeselUBackx (reason 200 BEACON_TIMEOUT)
I (14:12:35.492) Network: onWifiDisconnect: retry 1
I (14:12:35.496) Peers: Exception on fd 55
I (14:12:35.500) Peers: TcpServerTask: try to reconnect to wall
I (14:12:35.507) Peers: TcpAddPeer: peer wall
E (14:12:35.512) Peers: connect tcp:wall/23456 failed, errno 118 Host is unreachable
I (14:12:35.520) Peers: close(55)
I (14:12:37.921) esp_netif_handlers: wifi ip: 192.168.0.212, mask: 255.255.255.0, gw: 192.168.0.1
I (14:12:37.922) Network: onStaGotIP: call module App
I (14:12:37.926) App: Network connected, ip 192.168.0.212

assert failed: sntp_setoperatingmode /IDF/components/lwip/lwip/src/apps/sntp/sntp.c:730 (Operating mode must not be set while SNTP client is running)


Backtrace: 0x40379f9e:0x3fccae10 0x403873f9:0x3fccae30 0x4038f91d:0x3fccae50 0x42066375:0x3fccaf70 0x42079ac5:0x3fccaf90 0x420f4bd3:0x3fccafc0 0x420786e2:0x3fccafe0 0x42065dc5:0x3fccb000
0x40379f9e: panic_abort at /home/danny/src/github/esp32/esp-idf-v5.2-beta1/components/esp_system/panic.c:472

0x403873f9: esp_system_abort at /home/danny/src/github/esp32/esp-idf-v5.2-beta1/components/esp_system/port/esp_system_chip.c:93

0x4038f91d: __assert_func at /home/danny/src/github/esp32/esp-idf-v5.2-beta1/components/newlib/assert.c:81

0x42066375: sntp_setoperatingmode at /home/danny/src/github/esp32/esp-idf-v5.2-beta1/components/lwip/lwip/src/apps/sntp/sntp.c:730 (discriminator 1)

0x42079ac5: sntp_init_api at /home/danny/src/github/esp32/esp-idf-v5.2-beta1/components/esp_netif/lwip/esp_netif_sntp.c:51 (discriminator 2)

0x420f4bd3: tcpip_exec_api at /home/danny/src/github/esp32/esp-idf-v5.2-beta1/components/esp_netif/lwip/esp_netif_lwip.c:681

0x420786e2: esp_netif_api_cb at /home/danny/src/github/esp32/esp-idf-v5.2-beta1/components/esp_netif/lwip/esp_netif_lwip.c:208

0x42065dc5: tcpip_thread_handle_msg at /home/danny/src/github/esp32/esp-idf-v5.2-beta1/components/lwip/lwip/src/api/tcpip.c:202
 (inlined by) tcpip_thread at /home/danny/src/github/esp32/esp-idf-v5.2-beta1/components/lwip/lwip/src/api/tcpip.c:148


Also I added a

void Network::onStaLostIP(void *arg, esp_event_base_t event_base, int32_t event_id, void *event_data)
{
  ESP_LOGE(network->tag, "%s", __FUNCTION__);
}
...
  ESP_ERROR_CHECK(esp_event_handler_register(IP_EVENT, IP_EVENT_STA_LOST_IP, &onStaLostIP, NULL));

but that message doesn't appear so this event didn't occur.

@caco3
Copy link

caco3 commented Dec 28, 2023

I also have an application which crashes within esp_sntp_setoperatingmode().
I was using ESP IDF v5.0.2 without issues.
But now with ESP IDF v5.1.2 it always crashes.

Traceback:

0x4012a7e4: sys_mbox_post at .platformio/packages/framework-espidf/components/lwip/port/freertos/sys_arch.c:239
0x4011c8d1: tcpip_callback at .platformio/packages/framework-espidf/components/lwip/lwip/src/api/tcpip.c:324
0x4011a784: esp_sntp_setoperatingmode at .platformio/packages/framework-espidf/components/lwip/apps/sntp/sntp.c:168
0x400df9b5: setupTime() at .platformio/packages/framework-espidf/components/lwip/include/apps/esp_sntp.h:239
 (inlined by) setupTime() at ./code/components/jomjol_time_sntp/time_sntp.cpp:239
0x400d53f0: app_main at ./code/main/main.cpp:339
0x400911f5: main_task at .platformio/packages/framework-espidf/components/freertos/app_startup.c:208

Code:

bool setupTime() {
        esp_sntp_setoperatingmode(SNTP_OPMODE_POLL); 
        esp_sntp_setservername(0, timeServer.c_str()); 
        esp_sntp_init(); 
}

@david-cermak
Copy link
Collaborator

@dannybackx I'm attaching the fix for your reference, but it's just about preventing from calling it more times.
What I'd suggest in your codebase would be to init just once, but disable the default startup:

    esp_sntp_config_t config = ESP_NETIF_SNTP_DEFAULT_CONFIG(..);
    config.start = false;

And then call esp_netif_sntp_start() after connecting and esp_netif_sntp_stop() on disconnection.

@caco3 The traceback suggests that you're trying to initialize SNTP before LWIP was up and running. (sntp_setoperatingmode() should be called from the TCPIP context, It was possible on v5.0, but from v5.1 we're trying to run it safely from LWIP thread -- maybe a check with an error message would help here)

@caco3
Copy link

caco3 commented Jan 8, 2024

@david-cermak Thanks, we got it working now.

@alutov
Copy link

alutov commented Jan 9, 2024

I call "esp_netif_init();" before first sntp call.

@espressif-bot espressif-bot added Status: Reviewing Issue is being reviewed and removed Status: In Progress Work is in progress labels Jan 15, 2024
@espressif-bot espressif-bot added Status: Done Issue is done internally Resolution: NA Issue resolution is unavailable and removed Status: Reviewing Issue is being reviewed labels Feb 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Resolution: NA Issue resolution is unavailable Status: Done Issue is done internally Type: Bug bugs in IDF
Projects
None yet
Development

No branches or pull requests

6 participants