Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TW#16675] During WiFi activity, I2C starts to time out and then the timer crashes #1312

Closed
chanibal opened this issue Nov 24, 2017 · 9 comments

Comments

@chanibal
Copy link
Contributor

Sometimes when I2C write happens as the same time as a WiFi TCP or UDP send, the I2C starts to time out (error 263) and after some struggle (about 20 repeats of error 263), there is a crash in the timer thread.

What is happening when the crash occurs:

  • There is an update of a I2C 1306SSD screen, the update is very slow, usually you almost can't see the refresh rate but this time it takes seconds, with a timeout error every few lines.
  • The screen is being updated because the AP just lost one of it's connections, the reason for the STA disconnecting is unknown and it is not intended. Checking for disconnections is a timer that runs every second and tests if a packet has been received in the last 10 seconds from that device. Screen update is executed from timer context.

How does the timer update the screen?
There are two timers - an idle timer that is reset everytime an notification is received and a timeout timer.
The idle timer resumes the display task with vTaskResume. Then the display task takesa mutex, renders the screen and gives the mutex back. Then the display task suspends itself with vTaskSuspend.
The notification timer executes every second and checks for devices that stopped responding. If found, it resets the idle timer, takes the mutex, renders the screen (here are the most of the timeouts) and gives the mutex back. This is usually a fast operation, but with timeouts it takes long seconds - and I'm still doing this from the freertos timer, so perhaps this is an issue?

Some things in the background:

  • there is some light UDP send/receive (about 1 packet per second)
  • there are 0 to 2 open TCP sockets, at most one incoming and one outgoing
  • WiFi is in APSTA mode, but only AP is used at time of crash, up to 3 STAs
  • UART1 is sending constantly (25 bytes 100 times a second, invoked from a freertos timer)

I2C constantly times out all the time, but a retry works. Only when it happens next to WiFi, this crashes completely.

The I2C transfer is completed before the crash (full screen updated).

.../esp-idf/components/freertos/timers.c:670 (prvProcessReceivedCommands)- assert failed!
abort() was called at PC 0x4008a6d7 on core 0
0x4008a6d7: prvProcessReceivedCommands at .../esp-idf/components/freertos/timers.c:839


Backtrace: 0x4008c078:0x3ffd0f70 0x4008c16b:0x3ffd0f90 0x4008a6d7:0x3ffd0fb0 0x4008a737:0x3ffd0ff0
0x4008c078: invoke_abort at .../esp-idf/components/esp32/panic.c:553

0x4008c16b: abort at .../esp-idf/components/esp32/panic.c:553

0x4008a6d7: prvProcessReceivedCommands at .../esp-idf/components/freertos/timers.c:839

0x4008a737: prvTimerTask at .../esp-idf/components/freertos/timers.c:839
===============================================================
==================== ESP32 CORE DUMP START ====================

================== CURRENT THREAD REGISTERS ===================
pc             0x4008c078       0x4008c078 
lbeg           0x40098d20       1074367776
lend           0x40098d4e       1074367822
lcount         0x0      0
sar            0x5      5
ps             0x60f20  397088
threadptr      
br             
scompare1      
acclo          
acchi          
m0             
m1             
m2             
m3             
expstate       
f64r_lo        
f64r_hi        
f64s           
fcr            
fsr            
a0             0x4008c16e       1074315630
a1             0x3ffd0f70       1073549168
a2             0x2f     47
a3             0x2f     47
a4             0xc      12
a5             0xffffffff       -1
a6             0x0      0
a7             0xfffffffe       -2
a8             0x0      0
a9             0x3ffd0f50       1073549136
a10            0x0      0
a11            0x3ffd0f01       1073549057
a12            0x3ffd0eaf       1073548975
a13            0x30     48
a14            0x0      0
a15            0x3ffd0eb9       1073548985

==================== CURRENT THREAD STACK =====================
#0  0x4008c078 in invoke_abort () at .../esp-idf/components/esp32/panic.c:139
#1  0x4008c16e in abort () at .../esp-idf/components/esp32/panic.c:148
#2  0x4008a6da in prvProcessReceivedCommands () at .../esp-idf/components/freertos/timers.c:670
#3  0x4008a73a in prvTimerTask (pvParameters=0x0) at .../esp-idf/components/freertos/timers.c:445

======================== THREADS INFO =========================
  Id   Target Id         Frame
  15   process 14        0x40089e01 in xQueueGenericReceive (xQueue=0x3ffd3660, pvBuffer=0x3ffc7914 , xTicksToWait=4294967295, xJustPeeking=0) at .../esp-idf/components/freertos/queue.c:1588
  14   process 13        0x40089e01 in xQueueGenericReceive (xQueue=0x3ffeb6dc, pvBuffer=0x3ffec610, xTicksToWait=4294967295, xJustPeeking=0) at .../esp-idf/components/freertos/queue.c:1588
  13   process 12        0x40089e01 in xQueueGenericReceive (xQueue=0x3ffea4d8, pvBuffer=0x3ffeaf50, xTicksToWait=4294967295, xJustPeeking=0) at .../esp-idf/components/freertos/queue.c:1588
  12   process 11        0x400842c2 in esp_crosscore_int_send_yield (core_id=0) at .../esp-idf/components/esp32/crosscore_int.c:112
  11   process 10        0x40089e01 in xQueueGenericReceive (xQueue=0x3ffe49b4, pvBuffer=0x3ffe5fb0, xTicksToWait=4294967295, xJustPeeking=0) at .../esp-idf/components/freertos/queue.c:1588
  10   process 9         0x40089e01 in xQueueGenericReceive (xQueue=0x3ffedd9c, pvBuffer=0x3fff1cd0, xTicksToWait=4294967295, xJustPeeking=0) at .../esp-idf/components/freertos/queue.c:1588
  9    process 8         0x40089e01 in xQueueGenericReceive (xQueue=0x3ffd5270, pvBuffer=0x3ffd6810, xTicksToWait=4294967295, xJustPeeking=0) at .../esp-idf/components/freertos/queue.c:1588
  8    process 7         0x40089e01 in xQueueGenericReceive (xQueue=0x3ffeb604, pvBuffer=0x3ffcf370, xTicksToWait=4294967295, xJustPeeking=0) at .../esp-idf/components/freertos/queue.c:1588
  7    process 6         0x40089e01 in xQueueGenericReceive (xQueue=0x3ffd37b4, pvBuffer=0x0, xTicksToWait=4294967295, xJustPeeking=0) at .../esp-idf/components/freertos/queue.c:1588
  6    process 5         0x40089e01 in xQueueGenericReceive (xQueue=0x3ffd3810, pvBuffer=0x0, xTicksToWait=4294967295, xJustPeeking=0) at .../esp-idf/components/freertos/queue.c:1588
  5    process 4         0x40089e01 in xQueueGenericReceive (xQueue=0x3ffd5080, pvBuffer=0x3ffd2d80, xTicksToWait=10, xJustPeeking=0) at .../esp-idf/components/freertos/queue.c:1588
  4    process 3         xEventGroupWaitBits (xEventGroup=0x3ffd46d0, uxBitsToWaitFor=3, xClearOnExit=0, xWaitForAllBits=0, xTicksToWait=100) at .../esp-idf/components/freertos/event_groups.c:384
  3    process 2         0x400d26f6 in esp_vApplicationIdleHook () at .../esp-idf/components/esp32/freertos_hooks.c:62
  2    process 1         0x400d26f6 in esp_vApplicationIdleHook () at .../esp-idf/components/esp32/freertos_hooks.c:62
* 1           0x4008c078 in invoke_abort () at .../esp-idf/components/esp32/panic.c:139

======================= ALL MEMORY REGIONS ========================
Name   Address   Size   Attrs
.rtc.text 0x400c0000 0x0 RW
.iram0.vectors 0x40080000 0x400 R XA
.iram0.text 0x40080400 0x193f0 R XA
.dram0.data 0x3ffc0000 0x3e4c RW A
.flash.rodata 0x3f400020 0x2e6f0 RW A
.flash.text 0x400d0018 0x9561c R XA
.coredump.tasks 0x3ffd1094 0x16c RW
.coredump.tasks 0x3ffd0eb0 0x1dc RW
.coredump.tasks 0x3ffcff20 0x16c RW
.coredump.tasks 0x3ffcfdb0 0x168 RW
.coredump.tasks 0x3ffcf9ac 0x16c RW
.coredump.tasks 0x3ffcf840 0x164 RW
.coredump.tasks 0x3ffccfd0 0x16c RW
.coredump.tasks 0x3ffd1cc0 0x550 RW
.coredump.tasks 0x3ffcd15c 0x16c RW
.coredump.tasks 0x3ffd2c30 0x1e4 RW
.coredump.tasks 0x3ffcce50 0x16c RW
.coredump.tasks 0x3ffcccc0 0x188 RW
.coredump.tasks 0x3ffcc8dc 0x16c RW
.coredump.tasks 0x3ffcc750 0x184 RW
.coredump.tasks 0x3ffcf438 0x16c RW
.coredump.tasks 0x3ffcf080 0x3b0 RW
.coredump.tasks 0x3ffd2e1c 0x16c RW
.coredump.tasks 0x3ffd6710 0x1b8 RW
.coredump.tasks 0x3fff1f28 0x16c RW
.coredump.tasks 0x3fff1ba0 0x380 RW
.coredump.tasks 0x3ffd2f8c 0x16c RW
.coredump.tasks 0x3ffe5ea0 0x1ac RW
.coredump.tasks 0x3ffcc320 0x16c RW
.coredump.tasks 0x3ffcc170 0x1a8 RW
.coredump.tasks 0x3ffd30fc 0x16c RW
.coredump.tasks 0x3ffeae50 0x19c RW
.coredump.tasks 0x3ffec820 0x16c RW
.coredump.tasks 0x3ffec4e0 0x338 RW
.coredump.tasks 0x3ffed978 0x16c RW
.coredump.tasks 0x3ffed7f0 0x180 RW

===================== ESP32 CORE DUMP END =====================
===============================================================

This might be related to #1133 where I had the same issue with I2C, but back then, it wasn't fatal.

Could someone please tell me what can I do about the 263 error? This seems to be the root cause and it is in esp-idf at least from #1133 where a partial fix was applied.

Also, can I do such non-instant things like I2C writes (or UDP socket writes, or UART writes) from a FreeRTOS timer? Is it possible that something will break because of this, or is it safe as long as I don't overflow the stack?

ESP-IDF v3.0-dev-1057-g4048db35

@ammaree
Copy link

ammaree commented Nov 24, 2017 via email

@FayeY
Copy link
Collaborator

FayeY commented Nov 27, 2017

Hi @chanibal , did the above suggestion help you resolve the issue?

@FayeY FayeY changed the title During WiFi activity, I2C starts to time out and then the timer crashes [TW#16675] During WiFi activity, I2C starts to time out and then the timer crashes Nov 27, 2017
@chanibal
Copy link
Contributor Author

@ammaree, was afraid of that. Will have to make separate tasks for some things (too bad, I'm very low on memory :/)

@FayeY did not yet verify, will inform on progress.

But aside of I2C, can I send UDP or write to UART from the timer task? What is the realistic slowdown I can have in the timer that will not cause issues?

Also, what might be the cause of the I2C timeout? Bad hardware? Issues with ESP32 implementation? As for my implementation, I first implemented the screen on my own, and then used a library that did that for me (u8g2 with NKolban's esp32 abstraction layer) - both had this issue.

@ammaree
Copy link

ammaree commented Nov 27, 2017 via email

@chanibal
Copy link
Contributor Author

chanibal commented Nov 28, 2017

@FayeY it looks like doing I2C in a separate worker task fixed the issues (or at least it didn't crash over night). (edit: still hangs, just not with a crash)

Still, this isn't the first time I have issues with I2C timeouts, and this just degraded to has errors, but non fatal. Are there any other reports of I2C timeouts after 01befe4? Might this be only me? Any chance you guys at Espressif might be able to check this on real hardware? It's a fairly popular part - the most standard $4 0.96" I2C OLED.

@chanibal
Copy link
Contributor Author

chanibal commented Nov 28, 2017

This issue just happened despite not using a timer anymore. This time, ESP32 just hung after lots of timeouts after manually disconnecting in STA mode. Is repeatable.

Currently all access to the I2C is on a separate task that receives commands via a queue.

The timeout is always happening in https://github.com/nkolban/esp32-snippets/blob/1e58455c6d4c56124ac12c96c77bc33093017df0/hardware/displays/U8G2/u8g2_esp32_hal.c#L173

@chanibal
Copy link
Contributor Author

Verified that what was crashing at the start, with the timer was an auto reloading timer that was not able to enqueue another timer event because the queue was full (because I2C didn't make it in time). Still not the main issue, though.

@koobest
Copy link
Contributor

koobest commented Nov 29, 2017

Hi, @chanibal. we are working on a similar problem. we have modified the I2C drievr,and will update the IDF after the test passes.

igrr pushed a commit that referenced this issue Dec 18, 2017
Reported from github:
#1312
#1193

Issues:
1. We used to use event group in the driver, which would cause:
    a. longer operation time since the event group are based on FreeRTOS timer.
    b. Operation fails if the timer queue is not long enough.
2. There might be some issue with event group, we will still try to provide a small test code in other branch.

modification:
1. use queue instead of event-bit for internal commands
2. use queue overwrite for cmd_done event
@FayeY
Copy link
Collaborator

FayeY commented Dec 21, 2017

Hi @chanibal , please have a try with the latest IDF, and feel free to reopen it if your problem is still unsolved.

@FayeY FayeY closed this as completed Dec 21, 2017
igrr pushed a commit that referenced this issue Dec 29, 2017
Reported from github:
#1312
#1193

Issues:
1. We used to use event group in the driver, which would cause:
    a. longer operation time since the event group are based on FreeRTOS timer.
    b. Operation fails if the timer queue is not long enough.
2. There might be some issue with event group, we will still try to provide a small test code in other branch.

modification:
1. use queue instead of event-bit for internal commands
2. use queue overwrite for cmd_done event
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants