New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TW#27458] power_save example does not maintain an association #2711

Open
tve opened this Issue Nov 18, 2018 · 21 comments

Comments

Projects
None yet
5 participants
@tve
Copy link

tve commented Nov 18, 2018

Environment

  • Development Kit: Adafruit Huzzah32 or nodemcu
  • Kit version (for WroverKit/PicoKit/DevKitC): [v1|v2|v3|v4]
  • Module or chip used: ESP32-WROOM-32
  • IDF version (run git describe --tags to find it): v3.1.1 and master @960c240
  • Build System: Make
  • Compiler version (run xtensa-esp32-elf-gcc --version to find it):
    xtensa-esp32-elf-gcc (crosstool-NG crosstool-ng-1.22.0-80-g6c4433a) 5.2.0
  • Operating System: linux
  • Power Supply: usb

Problem Description

I am trying to get the wifi/power_save example to work but what happens is that at boot-up it associates with my AP but 5 seconds later looses the beacons and then cannot reconnect. I see the same behavior:

  • on my AP as well as if I use my Android phone hotspot AP
  • using the adafruit huzzah32 and a nodemcu, both have the esp-wroom-32
  • using a clean checkout of the esp-idf v3.1.1 tag as well as a clean checkout of master
  • the simple_wifi example does not exhibit this problem

Expected Behavior

  • the module should maintain an association or at least reconnect

Actual Behavior

After 5 seconds the wifi disconnects due to reason 200 (lost beacons) and then reconnect attempts fail due to reason 201.

Steps to reproduce

  1. clean checkout of esp-idf v3.1.1 and recursive submodule init
  2. copy wifi power_save example into working dir
  3. change wifi event disconnected log line to ESP_LOGI(TAG, "SYSTEM_EVENT_STA_DISCONNECTED: %d", event->event_info.disconnected.reason);
  4. make menuconfig and update: ssid, password, serial loader device
  5. make flash monitor

Debug Logs

I (735) power_save: SYSTEM_EVENT_STA_START
I (735) wifi: Set ps type: 1

I (855) wifi: n:1 0, o:1 0, ap:255 255, sta:1 0, prof:1
I (2825) wifi: state: init -> auth (b0)
I (2825) wifi: state: auth -> assoc (0)
I (2835) wifi: state: assoc -> run (10)
I (3145) wifi: connected with tve-home, channel 1
I (3145) wifi: pm start, type: 1

I (9185) wifi: bcn_timout,ap_probe_send_start
I (11685) wifi: ap_probe_send over, resett wifi status to disassoc
I (11685) wifi: state: run -> init (100)
I (11685) wifi: pm stop, total sleep time: 51069 us / 8527784 us

I (11685) wifi: n:1 0, o:1 0, ap:255 255, sta:1 0, prof:1
I (11695) power_save: SYSTEM_EVENT_STA_DISCONNECTED: 200
I (14115) power_save: SYSTEM_EVENT_STA_DISCONNECTED: 201
I (16535) power_save: SYSTEM_EVENT_STA_DISCONNECTED: 201
I (18945) power_save: SYSTEM_EVENT_STA_DISCONNECTED: 201

After this is continues indefinitely with reason 201.

@tve

This comment has been minimized.

Copy link
Author

tve commented Nov 18, 2018

I did some more tests using esp-idf master . If I disable light sleep in the power_save example the module does maintain its association:

> diff -u /home/src/esp32/esp-idf/examples/wifi/power_save/main/power_save.c main/power_save.c
--- /home/src/esp32/esp-idf/examples/wifi/power_save/main/power_save.c  2018-11-16 17:38:21.32848406
5 -0800
+++ main/power_save.c   2018-11-18 10:58:44.309225040 -0800
@@ -53,7 +53,7 @@
                ip4addr_ntoa(&event->event_info.got_ip.ip_info.ip));
         break;
     case SYSTEM_EVENT_STA_DISCONNECTED:
-       ESP_LOGI(TAG, "SYSTEM_EVENT_STA_DISCONNECTED");
+       ESP_LOGI(TAG, "SYSTEM_EVENT_STA_DISCONNECTED: %d", event->event_info.disconnected.reason);
        ESP_ERROR_CHECK(esp_wifi_connect());
        break;
     default:
@@ -103,7 +103,7 @@
             .max_freq_mhz = CONFIG_EXAMPLE_MAX_CPU_FREQ_MHZ,
             .min_freq_mhz = CONFIG_EXAMPLE_MIN_CPU_FREQ_MHZ,
 #if CONFIG_FREERTOS_USE_TICKLESS_IDLE
-            .light_sleep_enable = true
+            //.light_sleep_enable = true
 #endif
     };
     ESP_ERROR_CHECK( esp_pm_configure(&pm_config) );

Log output:

I (402) cpu_start: Pro cpu start user code
I (421) cpu_start: Starting scheduler on PRO CPU.
I (0) cpu_start: Starting scheduler on APP CPU.
I (590) pm_esp32: Frequency switching config: CPU_MAX: 80, APB_MAX: 80, APB_MIN: 10, Light sleep: DISABLED
I (600) wifi: wifi driver task: 3ffc098c, prio:23, stack:3584, core=0
I (600) wifi: wifi firmware version: a85ca5e
I (600) wifi: config NVS flash: enabled
I (610) wifi: config nano formating: disabled
I (610) system_api: Base MAC address is not set, read default base MAC address from BLK0 of EFUSE
I (620) system_api: Base MAC address is not set, read default base MAC address from BLK0 of EFUSE
I (670) wifi: Init dynamic tx buffer num: 32
I (670) wifi: Init data frame dynamic rx buffer num: 32
I (670) wifi: Init management frame dynamic rx buffer num: 32
I (680) wifi: Init static rx buffer size: 1600
I (680) wifi: Init static rx buffer num: 10
I (680) wifi: Init dynamic rx buffer num: 32
I (800) phy: phy_version: 4000, b6198fa, Sep  3 2018, 15:11:06, 0, 0
I (800) wifi: mode : sta (30:ae:a4:23:86:64)
I (800) power_save: esp_wifi_set_ps().
I (800) power_save: SYSTEM_EVENT_STA_START
I (810) wifi: Set ps type: 1

I (930) wifi: n:1 0, o:1 0, ap:255 255, sta:1 0, prof:1
I (2890) wifi: state: init -> auth (b0)
I (2900) wifi: state: auth -> assoc (0)
I (2900) wifi: state: assoc -> run (10)
I (3160) wifi: connected with tve-home, channel 1
I (3200) wifi: pm start, type: 1

I (4600) event: sta ip: 192.168.0.130, mask: 255.255.255.0, gw: 192.168.0.1
I (4600) power_save: SYSTEM_EVENT_STA_GOT_IP
I (4600) power_save: got ip:192.168.0.130

But as I was writing up this post I noticed that it crashed (several minutes after the boot):

Guru Meditation Error: Core  0 panic'ed (StoreProhibited). Exception was unhandled.
Core 0 register dump:
PC      : 0x4011b0ff  PS      : 0x00060730  A0      : 0x8011b2ab  A1      : 0x3ffbddd0
0x4011b0ff: sys_check_timeouts at /home/src/esp32/esp-idf/components/lwip/lwip/src/core/timeouts.c:3
81

A2      : 0x0000005a  A3      : 0x3ffb6678  A4      : 0x00000000  A5      : 0x0000493b
A6      : 0x3ffba1e8  A7      : 0x00000011  A8      : 0x0000005a  A9      : 0x0002dc62
A10     : 0x3ffb6674  A11     : 0x3ffb9d24  A12     : 0x3ffb9de0  A13     : 0x00000000
A14     : 0x00000001  A15     : 0x00000001  SAR     : 0x00000019  EXCCAUSE: 0x0000001d
EXCVADDR: 0x0000003d  LBEG    : 0x4000c2e0  LEND    : 0x4000c2f6  LCOUNT  : 0xffffffff

Backtrace: 0x4011b0ff:0x3ffbddd0 0x4011b2a8:0x3ffbddf0 0x40119c64:0x3ffbde10 0x4008c7f9:0x3ffbde40
0x4011b0ff: sys_check_timeouts at /home/src/esp32/esp-idf/components/lwip/lwip/src/core/timeouts.c:3
81
0x4011b2a8: sys_timeouts_mbox_fetch at /home/src/esp32/esp-idf/components/lwip/lwip/src/core/timeout
s.c:433
0x40119c64: tcpip_thread at /home/src/esp32/esp-idf/components/lwip/lwip/src/api/tcpip.c:483
0x4008c7f9: vPortTaskWrapper at /home/src/esp32/esp-idf/components/freertos/port.c:403

Rebooting...
ets Jun  8 2016 00:22:57

rst:0xc (SW_CPU_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee

I tried the same with v3.1.1 and it behaves the same as far as I can tell.

@tve

This comment has been minimized.

Copy link
Author

tve commented Nov 18, 2018

I let the last test run and it panic'ed a few minutes later. This is esp-idf master running the wifi/power_save example with pm_config.list_sleep_enable = false, i.e. the diff from my previous comment.

It connected fine to the AP and just sat there, but a few minutes later it crashed (I don't have the exact amount of time elapsed)

Guru Meditation Error: Core  0 panic'ed (IllegalInstruction). Exception was unhandled.
Core 0 register dump:
PC      : 0x401030c0  PS      : 0x00060b30  A0      : 0x800d1d36  A1      : 0x3ffaf9b0
0x401030c0: lmac_stop_hw_txq at ??:?

A2      : 0x00000000  A3      : 0x00000000  A4      : 0x3ffb3d8c  A5      : 0x3ffb93d0
A6      : 0x00000003  A7      : 0x00060e23  A8      : 0x3ffb1adc  A9      : 0x3ffaf990
A10     : 0x00000008  A11     : 0x00000000  A12     : 0x0000000a  A13     : 0x00000001
A14     : 0x0d96bf9a  A15     : 0x00000000  SAR     : 0x00000000  EXCCAUSE: 0x00000000
EXCVADDR: 0x00000000  LBEG    : 0x4000c2e0  LEND    : 0x4000c2f6  LCOUNT  : 0xffffffff

Backtrace: 0x401030c0:0x3ffaf9b0 0x400d1d33:0x3ffaf9d0 0x400d1d8f:0x3ffaf9f0 0x4008c7f9:0x3ffafa10
0x401030c0: lmac_stop_hw_txq at ??:?
0x400d1d33: timer_process_alarm at /home/src/esp32/esp-idf/components/esp32/esp_timer.c:412
0x400d1d8f: timer_task at /home/src/esp32/esp-idf/components/esp32/esp_timer.c:412
0x4008c7f9: vPortTaskWrapper at /home/src/esp32/esp-idf/components/freertos/port.c:403

Rebooting...
ets Jun  8 2016 00:22:57

rst:0xc (SW_CPU_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)

And as I'm writing this comment it crashed again in the same place with a slightly different backtrace and what is worse, it sits in an endless reboot loop:

Guru Meditation Error: Core  0 panic'ed (IllegalInstruction). Exception was unhandled.
Core 0 register dump:
PC      : 0x401030b8  PS      : 0x00060130  A0      : 0x800d1d92  A1      : 0x3ffaf9d0
0x401030b8: lmac_stop_hw_txq at ??:?

A2      : 0x3ffc51b8  A3      : 0x00000000  A4      : 0x3ffaea94  A5      : 0x00000000
A6      : 0x3ffaea4c  A7      : 0x00000000  A8      : 0x800d1d36  A9      : 0x3ffaf9b0
A10     : 0x00000000  A11     : 0x00000000  A12     : 0x3ffb3d8c  A13     : 0x3ffb93d0
A14     : 0x00000003  A15     : 0x00060e23  SAR     : 0x00000000  EXCCAUSE: 0x00000000
EXCVADDR: 0x00000000  LBEG    : 0x4000c2e0  LEND    : 0x4000c2f6  LCOUNT  : 0xffffffff

Backtrace: 0x401030b8:0x3ffaf9d0 0x400d1d8f:0x3ffaf9f0 0x4008c7f9:0x3ffafa10
0x401030b8: lmac_stop_hw_txq at ??:?
0x400d1d8f: timer_task at /home/src/esp32/esp-idf/components/esp32/esp_timer.c:412
0x4008c7f9: vPortTaskWrapper at /home/src/esp32/esp-idf/components/freertos/port.c:403

Rebooting...
ets Jun  8 2016 00:22:57

rst:0xc (SW_CPU_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
flash read err, 1000
ets_main.c 371
ets Jun  8 2016 00:22:57

rst:0x10 (RTCWDT_RTC_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
flash read err, 1000
ets_main.c 371
ets Jun  8 2016 00:22:57

After this it's in an infinite loop of RTCWDT_RTC_RESET.

I switched to a different AP that is open (no WPA2) and I see similar crashes:

Guru Meditation Error: Core  0 panic'ed (StoreProhibited). Exception was unhandled.
Core 0 register dump:
PC      : 0x4010307f  PS      : 0x00060330  A0      : 0x800d1d32  A1      : 0x3ffaf9b0
0x4010307f: lmac_stop_hw_txq at ??:?

A2      : 0x00000000  A3      : 0x00000000  A4      : 0x3ffb3d8c  A5      : 0x3ffb93d0
A6      : 0x00000003  A7      : 0x00060e23  A8      : 0x3ffb12bc  A9      : 0x3ffaf990
A10     : 0x00000008  A11     : 0x00000000  A12     : 0x0000000a  A13     : 0x00000001
A14     : 0x0956517b  A15     : 0x00000000  SAR     : 0x00000000  EXCCAUSE: 0x0000001d
EXCVADDR: 0x000003fc  LBEG    : 0x4000c2e0  LEND    : 0x4000c2f6  LCOUNT  : 0xffffffff

Backtrace: 0x4010307f:0x3ffaf9b0 0x400d1d2f:0x3ffaf9d0 0x400d1d8b:0x3ffaf9f0 0x4008c7f9:0x3ffafa10
0x4010307f: lmac_stop_hw_txq at ??:?
0x400d1d2f: timer_process_alarm at /home/src/esp32/esp-idf/components/esp32/esp_timer.c:412
0x400d1d8b: timer_task at /home/src/esp32/esp-idf/components/esp32/esp_timer.c:412
0x4008c7f9: vPortTaskWrapper at /home/src/esp32/esp-idf/components/freertos/port.c:403


Rebooting...

This time the reboot worked.

Do you have an elf of a rock-solid power_save example that runs on an esp-wroom-32 module that I could try?

NB: I can be found on the arduino-esp32 gitter as @tve in case you want to ask some quick questions. I'm on USA west coast time.

@Alvin1Zhang Alvin1Zhang changed the title power_save example does not maintain an association [TW#27458] power_save example does not maintain an association Nov 19, 2018

@tve

This comment has been minimized.

Copy link
Author

tve commented Nov 25, 2018

I spent some time using a wifi monitor to capture packets "in the air", What I observe is that a few seconds after entering light sleep the esp32 starts sending probe requests. The AP responds with many probe responses but none of them are ACKed. Obviously the AP also sends its normal beacons, so the esp32 should not send probe requests to start with. What it looks like to me is that the RX section does not wake-up as it should to receive the beacons or the probe responses.

@liuzfesp

This comment has been minimized.

Copy link
Collaborator

liuzfesp commented Nov 27, 2018

HI @tve, we have debugged the beacon timeout issue in light sleep for more than 1 weeks and make some progress, however we still not sure we have found the root cause.

Need you help to check whether the beacon timeout issue is fixed on your side with following change, the rtc_sleep_init() is defined in $IDF_PATH/components/soc/esp32/rtc_sleep.c :

void rtc_sleep_init(rtc_sleep_config_t cfg)
{
...
REG_SET_FIELD(RTC_CNTL_REG, RTC_CNTL_DIG_DBIAS_WAK, cfg.dig_dbias_wak);
REG_SET_FIELD(RTC_CNTL_REG, RTC_CNTL_DIG_DBIAS_SLP, cfg.dig_dbias_slp);

// Add this line in rtc_sleep_init()
**REG_SET_FIELD(RTC_CNTL_OPTIONS0_REG, RTC_CNTL_BIAS_I2C_FORCE_PU, 1);**

}

For the crash issue, I just can't reproduce it, could you help to describe the detailed reproduce steps, or send your code to me then I can reproduce it myself.

@tve

This comment has been minimized.

Copy link
Author

tve commented Nov 27, 2018

No luck...
The change I applied is:

diff -u $IDF_PATH/components/soc/esp32/rtc_sleep.c.orig $IDF_PATH/components/soc/esp32/rtc_sleep.c
--- /home/src/esp32/esp-idf/components/soc/esp32/rtc_sleep.c.orig       2018-11-26 21:07:29.864398692 -0800
+++ /home/src/esp32/esp-idf/components/soc/esp32/rtc_sleep.c    2018-11-26 21:06:55.411843242 -0800
@@ -211,6 +211,7 @@
     REG_SET_FIELD(RTC_CNTL_REG, RTC_CNTL_DBIAS_WAK, cfg.rtc_dbias_wak);
     REG_SET_FIELD(RTC_CNTL_REG, RTC_CNTL_DIG_DBIAS_WAK, cfg.dig_dbias_wak);
     REG_SET_FIELD(RTC_CNTL_REG, RTC_CNTL_DIG_DBIAS_SLP, cfg.dig_dbias_slp);
+    REG_SET_FIELD(RTC_CNTL_OPTIONS0_REG, RTC_CNTL_BIAS_I2C_FORCE_PU, 1);
 }

 void rtc_sleep_set_wakeup_time(uint64_t t)

The result is unchanged:

I (594) pm_esp32: Frequency switching config: CPU_MAX: 80, APB_MAX: 80, APB_MIN: 10, Light sleep: EN
ABLED
I (604) wifi: wifi driver task: 3ffc09d4, prio:23, stack:3584, core=0
I (604) wifi: wifi firmware version: abbd416
I (604) wifi: config NVS flash: enabled
I (614) wifi: config nano formating: disabled
I (614) system_api: Base MAC address is not set, read default base MAC address from BLK0 of EFUSE
I (624) system_api: Base MAC address is not set, read default base MAC address from BLK0 of EFUSE
I (684) wifi: Init dynamic tx buffer num: 32
I (684) wifi: Init data frame dynamic rx buffer num: 32
I (684) wifi: Init management frame dynamic rx buffer num: 32
I (684) wifi: Init static rx buffer size: 1600
I (684) wifi: Init static rx buffer num: 10
I (694) wifi: Init dynamic rx buffer num: 32
I (814) phy: phy_version: 4000, b6198fa, Sep  3 2018, 15:11:06, 0, 0
I (814) wifi: mode : sta (30:ae:a4:23:86:64)
I (814) power_save: esp_wifi_set_ps().
I (814) power_save: SYSTEM_EVENT_STA_START
I (824) wifi: Set ps type: 1

I (944) wifi: n:6 0, o:1 0, ap:255 255, sta:6 0, prof:1
I (944) wifi: state: init -> auth (b0)
I (954) wifi: state: auth -> assoc (0)
I (964) wifi: state: assoc -> run (10)
I (964) wifi: connected with test, channel 6
I (974) wifi: pm start, type: 1

I (1614) event: sta ip: 192.168.0.130, mask: 255.255.255.0, gw: 192.168.0.1
I (1614) power_save: SYSTEM_EVENT_STA_GOT_IP
I (1614) power_save: got ip:192.168.0.130

Guru Meditation Error: Core  0 panic'ed (IllegalInstruction). Exception was unhandled.
Core 0 register dump:
PC      : 0x400d1da9  PS      : 0x00060130  A0      : 0x8008fb0c  A1      : 0x3ffafa00
0x400d1da9: timer_task at /home/src/esp32/esp-idf/components/esp32/esp_timer.c:412

A2      : 0x00000000  A3      : 0x00000000  A4      : 0x00060023  A5      : 0x3ffafaa8
A6      : 0x00000000  A7      : 0x00000001  A8      : 0x800d1d97  A9      : 0x3ffaf9c0
A10     : 0x00000001  A11     : 0x00000000  A12     : 0x3ffaea98  A13     : 0x00000000
A14     : 0x3ffaea50  A15     : 0x00000000  SAR     : 0x00000000  EXCCAUSE: 0x00000000
EXCVADDR: 0x00000000  LBEG    : 0x4000c2e0  LEND    : 0x4000c2f6  LCOUNT  : 0xffffffff

Backtrace: 0x400d1da9:0x3ffafa00 0x4008fb09:0x3ffafa20
0x400d1da9: timer_task at /home/src/esp32/esp-idf/components/esp32/esp_timer.c:412

0x4008fb09: vPortTaskWrapper at /home/src/esp32/esp-idf/components/freertos/port.c:403


Rebooting...
ets Jun  8 2016 00:22:57

rst:0xc (SW_CPU_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)

I'm attaching my ELF file:
power_save.elf.zip

@liuzfesp

This comment has been minimized.

Copy link
Collaborator

liuzfesp commented Nov 27, 2018

HI tve, do you add your private code into the power save example?

@tve

This comment has been minimized.

Copy link
Author

tve commented Nov 27, 2018

I changed the SSID to "test", no password.
But please hold, I have created an automated build using travis-ci.org and will make this available to you. This way it is built cleanly on ubuntu and not dependent on my desktop (running arch linux). That will make it more reproducible. I have it working but need to document, I will have that tomorrow.

@liuzfesp

This comment has been minimized.

Copy link
Collaborator

liuzfesp commented Nov 27, 2018

Thanks tve. It will be appreciated if you can make this issue more reproducible. I can reproduce this issue in our lab, however it needs at least more than 12 hours, sometimes it need more than 1 day.

@tve

This comment has been minimized.

Copy link
Author

tve commented Nov 28, 2018

OK, I now have travis builds working. I forked the esp-idf repo and only added .travis.yml, see https://github.com/tve/esp-idf/blob/master/.travis.yml
This results in a build, the latest being https://travis-ci.org/tve/esp-idf/builds/460639618, you can see the build log there
The script uploads an archive with just the files to flash an esp32 to AWS S3: https://s3.amazonaws.com/s3.voneicken.com/esp-idf/esp-idf-power_save-master-19.tgz
It also uploads a (big!) archive of the entire esp-idf tree after the build: https://s3.amazonaws.com/s3.voneicken.com/esp-idf/esp-idf-master-19.tgz

Here is my console log in downloading, flashing and running this to reproduce the issue:

> mkdir esp-idf-master-19
> cd esp-idf-master-19
> curl -L https://s3.amazonaws.com/s3.voneicken.com/esp-idf/esp-idf-power_save-master-19.tgz | tar zxf -
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1952k  100 1952k    0     0   486k      0  0:00:04  0:00:04 --:--:--  486k
> ls
bootloader/  flash.cmd  partitions_singleapp.bin  power_save.bin  power_save.elf*  power_save.map
> cat flash.cmd
Toolchain path: /home/travis/xtensa-esp32-elf/bin/xtensa-esp32-elf-gcc
Toolchain version: crosstool-ng-1.22.0-80-g6c4433a
Compiler version: 5.2.0
Python requirements from /home/travis/build/tve/esp-idf/requirements.txt are satisfied.
--flash_mode dio --flash_freq 40m --flash_size detect 0x1000 bootloader/bootloader.bin 0x10000 power
_save.bin 0x8000 partitions_singleapp.bin
> python2 /home/src/esp32/esp-idf/components/esptool_py/esptool/esptool.py --chip esp32 --port /dev/tt
yUSB1 --baud 115200 --before default_reset --after hard_reset write_flash -z --flash_mode dio --flas
h_freq 40m --flash_size detect 0x1000 bootloader/bootloader.bin 0x10000 power_save.bin 0x8000 partit
ions_singleapp.bin
esptool.py v2.6-beta1
Serial port /dev/ttyUSB1
Connecting....
Chip is ESP32D0WDQ6 (revision 1)
Features: WiFi, BT, Dual Core, Coding Scheme None
MAC: 30:ae:a4:23:86:64
Uploading stub...
Running stub...
Stub running...
Configuring flash size...
Auto-detected Flash size: 4MB
Flash params set to 0x0220
Compressed 23008 bytes to 13637...
Wrote 23008 bytes (13637 compressed) at 0x00001000 in 1.2 seconds (effective 153.3 kbit/s)...
Hash of data verified.
Compressed 586624 bytes to 358587...
Wrote 586624 bytes (358587 compressed) at 0x00010000 in 31.6 seconds (effective 148.4 kbit/s)...
Hash of data verified.
Compressed 3072 bytes to 103...
Wrote 3072 bytes (103 compressed) at 0x00008000 in 0.0 seconds (effective 1854.4 kbit/s)...
Hash of data verified.

Leaving...
Hard resetting via RTS pin...


> miniterm2.py --rts 0 --dtr 0 /dev/ttyUSB1 115200
--- forcing DTR inactive
--- forcing RTS inactive
--- Miniterm on /dev/ttyUSB1  115200,8,N,1 ---
--- Quit: Ctrl+] | Menu: Ctrl+T | Help: Ctrl+T followed by Ctrl+H ---
ets Jun  8 2016 00:22:57

rst:0x1 (POWERON_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)

...

␛[0;32mI (813) power_save: esp_wifi_set_ps().␛[0m
␛[0;32mI (813) power_save: SYSTEM_EVENT_STA_START␛[0m
I (823) wifi: Set ps type: 1

I (943) wifi: n:6 0, o:1 0, ap:255 255, sta:6 0, prof:1
I (943) wifi: state: init -> auth (b0)
I (953) wifi: state: auth -> assoc (0)
I (963) wifi: state: assoc -> run (10)
I (963) wifi: connected with test, channel 6
I (973) wifi: pm start, type: 1

␛[0;32mI (1613) event: sta ip: 192.168.0.130, mask: 255.255.255.0, gw: 192.168.0.1␛[0m
␛[0;32mI (1613) power_save: SYSTEM_EVENT_STA_GOT_IP␛[0m
␛[0;32mI (1613) power_save: got ip:192.168.0.130
␛[0m
I (9113) wifi: bcn_timout,ap_probe_send_start
I (11623) wifi: ap_probe_send over, resett wifi status to disassoc
I (11623) wifi: state: run -> init (100)
I (11623) wifi: pm stop, total sleep time: 1775283 us / 10606058 us

I (11623) wifi: n:6 0, o:6 0, ap:255 255, sta:6 0, prof:1
␛[0;32mI (11633) power_save: SYSTEM_EVENT_STA_DISCONNECTED␛[0m
␛[0;32mI (14053) power_save: SYSTEM_EVENT_STA_DISCONNECTED␛[0m
␛[0;32mI (16463) power_save: SYSTEM_EVENT_STA_DISCONNECTED␛[0m

Continued in the next comment...

@tve

This comment has been minimized.

Copy link
Author

tve commented Nov 28, 2018

I then created a new branch tw27458-fix and applied the change:

> git diff
diff --git a/components/soc/esp32/rtc_sleep.c b/components/soc/esp32/rtc_sleep.c
index 6b16aa288..41c02c687 100644
--- a/components/soc/esp32/rtc_sleep.c
+++ b/components/soc/esp32/rtc_sleep.c
@@ -211,6 +211,8 @@ void rtc_sleep_init(rtc_sleep_config_t cfg)
     REG_SET_FIELD(RTC_CNTL_REG, RTC_CNTL_DBIAS_WAK, cfg.rtc_dbias_wak);
     REG_SET_FIELD(RTC_CNTL_REG, RTC_CNTL_DIG_DBIAS_WAK, cfg.dig_dbias_wak);
     REG_SET_FIELD(RTC_CNTL_REG, RTC_CNTL_DIG_DBIAS_SLP, cfg.dig_dbias_slp);
+    // TW27458 fix
+    REG_SET_FIELD(RTC_CNTL_OPTIONS0_REG, RTC_CNTL_BIAS_I2C_FORCE_PU, 1);
 }

 void rtc_sleep_set_wakeup_time(uint64_t t)

The travis builds are (the first one is the full 280MB tree, second is just the image):

I'm still working through what I'm observing... The Adafruit Huzzah32 fails with the fix. I'm also testing a NodeMCU ESP32S and so far I can't reproduce the problem using the unmodified IDF. I'm trying to make sure it's not a power supply or similar issue...

Something else noticed looking at the logs of my AP, which is using hostapd. What I see is:
initial association:

+rtl871x_get_sta_wpaie, 30:ae:a4:7c:9b:c4 is sta's address
wlan0: STA 30:ae:a4:7c:9b:c4 IEEE 802.11: associated
wlan0: STA 30:ae:a4:7c:9b:c4 RADIUS: starting accounting session 5BFCBB37-00000023

and then a little later:

wlan0: STA 30:ae:a4:7c:9b:c4 IEEE 802.11: disassociated
wlan0: AP-STA-DISCONNECTED 30:ae:a4:7c:9b:c4
+rtl871x_sta_remove_ops, 30:ae:a4:7c:9b:c4 is sta address removed

and a little later the esp32 goes through its beacon time-out and reconnection:

I (297849) wifi: bcn_timout,ap_probe_send_start
I (297859) wifi: state: run -> init (7c0)
I (297859) wifi: pm stop, total sleep time: 54231628 us / 56086980 us

I (297859) wifi: n:6 0, o:6 0, ap:255 255, sta:6 0, prof:1
␛[0;32mI (297859) power_save: SYSTEM_EVENT_STA_DISCONNECTED␛[0m
I (297989) wifi: n:6 0, o:6 0, ap:255 255, sta:6 0, prof:1
I (297989) wifi: state: init -> auth (b0)
I (297999) wifi: state: auth -> assoc (0)
I (298009) wifi: state: assoc -> run (10)
I (298009) wifi: connected with test, channel 6
I (298059) wifi: pm start, type: 1

␛[0;32mI (298789) event: sta ip: 192.168.0.95, mask: 255.255.255.0, gw: 192.168.0.1␛[0m
␛[0;32mI (298789) power_save: SYSTEM_EVENT_STA_GOT_IP␛[0m
␛[0;32mI (298789) power_save: got ip:192.168.0.95

and the AP reports as expected:

+rtl871x_get_sta_wpaie, 30:ae:a4:7c:9b:c4 is sta's address
wlan0: STA 30:ae:a4:7c:9b:c4 IEEE 802.11: associated
wlan0: STA 30:ae:a4:7c:9b:c4 RADIUS: starting accounting session 5BFCBB37-00000024

So the whole thing gets triggered by an association timeout on the AP side as far as I can tell. (The above was captured with the NodeMCU running the un-modified IDF).

NB: if you need some help with Travis, let me know. I don't know whether you're familiar with this type of continuous integration service.

@tve

This comment has been minimized.

Copy link
Author

tve commented Nov 29, 2018

I have been running many more tests. Unfortunately I don't know what to say... I have 3 esp32 boards:

On a NodeMCU with esp-wroom-32 module I cannot reproduce the SYSTEM_EVENT_STA_DISCONNECTED loop problem at all.

On a TTGO w/LoRa board that has an esp32 chip I cannot reproduce the problem either.

On an Adafruit Huzzah32 with esp-wroom-32 I have lots of problems. I can reproduce the SYSTEM_EVENT_STA_DISCONNECTED loop problem easily. But if I try the version with the fix, after associating it gets a Guru Meditation Error, then restarts one or two times, then goes into an infinite loop of

rst:0x10 (RTCWDT_RTC_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
flash read err, 1000
ets_main.c 371
ets Jun  8 2016 00:22:57

I tried different USB cables, ports, power supplies, LiPo. No change. I checked the strapping (expected to be correct on an Adafruit board which I'm not connecting to anything but USB). I am wondering whether there is some HW problem on this board/module. I think I will order another one to compare.

:-(

@liuzfesp

This comment has been minimized.

Copy link
Collaborator

liuzfesp commented Nov 29, 2018

Hi @tve, we are now debugging this issue.

@igrr

This comment has been minimized.

Copy link
Member

igrr commented Nov 29, 2018

@tve Could you please attach a photo of the module? Interested in what is engraved on the metal shield. It might help to identify the module version.

@tve

This comment has been minimized.

Copy link
Author

tve commented Nov 29, 2018

Haha, yes I can, it'll look a little odd, though. I'm also stubborn/tenacious so I kept fiddling...

First, the NodeMCU board has been running the original unmodified IDF power_save. I wanted to know whether it reproduces the issue. It sort of did:

I (15049738) wifi: n:6 0, o:6 0, ap:255 255, sta:6 0, prof:1
I (15049738) wifi: state: init -> auth (b0)
I (15050738) wifi: state: auth -> init (200)
I (15050738) wifi: n:6 0, o:6 0, ap:255 255, sta:6 0, prof:1
␛[0;32mI (15050748) power_save: SYSTEM_EVENT_STA_DISCONNECTED␛[0m
␛[0;32mI (15053158) power_save: SYSTEM_EVENT_STA_DISCONNECTED␛[0m
␛[0;32mI (15055578) power_save: SYSTEM_EVENT_STA_DISCONNECTED␛[0m
␛[0;32mI (15057998) power_save: SYSTEM_EVENT_STA_DISCONNECTED␛[0m
␛[0;32mI (15060408) power_save: SYSTEM_EVENT_STA_DISCONNECTED␛[0m
␛[0;32mI (15062828) power_save: SYSTEM_EVENT_STA_DISCONNECTED␛[0m
␛[0;32mI (15065248) power_save: SYSTEM_EVENT_STA_DISCONNECTED␛[0m
I (15066218) wifi: n:6 0, o:6 0, ap:255 255, sta:6 0, prof:1
I (15066218) wifi: state: init -> auth (b0)
I (15067218) wifi: state: auth -> init (200)
I (15067218) wifi: n:6 0, o:6 0, ap:255 255, sta:6 0, prof:1

I.e. it "recovered" on its own. This type of sequence has happened a few times over the past few hours.

About the Adafruit Huzzah32, I decided to take the shield off and touch up the solder on the flash chip. That didn't help, so I decided to replace the flash chip. But I used the wrong tools and butchered it, lifting two pads in the process. The result is an "artistic" replacement with a Winbond 25Q128 (I didn't have a 25Q32 handy). Here's a photo of that, and please don't laugh too hard :-) :-) You can make out the shield and the module PCB markings.

20181128_205523_sml

Now the interesting part: I flashed the image with the fix and after ~10 minutes it has not shown any signs of problems. So it seems likely that the problem was with the flash chip (a Gigadevice).

I'm planning to let it run for another half hour and then flash the unmodified version to see whether the problem reproduces then. I will report back...

Update: guess what, as soon as I posted the comment it went through some disconnects, which look OK to me (assuming disconnects should happen at all):

I (1048) wifi: pm start, type: 1

␛[0;32mI (2198) event: sta ip: 192.168.0.130, mask: 255.255.255.0, gw: 192.168.0.1␛[0m
␛[0;32mI (2198) power_save: SYSTEM_EVENT_STA_GOT_IP␛[0m
␛[0;32mI (2198) power_save: got ip:192.168.0.130
␛[0m
I (543998) wifi: state: run -> init (7c0)
I (544008) wifi: pm stop, total sleep time: 515375651 us / 551959958 us

I (544008) wifi: n:6 0, o:6 0, ap:255 255, sta:6 0, prof:1
␛[0;32mI (544008) power_save: SYSTEM_EVENT_STA_DISCONNECTED␛[0m
I (544138) wifi: n:6 0, o:6 0, ap:255 255, sta:6 0, prof:1
I (544138) wifi: state: init -> auth (b0)
I (544138) wifi: state: auth -> assoc (0)
I (544148) wifi: state: assoc -> run (10)
I (544148) wifi: connected with test, channel 6
I (544158) wifi: pm start, type: 1

␛[0;32mI (544788) event: sta ip: 192.168.0.130, mask: 255.255.255.0, gw: 192.168.0.1␛[0m
␛[0;32mI (544788) power_save: SYSTEM_EVENT_STA_GOT_IP␛[0m
␛[0;32mI (544788) power_save: got ip:192.168.0.130

Again, this is the Huzzah32 with the TW27458 1-line fix

@igrr

This comment has been minimized.

Copy link
Member

igrr commented Nov 29, 2018

Okay, so to summarize there might be two unrelated problems here: first is the unrecoverable Wi-Fi disconnect, second is the crash with subsequent RTC_WDT_RESETs. The second problem seems to go away when replacing the Gigadevice part with a Winbond one.

@tve

This comment has been minimized.

Copy link
Author

tve commented Nov 29, 2018

Correct.

The Huzzah32 ran for an hour with the fix and while it showed individual disconnections they immediately recovered.
I now have it running for 10 minutes on the original IDF without fix and it has gone through a 400 second stretch of SYSTEM_EVENT_STA_DISCONNECTED and then managed to get an IP address again. 90 seconds later it disconnected again and that is ongoing. So I can definitely reproduce the issue.

@tve

This comment has been minimized.

Copy link
Author

tve commented Nov 29, 2018

FYI, sniffing the wifi while the Huzzah32 is stuck in repeated SYSTEM_EVENT_STA_DISCONNECTED shows that it is sending probe requests and the AP is responding but the esp32 does not ACK them. I can provide pcap files if it helps.

screenshot from 2018-11-28 22-43-32

@TemcoHeng

This comment has been minimized.

Copy link

TemcoHeng commented Dec 4, 2018

Hi, @tve Do you still have a problem on lightsleep example of wifi disconnected ?

try add this line in the file rtc_sleep.c on the function rtc_sleep_init()

void rtc_sleep_init(rtc_sleep_config_t cfg)
{
....
REG_SET_FIELD(RTC_CNTL_REG, RTC_CNTL_DBIAS_SLP, cfg.rtc_dbias_slp);
REG_SET_FIELD(RTC_CNTL_REG, RTC_CNTL_DBIAS_WAK, cfg.rtc_dbias_wak);
REG_SET_FIELD(RTC_CNTL_REG, RTC_CNTL_DIG_DBIAS_WAK, cfg.dig_dbias_wak);
REG_SET_FIELD(RTC_CNTL_REG, RTC_CNTL_DIG_DBIAS_SLP, cfg.dig_dbias_slp);
// Add this line in the end of rtc_sleep_init()
REG_SET_FIELD(RTC_CNTL_OPTIONS0_REG, RTC_CNTL_BIAS_I2C_FORCE_PU, 1);

}

igrr added a commit that referenced this issue Dec 10, 2018

soc/rtc: reset BBPLL configuration after enabling it
A workaround to reset BBPLL configuration after light sleep. Fixes the
issue that Wi-Fi can not receive packets after waking up from light
sleep.

Ref. #2711
@TimXia

This comment has been minimized.

Copy link
Contributor

TimXia commented Dec 11, 2018

@tve Could you please provide the brand and android version of your phone? So that we can reproduce the WiFi disconnection here.

@tve

This comment has been minimized.

Copy link
Author

tve commented Dec 11, 2018

Phone is Samsung Galaxy S6 running Android 5.0.1 (sigh!).

I can also reproduce the problem running an access point using an ODROID C1+ with Ubuntu 16.04 kernel 3.10, hostapd v0.8.x_rtw_r7475.20130812_beta (that has RealTek drivers), and a USB-Wifi "Edimax Technology Co., Ltd EW-7811Un 802.11n Wireless Adapter [Realtek RTL8188CUS]". The command-line I use is:

sudo /home/sw/create_ap/create_ap -m bridge -c 6 --ieee80211n --country US --ht_capab HT20 wlan0 eth0 test

I can see whether it's reproducible on an rPi-0w, if that's easy for you to try?

igrr added a commit that referenced this issue Dec 21, 2018

soc/rtc: reset another BBPLL related register
Follow-up to b21ffc8: an additional register needs to be reset.

Ref. #2711

igrr added a commit that referenced this issue Dec 29, 2018

soc/rtc: reset BBPLL configuration after enabling it
A workaround to reset BBPLL configuration after light sleep. Fixes the
issue that Wi-Fi can not receive packets after waking up from light
sleep.

Ref. #2711

igrr added a commit that referenced this issue Dec 29, 2018

soc/rtc: reset another BBPLL related register
Follow-up to b21ffc8: an additional register needs to be reset.

Ref. #2711

igrr added a commit that referenced this issue Jan 9, 2019

soc/rtc: reset BBPLL configuration after enabling it
A workaround to reset BBPLL configuration after light sleep. Fixes the
issue that Wi-Fi can not receive packets after waking up from light
sleep.

Ref. #2711
@TimXia

This comment has been minimized.

Copy link
Contributor

TimXia commented Feb 1, 2019

@tve We have fixed this issue on IDF. Please update IDF to check it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment