Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

esp8266: Crashes on concurrent WebREPL access #2537

Closed
fisehara opened this issue Oct 19, 2016 · 11 comments
Closed

esp8266: Crashes on concurrent WebREPL access #2537

fisehara opened this issue Oct 19, 2016 · 11 comments

Comments

@fisehara
Copy link

fisehara commented Oct 19, 2016

Hi I want to program my EPS8266 micropython via the webrepl during it executes scripts via wifi.
I don't want to use a wire for just updating a script

System

MicroPython v1.8.5 on 2016-10-18; ESP module with ESP8266

What I've observed

micropython program

import webrepl
webrepl.start(password='')
while True:
print('1')

Host

Sending a file test.py via webrepl
(dd if=/dev/zero of=test.py bs=1K count=1) or (dd if=/dev/zero of=test.py bs=512 count=1)
./webrepl_cli.py test.py 192.168.4.1:/test.py

2 reactions on the esp (eventually)

dupterm: EOF received, deactivating
dupterm: Exception in write() method, deactivating: OSError: [Errno 9] EBADF
Fatal exception 28(LoadProhibitedCause):
epc1=0x40245962, epc2=0x00000000, epc3=0x00000000, excvaddr=0x00000000, depc=0x00000000

Fatal exception 9(LoadStoreAlignmentCause):
epc1=0x40253ad0, epc2=0x00000000, epc3=0x00000000, excvaddr=0x3ffff757, depc=0x00000000

ets Jan 8 2013,rst cause:2, boot mode:(3,0)

@pfalcon
Copy link
Contributor

pfalcon commented Oct 20, 2016

If you're interested to have progress with this sooner rather than later, please build the latest master (requires latest esp-open-sdk), and see if the issue still persists. If it does, please attach firmware.bin, firmware.elf, firmware.map files for investigation. Thanks.

@fisehara
Copy link
Author

Hi with the current sdk and the latest micropython src I still get the same issue.

Scenario 1:

import webrepl
webrepl.start(password='')

no other actions on the esp
=> The webrepl filetransfer doesn't break (a hundret times no break)

Scenario 2:

same as 1 but now with
while True:
1+1

=> The webrepl filetransfer breaks after the 2. or 3. time of transfering

ESP serial output:

Fatal exception 9(LoadStoreAlignmentCause):
epc1=0x40254f14, epc2=0x00000000, epc3=0x00000000, excvaddr=0x3ffff757, depc=0x00000000

ets Jan 8 2013,rst cause:2, boot mode:(3,0)

load 0x40100000, len 32020, room 16
tail 4
chksum 0x9e
load 0x3ffe8000, len 1084, room 4
tail 8
chksum 0x54
load 0x3ffe8440, len 3000, room 0
tail 8
chksum 0x2e
csum 0x2e
l��|��rrnb��l�b�lb쌜���lb�lrlll��|��rrnb��ll��b�b쌜��b��lrl�l��|��rrnb��l��b�b쌜��lb�lbl����n�r��n|�llll��r�l�l�l��r�l�l�l��r�l���llrl��rl���b��b�bbr�rb��n�nn�l��l�l��ll������l�n����bll�rp���bl�brlrlr�n�����n����b��l�l����l��pbl`��ln�prl�l���#4 ets_task(40100164, 3, 3fff6388, 4)
could not open file 'main.py' for reading

MicroPython v1.8.5-18-g84679e0 on 2016-10-20; ESP module with ESP8266

Firmware

LINK build/firmware.elf
text data bss dec hex filename
536312 1084 56352 593748 90f54 build/firmware.elf
Create build/firmware-combined.bin
esptool.py v1.2-dev
('flash ', 36144)
('padding ', 720)
('irom0text', 501292)
('total ', 538156)
('md5 ', '95fda393ad3b989cba315bfeaecb4659')

micropython_esp8266_firmware.tar.gz

@pfalcon
Copy link
Contributor

pfalcon commented Oct 20, 2016

Thanks for detailed steps to reproduce. Well, first of all, concurrent webrepl connections aren't supported - you can have only one connection. I guess it's time to actually add check for that.

Otherwise, the exception above looks pretty weird - it's for unaligned access to a stack variable, something which should not ever happen. It would be nice to understand ins and outs of that, but general issue would be trying to run code concurrently which was not intended for that.

@pfalcon pfalcon changed the title esp8266: webREPL crashed when loaded esp8266: Crashes on concurrent WebREPL access Oct 20, 2016
@fisehara
Copy link
Author

Hi pfalcon,

the webrepl connections aren't parallel. They happen sequencially and after the 2. oder 3. transfer the error happens

@pfalcon
Copy link
Contributor

pfalcon commented Oct 20, 2016

Ok, so can you please provide exact steps required to reproduce the issue, as 1-2-3... ?

@pfalcon
Copy link
Contributor

pfalcon commented Oct 20, 2016

I guess it's time to actually add check for that.

Now done in: 3f251ef

@fisehara, to avoid any confusion, please retry with this revision.

@fisehara
Copy link
Author

@pfalcon I'm sorry, for the delay and that the change didn't solved the problem. Only the count of retries has increased. So after the 10. time of sending the same file to the esp the esp crashes with:

Fatal exception 9(LoadStoreAlignmentCause):
epc1=0x4025dc4c, epc2=0x00000000, epc3=0x00000000, excvaddr=0x3ffff757, depc=0x00000000

ets Jan 8 2013,rst cause:2, boot mode:(3,6)

load 0x40100000, len 32020, room 16
tail 4
chksum 0x8d
load 0x3ffe8000, len 1084, room 4
tail 8
chksum 0x8d
load 0x3ffe8440, len 3000, room 0
tail 8
chksum 0xf7
csum 0xf7
l��|��rrnb��l�b�lb쌜���lb�lrlll��|��rrnb��ll��b�b쌜��b��lrl�l��|��rrnb��l��b�b쌜��lb�lbl����n�r��n|�llll��r�l�l�l��r�l�l�l��r�l���llrl��rl���b��b�bbr�rb��n�nn�l��l�l��ll������l�n����bll�rp���bl�brlrlr�n�����n����b��l�l����l��pbl`��ln�prl�l���#4 ets_task(40100164, 3, 3fff6388, 4)
could not open file 'main.py' for reading

MicroPython v1.8.5-42-gb78144c on 2016-10-24; ESP module with ESP8266
Type "help()" for more information.

@pfalcon
Copy link
Contributor

pfalcon commented Oct 24, 2016

Thanks. I'd need exact list of steps to reproduce it:

  1. ...
  2. ...
  3. ...

, to avoid any confusion. Thanks.

@fisehara
Copy link
Author

  1. Power On ESP

  2. connect to the micropython access point via wifi
    Output on ESP:
    MicroPython v1.8.5-42-gb78144c on 2016-10-24; ESP module with ESP8266
    Type "help()" for more information.
    add 1
    aid 1
    station: b4:b6:76:96:2f:a8 join, AID = 1

  3. on ESP:

    import webrepl
    webrepl.start(password='')
    WebREPL daemon started on ws://192.168.4.1:8266
    Started webrepl in normal mode
    while True:
    ... 1+1
    ...
    ...
    2
    2
    2
    2
    2

  4. On host (connected to ESP via wifi) in shell
    4.1 dd if=/dev/zero of=test.data bs=4096 count=1
    4.2 ./webrepl_cli.py test.data 192.168.4.1:/test.data
    dd if=/dev/zero of=test.data bs=4096 count=1
    try/except leads to infinite loop with growing memory usage #1+0 Datensätze ein
    try/except leads to infinite loop with growing memory usage #1+0 Datensätze aus
    Consolidate use of modnetwork, modusocket and cc31k  #4096 bytes (4,1 kB, 4,0 KiB) copied, 0,000237389 s, 17,3 MB/s
    ./webrepl_cli.py test.data 192.168.4.1:/test.data
    #put 192.168.4.1 8266
    #test.data -> /test.data
    #Password:
    #Remote WebREPL version: (1, 8, 5)
    #Sent 4096 of 4096 bytes
    ./webrepl_cli.py test.data 192.168.4.1:/test.data
    #put 192.168.4.1 8266
    #test.data -> /test.data
    #Password:

  5. crash on ESP
    ...
    2
    2
    2
    2
    2
    Fatal exception 9(LoadStoreAlignmentCause):
    epc1=0x4025dc4c, epc2=0x00000000, epc3=0x00000000, excvaddr=0x3ffff757, depc=0x00000000

Not defined count of webrepl data send

Sometimes it breaks after the 2. iteration sometime it breaks first after the 8. interation of webrepl. So I can't give you any deterministic count of steps, sry :-(

@pfalcon
Copy link
Contributor

pfalcon commented May 14, 2017

Does this still happen? There were no similar reports from anybody else. Please retest.

@dpgeorge
Copy link
Member

I looked into this issue. I could reproduce it with v1.9.4 of MicroPython. The steps were as described above, and I was able to add a delay to the printing loop like this:

>>> while 1:
...     1+1
...     time.sleep_ms(100)

That would still lead to a crash. But note that that code must be typed in at the normal REPL (and not paste mode!) to reproduce the issue.

The bug ended up being random values written to the stack due to MP_STATE_VM(dupterm_arr_obj) not being reset to its original values in mp_uos_dupterm_tx_strn() after an exception in the streams write method, due to an abrupt disconnect of webrepl_cli.py. The board would crash on the second round because the data was corrupt from the first round closing abruptly.

The fix is to make sure that dupterm_arr_obj is reset to original values even after an exception. At commit 5042d98 the fix is:

--- a/extmod/uos_dupterm.c
+++ b/extmod/uos_dupterm.c
@@ -94,13 +94,13 @@ void mp_uos_dupterm_tx_strn(const char *str, size_t len) {
         if (MP_STATE_VM(dupterm_objs[idx]) == MP_OBJ_NULL) {
             continue;
         }
+        mp_obj_array_t *arr = MP_OBJ_TO_PTR(MP_STATE_VM(dupterm_arr_obj));
+        void *org_items = arr->items;
         nlr_buf_t nlr;
         if (nlr_push(&nlr) == 0) {
             mp_obj_t write_m[3];
             mp_load_method(MP_STATE_VM(dupterm_objs[idx]), MP_QSTR_write, write_m);
 
-            mp_obj_array_t *arr = MP_OBJ_TO_PTR(MP_STATE_VM(dupterm_arr_obj));
-            void *org_items = arr->items;
             arr->items = (void*)str;
             arr->len = len;
             write_m[2] = MP_STATE_VM(dupterm_arr_obj);
@@ -110,6 +110,8 @@ void mp_uos_dupterm_tx_strn(const char *str, size_t len) {
             arr->len = 1;
             nlr_pop();
         } else {
+            arr->items = org_items;
+            arr->len = 1;
             mp_uos_deactivate(idx, "dupterm: Exception in write() method, deactivating: ", nlr.ret_val);
         }
     }

But it's no longer an issue in the latest MicroPython because commit 0359064 removed the need for the dupterm_arr_obj.

In summary: the bug was there but it has been fixed already by 0359064 (so v1.10 is the oldest release with the fix).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants