Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extmod: regex crash due to stack overflow #2451

Closed
B-Laurent opened this issue Sep 23, 2016 · 7 comments
Closed

extmod: regex crash due to stack overflow #2451

B-Laurent opened this issue Sep 23, 2016 · 7 comments
Labels

Comments

@B-Laurent
Copy link

Hello,

The following code crash the board but it works fine with unix micropython :

import ure

txtmsg = """client connected from ('192.168.0.44', 33714)
GET /?rouge=255&vert=255&bleu=255&time=15 HTTP/1.1
Host: 192.168.0.102
User-Agent: Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:42.0) Gecko/20100101 Firefox/42.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: fr,fr-FR;q=0.8,en-US;q=0.5,en;q=0.3
Accept-Encoding: gzip, deflate
Referer: http://192.168.0.102/
Connection: keep-alive"""

print(txtmsg)

rouge = ure.search('rouge=(.+)&vert', txtmsg)
print('rouge type: ',type(rouge), rouge.group(1))

vert = ure.search('vert=(.+)&bleu', txtmsg)
print('vert type: ',type(vert), vert.group(1))

bleu = ure.search('bleu=(.+)&time', txtmsg)
print('bleu type: ',type(bleu), bleu.group(1))

time = ure.search('time=(.+)\ HTTP', txtmsg)
print('time type: ',type(time), time.group(1))

The board is a chinese nodeMCU :

>>> import port_diag
FlashROM:
Flash ID: 1640e0 (Vendor: e0 Device: 4016)
Flash bootloader data:
Byte @2: 00
Byte @3: 40 (Flash size: 4MB Flash freq: 40MHZ)
Firmware checksum:
size: 530228
md5: bc440e0ab9bf219b447b76721de745b3
True

And micropython is
MicroPython v1.8.4-38-g34e0198-dirty on 2016-09-20; ESP module with ESP8266

Other tests can be found here :
http://forum.micropython.org/viewtopic.php?f=16&t=2416

LAurent_B

@pfalcon
Copy link
Contributor

pfalcon commented Sep 23, 2016

Please try to provide minimal example to reproduce the problem, and also elaborate what "crash the board" means, like provide the exact output from the module when running a minimal example. This will allow to investigate the issue sooner rather than later.

@deshipu
Copy link
Contributor

deshipu commented Sep 23, 2016

I can reproduce it with a little shorter program. However, shortening the string makes it stop crashing, so I suppose the length has something to do with it. What's interesting, it doesn't crash right away -- I still get the prompt, and then it crashes after a second or so.

MicroPython v1.8.4-55-ge2240d4 on 2016-09-23; ESP module with ESP8266
Type "help()" for more information.
>>> import ure
>>> txtmsg = """client connected from ('192.168.0.44', 33714)
... GET /?rouge=255&vert=255&bleu=255&time=15 HTTP/1.1
... Host: 192.168.0.102
... User-Agent: Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:42.0) Gecko/20100101 Firefox/42.0
... Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
... Accept-Language: fr,fr-FR;q=0.8,en-US;q=0.5,en;q=0.3
... Accept-Encoding: gzip, deflate
... Referer: http://192.168.0.102/
... Connection: keep-alive"""
>>> rouge = ure.search('rouge=(.+)&vert', txtmsg)
>>> 
 ets Jan  8 2013,rst cause:4, boot mode:(3,6)

wdt reset
load 0x40100000, len 30924, room 16 
tail 12
chksum 0xeb
ho 0 tail 12 room 4
load 0x3ffe8000, len 1064, room 12 
tail 12
chksum 0x4a
ho 0 tail 12 room 4
load 0x3ffe8430, len 3000, room 12 
tail 12
chksum 0x55
csum 0x55
�$��|��{roc����$�c�l#쌜��b�l{$$l�����{rob��l����b��b�ܜ��#��lr$�$��|��srob��l����b�#䌜����#�l#$��r�$�l��l`���N�d�$�l ��r�l���l$ {l��n���B$�b{|� �p��cb2b��n�nN�ll��l���l��ll���d��$`�o����bl�p���bl�bpr${l{�N�����o�߀�bsB��"쏜������c$��$n�pr$�l�����#4 ets_task(40100390, 3, 3fff6300, 4)
could not open file 'main.py' for reading

MicroPython v1.8.4-55-ge2240d4 on 2016-09-23; ESP module with ESP8266
Type "help()" for more information.
>>> 

@B-Laurent
Copy link
Author

Sorry, for the missing information. Here the result of different try.

First evaluation : running the cript as # main.py

  • The board is erased ans flashed.
  • The script is imported on board with webrepl:
>>> import os 
>>> os.listdir()
['boot.py', 'port_config.py', 'main.py']
>>> chg_A3:-180

>>> import main.py
client connected from ('192.168.0.44', 33714)
GET /?rouge=255&vert=255&bleu=255&time=15 HTTP/1.1
Host: 192.168.0.102
User-Agent: Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:42.0) Gecko/20100101 Firefox/42.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: fr,fr-FR;q=0.8,en-US;q=0.5,en;q=0.3
Accept-Encoding: gzip, deflate
Referer: http://192.168.0.102/
Connection: keep-alive

 ets Jan  8 2013,rst cause:4, boot mode:(3,7)

wdt reset
load 0x40100000, len 30924, room 16 
tail 12
chksum 0x82
ho 0 tail 12 room 4
load 0x3ffe8000, len 1064, room 12 
tail 12
chksum 0xbd
ho 0 tail 12 room 4
load 0x3ffe8430, len 3000, room 12 
tail 12
chksum 0xa2
csum 0xa2

�l
  c���
$��ln�prl�
          l��#4 ets_task(40100390, 3, 3fff6300, 4)
WebREPL daemon started on ws://192.168.4.1:8266
Started webrepl in normal mode
client connected from ('192.168.0.44', 33714)
GET /?rouge=255&vert=255&bleu=255&time=15 HTTP/1.1
Host: 192.168.0.102
User-Agent: Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:42.0) Gecko/20100101 Firefox/42.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: fr,fr-FR;q=0.8,en-US;q=0.5,en;q=0.3
Accept-Encoding: gzip, deflate
Referer: http://192.168.0.102/
Connection: keep-alive
rouge type:  <class 'match'> 255
vert type:  <class 'match'> 255
bleu type:  <class 'match'> 255
time type:  <class 'match'> 15

MicroPython v1.8.4-38-g34e0198-dirty on 2016-09-24; ESP module with ESP8266
Type "help()" for more information.
>>> 

After reboot, the code is executed but the promt is "frozen"

Second evaluation :

  • The board is erased and flashed with the same firmware.
  • The script is executed line by line in the repl
>>> import ure
>>> txtmsg = """client connected from ('192.168.0.44', 33714)
... GET /?rouge=255&vert=255&bleu=255&time=15 HTTP/1.1
... Host: 192.168.0.102
... User-Agent: Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:42.0) Gecko/20100101 Firefox/42.0
... Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
... Accept-Language: fr,fr-FR;q=0.8,en-US;q=0.5,en;q=0.3
... Accept-Encoding: gzip, deflate
... Referer: http://192.168.0.102/
... Connection: keep-alive"""

>>> print (txtmsg)
client connected from ('192.168.0.44', 33714)
GET /?rouge=255&vert=255&bleu=255&time=15 HTTP/1.1
Host: 192.168.0.102
User-Agent: Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:42.0) Gecko/20100101 Firefox/42.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: fr,fr-FR;q=0.8,en-US;q=0.5,en;q=0.3
Accept-Encoding: gzip, deflate
Referer: http://192.168.0.102/
Connection: keep-alive
>>> chg_A3:-180
rouge = ure.search('rouge=(.+)&vert', txtmsg)

In this case the promt is "frozen"

Third evaluation :

  • The board is erased and flashed with the same firmware.
  • The script is executed line by line in the repl.
  • The text is reduced
>>> import ure
>>> 
>>> txtmsg = """client connected from ('192.168.0.44', 33714)
... GET /?rouge=255&vert=255&bleu=255&time=15 HTTP/1.1"""
>>> 
>>> print(txtmsg)
client connected from ('192.168.0.44', 33714)
GET /?rouge=255&vert=255&bleu=255&time=15 HTTP/1.1
>>> 
>>> rouge = ure.search('rouge=(.+)&vert', txtmsg)
>>> print('rouge type: ',type(rouge), rouge.group(1))
rouge type:  <class 'match'> 255
>>> vert = ure.search('vert=(.+)&bleu', txtmsg)
>>> print('vert type: ',type(vert), vert.group(1))
vert type:  <class 'match'> 255
>>> bleu = ure.search('bleu=(.+)&time', txtmsg)
>>> print('bleu type: ',type(bleu), bleu.group(1))
bleu type:  <class 'match'> 255
>>> time = ure.search('time=(.+)\ HTTP', txtmsg)
>>> print('time type: ',type(time), time.group(1))
time type:  <class 'match'> 15

In this case no problem

Hope it's better. And thanks for your help
Let me know if more information is needed.

@dpgeorge
Copy link
Member

Confirmed. It's a stack overflow in re1.5/recursiveloop.c:recursiveloop.

On unix 64bit the recursiveloop call for the above example uses about 20k of stack. It's because there is a lot of text following the match, which the re engine tries to match against for a longer match than just the "255". If you try a string of similar length but with the "rouge=255&vert" stuff at the end of the string then it doesn't crash.

@pfalcon this seems like a pretty common thing to use re for so would be nice to make it work with minimal stack usage, but it's not immediately obvious how to do it.

@dpgeorge dpgeorge added the bug label Sep 27, 2016
@dpgeorge
Copy link
Member

Although it would harm performance, it might be worth putting a call to mp_stack_check() in recursiveloop to catch any other (more complex) errors like this.

@dpgeorge dpgeorge changed the title regex crach esp8266. extmod: regex crash due to stack overflow Sep 27, 2016
@pfalcon
Copy link
Contributor

pfalcon commented Sep 27, 2016

@dpgeorge : Sure, such need was always anticipated, it's just re1.5 is independent library, so such stack check needs to be added in a reusable/configurable way, which requires some consideration. Now that there're actual reports, I'll look into that.

But then it will just error out instead of crashing, still won't work. So, suggestion to people experiencing is to use shorter subject string (in this case, don't match whole HTTP request leader, but match it line by line).

@dpgeorge
Copy link
Member

Stack checking in regex was added in aba1f91

I can confirm that the original code snippet above now raises RuntimeError: maximum recursion depth exceeded on esp8266.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants