Creating too many qstr's leads to large memory use #2280

Open
peterhinch opened this Issue Jul 31, 2016 · 6 comments

Projects

None yet

3 participants

@peterhinch
Contributor

Also on Pyboard, ESP8266. Pasted at the REPL:

import pickle, gc, micropython
gc.collect()
micropython.mem_info()
count = 0
def bar():
    global count
    s = pickle.dumps(('result', str(count)))
    count += 1
    return pickle.loads(s)
for _ in range(1000):
    a = bar()
gc.collect()
micropython.mem_info()

This produces the following outcome with used increasing:

mem: total=10303, current=3127, peak=4908
stack: 4992 out of 80000
GC: total: 2072832, used: 4064, free: 2068768
 No. of 1-blocks: 70, 2-blocks: 10, max blk sz: 9
12997
mem: total=2260180, current=376780, peak=378293
stack: 4992 out of 80000
GC: total: 2072832, used: 21248, free: 2051584
 No. of 1-blocks: 74, 2-blocks: 10, max blk sz: 161
>>> 
@peterhinch
Contributor
peterhinch commented Aug 1, 2016 edited

I've narrowed this down a little further. Replacing pickle with json in the above code removes the leak. The following code is the minimum test case I've found for the leak: I suspect the exec statement.

import gc, micropython
gc.collect()
micropython.mem_info()
count = 0
def bar():
    global count
    count += 1
    s = repr(('result', str(count))) # pickle.dumps
    d = {} # pickle.loads.
    exec("v=" + s, d) #  Leak seems to be here
for _ in range(1000):
    bar()
gc.collect()
micropython.mem_info()

It only leaks if s changes on each iteration - the counter is necessary. In my attempt to find a workround I tried this, which perhaps offers a clue to the cause.

import gc, micropython
gc.collect()
micropython.mem_info()
count = 0
def bar():
    global count
    count += 1
    s = repr(('result', str(count))) # pickle.dumps
    d = {}
    bytecode = compile("v=" + s, '<string>', 'exec') # leak here
#    exec(bytecode, d)
for _ in range(1000):
    bar()
gc.collect()
micropython.mem_info()

Commenting out the exec statement made no difference. The leak occurs in the compile statement. An attempt at using eval also leaks:

import gc, micropython
gc.collect()
micropython.mem_info()
count = 0
def bar():
    global count
    count += 1
    s = repr(('result', str(count))) # pickle.dumps
    return eval(s)
for _ in range(1000):
    a = bar()
gc.collect()
micropython.mem_info()

produced

mem: total=5613, current=2007, peak=4483
stack: 4992 out of 80000
GC: total: 2072832, used: 3008, free: 2069824
 No. of 1-blocks: 51, 2-blocks: 10, max blk sz: 8
8996
mem: total=2143665, current=255899, peak=257448
stack: 4992 out of 80000
GC: total: 2072832, used: 20352, free: 2052480
 No. of 1-blocks: 56, 2-blocks: 9, max blk sz: 161
@dpgeorge
Contributor
dpgeorge commented Aug 4, 2016

Confirmed. The reason for the increased memory usage is string interning: the string you are evaluating is something like v=('result', '123'), with the 123 changing on each iteration. The '123' is a small string that the parser interns when it parses the code you give to eval/exec/compile. So as iterations go on there the interned strings from previous runs remain and eventually you run out of RAM.

A work-around for the above script is to replace str(count) with count, since then it's not a string but an integer. But that doesn't help the general case.

@deshipu
Contributor
deshipu commented Aug 4, 2016

I wonder if string interning could be limited to just string literals that appear in the programs? Would that make sense?

@dpgeorge
Contributor
dpgeorge commented Aug 4, 2016

I wonder if string interning could be limited to just string literals that appear in the programs?

As far as the parser is concerned, the input from a file is equivalent to the input from exec. So there would need to be extra logic to distinguish these.

@deshipu
Contributor
deshipu commented Aug 4, 2016

Ah, you are of course, right, in this case this is actually a string literal being evaluated, I missed that, sorry.

@peterhinch
Contributor

Thanks for that. I'll figure out if I can apply the workround in my application.

@pfalcon pfalcon changed the title from unix memory leak to Creating too many qstr's leads to large memory use Aug 9, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment