wrong results when calling function twice #9

Open
fabianp opened this Issue May 15, 2012 · 19 comments

Comments

Projects
None yet
7 participants
Owner

fabianp commented May 15, 2012

running test/test_func.py


Line #    Mem usage  Increment   Line Contents
==============================================
     2                           @profile
     3                           def test_1():
     4      7.73 MB    0.00 MB       # .. will be called twice ..
     5      7.73 MB    0.00 MB       a = 2.
     6      7.73 MB    0.00 MB       b = 3
     7      7.61 MB   -0.12 MB       c = {}
     8      7.80 MB    0.19 MB       for i in range(1000):
     9      7.80 MB    0.00 MB           c[i] = 2
    10      7.80 MB    0.00 MB       c[0] = 2.

but correct results should be the same as calling the function once:


Line #    Mem usage  Increment   Line Contents
==============================================
     2                           @profile
     3                           def test_1():
     4      7.57 MB    0.00 MB       # .. will be called twice ..
     5      7.59 MB    0.02 MB       a = 2.
     6      7.59 MB    0.00 MB       b = 3
     7      7.59 MB    0.00 MB       c = {}
     8      7.73 MB    0.14 MB       for i in range(1000):
     9      7.73 MB    0.00 MB           c[i] = 2
    10      7.73 MB    0.00 MB       c[0] = 2.

I can confirm this behaviour. However I can't see how to solve this: either we just use the first run for usage, or we show both runs ? Showing the second run is maybe not the best though...

sjagoe commented May 15, 2013

There are cases where I wanted to see something like average (or total) increase in memory caused by a function that is called many times. I ended up hacking the code to dump each call's output to a file, and then manually trawled the file for the information I was looking for.

I think making it work for multiple calls would make memory_profiler far more useful. Two options to consider:

  • Show the total change per line (along with number of times the line was executed)
  • Show the average change per line, again along with the number of times it was executed.

For averaging, use the number of times a line was executed rather than the number of times the function was called.

bmu commented Oct 11, 2013

I was proposing memory_profiler on this stackoverflow question and the issue here came up.

Do you have any idea to what extend the results maybe influenced and when this bug will be fixed?

@bmu, the thing you have to remember about memory_profiler is that it just reads the /proc/#procid#/mem file before and after the operation, so that means that you can't actually guaranty that it's just measuring the memory consumption of your command. Maybe python does something in the background at the same time, which will influence the memory reading...

Owner

fabianp commented Oct 11, 2013

I'm with @mraspaud at this. At least for this example, results seem to stabilize:

In [1]: import numpy as np

In [2]: %load_ext memory_profiler

In [3]: for i in range(20):
   ...:     %memit np.zeros(1e7)
   ...: 
maximum of 1: 28.097656 MiB per loop
maximum of 1: 28.156250 MiB per loop
maximum of 1: 28.156250 MiB per loop
maximum of 1: 28.156250 MiB per loop
maximum of 1: 28.156250 MiB per loop
maximum of 1: 28.156250 MiB per loop
maximum of 1: 28.156250 MiB per loop
maximum of 1: 28.156250 MiB per loop
maximum of 1: 28.156250 MiB per loop
maximum of 1: 28.156250 MiB per loop
maximum of 1: 28.156250 MiB per loop
maximum of 1: 28.156250 MiB per loop
maximum of 1: 28.156250 MiB per loop
maximum of 1: 28.156250 MiB per loop
maximum of 1: 28.156250 MiB per loop
maximum of 1: 28.156250 MiB per loop
maximum of 1: 28.156250 MiB per loop
maximum of 1: 28.156250 MiB per loop
maximum of 1: 28.156250 MiB per loop
maximum of 1: 28.156250 MiB per loop

suggesting that either the results given by the OS are not 100% precise or that there is something going on in the background (garbage collector?) or that the behavior of malloc can varies between runs. And here I am way out of my comfort zone and probably saying stupid stuff, so if any one could shed light on this it would be of great help :-)

Contributor

ianozsvald commented Oct 24, 2013

Hi chaps. I have a related problem. In my case using %memit I'm not seeing the size of temporary objects, I'm seeing the size of the Python process. I was expecting to see the delta in the Python RSS size when memit is run (and if no new RAM is allocated then the result would be 0). I'm trying the same example as Fabian.

IPython 1.1.0 -- An enhanced Interactive Python.
...
In [1]: import numpy as np
In [2]: %load_ext memory_profiler
# Calling memit by itself reports the size of RSS (resident) RAM for this process 
#(this agrees with using "$ ps" at the command line)
In [3]: %memit
maximum of 1: 26.140625 MiB per loop

# try 1 million elements
In [5]: %memit np.ones(1e6)
maximum of 1: 26.199219 MiB per loop
In [6]: %memit np.ones(1e6)
maximum of 1: 26.238281 MiB per loop
In [7]: %memit np.ones(1e6)
maximum of 1: 33.851562 MiB per loop
In [8]: %memit np.ones(1e6)
... # stabilises at 33.851562MB which is the same RSS size for this process using "$ ps"
# presumably RAM isn't being returned to the OS, so we're not really seeing any 
#allocation size for our larger object?

# now try the same with 10 million elements - note no difference to 1e7!
In [11]: %memit np.ones(1e7)
maximum of 1: 33.851562 MiB per loop
In [12]: %memit np.ones(1e7)
maximum of 1: 33.851562 MiB per loop
In [13]: %memit np.ones(1e7)
maximum of 1: 33.859375 MiB per loop
... # it stabilises at this figure

If I then exit and restart a fresh IPython:

In [3]: %memit np.ones(1e7)
maximum of 1: 26.292969 MiB per loop
In [4]: %memit np.ones(1e7)
maximum of 1: 26.355469 MiB per loop
... # it stabilises with this value

I'm using the current pypi version (0.29), 64bit Ubuntu, latest IPython in a clean virtualenv (IPython, memory_profiler, psutil, nothing else).

Looking at memory_profiler.py:memory_usage it looks as though the option for a tuple (used by memit) just records the current process' RSS usage, rather than generating a delta before and after stmt is run through exec. Am I reading the code correctly? I'm looking at this section: if isinstance(proc, (list, tuple)):

Shouldn't memit's behaviour be to show the delta, not the process' RSS usage?

This doesn't help figure out the memory usage of small objects (it looks on my system like the RAM is not returned to the OS for the 1e7 example above).

Trying a larger example with np.ones(1e8) if use $ ps in a loop in the shell I can see that the Python process grows to 800MB, then drops back to 28MB. memit reports 717MB allocated, yet at its peak the process grew by 782MB. Maybe this discrepancy comes from the use of a timer that's observing snapshots?

If I try %memit x=np.ones(1e8) so the result is retained then memit reports 784MB and the RSS stays at 810MB (as seen by ps).

I think what I expected to see with %memit was the delta before and after the statement, rather than the RSS size of the process. Does anyone else agree that this is a sensible measurement to provide? It is late and maybe I've read the code incorrectly, do say if I'm being dense.

I'm trying to use memit in my book to explain how to understand object allocation in Python and it turns out to be far trickier than expected :-)

Owner

fabianp commented Oct 25, 2013

Hi Ian,

I've changed the output of %memit to refect also the delta:

In [3]: %memit np.ones((1e8))
peak memory: 640.60 MiB, increment: 612.11 MiB

I've coupled it with a strategy to decrease the sampling time for fast functions, that should make it more robust. Please check it out and tell me what you think.

Contributor

ianozsvald commented Oct 25, 2013

Excellent update! Two examples below. The is a slight variance but I can see the average result (and -r does that for us). Very cool fix :-)

In [56]: %memit [0]*int(1e7)
peak memory: 111.13 MiB, increment: 76.07 MiB

In [57]: %memit [0]*int(1e7)
peak memory: 111.12 MiB, increment: 76.05 MiB

In [58]: %memit [0]*int(1e8)
peak memory: 797.76 MiB, increment: 762.70 MiB

In [59]: %memit [0]*int(1e8)
peak memory: 798.00 MiB, increment: 762.94 MiB
Owner

fabianp commented Oct 25, 2013

Great, thanks to you for reporting the problem

jamespjh commented Jun 3, 2014

For me, using python 2.7.5, IPython 2.1.0, On OSX 10.9, with 0.31 of memory_profiler, the example quoted by @fabianp above now gives zero memory for all but the first output:

In [1]: %load_ext memory_profiler
In [2]: import numpy
In [3]: for i in range(20):
       ...:     %memit numpy.zeros(1e7)
   ...:
peak memory: 103.51 MiB, increment: 76.35 MiB
peak memory: 103.52 MiB, increment: 0.00 MiB
peak memory: 103.52 MiB, increment: 0.00 MiB
peak memory: 103.52 MiB, increment: 0.00 MiB
peak memory: 103.53 MiB, increment: 0.01 MiB
peak memory: 103.53 MiB, increment: 0.00 MiB
peak memory: 103.53 MiB, increment: 0.00 MiB
...

I thought this might be due to things getting left behind by IPython, or
memory being kept alive in numpy, but even a full %reset still leaves it at zero:

In [1]: %load_ext memory_profiler
In [2]: import numpy
In [3]: %memit numpy.zeros(1e4)
peak memory: 27.66 MiB, increment: 0.22 MiB
In [4]: %memit numpy.zeros(1e4)
peak memory: 27.74 MiB, increment: 0.07 MiB
In [5]: %memit numpy.zeros(1e4)
peak memory: 27.74 MiB, increment: 0.00 MiB
In [6]: %memit numpy.zeros(1e4)
peak memory: 27.74 MiB, increment: 0.00 MiB
In [7]: %memit numpy.zeros(1e5)
peak memory: 28.51 MiB, increment: 0.77 MiB
In [8]: %memit numpy.zeros(1e5)
peak memory: 28.51 MiB, increment: 0.00 MiB
In [9]: %memit numpy.zeros(1e5)
peak memory: 28.51 MiB, increment: 0.00 MiB
In [10]: %reset
In [11]: import numpy
In [12]: %memit numpy.zeros(1e5)
peak memory: 28.86 MiB, increment: 0.00 MiB
In [13]: %memit numpy.zeros(1e4)
peak memory: 28.86 MiB, increment: 0.00 MiB

Any thoughts?

(I'm hoping to use %memit while teaching to illustrate why iterators are good, but my numpy examples are just showing zero memory.)

Contributor

ianozsvald commented Jun 4, 2014

Memory usage may not shrink, that's under the control of the OS and not Python. Try using a much larger allocation - when I was teaching PhDs using memory_profiler last week I was using np.ones(1e8) for a consistent 745MB increase.
Note that using np.zeros often shows with no memory allocation - I don't know why this is (you can access the data block), I wondered if it might be a lazy allocation that memit didn't observe. Hence I use np.ones - that works as expected.
Remind your students not to allocate small structures else memit won't see the memory change, also to quit their ipython session and restart with fresh memory to verify their observations.

Owner

fabianp commented Jun 4, 2014

Thanks @ianozsvald , those are really interesting observations!. I will put them in the FAQ if you don't mind :-)

mraspaud commented Jun 4, 2014

Isn't it so that in your case @jamespjh the garbage collector is freeing the memory between each call since the arrays are not assigned to a variable ?

Here's what I get when assigning (from line 10):

In [1]: %load_ext memory_profiler

In [2]: import numpy

In [3]: %memit numpy.zeros(1e4)
/home/a001673/.local/bin/ipython:1: DeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  #!/usr/bin/python
peak memory: 26.25 MiB, increment: 0.16 MiB

In [4]: %memit numpy.zeros(1e4)
peak memory: 26.27 MiB, increment: 0.00 MiB

In [5]: %memit numpy.zeros(1e4)
peak memory: 26.27 MiB, increment: 0.00 MiB

In [6]: %memit numpy.zeros(1e4)
peak memory: 26.28 MiB, increment: 0.01 MiB

In [7]: %memit numpy.zeros(1e4)
peak memory: 26.28 MiB, increment: 0.00 MiB

In [8]: %memit numpy.zeros(1e4)
peak memory: 26.29 MiB, increment: 0.00 MiB

In [9]: %memit numpy.zeros(1e4)
peak memory: 26.30 MiB, increment: 0.00 MiB

In [10]: %memit a=numpy.zeros(1e4)
peak memory: 26.38 MiB, increment: 0.09 MiB

In [11]: %memit b=numpy.zeros(1e4)
peak memory: 26.46 MiB, increment: 0.07 MiB

In [12]: %memit c=numpy.zeros(1e4)
peak memory: 26.54 MiB, increment: 0.08 MiB

Can somebody elaborate on how peak memory is different from increment in a one-liner like:

%memit np.ones(1e7)
peak memory: 99.16 MiB, increment: 70.30 MiB
Owner

fabianp commented Aug 4, 2015

Hi,

note that memory_profiler returns the total memory used by the process, which includes the interpreter and the libraries loaded. For example, the IPython interpreter by itself already uses around 30Mib:

In [1]: %load_ext memory_profiler

In [2]: import numpy as np

In [3]: %memit
peak memory: 35.33 MiB, increment: 0.30 MiB

Hence peak memory memory would be the size of np.ones(1e7) + all the memory taken by the interpreter = ~70 + ~30 = ~100MiB

@fabianp , thanks a lot for the explanation!

One more question, if I may?
I've seen some guesses above as to why:

%memit np.zeros()

does not report meaningful results. E.g. :

%memit np.zeros(1e6)
peak memory: 35.86 MiB, increment: 6.96 MiB
%memit np.zeros(1e7)
peak memory: 28.93 MiB, increment: 0.00 MiB
%memit np.zeros(1e8)
peak memory: 28.93 MiB, increment: 0.00 MiB
%memit np.zeros(1e9)
peak memory: 28.93 MiB, increment: 0.00 MiB
%memit np.zeros(1e10)
MemoryError: 

So, is there a conclusive explanation why np.zeros() is so special from the perspective of %memit?

Contributor

ianozsvald commented Aug 4, 2015

That's due to lazy memory allocation on Linux (and maybe Mac?): https://stackoverflow.com/questions/27574881/why-does-numpy-zeros-takes-up-little-space
I asked the same question recently: https://twitter.com/ianozsvald/status/609337474565939200 (see answers in the comments)

@ianozsvald , can you explain, please, why "lazy memory allocation" treats np.zeros() and np.ones() differently:

%memit np.zeros(1e7)
peak memory: 28.93 MiB, increment: 0.00 MiB
np.zeros(1e7).nbytes
80000000

%memit np.ones(1e7)
peak memory: 97.19 MiB, increment: 68.26 MiB
np.ones(1e7).nbytes
80000000
Contributor

ianozsvald commented Aug 4, 2015

@sbushmanov When you calloc the memory, Linux won't touch it as you've asked for 0s until you touch the array (e.g. by assigning to it and maybe(?) by reading from it). If you use ones it'll write the 1s immediately. This is just a way the OS is written. I guess there's a performance benefit for some non-math cases, for us with numpy it is just a bit odd and you only notice it if you're doing this kind of profiling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment