Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential memory leak #1014

Closed
chongchenmath opened this issue May 31, 2017 · 24 comments
Closed

Potential memory leak #1014

chongchenmath opened this issue May 31, 2017 · 24 comments

Comments

@chongchenmath
Copy link
Member

There seems to be a potential memory leak somewhere in odl. Running the code from @davlars repo to make reconstructions on simulated 3D data, the used RAM memory is continuously increasing. Trying a TV-reconstruction with the Chambolle-Pock solver we (@aringh and I) get the following memory usage in the iterations:

iter = 0
CPU usage: [11.6, 12.5, 49.4, 11.5, 9.2, 9.2, 56.5, 8.8]
RAM usage: svmem(total=67514511360, available=27549761536, percent=59.2, used=39263768576, free=20183449600, active=46021120000, inactive=513998848, buffers=47173632, cached=8020119552, shared=38600704)
SWAP usage: sswap(total=4294963200, used=2080911360, free=2214051840, percent=48.5, sin=28048973824, sout=45718224896)

[...]

iter = 100
CPU usage: [19.8, 19.1, 17.1, 31.8, 13.6, 14.0, 12.8, 75.2]
RAM usage: svmem(total=67514511360, available=25842774016, percent=61.7, used=40964677632, free=18139529216, active=48008773632, inactive=556429312, buffers=60911616, cached=8349392896, shared=39292928)
SWAP usage: sswap(total=4294963200, used=2048077824, free=2246885376, percent=47.7, sin=28079247360, sout=45718224896)

[...]

iter = 500
CPU usage: [16.5, 19.1, 25.1, 45.3, 14.7, 14.1, 13.9, 61.1]
RAM usage: svmem(total=67514511360, available=18392444928, percent=72.8, used=48325439488, free=11604901888, active=50951569408, inactive=3880685568, buffers=491245568, cached=7092924416, shared=43077632)
SWAP usage: sswap(total=4294963200, used=1929170944, free=2365792256, percent=44.9, sin=28226437120, sout=45718732800)

As can be seen, the % of used RAM memory is increased from 59.2 to 72.8 over the iterations.

We do not yet know where the leak is, and it needs further investigation. Can anyone suggest a good way to do it?

The code used to get the above out-prints is (note however, that as is it will take several hours to run 500 iterations)

import odl
import numpy as np
import os
import adutils
import sys
import psutil

saveReco = True
savePath = '/home/chchen/dev/ad-skull-reconstruction/reco/'


# Create a text file and write everything both in terminal and in file
if not os.path.exists(savePath):
    os.makedirs(savePath)

output_filename = 'Terminal_output_TV.txt'

class Logger(object):
    def __init__(self, filename="Default.log"):
        self.terminal = sys.stdout
        self.log = open(filename, "a")
 
    def write(self, message):
        self.terminal.write(message)
        self.log.write(message)
 
    def flush(self):
        self.terminal.flush()
        self.log.flush()

sys.stdout = Logger(savePath + output_filename)


# Discretization
space = adutils.get_discretization()

# Forward operator (in the form of a broadcast operator)
A = adutils.get_ray_trafo(space)

# Data
rhs = adutils.get_data(A)

# Construct operators and functionals
gradient = odl.Gradient(space)

# Column vector of operators
op = odl.BroadcastOperator(A, gradient)

Anorm = odl.power_method_opnorm(A, maxiter=2)
Dnorm = odl.power_method_opnorm(gradient,
                                xstart=odl.phantom.white_noise(gradient.domain),
                                maxiter=10)

# Estimated operator norm, add 10 percent
op_norm = 1.1 * np.sqrt(len(A.operators)*(Anorm**2) + Dnorm**2)

print('Norm of the product space operator: {}'.format(op_norm))

lamb = 0.005 

# l2-squared data matching
l2_norm = odl.solvers.L2NormSquared(A.range).translated(rhs)

# Isotropic TV-regularization i.e. the l1-norm
l1_norm = lamb * odl.solvers.L1Norm(gradient.range)

# Combine functionals
f = odl.solvers.SeparableSum(l2_norm, l1_norm)

# Set g functional to zero
g = odl.solvers.ZeroFunctional(op.domain)

# Accelerataion parameter
gamma = 0.4

# Step size for the proximal operator for the primal variable x
tau = 1.0 / op_norm

# Step size for the proximal operator for the dual variable y
sigma = 1.0 / op_norm  # 1.0 / (op_norm ** 2 * tau)

# Use initial guess
x = A.domain.zero()

# Callback for the solver that both print iteration number and CPU/RAM use
class CallbackMycallback(odl.solvers.util.callback.SolverCallback):

    def __init__(self):
        self.print_iter_callback = odl.solvers.CallbackPrintIteration()

    def __call__(self, x):
        self.print_iter_callback(x)
        print('CPU usage: {}'.format(psutil.cpu_percent(percpu=True)))
        print('RAM usage: {}'.format(psutil.virtual_memory()))
        print('SWAP usage: {}'.format(psutil.swap_memory()))

    def reset(self):
        self.print_iter_callback.reset()

callback_print_iter = CallbackMycallback()

# Run such that last iteration is saved (saveReco = 1) or none (saveReco = 0)
niter = 1000
odl.solvers.chambolle_pock_solver(x, f, g, op, tau=tau, sigma = sigma,
				  niter = niter, gamma=gamma, callback=callback_print_iter)

# Close the log-file and set standard output to terminal
sys.stdout.log.close
sys.stdout = sys.stdout.terminal
@adler-j
Copy link
Member

adler-j commented May 31, 2017

We do not yet know where the leak is, and it needs further investigation. Can anyone suggest a good way to do it?

I know that there is a tool (that i currently cannot find, since google is blocked in china) that allows you to list all objects currently in python memory. There is also a functionality to only list those objects that are "new" since some checkpoint.

You could use this to see what new stuff is beeing allocated in each iteration and try to see if there is anything wrong there.

@adler-j
Copy link
Member

adler-j commented May 31, 2017

Btw, that callback is quite useful, would you consider making a pull request with it?

@adler-j
Copy link
Member

adler-j commented May 31, 2017

Finally, is this specific to this example, or could you try reproducing it with something that does not take hours to run?

Anyway, obviously looks quite bad and is something we need to fix.

@chongchenmath
Copy link
Member Author

In the iteration part of chambolle_pock_solver, the lines of proximal_dual(sigma)(dual_tmp, out=y) and proximal_primal(tau)(primal_tmp, out=x) need to be run at each iteration. Then new operators are created simultaneously. Hence I guess that would consume new memory but not release the old at each iteration.

@adler-j
Copy link
Member

adler-j commented May 31, 2017

That is likely not the issue since python garbage collects things like that. The issue is somewhere deeper.

Basically there are two ways that we could leak memory (that I know of)

  • Circular dependencies in classes with a __del__ method, i.e. by doing
import psutil
import numpy as np

class MyList(list):
    def __init__(self, *args, **kwargs):
        # Allocate something to cause memory growth
        self.large_array = np.random.rand(50000)
        list.__init__(self, *args, **kwargs)

    def __del__(self):
        pass

n = 10000
for i in range(n):
    x = MyList()
    x.append(x)
    if i%1000 == 0:
        print('RAM usage: {}%'.format(psutil.virtual_memory().percent))

which gives

RAM usage: 32.1%
RAM usage: 35.3%
RAM usage: 38.5%
RAM usage: 41.6%
RAM usage: 44.8%
RAM usage: 47.9%
RAM usage: 51.1%
RAM usage: 54.2%
RAM usage: 57.4%
RAM usage: 60.6%

However, this only occurs with python < 3.4, since in 3.4 PEP 442 fixes this, giving something like:

RAM usage: 35.5%
RAM usage: 35.7%
RAM usage: 36.4%
RAM usage: 35.8%
RAM usage: 36.5%
RAM usage: 36.0%
RAM usage: 36.7%
RAM usage: 36.1%
RAM usage: 35.6%
RAM usage: 36.3%

Interestingly, we only implement __del__ in two places, AstraCudaProjectorImpl and AstraCudaBackProjectorImpl and perhaps we somehow manage to create a ciruclar dependency with them? I cannot find one myself however.

  • The other option is that we're leaking some external resource, in this case that would be ASTRA. We could for example forget to call astra.delete somewhere.

So anyway, my debug steps for you would be:

  1. Try to find the package I mentioned above and see where the leaked memory is going.
  2. Try running your code with python>=3.4 and see if it helps.

@kohr-h
Copy link
Member

kohr-h commented May 31, 2017

Btw, that callback is quite useful, would you consider making a pull request with it?

This is definitely useful. @chongchenmath would you make a separate pull request just with this callback, giving it some name like CallbackPrintUsedResources or something similar? 谢谢

@chongchenmath
Copy link
Member Author

chongchenmath commented May 31, 2017

不用客气 This is input by @aringh I think he will do it then.

@chongchenmath
Copy link
Member Author

We run the code with Python 3.5.3. So it may be not the version problem of python.

@adler-j
Copy link
Member

adler-j commented Jun 1, 2017

Frankly there is a large risk this has to do with astra then. Anyway, we need statistics on new allocations to get further with this then.

@aringh
Copy link
Member

aringh commented Jun 2, 2017

Yes of I can do a PR with the callback.

When it comes to the real issue:

  • I will try to run it on a small tomography example, that does not take as much time to run
  • I will try to look up the tool you mention @adler-j

but, it will have to wait until next week ;-)

@adler-j
Copy link
Member

adler-j commented Jun 3, 2017

No problems! But I think this has to do with 3d somehow, given that we did 20 000 000 projections in https://github.com/adler-j/learned_gradient_tomography without any noticable memory leaks.

@adler-j
Copy link
Member

adler-j commented Jun 8, 2017

Any update on this rather interesting issue?

@aringh
Copy link
Member

aringh commented Jun 8, 2017

I am stuck with other things right now... but if I make sufficient progress there I hope to be able to have a look at this late this afternoon.

@aringh
Copy link
Member

aringh commented Jun 8, 2017

The tool I found is called pympler. Was this the one you meant @adler-j ?

@adler-j
Copy link
Member

adler-j commented Jun 8, 2017

That sure looks like it. By looming at diffs between iterates you should be able to see what has been created. You can do that by writing a simple callback.

@aringh
Copy link
Member

aringh commented Jun 8, 2017

I have started to look at it. All code, output and findings can be found on this branch. In the notes.txt file is my summary.

Conclusion so far: I still have no idea.

@adler-j
Copy link
Member

adler-j commented Jun 9, 2017

VERY nice investigative work, and interesting indeed that it seems to "stagnate" at some point (indicating that this may have something to do with garbage collection).

Some suggestions:

  • Check if any individual component is causing this, e.g. possible suspects:
    • creating proximals, specifically class 'odl.solvers.nonsmooth.proximal_operators.proximal_cconj_l1.<locals>.ProximalCConjL1 looks suspicious
    • Calling the ray-transform and its adjoint (e.g. change solver to landweber)
  • The thing with 130 sounds scarily much like my optimization with lru_list_elem, since I think the lru thing keeps at most 100 elements. BUT! the issue existed even before that (since I only just just added that thing 3 days ago, but this issue was reported 9 days ago). One option would be to disable the lru cache (here) and re-run the tests to see what happens.
  • Further, it seems like @chongchenmath:s stuff do not stagnate (indeed, it seems like he very steadily allocates about 18314633 bytes of memory for each iteration.) Can you make sure to EXACTLY replicate what he does (including the terminal re-direction tricks etc? Also, perhaps the number 18314633 bytes is a hint as to what is happening.

P.S. Quite sure this type of work should not be on a branch (since it will never be merged back, given the huge log files).

@aringh
Copy link
Member

aringh commented Jun 16, 2017

Current update on this: when running from odl commit 3fd52ac (as reported in this issue davlars/ad-skull-reconstruction#35) the iterations on a TV reconstruction on the AD skull data set crashes after 4 iterations due to out of memory. However, since it has worked to run the code before we checked how it was run on @chongchenmath's account. There it runs from odl commit 7824dd2. From this commit I can run the exact same code on my account, and it does not crash after 4 iterations.

Conclusion 1: something has happened between commit 7824dd2 and 3fd52ac that makes the later allocate much more memory in fewer iterations.

@aringh
Copy link
Member

aringh commented Jun 16, 2017

Update on previous comment: in fact I seem to have a similar behavior as originally reported. Memory use seems to increase in the same way during the first 66 iterattions, after which I terminated the code in order to try to run code using pympler to see what is actually increasing in the memory usage.

Ps. Output files and notes are updated on the branch. Ds.

@adler-j
Copy link
Member

adler-j commented Jun 17, 2017

Did you try this:

One option would be to disable the lru cache (here) and re-run the tests to see what happens.

Which is a likely culprit

@aringh
Copy link
Member

aringh commented Jun 19, 2017

I have not tried it. But in order to reproduce the exam same behavior as first reported, I need to run from commit 7824dd2 (potentially somewhat later, but at least before commit 3fd52ac), which is from February. Hence this is long before the lru-things were added.

I will pause this debugging for now: I think I need some help from @adler-j in person in order to continue this in a meaningful way :-)

@adler-j
Copy link
Member

adler-j commented Jun 19, 2017

I will pause this debugging for now: I think I need some help from @adler-j in person in order to continue this in a meaningful way :-)

No problem, I'll be back next week. Great job so far!

@adler-j
Copy link
Member

adler-j commented Oct 16, 2017

This has basically been solved in the ASTRA branch by this commit: astra-toolbox/astra-toolbox@aa32503

we're waiting for a conda release.

@adler-j
Copy link
Member

adler-j commented Oct 16, 2017

Closing this issue here for now. Solution is to build astra locally.

@adler-j adler-j closed this as completed Oct 16, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants