New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The memory the osim env takes increases linearly with the number of steps it took, even if the env gets resetted several times in the middle. #10

Closed
syllogismos opened this Issue Apr 7, 2017 · 29 comments

Comments

Projects
None yet
7 participants
@syllogismos

syllogismos commented Apr 7, 2017

The memory the python process takes increases linearly with the number of steps taken by GaitEnv, is it supposed to be like that?

s = 0
while s<50000:   
    o = env.reset()   
    d = False   
    while not d:   
        o, r, d, i = env.step(env.action_space.sample())   
        s += 1   

This small script takes more than 500MB in my computer, python 2.7 and osx

And while training, the env takes more than 20GB memory for some 1Million steps, am I doing something wrong? There are cases where my training script stopped because of memory errors after training for a day on my little 8GB computer.

@ViktorM

This comment has been minimized.

Show comment
Hide comment
@ViktorM

ViktorM Apr 7, 2017

I have the same problem.

ViktorM commented Apr 7, 2017

I have the same problem.

@chrisdembia

This comment has been minimized.

Show comment
Hide comment
@chrisdembia

chrisdembia Apr 7, 2017

Member

@kidzik I'd be happy to look through the code with you to identify memory leaks.

Member

chrisdembia commented Apr 7, 2017

@kidzik I'd be happy to look through the code with you to identify memory leaks.

@kidzik kidzik self-assigned this Apr 7, 2017

@kidzik

This comment has been minimized.

Show comment
Hide comment
@kidzik

kidzik Apr 7, 2017

Collaborator

Thanks I will look into that in the next few days

Collaborator

kidzik commented Apr 7, 2017

Thanks I will look into that in the next few days

@kidzik

This comment has been minimized.

Show comment
Hide comment
@kidzik

kidzik Apr 28, 2017

Collaborator

@syllogismos @ViktorM I have troubles with reproducing the error. Can you tell me more about your environment (os, python version)?
Can you please try this script https://github.com/stanfordnmbl/osim-rl/blob/master/tests/test.memory.py with guppy? Thanks!

Collaborator

kidzik commented Apr 28, 2017

@syllogismos @ViktorM I have troubles with reproducing the error. Can you tell me more about your environment (os, python version)?
Can you please try this script https://github.com/stanfordnmbl/osim-rl/blob/master/tests/test.memory.py with guppy? Thanks!

@ViktorM

This comment has been minimized.

Show comment
Hide comment
@ViktorM

ViktorM May 2, 2017

OS - Ubuntu 16.04, python 2.7, your standard conda environment. I'll be able to try script not only in 2-3 days. Usual scenario for the error is just to start training of the Gait env. In between 400K and 500K iterations it gets killed. Rarely after 500K. It takes approximately more than 16 hours of training.

ViktorM commented May 2, 2017

OS - Ubuntu 16.04, python 2.7, your standard conda environment. I'll be able to try script not only in 2-3 days. Usual scenario for the error is just to start training of the Gait env. In between 400K and 500K iterations it gets killed. Rarely after 500K. It takes approximately more than 16 hours of training.

@alexis-jacq

This comment has been minimized.

Show comment
Hide comment
@alexis-jacq

alexis-jacq Jul 8, 2017

I have the same. It takes ~1h for 1000 steps, without any learning.
Is there a way to speed-up the environement's computations ? (like using GPU)

alexis-jacq commented Jul 8, 2017

I have the same. It takes ~1h for 1000 steps, without any learning.
Is there a way to speed-up the environement's computations ? (like using GPU)

@kidzik

This comment has been minimized.

Show comment
Hide comment
@kidzik

kidzik Jul 8, 2017

Collaborator

I suppose you mean 1000 episodes? Or just 1000 steps?
The environment is indeed fairly difficult and time-consuming, but there are some tricks possible: you can run simulations on multiple cores and use an algorithm that can leverage parallel runs, or you can run short episodes at first.

This difficulty is a part of the problem -- finding an algorithm which can learn quickly in such a complex system is a goal on its own.

Collaborator

kidzik commented Jul 8, 2017

I suppose you mean 1000 episodes? Or just 1000 steps?
The environment is indeed fairly difficult and time-consuming, but there are some tricks possible: you can run simulations on multiple cores and use an algorithm that can leverage parallel runs, or you can run short episodes at first.

This difficulty is a part of the problem -- finding an algorithm which can learn quickly in such a complex system is a goal on its own.

@alexis-jacq

This comment has been minimized.

Show comment
Hide comment
@alexis-jacq

alexis-jacq Jul 9, 2017

Yes, parallel computing is working, but I only have 4 cores. Let's move on EPFL servers...

alexis-jacq commented Jul 9, 2017

Yes, parallel computing is working, but I only have 4 cores. Let's move on EPFL servers...

@ctmakro

This comment has been minimized.

Show comment
Hide comment
@ctmakro

ctmakro Jul 12, 2017

Contributor

I wrote a script which starts a RunEnv in a seperate process and communicate with it. I recreate that process every 50 episodes to limit it's memory consumption. Before that, the training script consumed ~10GB memory and crashed hard.

Contributor

ctmakro commented Jul 12, 2017

I wrote a script which starts a RunEnv in a seperate process and communicate with it. I recreate that process every 50 episodes to limit it's memory consumption. Before that, the training script consumed ~10GB memory and crashed hard.

@AdamStelmaszczyk

This comment has been minimized.

Show comment
Hide comment
@AdamStelmaszczyk

AdamStelmaszczyk Aug 14, 2017

Contributor

Memory leak still exists. I'm aware that one possible workaround is killing and restarting the leaking process, but maybe let's fix the leak :) It's painful for newcomers, with AWS (t2.micro has only 1 GB RAM) or multiprocessing.

My gut feeling is that it happens during Python interaction with OpenSim's C++ code. I saw similar things happening with cap'n'proto Python API, you had to call some function in Python in particular way to prevent memleak. My guess here will be that osim-rl calls some function in OpenSim, e.g. next_frame() and OpenSim allocates memory to return the result. And then it expects that some other function is called, that is never called. Or, there is simply a leak during next_frame() execution, totally inside OpenSim C++ code, but that seem to me less likely, they would probably spot it before or maybe even they auto test the code with valgrind or similar in CI.

Or maybe the Python wrappers generated with SWIG are problematic...

Contributor

AdamStelmaszczyk commented Aug 14, 2017

Memory leak still exists. I'm aware that one possible workaround is killing and restarting the leaking process, but maybe let's fix the leak :) It's painful for newcomers, with AWS (t2.micro has only 1 GB RAM) or multiprocessing.

My gut feeling is that it happens during Python interaction with OpenSim's C++ code. I saw similar things happening with cap'n'proto Python API, you had to call some function in Python in particular way to prevent memleak. My guess here will be that osim-rl calls some function in OpenSim, e.g. next_frame() and OpenSim allocates memory to return the result. And then it expects that some other function is called, that is never called. Or, there is simply a leak during next_frame() execution, totally inside OpenSim C++ code, but that seem to me less likely, they would probably spot it before or maybe even they auto test the code with valgrind or similar in CI.

Or maybe the Python wrappers generated with SWIG are problematic...

@kidzik

This comment has been minimized.

Show comment
Hide comment
@kidzik

kidzik Aug 14, 2017

Collaborator

Good point, thanks for the insight. It seems to be dependent on the environment. Can you give us the details about your environment and a script for reproducing the error?

Collaborator

kidzik commented Aug 14, 2017

Good point, thanks for the insight. It seems to be dependent on the environment. Can you give us the details about your environment and a script for reproducing the error?

@AdamStelmaszczyk

This comment has been minimized.

Show comment
Hide comment
@AdamStelmaszczyk

AdamStelmaszczyk Aug 14, 2017

Contributor

Linux x86_64 4.8.0-53. Conda env as in osim-rl README, with Python 3, tensorflow and psutil:

conda create -n py3 -c kidzik opensim git tensorflow psutil python=3.6
source activate py3
pip install git+https://github.com/stanfordnmbl/osim-rl.git

With Python 2 it's also leaking.

Minimal working example, memleak.py:

from datetime import datetime
from os import getpid

from osim.env.run import RunEnv
from psutil import Process


def time():
    return datetime.now().strftime('%H:%M:%S')


def memory_used():
    process = Process(getpid())
    return process.memory_info().rss  # https://pythonhosted.org/psutil/#psutil.Process.memory_info


env = RunEnv(visualize=False)
env.reset()
step = 0
episode = 0
while True:
    observation, reward, done, info = env.step(env.action_space.sample())
    step += 1
    if done:
        episode += 1
        env.reset()
        print("%s Episode %s, steps %s, memory %s" % (time(), episode, step, memory_used()))
        step = 0

Output:

(py3) adam@adam-ThinkPad-T520 ~/Desktop/running/stuff $ python memleak.py
Updating Model file from 30000 to latest format...
Loaded model gait9dof18musc_Thelen_BigSpheres.osim from file /home/adam/miniconda3/envs/py3/lib/python3.5/site-packages/osim/env/../models/gait9dof18musc.osim
20:56:22 Episode 1, steps 120, memory 133820416
20:56:31 Episode 2, steps 117, memory 134230016
20:56:40 Episode 3, steps 120, memory 135041024
20:56:49 Episode 4, steps 118, memory 135041024
20:56:57 Episode 5, steps 116, memory 136663040
20:57:05 Episode 6, steps 117, memory 136663040
20:57:14 Episode 7, steps 117, memory 136663040
20:57:23 Episode 8, steps 122, memory 136663040
20:57:32 Episode 9, steps 117, memory 139636736
20:57:40 Episode 10, steps 118, memory 139636736
20:57:48 Episode 11, steps 119, memory 139636736
20:57:57 Episode 12, steps 117, memory 139636736
20:58:06 Episode 13, steps 118, memory 139636736
20:58:14 Episode 14, steps 119, memory 139636736
20:58:23 Episode 15, steps 120, memory 139636736
20:58:32 Episode 16, steps 119, memory 139636736
20:58:40 Episode 17, steps 117, memory 139636736
20:58:49 Episode 18, steps 118, memory 145854464
20:58:57 Episode 19, steps 116, memory 145854464
20:59:06 Episode 20, steps 117, memory 145854464
20:59:14 Episode 21, steps 119, memory 145854464
20:59:23 Episode 22, steps 118, memory 145854464
20:59:31 Episode 23, steps 119, memory 145854464
20:59:40 Episode 24, steps 119, memory 145854464
20:59:48 Episode 25, steps 116, memory 145854464
20:59:57 Episode 26, steps 117, memory 145854464
21:00:05 Episode 27, steps 119, memory 145854464
21:00:14 Episode 28, steps 118, memory 145854464
21:00:23 Episode 29, steps 119, memory 145854464
21:00:32 Episode 30, steps 118, memory 145854464
21:00:40 Episode 31, steps 118, memory 145854464
21:00:49 Episode 32, steps 118, memory 145854464
21:00:58 Episode 33, steps 117, memory 145854464
21:01:06 Episode 34, steps 117, memory 145854464
21:01:14 Episode 35, steps 119, memory 158560256
21:01:23 Episode 36, steps 117, memory 158560256
21:01:32 Episode 37, steps 118, memory 158560256
21:01:40 Episode 38, steps 117, memory 158560256
21:01:50 Episode 39, steps 121, memory 158560256
21:01:58 Episode 40, steps 118, memory 158560256
21:02:07 Episode 41, steps 117, memory 158560256
21:02:16 Episode 42, steps 117, memory 158560256
21:02:27 Episode 43, steps 119, memory 158560256
21:02:37 Episode 44, steps 117, memory 158560256
21:02:46 Episode 45, steps 118, memory 158560256
21:02:55 Episode 46, steps 118, memory 158560256
21:03:03 Episode 47, steps 118, memory 158560256
21:03:13 Episode 48, steps 122, memory 158560256
21:03:22 Episode 49, steps 117, memory 158560256
21:03:31 Episode 50, steps 117, memory 158560256
(...)

Memory usage always grew. I'm aware that this short excerpt doesn't demonstrate memleak, as such pattern can happen in Python app without memleak, it could just be that there are allocations, but later GC frees some memory. However here we have the case that the memory is never freed. Wait couple of hours, you should see GBs of memory eaten :)

I noticed that if env.step(env.action_space.sample()) line is replaced with done = True, then the used memory doesn't increase.

So, calls to env.reset() are fine, calls to env.step() are causing the leak.

Contributor

AdamStelmaszczyk commented Aug 14, 2017

Linux x86_64 4.8.0-53. Conda env as in osim-rl README, with Python 3, tensorflow and psutil:

conda create -n py3 -c kidzik opensim git tensorflow psutil python=3.6
source activate py3
pip install git+https://github.com/stanfordnmbl/osim-rl.git

With Python 2 it's also leaking.

Minimal working example, memleak.py:

from datetime import datetime
from os import getpid

from osim.env.run import RunEnv
from psutil import Process


def time():
    return datetime.now().strftime('%H:%M:%S')


def memory_used():
    process = Process(getpid())
    return process.memory_info().rss  # https://pythonhosted.org/psutil/#psutil.Process.memory_info


env = RunEnv(visualize=False)
env.reset()
step = 0
episode = 0
while True:
    observation, reward, done, info = env.step(env.action_space.sample())
    step += 1
    if done:
        episode += 1
        env.reset()
        print("%s Episode %s, steps %s, memory %s" % (time(), episode, step, memory_used()))
        step = 0

Output:

(py3) adam@adam-ThinkPad-T520 ~/Desktop/running/stuff $ python memleak.py
Updating Model file from 30000 to latest format...
Loaded model gait9dof18musc_Thelen_BigSpheres.osim from file /home/adam/miniconda3/envs/py3/lib/python3.5/site-packages/osim/env/../models/gait9dof18musc.osim
20:56:22 Episode 1, steps 120, memory 133820416
20:56:31 Episode 2, steps 117, memory 134230016
20:56:40 Episode 3, steps 120, memory 135041024
20:56:49 Episode 4, steps 118, memory 135041024
20:56:57 Episode 5, steps 116, memory 136663040
20:57:05 Episode 6, steps 117, memory 136663040
20:57:14 Episode 7, steps 117, memory 136663040
20:57:23 Episode 8, steps 122, memory 136663040
20:57:32 Episode 9, steps 117, memory 139636736
20:57:40 Episode 10, steps 118, memory 139636736
20:57:48 Episode 11, steps 119, memory 139636736
20:57:57 Episode 12, steps 117, memory 139636736
20:58:06 Episode 13, steps 118, memory 139636736
20:58:14 Episode 14, steps 119, memory 139636736
20:58:23 Episode 15, steps 120, memory 139636736
20:58:32 Episode 16, steps 119, memory 139636736
20:58:40 Episode 17, steps 117, memory 139636736
20:58:49 Episode 18, steps 118, memory 145854464
20:58:57 Episode 19, steps 116, memory 145854464
20:59:06 Episode 20, steps 117, memory 145854464
20:59:14 Episode 21, steps 119, memory 145854464
20:59:23 Episode 22, steps 118, memory 145854464
20:59:31 Episode 23, steps 119, memory 145854464
20:59:40 Episode 24, steps 119, memory 145854464
20:59:48 Episode 25, steps 116, memory 145854464
20:59:57 Episode 26, steps 117, memory 145854464
21:00:05 Episode 27, steps 119, memory 145854464
21:00:14 Episode 28, steps 118, memory 145854464
21:00:23 Episode 29, steps 119, memory 145854464
21:00:32 Episode 30, steps 118, memory 145854464
21:00:40 Episode 31, steps 118, memory 145854464
21:00:49 Episode 32, steps 118, memory 145854464
21:00:58 Episode 33, steps 117, memory 145854464
21:01:06 Episode 34, steps 117, memory 145854464
21:01:14 Episode 35, steps 119, memory 158560256
21:01:23 Episode 36, steps 117, memory 158560256
21:01:32 Episode 37, steps 118, memory 158560256
21:01:40 Episode 38, steps 117, memory 158560256
21:01:50 Episode 39, steps 121, memory 158560256
21:01:58 Episode 40, steps 118, memory 158560256
21:02:07 Episode 41, steps 117, memory 158560256
21:02:16 Episode 42, steps 117, memory 158560256
21:02:27 Episode 43, steps 119, memory 158560256
21:02:37 Episode 44, steps 117, memory 158560256
21:02:46 Episode 45, steps 118, memory 158560256
21:02:55 Episode 46, steps 118, memory 158560256
21:03:03 Episode 47, steps 118, memory 158560256
21:03:13 Episode 48, steps 122, memory 158560256
21:03:22 Episode 49, steps 117, memory 158560256
21:03:31 Episode 50, steps 117, memory 158560256
(...)

Memory usage always grew. I'm aware that this short excerpt doesn't demonstrate memleak, as such pattern can happen in Python app without memleak, it could just be that there are allocations, but later GC frees some memory. However here we have the case that the memory is never freed. Wait couple of hours, you should see GBs of memory eaten :)

I noticed that if env.step(env.action_space.sample()) line is replaced with done = True, then the used memory doesn't increase.

So, calls to env.reset() are fine, calls to env.step() are causing the leak.

@AdamStelmaszczyk

This comment has been minimized.

Show comment
Hide comment
@AdamStelmaszczyk

AdamStelmaszczyk Aug 14, 2017

Contributor

If this line is replaced with pass, then the used memory doesn't increase:

manager.integrate(self.osim_model.state)
Contributor

AdamStelmaszczyk commented Aug 14, 2017

If this line is replaced with pass, then the used memory doesn't increase:

manager.integrate(self.osim_model.state)
@AdamStelmaszczyk

This comment has been minimized.

Show comment
Hide comment
@AdamStelmaszczyk

AdamStelmaszczyk Aug 14, 2017

Contributor

I just searched for "leak" in opensim-core repo... one search hit:

{
    double* xValues = new double[2];
    xValues[0] = -1.0;
    xValues[1] = 1.0;
    return xValues; // possible memory leak
}

This looks like a bug, however I don't know if it's part of our leak... It's in XYFunctionInterface::getXValues.

A bit lower similar leak in getYValues.

Issues with word "leak" in OpenSim.

Ok, so what I wrote above:

Or, there is simply a leak during next_frame() execution, totally inside OpenSim C++ code, but that seem to me less likely, they would probably spot it before or maybe even they auto test the code with valgrind or similar in CI.

Now it seems to me that memory leaks are simply in OpenSim C++ code.

I have troubles with reproducing the error. Can you tell me more about your environment (os, python version)?

@kidzik could you please tell what env and OS are you using?

Contributor

AdamStelmaszczyk commented Aug 14, 2017

I just searched for "leak" in opensim-core repo... one search hit:

{
    double* xValues = new double[2];
    xValues[0] = -1.0;
    xValues[1] = 1.0;
    return xValues; // possible memory leak
}

This looks like a bug, however I don't know if it's part of our leak... It's in XYFunctionInterface::getXValues.

A bit lower similar leak in getYValues.

Issues with word "leak" in OpenSim.

Ok, so what I wrote above:

Or, there is simply a leak during next_frame() execution, totally inside OpenSim C++ code, but that seem to me less likely, they would probably spot it before or maybe even they auto test the code with valgrind or similar in CI.

Now it seems to me that memory leaks are simply in OpenSim C++ code.

I have troubles with reproducing the error. Can you tell me more about your environment (os, python version)?

@kidzik could you please tell what env and OS are you using?

@kidzik

This comment has been minimized.

Show comment
Hide comment
@kidzik

kidzik Aug 14, 2017

Collaborator

Thank you very very much! I will check it on my machine and discuss it with the OpenSim team.

Collaborator

kidzik commented Aug 14, 2017

Thank you very very much! I will check it on my machine and discuss it with the OpenSim team.

@AdamStelmaszczyk

This comment has been minimized.

Show comment
Hide comment
@AdamStelmaszczyk

AdamStelmaszczyk Aug 21, 2017

Contributor

@kidzik Any news?

Contributor

AdamStelmaszczyk commented Aug 21, 2017

@kidzik Any news?

@kidzik

This comment has been minimized.

Show comment
Hide comment
@kidzik

kidzik Aug 22, 2017

Collaborator

I really have troubles reproducing the error. Your code on my machine outputs

(opensim-rl) lukasz@lukasz-MOBILIZE-NMBL:~/workspace/osim-rl/tests$ python test.leak.py 
Updating Model file from 30000 to latest format...
Loaded model gait9dof18musc_Thelen_BigSpheres.osim from file /home/lukasz/anaconda2/envs/opensim-rl/lib/python2.7/site-packages/osim/env/../models/gait9dof18musc.osim
17:33:15 Episode 1, steps 120, memory 128196608
17:33:20 Episode 2, steps 117, memory 128196608
17:33:24 Episode 3, steps 121, memory 128196608
17:33:29 Episode 4, steps 118, memory 128196608
17:33:33 Episode 5, steps 116, memory 128196608
17:33:37 Episode 6, steps 116, memory 128196608
17:33:42 Episode 7, steps 117, memory 128196608
17:33:47 Episode 8, steps 120, memory 128196608
17:33:51 Episode 9, steps 118, memory 128196608
17:33:56 Episode 10, steps 119, memory 128196608
17:34:00 Episode 11, steps 119, memory 128196608
17:34:05 Episode 12, steps 118, memory 128196608
17:34:10 Episode 13, steps 118, memory 128196608
17:34:14 Episode 14, steps 119, memory 128196608
17:34:19 Episode 15, steps 118, memory 128196608
17:34:23 Episode 16, steps 119, memory 128196608
17:34:28 Episode 17, steps 116, memory 128196608
17:34:32 Episode 18, steps 117, memory 128196608
17:34:37 Episode 19, steps 117, memory 128196608
17:34:41 Episode 20, steps 118, memory 128196608
17:34:46 Episode 21, steps 120, memory 128196608
17:34:50 Episode 22, steps 118, memory 128196608
17:34:55 Episode 23, steps 119, memory 128196608
17:34:59 Episode 24, steps 118, memory 128196608
17:35:04 Episode 25, steps 116, memory 128196608
17:35:08 Episode 26, steps 117, memory 128196608
17:35:13 Episode 27, steps 121, memory 128196608
17:35:17 Episode 28, steps 118, memory 128196608
17:35:22 Episode 29, steps 119, memory 128196608
17:35:26 Episode 30, steps 119, memory 128196608
17:35:31 Episode 31, steps 119, memory 128196608
17:35:36 Episode 32, steps 118, memory 128196608
17:35:40 Episode 33, steps 117, memory 128196608
17:35:44 Episode 34, steps 117, memory 128196608
17:35:49 Episode 35, steps 119, memory 128196608
17:35:54 Episode 36, steps 119, memory 128196608
17:35:58 Episode 37, steps 118, memory 128196608
17:36:03 Episode 38, steps 118, memory 128196608
17:36:07 Episode 39, steps 118, memory 128196608
17:36:12 Episode 40, steps 120, memory 128196608
17:36:16 Episode 41, steps 117, memory 128196608
17:36:21 Episode 42, steps 116, memory 128196608
17:36:25 Episode 43, steps 118, memory 128196608
17:36:30 Episode 44, steps 117, memory 128196608
17:36:34 Episode 45, steps 118, memory 128196608
17:36:39 Episode 46, steps 119, memory 128196608
17:36:43 Episode 47, steps 117, memory 128196608
17:36:48 Episode 48, steps 119, memory 128196608
17:36:52 Episode 49, steps 118, memory 128196608
17:36:57 Episode 50, steps 118, memory 128196608
...

and it's still running stable at least after 100 episodes.

Here is my environment

(opensim-rl) lukasz@lukasz-MOBILIZE-NMBL:~$ conda list
# packages in environment at /home/lukasz/anaconda2/envs/opensim-rl:
#
certifi                   2017.4.17                 <pip>
chardet                   3.0.4                     <pip>
freeglut                  3.0.0                         4    kidzik
funcsigs                  1.0.2                    py27_0  
gym                       0.9.2                     <pip>
h5py                      2.7.0               np113py27_0  
hdf5                      1.8.17                        1  
idna                      2.5                       <pip>
keras                     2.0.2                    py27_0  
keras-rl                  0.3.0                     <pip>
libgcc                    5.2.0                         0  
libgfortran               3.0.0                         1  
libgpuarray               0.6.4                         0  
libprotobuf               3.2.0                         0  
mako                      1.0.6                    py27_0  
markupsafe                0.23                     py27_2  
mkl                       2017.0.1                      0  
mkl-service               1.1.2                    py27_3  
mock                      2.0.0                    py27_0  
nose                      1.3.7                    py27_1  
numpy                     1.13.0                   py27_0  
numpy                     1.13.1                    <pip>
openblas                  0.2.19                        0    kidzik
opensim                   4.0.0                   py27_12    kidzik
openssl                   1.0.2l                        0  
osim-rl                   1.4.1                     <pip>
pbr                       1.10.0                   py27_0  
pip                       9.0.1                    py27_1  
protobuf                  3.2.0                    py27_0  
psutil                    5.2.2                     <pip>
pyglet                    1.2.4                     <pip>
pygpu                     0.6.4                    py27_1  
python                    2.7.13                        0  
pyyaml                    3.12                     py27_0  
readline                  6.2                           2  
requests                  2.18.2                    <pip>
scipy                     0.19.0              np113py27_0  
setuptools                27.2.0                   py27_0  
six                       1.10.0                   py27_0  
six                       1.10.0                    <pip>
sqlite                    3.13.0                        0  
tabulate                  0.7.7                     <pip>
tensorflow                1.1.0                    py27_0    conda-forge
theano                    0.9.0                    py27_0  
tk                        8.5.18                        0  
urllib3                   1.22                      <pip>
werkzeug                  0.12.2                   py27_0  
wheel                     0.29.0                   py27_0  
yaml                      0.1.6                         0  
zlib                      1.2.8                         3  

Ubuntu 16.04 and my kernel is 4.4.0-72-generic.

It clearly not a proof that things are ok -- it just shows that it'll be harder to fix... I will try to reproduce it on AWS.

Collaborator

kidzik commented Aug 22, 2017

I really have troubles reproducing the error. Your code on my machine outputs

(opensim-rl) lukasz@lukasz-MOBILIZE-NMBL:~/workspace/osim-rl/tests$ python test.leak.py 
Updating Model file from 30000 to latest format...
Loaded model gait9dof18musc_Thelen_BigSpheres.osim from file /home/lukasz/anaconda2/envs/opensim-rl/lib/python2.7/site-packages/osim/env/../models/gait9dof18musc.osim
17:33:15 Episode 1, steps 120, memory 128196608
17:33:20 Episode 2, steps 117, memory 128196608
17:33:24 Episode 3, steps 121, memory 128196608
17:33:29 Episode 4, steps 118, memory 128196608
17:33:33 Episode 5, steps 116, memory 128196608
17:33:37 Episode 6, steps 116, memory 128196608
17:33:42 Episode 7, steps 117, memory 128196608
17:33:47 Episode 8, steps 120, memory 128196608
17:33:51 Episode 9, steps 118, memory 128196608
17:33:56 Episode 10, steps 119, memory 128196608
17:34:00 Episode 11, steps 119, memory 128196608
17:34:05 Episode 12, steps 118, memory 128196608
17:34:10 Episode 13, steps 118, memory 128196608
17:34:14 Episode 14, steps 119, memory 128196608
17:34:19 Episode 15, steps 118, memory 128196608
17:34:23 Episode 16, steps 119, memory 128196608
17:34:28 Episode 17, steps 116, memory 128196608
17:34:32 Episode 18, steps 117, memory 128196608
17:34:37 Episode 19, steps 117, memory 128196608
17:34:41 Episode 20, steps 118, memory 128196608
17:34:46 Episode 21, steps 120, memory 128196608
17:34:50 Episode 22, steps 118, memory 128196608
17:34:55 Episode 23, steps 119, memory 128196608
17:34:59 Episode 24, steps 118, memory 128196608
17:35:04 Episode 25, steps 116, memory 128196608
17:35:08 Episode 26, steps 117, memory 128196608
17:35:13 Episode 27, steps 121, memory 128196608
17:35:17 Episode 28, steps 118, memory 128196608
17:35:22 Episode 29, steps 119, memory 128196608
17:35:26 Episode 30, steps 119, memory 128196608
17:35:31 Episode 31, steps 119, memory 128196608
17:35:36 Episode 32, steps 118, memory 128196608
17:35:40 Episode 33, steps 117, memory 128196608
17:35:44 Episode 34, steps 117, memory 128196608
17:35:49 Episode 35, steps 119, memory 128196608
17:35:54 Episode 36, steps 119, memory 128196608
17:35:58 Episode 37, steps 118, memory 128196608
17:36:03 Episode 38, steps 118, memory 128196608
17:36:07 Episode 39, steps 118, memory 128196608
17:36:12 Episode 40, steps 120, memory 128196608
17:36:16 Episode 41, steps 117, memory 128196608
17:36:21 Episode 42, steps 116, memory 128196608
17:36:25 Episode 43, steps 118, memory 128196608
17:36:30 Episode 44, steps 117, memory 128196608
17:36:34 Episode 45, steps 118, memory 128196608
17:36:39 Episode 46, steps 119, memory 128196608
17:36:43 Episode 47, steps 117, memory 128196608
17:36:48 Episode 48, steps 119, memory 128196608
17:36:52 Episode 49, steps 118, memory 128196608
17:36:57 Episode 50, steps 118, memory 128196608
...

and it's still running stable at least after 100 episodes.

Here is my environment

(opensim-rl) lukasz@lukasz-MOBILIZE-NMBL:~$ conda list
# packages in environment at /home/lukasz/anaconda2/envs/opensim-rl:
#
certifi                   2017.4.17                 <pip>
chardet                   3.0.4                     <pip>
freeglut                  3.0.0                         4    kidzik
funcsigs                  1.0.2                    py27_0  
gym                       0.9.2                     <pip>
h5py                      2.7.0               np113py27_0  
hdf5                      1.8.17                        1  
idna                      2.5                       <pip>
keras                     2.0.2                    py27_0  
keras-rl                  0.3.0                     <pip>
libgcc                    5.2.0                         0  
libgfortran               3.0.0                         1  
libgpuarray               0.6.4                         0  
libprotobuf               3.2.0                         0  
mako                      1.0.6                    py27_0  
markupsafe                0.23                     py27_2  
mkl                       2017.0.1                      0  
mkl-service               1.1.2                    py27_3  
mock                      2.0.0                    py27_0  
nose                      1.3.7                    py27_1  
numpy                     1.13.0                   py27_0  
numpy                     1.13.1                    <pip>
openblas                  0.2.19                        0    kidzik
opensim                   4.0.0                   py27_12    kidzik
openssl                   1.0.2l                        0  
osim-rl                   1.4.1                     <pip>
pbr                       1.10.0                   py27_0  
pip                       9.0.1                    py27_1  
protobuf                  3.2.0                    py27_0  
psutil                    5.2.2                     <pip>
pyglet                    1.2.4                     <pip>
pygpu                     0.6.4                    py27_1  
python                    2.7.13                        0  
pyyaml                    3.12                     py27_0  
readline                  6.2                           2  
requests                  2.18.2                    <pip>
scipy                     0.19.0              np113py27_0  
setuptools                27.2.0                   py27_0  
six                       1.10.0                   py27_0  
six                       1.10.0                    <pip>
sqlite                    3.13.0                        0  
tabulate                  0.7.7                     <pip>
tensorflow                1.1.0                    py27_0    conda-forge
theano                    0.9.0                    py27_0  
tk                        8.5.18                        0  
urllib3                   1.22                      <pip>
werkzeug                  0.12.2                   py27_0  
wheel                     0.29.0                   py27_0  
yaml                      0.1.6                         0  
zlib                      1.2.8                         3  

Ubuntu 16.04 and my kernel is 4.4.0-72-generic.

It clearly not a proof that things are ok -- it just shows that it'll be harder to fix... I will try to reproduce it on AWS.

@AdamStelmaszczyk

This comment has been minimized.

Show comment
Hide comment
@AdamStelmaszczyk

AdamStelmaszczyk Aug 22, 2017

Contributor

That sheds some light.

So, with Python 2 and 3, kernel 4.8.0-53, there is memory leak.

But, with Python 2 and kernel 4.4.0-72 you don't experience it. However, @ViktorM with these two (Python and kernel version) being the same, experiences leaks.

So, it looks like there is some other difference between these envs that matters.

If you are not experiencing leaks, then the leaks in C++ code of OpenSim apparently aren't on the execution path.

So the open question is, what's the source of leak. If all above is true, it must be something that's different between yours and @ViktorM env. Any ideas what it may be?

Could you please try with the same env as me, with Python 3.6?

Contributor

AdamStelmaszczyk commented Aug 22, 2017

That sheds some light.

So, with Python 2 and 3, kernel 4.8.0-53, there is memory leak.

But, with Python 2 and kernel 4.4.0-72 you don't experience it. However, @ViktorM with these two (Python and kernel version) being the same, experiences leaks.

So, it looks like there is some other difference between these envs that matters.

If you are not experiencing leaks, then the leaks in C++ code of OpenSim apparently aren't on the execution path.

So the open question is, what's the source of leak. If all above is true, it must be something that's different between yours and @ViktorM env. Any ideas what it may be?

Could you please try with the same env as me, with Python 3.6?

@AdamStelmaszczyk

This comment has been minimized.

Show comment
Hide comment
@AdamStelmaszczyk

AdamStelmaszczyk Aug 22, 2017

Contributor

I checked on AWS, it just takes 5 minutes with prepared AMI (Python 2, kernel 4.4.0), following these instructions, plus conda install psutil and copying memleak.py.

No memory leak.

Contributor

AdamStelmaszczyk commented Aug 22, 2017

I checked on AWS, it just takes 5 minutes with prepared AMI (Python 2, kernel 4.4.0), following these instructions, plus conda install psutil and copying memleak.py.

No memory leak.

@AdamStelmaszczyk

This comment has been minimized.

Show comment
Hide comment
@AdamStelmaszczyk

AdamStelmaszczyk Aug 22, 2017

Contributor

On one server with kernel 4.9.30 there is memory leak with Python 3.6, but not with Python 2.7.

Maybe new OpenSim bindings for Python 3.6 are leaking...

Contributor

AdamStelmaszczyk commented Aug 22, 2017

On one server with kernel 4.9.30 there is memory leak with Python 3.6, but not with Python 2.7.

Maybe new OpenSim bindings for Python 3.6 are leaking...

@AdamStelmaszczyk

This comment has been minimized.

Show comment
Hide comment
@AdamStelmaszczyk

AdamStelmaszczyk Aug 22, 2017

Contributor

It's easy to reproduce memory leak with Python 3.6 on AWS with your AMI.

Follow these instructions, then:

conda create -n py3 -c kidzik opensim git tensorflow psutil python=3.6
source activate py3
pip install git+https://github.com/stanfordnmbl/osim-rl.git

Copy and run memleak.py.

You can also run valgrind --leak-check=full python memleak.py and Ctrl+C it after e.g. episode 3.

This was mine output for py3 and py2.

Python 3 has lots of leaks with _PyEval_EvalCodeWithName and _PyEval_EvalFrameDefault, Python 2 has none.

Contributor

AdamStelmaszczyk commented Aug 22, 2017

It's easy to reproduce memory leak with Python 3.6 on AWS with your AMI.

Follow these instructions, then:

conda create -n py3 -c kidzik opensim git tensorflow psutil python=3.6
source activate py3
pip install git+https://github.com/stanfordnmbl/osim-rl.git

Copy and run memleak.py.

You can also run valgrind --leak-check=full python memleak.py and Ctrl+C it after e.g. episode 3.

This was mine output for py3 and py2.

Python 3 has lots of leaks with _PyEval_EvalCodeWithName and _PyEval_EvalFrameDefault, Python 2 has none.

@kidzik

This comment has been minimized.

Show comment
Hide comment
@kidzik

kidzik Aug 22, 2017

Collaborator

One update on my side in Python 2.7:
10:55:07 Episode 14451, steps 116, memory 128729088
so if there is a leak there it's rather marginal.

Indeed, in Python 3.6 I get the leak. As you mentioned, this could indicate that the actual problem might be in bindings and not in this package.

For the challenge, we can't guarantee the full support for Python 3.6 since it's very recent in OpenSim. In particular, my conda build for Python 3.6 includes some commits between the original package and the Python 3.6 update in OpenSim.

I will further investigate it with the team OpenSim team. Thanks for all the updates, it seems we learned that 2.7 might be ok now

Collaborator

kidzik commented Aug 22, 2017

One update on my side in Python 2.7:
10:55:07 Episode 14451, steps 116, memory 128729088
so if there is a leak there it's rather marginal.

Indeed, in Python 3.6 I get the leak. As you mentioned, this could indicate that the actual problem might be in bindings and not in this package.

For the challenge, we can't guarantee the full support for Python 3.6 since it's very recent in OpenSim. In particular, my conda build for Python 3.6 includes some commits between the original package and the Python 3.6 update in OpenSim.

I will further investigate it with the team OpenSim team. Thanks for all the updates, it seems we learned that 2.7 might be ok now

@AdamStelmaszczyk

This comment has been minimized.

Show comment
Hide comment
@AdamStelmaszczyk

AdamStelmaszczyk Aug 23, 2017

Contributor

I installed Python 2.7 OpenSim bindings following this. With SWIG 3.0.10 and g++ 6.3.0.

Now it's leaking.

With opensim from kidzik channel there's no leak.

With what command did you compile opensim that is in kidzik channel (for Python 2)? What g++ and SWIG version?

Contributor

AdamStelmaszczyk commented Aug 23, 2017

I installed Python 2.7 OpenSim bindings following this. With SWIG 3.0.10 and g++ 6.3.0.

Now it's leaking.

With opensim from kidzik channel there's no leak.

With what command did you compile opensim that is in kidzik channel (for Python 2)? What g++ and SWIG version?

@chrisdembia

This comment has been minimized.

Show comment
Hide comment
@chrisdembia

chrisdembia Aug 23, 2017

Member

Now it's leaking.

This supports the idea that the memory leak was introduced in commits to opensim-core between when the python 2.7 and the python 3 opensim packages were created (based on discussion with @kidzik).

Member

chrisdembia commented Aug 23, 2017

Now it's leaking.

This supports the idea that the memory leak was introduced in commits to opensim-core between when the python 2.7 and the python 3 opensim packages were created (based on discussion with @kidzik).

@AdamStelmaszczyk

This comment has been minimized.

Show comment
Hide comment
@AdamStelmaszczyk

AdamStelmaszczyk Aug 23, 2017

Contributor

Good idea, might be.

@kidzik Do you know on what commit opensim-core was when built and stored in channel kidzik (for Python 2)?

Contributor

AdamStelmaszczyk commented Aug 23, 2017

Good idea, might be.

@kidzik Do you know on what commit opensim-core was when built and stored in channel kidzik (for Python 2)?

@kidzik

This comment has been minimized.

Show comment
Hide comment
@kidzik

kidzik Aug 23, 2017

Collaborator

Sure, it's compiled with this conda receipe https://github.com/opensim-org/conda-opensim/blob/master/opensim/meta.yaml, which means it's this commit in opensim-core
399c8d57a779dd5dde2916192f8b92bfc959e269. I don't know which GCC was used though.

Collaborator

kidzik commented Aug 23, 2017

Sure, it's compiled with this conda receipe https://github.com/opensim-org/conda-opensim/blob/master/opensim/meta.yaml, which means it's this commit in opensim-core
399c8d57a779dd5dde2916192f8b92bfc959e269. I don't know which GCC was used though.

@AdamStelmaszczyk

This comment has been minimized.

Show comment
Hide comment
@AdamStelmaszczyk

AdamStelmaszczyk Aug 23, 2017

Contributor

Cheers, I can confirm that with opensim-org/opensim-core@399c8d5 and Python 2 there's no leak.

To find which commit introduced it, git bisect is ideal:

(py2) astelma@chuck:~/opensim-core$ git bisect start
(py2) astelma@chuck:~/opensim-core$ git bisect bad
(py2) astelma@chuck:~/opensim-core$ git bisect good 399c8d57a779dd5dde2916192f8b92bfc959e269
Bisecting: 306 revisions left to test after this (roughly 8 steps)
[3c12ca81228b93c8198aa44d09803bce5e8aa021] Merge branch 'master' into tgcs2017

There are 306 commits, but with binary search one needs ~8 steps.

I will leave this for a volunteer as a good git bisect exercise.

Contributor

AdamStelmaszczyk commented Aug 23, 2017

Cheers, I can confirm that with opensim-org/opensim-core@399c8d5 and Python 2 there's no leak.

To find which commit introduced it, git bisect is ideal:

(py2) astelma@chuck:~/opensim-core$ git bisect start
(py2) astelma@chuck:~/opensim-core$ git bisect bad
(py2) astelma@chuck:~/opensim-core$ git bisect good 399c8d57a779dd5dde2916192f8b92bfc959e269
Bisecting: 306 revisions left to test after this (roughly 8 steps)
[3c12ca81228b93c8198aa44d09803bce5e8aa021] Merge branch 'master' into tgcs2017

There are 306 commits, but with binary search one needs ~8 steps.

I will leave this for a volunteer as a good git bisect exercise.

@kidzik

This comment has been minimized.

Show comment
Hide comment
@kidzik

kidzik Aug 23, 2017

Collaborator

Good point, thanks, Adam! With @chrisdembia we were thinking that for the challenge we may just stick to opensim-org/opensim-core@399c8d5 and cherry pick python3 commits -- this should help us avoid the memory leaks and make the python3 and python2 environments more comparable.

Collaborator

kidzik commented Aug 23, 2017

Good point, thanks, Adam! With @chrisdembia we were thinking that for the challenge we may just stick to opensim-org/opensim-core@399c8d5 and cherry pick python3 commits -- this should help us avoid the memory leaks and make the python3 and python2 environments more comparable.

kidzik added a commit that referenced this issue Sep 9, 2017

@kidzik

This comment has been minimized.

Show comment
Hide comment
@kidzik

kidzik May 13, 2018

Collaborator

Now the new fixed builds for python 3.6 are available on all platforms, so this error shouldn't happen any more.

Collaborator

kidzik commented May 13, 2018

Now the new fixed builds for python 3.6 are available on all platforms, so this error shouldn't happen any more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment