Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error when running rlgames_train #2

Open
fanshi14 opened this issue Jun 6, 2022 · 7 comments
Open

error when running rlgames_train #2

fanshi14 opened this issue Jun 6, 2022 · 7 comments

Comments

@fanshi14
Copy link

fanshi14 commented Jun 6, 2022

Hi all,

Thanks for your awesome contribution.

When I'm trying to play the demo, it occurs the problem for both rlgames_train and rlgames_train_mt as:
$ PYTHON_PATH scripts/rlgames_train_mt.py task=Ant headless=True
/home/USER/.local/share/ov/pkg/isaac_sim-2022.1.0/python.sh: line 46: 60079 Segmentation fault (core dumped) $python_exe "$@" $args There was an error running python

I tried to build conda environment as well, but the same Segmentation fault error comes.

btw, random_policy works well on my side.
PYTHON_PATH -m pip install -e . is done as well.

Desktop: Ubuntu 18.04 + RTX 3080 GPU

Appreciate it if you can give some hints.
Fan

@gavrielstate
Copy link
Contributor

Without more details it's a bit hard to diagnose the problem. It's possible that you've run out of memory - the default parameters are tuned for systems with 64gb of RAM, and Isaac Sim is currently somewhat memory hungry at startup.

Can you try running with num_envs=1024 instead?

@jinseokbae
Copy link

jinseokbae commented Jun 8, 2022

Hi,
I also cannot sure the exact situation that @fanshi14 faced, but I got a similar error log containing Segmentation fault message when launching training code.
My case was due to setting invalid value for num_envs argument, which violated assert(self.batch_size % self.minibatch_size == 0) in rl_games/common/a2c_common.py.
So check this out if you manually changed num_envs in configuration file.

@fanshi14
Copy link
Author

fanshi14 commented Jun 12, 2022

Thank you @gavrielstate @jinseokbae for the kind advice!

I solved the problem on my pc, which is related to:

  • import omni.client in path_utils.py
  • rl-games installation (created an issue in rl-games)

When path_utils is imported in rlgames_train.py, my PC will output:
File "/home/USER/codes/OmniIsaacGymEnvs/omniisaacgymenvs/utils/config_utils/path_utils.py", line 32, in <module> import omni.client File "/home/USER/.local/share/ov/pkg/isaac_sim-2022.1.0/kit/extscore/omni.client/omni/client/__init__.py", line 21, in <module> from ._omniclient import * ImportError: /home/USER/.local/share/ov/pkg/isaac_sim-2022.1.0/kit/extscore/omni.usd.libs/bin/libjs.so: undefined symbol: _ZN32pxrInternal_v0_20__pxrReserved__18Tf_PostErrorHelperERKNS_13TfCallContextENS_16TfDiagnosticTypeEPKcz

The client's undefined symbol problem might be similar to disucussion1 and discussion2, I guess it is because of the order of import.

Finally, my temporary solution is to comment omni.client related codes in path_utils.py, since these functions are not used in current training scripts.

Maybe it is a special case for my desktop, but just let you know in case it is a potential bug.

@cyzhu-hiter
Copy link

cyzhu-hiter commented Jun 15, 2022

Hi all,

I'm also facing the bug that is the same as @fanshi14 posted. I try to fix that by just reinstalling sim and gym but failed.

So, I switched to another PC and built the environment from the bottom. It works!

chay@chay-MS-7C94:~/Downloads/OmniIsaacGymEnvs/omniisaacgymenvs$ PYTHON_PATH scripts/rlgames_train.py task=Cartpole Traceback (most recent call last): File "scripts/rlgames_train.py", line 34, in <module> from omniisaacgymenvs.utils.config_utils.path_utils import retrieve_checkpoint_path File "/home/chay/Downloads/OmniIsaacGymEnvs/omniisaacgymenvs/utils/config_utils/path_utils.py", line 32, in <module> import omni.client File "/home/chay/.local/share/ov/pkg/isaac_sim-2022.1.0/kit/extscore/omni.client/omni/client/__init__.py", line 21, in <module> from ._omniclient import * ImportError: /home/chay/.local/share/ov/pkg/isaac_sim-2022.1.0/kit/extscore/omni.usd.libs/bin/libjs.so: undefined symbol: _ZN32pxrInternal_v0_20__pxrReserved__18Tf_PostErrorHelperERKNS_13TfCallContextENS_16TfDiagnosticTypeEPKcz There was an error running python

Desktop: Ubuntu 18.04 + RTX 3070Ti

I used the old version of the gym and switch to the new one while sim remains in version 2022.1.0. Is that a potential cause to this problem?

@kellyguo11
Copy link
Collaborator

Thanks for bringing up the issue with the omni.client import. It appears we may have a bad ordering of imports happening.

If you are not planning to load a pre-trained checkpoint, it is safe to comment out the import for now.
Alternatively, the import should be moved to after we initialize VecEnvRLGames in rlgames_train.py.

We will fix this for the next release.

@hpf9017
Copy link

hpf9017 commented Jul 13, 2022

@kellyguo11 When will the next version be released ? Thanks.

@kellyguo11
Copy link
Collaborator

We are targeting August for the next release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants