-
Notifications
You must be signed in to change notification settings - Fork 412
[BugFix]: update linux setup_env to include libosmesa6 and libgl1-mesa-glx #335
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should be using yum instead
else | ||
# Software rendering requires GLX and OSMesa. | ||
apt update | ||
apt install -y libgl1-mesa-glx libosmesa6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be using yum and not apt IIRC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep functorch installation in install.sh
PRIVATE_MUJOCO_GL=glfw | ||
else | ||
# Software rendering requires GLX and OSMesa. | ||
yum install -y mesa-libOSMesa-devel.x86_64 mesa-libGL-devel.x86_64 mesa-libGLU-devel.x86_64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need to install osmesa via conda after that?
Also, do you think we could make it work with MUJOCO_GL=egl
? It should speed up some tests by 2 orders of magnitude or so...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can def update and see what happens :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that now egl is failing.
Sometimes it works when you ssh on the CircleCI instance (see my opened issue on Mujoco, link in the issue description).
They suggested that if it is the case, we should look at the env variable with and without ssh.
I don't know if you have the credentials to ssh on those machines though...
That being said, if tests pass with osmesa now I'm already more than happy! Solving it for egl is really more like going the extra mile
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haha okay I will revert --> I probably don't have ssh access since this is my first week...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cpu tests are passing just fine -- but none of the gpu tests passed. Any suggestion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the core of this issue:
rendering for dm_control is tested with GPUs. It used to work, but now that the osmesa we installed via conda is not accessible, we must find a workaround.
I would suggest trying to ssh onto the machine: click on the tests that fails and then ssh with this
It's very likely that you won't have the credentials to do that tho...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have the write permission to rerun the job with SSH...
I remember in the bug report you mentioned that when logging through SSH the bug disappears. Is it still true in the current PR? Could you help trigger it and test it? TY!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tongbaojia
I have done a bit of debugging.
With 93fb291, running the run_tests.sh
as is did not succeed. After a few attemps, I realized that installing
pip3 install pyrender
pip3 install pyopengl --upgrade
in that order worked! You must make sure that
export MUJOCO_GL=egl
export PYOPENGL_PLATFORM=egl
(they should be properly set in the conda env)
The tests seem to run fine with that, but it's in ssh (might break in pure headless).
Hope that helps!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the script of run_tests.sh that i managed to run:
#!/usr/bin/env bash
set -e
eval "$(./conda/bin/conda shell.bash hook)"
conda activate ./env
pip3 install pyrender
pip3 install pyopengl --upgrade
export MUJOCO_GL=egl
export PYOPENGL_PLATFORM=egl
export PYTORCH_TEST_WITH_SLOW='1'
python -m torch.utils.collect_env
# Avoid error: "fatal: unsafe repository"
git config --global --add safe.directory '*'
root_dir="$(git rev-parse --show-toplevel)"
env_dir="${root_dir}/env"
lib_dir="${env_dir}/lib"
# solves ImportError: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.21' not found
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$lib_dir
export MKL_THREADING_LAYER=GNU
pytest test/smoke_test.py -v --durations 20
pytest test/smoke_test_deps.py -v --durations 20
python3 test/test_libs.py # those are the tests that should break
pytest --instafail -v --durations 20
I have added 5 lines of code, for debugging. The packages should be installed elsewhere (in setup_env.sh) and the env variable should be set in the conda env. The test/test_lib.py is already run by pytest so we can remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use egl not osmesa
Closing as of #339 |
Description
Trying to fix the bug in #329.
Motivation and Context
The PR should fix the rendering of dm_control.
Types of changes
What types of changes does your code introduce? Remove all that do not apply:
Checklist
Go over all the following points, and put an
x
in all the boxes that apply.If you are unsure about any of these, don't hesitate to ask. We are here to help!