Skip to content

Conversation

tongbaojia
Copy link
Contributor

Description

Trying to fix the bug in #329.

Motivation and Context

The PR should fix the rendering of dm_control.

  • I have raised an issue to propose this change (required for new features and bug fixes)

Types of changes

What types of changes does your code introduce? Remove all that do not apply:

  • Bug fix (non-breaking change which fixes an issue)

Checklist

Go over all the following points, and put an x in all the boxes that apply.
If you are unsure about any of these, don't hesitate to ask. We are here to help!

  • I have read the CONTRIBUTION guide (required)
  • My change requires a change to the documentation.
  • I have updated the tests accordingly (required for a bug fix or a new feature).
  • I have updated the documentation accordingly.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 3, 2022
Copy link
Collaborator

@vmoens vmoens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should be using yum instead

else
# Software rendering requires GLX and OSMesa.
apt update
apt install -y libgl1-mesa-glx libosmesa6
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be using yum and not apt IIRC

Copy link
Collaborator

@vmoens vmoens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep functorch installation in install.sh

@vmoens vmoens added bug Something isn't working CI Has to do with CI setup (e.g. wheels & builds, tests...) labels Aug 3, 2022
PRIVATE_MUJOCO_GL=glfw
else
# Software rendering requires GLX and OSMesa.
yum install -y mesa-libOSMesa-devel.x86_64 mesa-libGL-devel.x86_64 mesa-libGLU-devel.x86_64
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need to install osmesa via conda after that?
Also, do you think we could make it work with MUJOCO_GL=egl? It should speed up some tests by 2 orders of magnitude or so...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can def update and see what happens :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that now egl is failing.
Sometimes it works when you ssh on the CircleCI instance (see my opened issue on Mujoco, link in the issue description).
They suggested that if it is the case, we should look at the env variable with and without ssh.
I don't know if you have the credentials to ssh on those machines though...
That being said, if tests pass with osmesa now I'm already more than happy! Solving it for egl is really more like going the extra mile

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haha okay I will revert --> I probably don't have ssh access since this is my first week...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cpu tests are passing just fine -- but none of the gpu tests passed. Any suggestion?

Copy link
Collaborator

@vmoens vmoens Aug 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the core of this issue:
rendering for dm_control is tested with GPUs. It used to work, but now that the osmesa we installed via conda is not accessible, we must find a workaround.
I would suggest trying to ssh onto the machine: click on the tests that fails and then ssh with this
image

It's very likely that you won't have the credentials to do that tho...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have the write permission to rerun the job with SSH...
I remember in the bug report you mentioned that when logging through SSH the bug disappears. Is it still true in the current PR? Could you help trigger it and test it? TY!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tongbaojia
I have done a bit of debugging.
With 93fb291, running the run_tests.sh as is did not succeed. After a few attemps, I realized that installing

pip3 install pyrender
pip3 install pyopengl --upgrade

in that order worked! You must make sure that

export MUJOCO_GL=egl
export PYOPENGL_PLATFORM=egl

(they should be properly set in the conda env)

The tests seem to run fine with that, but it's in ssh (might break in pure headless).

Hope that helps!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the script of run_tests.sh that i managed to run:

#!/usr/bin/env bash

set -e

eval "$(./conda/bin/conda shell.bash hook)"
conda activate ./env

pip3 install pyrender
pip3 install pyopengl --upgrade

export MUJOCO_GL=egl
export PYOPENGL_PLATFORM=egl

export PYTORCH_TEST_WITH_SLOW='1'
python -m torch.utils.collect_env
# Avoid error: "fatal: unsafe repository"
git config --global --add safe.directory '*'

root_dir="$(git rev-parse --show-toplevel)"
env_dir="${root_dir}/env"
lib_dir="${env_dir}/lib"

# solves ImportError: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.21' not found
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$lib_dir
export MKL_THREADING_LAYER=GNU

pytest test/smoke_test.py -v --durations 20
pytest test/smoke_test_deps.py -v --durations 20
python3 test/test_libs.py  # those are the tests that should break
pytest --instafail -v --durations 20

I have added 5 lines of code, for debugging. The packages should be installed elsewhere (in setup_env.sh) and the env variable should be set in the conda env. The test/test_lib.py is already run by pytest so we can remove it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vmoens I tired to just use the run_tests.sh in a8e52ae. Unfortunately it doesn't seem to work in headless mode.

Could it be the ssh process changed some display env variables?

Copy link
Collaborator

@vmoens vmoens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use egl not osmesa

@vmoens
Copy link
Collaborator

vmoens commented Aug 6, 2022

Closing as of #339

@vmoens vmoens closed this Aug 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CI Has to do with CI setup (e.g. wheels & builds, tests...) CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants