Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I can run the program but the output video isn't visible #2

Closed
QiuJunning opened this issue Aug 30, 2021 · 27 comments
Closed

I can run the program but the output video isn't visible #2

QiuJunning opened this issue Aug 30, 2021 · 27 comments

Comments

@QiuJunning
Copy link

I installed maniskill and maniskill-learn according to readme and run the example:
python -m tools.run_rl configs/bc/mani_skill_point_cloud_transformer.py
--gpu-ids=3 --cfg-options "env_cfg.env_name=OpenCabinetDrawer_1045_link_0-v0"
"eval_cfg.save_video=True" "eval_cfg.num=1" "eval_cfg.use_log=True"
--work-dir=./test/OpenCabinetDrawer_1045_link_0-v0_pcd
--resume-from=./example_mani_skill_data/OpenCabinetDrawer_1045_link_0-v0_PN_Transformer.ckpt --evaluation
The program can run,but the video of test looks black:

image

Hope to get your help, thank you!
The program log is as follows:

INFO - 2021-08-30 09:39:12,600 - utils - Note: detected 72 virtual cores but NumExpr set to maximum of 64, check "NUMEXPR_MAX_THREADS" environment variable.
INFO - 2021-08-30 09:39:12,600 - utils - Note: NumExpr detected 72 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
Size of image in the rendered video (160, 400, 3)
/bin/sh: 1: /home/qjn/miniconda/envs/mani_skill/bin/nvcc: not found
OpenCabinetDrawer_1045_link_0-v0 - INFO - 2021-08-30 09:39:23 - Environment info:

sys.platform: linux
Python: 3.8.10 (default, Jun 4 2021, 15:09:15) [GCC 7.5.0]
CUDA available: True
GPU 0,1,2,5: Quadro RTX 8000
GPU 3,4: NVIDIA GeForce RTX 2080 Ti
CUDA_HOME: /home/qjn/miniconda/envs/mani_skill
NVCC:
Num of GPUs: 6
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.8.0+cu111
PyTorch compiling details: PyTorch built with:

  • GCC 7.3
  • C++ Version: 201402
  • Intel(R) oneAPI Math Kernel Library Version 2021.3-Product Build 20210617 for Intel(R) 64 architecture applications
  • Intel(R) MKL-DNN v1.7.0 (Git Hash 7aed236906b1f7a05c0917e5257a1af05e9ff683)
  • OpenMP 201511 (a.k.a. OpenMP 4.5)
  • NNPACK is enabled
  • CPU capability usage: AVX2
  • CUDA Runtime 11.1
  • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
  • CuDNN 8.0.5
  • Magma 2.5.2
  • Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.8.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

TorchVision: 0.9.0+cu111
OpenCV: 4.5.3
mani_skill_learn: 1.0.0

OpenCabinetDrawer_1045_link_0-v0 - INFO - 2021-08-30 09:39:23 - Config:
log_level = 'INFO'
stack_frame = 1
num_heads = 4
agent = dict(
type='BC',
batch_size=128,
policy_cfg=dict(
type='ContinuousPolicy',
policy_head_cfg=dict(type='DeterministicHead', noise_std=1e-05),
nn_cfg=dict(
type='PointNetWithInstanceInfoV0',
stack_frame=1,
num_objs='num_objs',
pcd_pn_cfg=dict(
type='PointNetV0',
conv_cfg=dict(
type='ConvMLP',
norm_cfg=None,
mlp_spec=['agent_shape + pcd_xyz_rgb_channel', 256, 256],
bias='auto',
inactivated_output=True,
conv_init_cfg=dict(type='xavier_init', gain=1, bias=0)),
mlp_cfg=dict(
type='LinearMLP',
norm_cfg=None,
mlp_spec=[256, 256, 256],
bias='auto',
inactivated_output=True,
linear_init_cfg=dict(type='xavier_init', gain=1, bias=0)),
subtract_mean_coords=True,
max_mean_mix_aggregation=True),
state_mlp_cfg=dict(
type='LinearMLP',
norm_cfg=None,
mlp_spec=['agent_shape', 256, 256],
bias='auto',
inactivated_output=True,
linear_init_cfg=dict(type='xavier_init', gain=1, bias=0)),
transformer_cfg=dict(
type='TransformerEncoder',
block_cfg=dict(
attention_cfg=dict(
type='MultiHeadSelfAttention',
embed_dim=256,
num_heads=4,
latent_dim=32,
dropout=0.1),
mlp_cfg=dict(
type='LinearMLP',
norm_cfg=None,
mlp_spec=[256, 1024, 256],
bias='auto',
inactivated_output=True,
linear_init_cfg=dict(
type='xavier_init', gain=1, bias=0)),
dropout=0.1),
pooling_cfg=dict(embed_dim=256, num_heads=4, latent_dim=32),
mlp_cfg=None,
num_blocks=6),
final_mlp_cfg=dict(
type='LinearMLP',
norm_cfg=None,
mlp_spec=[256, 256, 'action_shape'],
bias='auto',
inactivated_output=True,
linear_init_cfg=dict(type='xavier_init', gain=1, bias=0))),
optim_cfg=dict(type='Adam', lr=0.0003, weight_decay=5e-06)))
eval_cfg = dict(
type='Evaluation',
num=1,
num_procs=1,
use_hidden_state=False,
start_state=None,
save_traj=True,
save_video=True,
use_log=True,
env_cfg=dict(
type='gym',
unwrapped=False,
stack_frame=1,
obs_mode='pointcloud',
reward_type='dense',
env_name='OpenCabinetDrawer_1045_link_0-v0'))
train_mfrl_cfg = dict(
on_policy=False,
total_steps=50000,
warm_steps=0,
n_steps=0,
n_updates=500,
n_eval=50000,
n_checkpoint=50000,
init_replay_buffers=
'./example_mani_skill_data/OpenCabinetDrawer_1045_link_0-v0_pcd.h5')
env_cfg = dict(
type='gym',
unwrapped=False,
stack_frame=1,
obs_mode='pointcloud',
reward_type='dense',
env_name='OpenCabinetDrawer_1045_link_0-v0')
replay_cfg = dict(type='ReplayMemory', capacity=1000000)
work_dir = './test/OpenCabinetDrawer_1045_link_0-v0_pcd/BC'
resume_from = './example_mani_skill_data/OpenCabinetDrawer_1045_link_0-v0_PN_Transformer.ckpt'

OpenCabinetDrawer_1045_link_0-v0 - INFO - 2021-08-30 09:39:23 - Set random seed to None
OpenCabinetDrawer_1045_link_0-v0 - INFO - 2021-08-30 09:39:24 - State shape:{'pointcloud': {'rgb': (1200, 3), 'xyz': (1200, 3), 'seg': (1200, 3)}, 'state': 38}, action shape:Box(-1.0, 1.0, (13,), float32)
OpenCabinetDrawer_1045_link_0-v0 - INFO - 2021-08-30 09:39:24 - We do not use distributed training, but we support data parallel in torch
OpenCabinetDrawer_1045_link_0-v0 - INFO - 2021-08-30 09:39:24 - Save trajectory at ./test/OpenCabinetDrawer_1045_link_0-v0_pcd/BC/test/trajectory.h5.
OpenCabinetDrawer_1045_link_0-v0 - INFO - 2021-08-30 09:39:24 - Begin to evaluate
OpenCabinetDrawer_1045_link_0-v0 - INFO - 2021-08-30 09:39:39 - Episode 0: Length 200 Reward: -2865.0203219550845
OpenCabinetDrawer_1045_link_0-v0 - INFO - 2021-08-30 09:39:40 - memory:5.53G gpu_mem_ratio:3.5% gpu_mem:1.65G gpu_mem_this:0.00G gpu_util:4%
OpenCabinetDrawer_1045_link_0-v0 - INFO - 2021-08-30 09:39:40 - Num of trails: 1.00, Length: 200.00+/-0.00, Reward: -2865.02+/-0.00, Success or Early Stop Rate: 0.00

@fbxiang
Copy link

fbxiang commented Aug 30, 2021

I would first try switch to another video player with better codec support (e.g. VLC). Since the video is generated, the code is probably fine.

@lz1oceani
Copy link
Collaborator

I have double-checked, the generated video can be opened on Ubuntu 20.04. If you still cannot open the video with VLC, you can try ffmpeg -i input.mp4 output.xxx to convert the video format to the supported ones on your computer.

@QiuJunning
Copy link
Author

I would first try switch to another video player with better codec support (e.g. VLC). Since the video is generated, the code is probably fine.

Thank you for your reply. I tried to open the video with VLC, but it still didn't work

@fbxiang
Copy link

fbxiang commented Sep 1, 2021

One other thing to try is to replace the video writer with an image writer to verify if the images themselves are not generated correctly.

@QiuJunning
Copy link
Author

QiuJunning commented Sep 1, 2021

I have double-checked, the generated video can be opened on Ubuntu 20.04. If you still cannot open the video with VLC, you can try ffmpeg -i input.mp4 output.xxx to convert the video format to the supported ones on your computer.

Thanks for your advice, I tried VLC but it didn't work. My computer can normally open other MP4 files, and the conversion to other common formats (e.g.avi) does not work either. I want to know whether my output info of the Evaluation on Simple Pretrained Models you provided is correct,thanks!

@lz1oceani
Copy link
Collaborator

The output seems to be correct. You can use this code to check every image in the video.

import cv2
import os.path as osp
filename = "xxx.mp4"
video = cv2.VideoCapture(filename)
video_dir = osp.dirname(filename)
# success, image = video.read()
count = 0
success = True
while success:
     success, image = video.read()
     if success:
        cv2.imwrite(osp.join(video_dir, f"frame_{count}.jpg"), image)
        count += 1
print(count)

@QiuJunning
Copy link
Author

The output seems to be correct. You can use this code to check every image in the video.

import cv2
import os.path as osp
filename = "xxx.mp4"
video = cv2.VideoCapture(filename)
video_dir = osp.dirname(filename)
# success, image = video.read()
count = 0
success = True
while success:
     success, image = video.read()
     if success:
        cv2.imwrite(osp.join(video_dir, f"frame_{count}.jpg"), image)
        count += 1
print(count)

Thanks for the code you provided. The result of check is that the count of all videos is 200, and each frame is a 400×160 black picture and 1.59KB. The picture is as follows:
frame_145

@lz1oceani
Copy link
Collaborator

We find a bug when rendering with multiple gpus. Can you update ManiSkill repo and install a new sapien from https://ucsdcloud-my.sharepoint.com/:u:/g/personal/z6ling_ucsd_edu/EVWaOUz0Cw5MgHIY06H9PxEBcaD5cLUK1VvnhyTibMMGmQ?e=aeNWtm and rerun the scripts? You do not need to set the CUDA_VISIBLE_DEVICES=0 when running the script.

@QiuJunning
Copy link
Author

We find a bug when rendering with multiple gpus. Can you update ManiSkill repo and install a new sapien from https://ucsdcloud-my.sharepoint.com/:u:/g/personal/z6ling_ucsd_edu/EVWaOUz0Cw5MgHIY06H9PxEBcaD5cLUK1VvnhyTibMMGmQ?e=aeNWtm and rerun the scripts? You do not need to set the CUDA_VISIBLE_DEVICES=0 when running the script.

I have updated ManiSkill repo and installed a new sapien.Whether I set CUDA_VISIBLE_DEVICES=0 or not,It still doesn't work.

@lz1oceani
Copy link
Collaborator

OK. I think you can try the following code to see if the env can render images correctly.

import mani_skill.env, gym
env = gym.make('OpenCabinetDrawer-v0')
x = env.render('color_image')['world']['rgb']
print(x)

Or use the following code to view the env with UI.

import mani_skill.env, gym
env = gym.make('OpenCabinetDrawer-v0')
while True:
     env.render('human')

@QiuJunning
Copy link
Author

OK. I think you can try the following code to see if the env can render images correctly.

import mani_skill.env, gym
env = gym.make('OpenCabinetDrawer-v0')
x = env.render('color_image')['world']['rgb']
print(x)

Or use the following code to view the env with UI.

import mani_skill.env, gym
env = gym.make('OpenCabinetDrawer-v0')
while True:
     env.render('human')

I have run the code.The output of x is an all-zero array.So the env can't render images correctly

@lz1oceani
Copy link
Collaborator

OK. Can you open the ui and see what happens?

@lz1oceani
Copy link
Collaborator

Can you provide the version of your Nvidia driver?

@QiuJunning
Copy link
Author

Can you provide the version of your Nvidia driver?

OK,my Nvidia driver is 470.57.02.
Probably because I use a server,there will be an error when using env.render('human'):
RuntimeError: Create window failed: context is not created with present support.

@lz1oceani
Copy link
Collaborator

lz1oceani commented Sep 5, 2021

Can you run python -m sapien.example.offscreen first to double check the sapien renderer? Because maniskill use cupy to get rendered image, which may cause bugs. The output image is a red box.

@QiuJunning
Copy link
Author

Can you run python -m sapien.example.offscreen first to double check the sapien renderer? Because maniskill use cupy to get rendered image, which may cause bugs. The output image is a red box.

OK,the output image does not appear and my results are as follows:
[2021-09-05 12:54:34.519] [svulkan2] [error] GLFW error: X11: The DISPLAY environment variable is missing
[2021-09-05 12:54:34.519] [svulkan2] [warning] Continue without GLFW.
[2021-09-05 12:54:35.200] [SAPIEN] [warning] Mass or inertia contains very small number, this is not allowed. Mass will be set to 1e-6 and inertia will be set to 1e-8 for stability. Actor:

@lz1oceani
Copy link
Collaborator

Is there any image like output.png under the running path?

@lz1oceani
Copy link
Collaborator

By the way, is it possible to try mani skill on another available machine?

@QiuJunning
Copy link
Author

运行路径下有没有类似output.png的图片?

Oh,sorry,I have found the output picture,this is my output.
output

@lz1oceani
Copy link
Collaborator

Can you run pip freeze | grep sapien to check sapien version again.

@QiuJunning
Copy link
Author

Can you run pip freeze | grep sapien to check sapien version again.

this is my output:
sapien @ file:///home/qjn/ManiSkill/ManiSkill-Learn/sapien-1.1.1-cp38-cp38-manylinux2014_x86_64.whl

@lz1oceani
Copy link
Collaborator

lz1oceani commented Sep 5, 2021

OK. I guess the problem may come from cupy and I am asking the teammates to provide a new MainSkill branch without cupy.

@QiuJunning
Copy link
Author

OK. I guess the problem may come from cupy and I am asking the teammates to provide a new MainSkill branch without cupy.

Thanks!Looking forward to your branch.

@lz1oceani
Copy link
Collaborator

lz1oceani commented Sep 7, 2021

You can pull the current ManiSkill branch. And run codes with environment variable NO_CUPY=1. The script is like

NO_CUPY=1 python xxxxx.

Then you can check if the cupy fails to load the image.

@lz1oceani
Copy link
Collaborator

Also you can try https://github.com/haosulab/ManiSkill-Learn/blob/main/scripts/docker/build_docker.sh to build a docker to run the program.

@QiuJunning
Copy link
Author

You can pull the current ManiSkill branch. And run codes with environment variable NO_CUPY=1. The script is like

NO_CUPY=1 python xxxxx.

Then you can check if the cupy fails to load the image.

Thank you very much. It worked

@lz1oceani
Copy link
Collaborator

Hi, Qiu721, thanks for reporting this issue. I will close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants