Unexpected bus error encountered in worker. #136

dixonhsiao · 2020-03-04T06:03:04Z

I tried to run SLOWFAST_8x8_R50 to inference on a small amount of kinetics-400 test data (like only 5 videos), on a google cloud compute engine machine with 8 K80 GPUs (each has 12GB gpu memory). but it seems that it cannot be run due to some unexpected error:
ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).

Can someone help with this error ? I simply don't know what went wrong .....

haooooooqi · 2020-03-04T08:31:18Z

Hi
Thanks for playing with PySF!
I am not sure what is the cause of your issue, looks pretty strange to me :P
Could you try too inference with batch size of 1 and let me know what you see?

Thanks,
Haoqi

dixonhsiao · 2020-03-04T09:29:02Z

I changed the BATCH_SIZE to 1 (under the TEST label in SLOWFAST_8x8_R50.yaml) and it shows

dixonhsiao · 2020-03-04T09:34:56Z

By the way, I tried to inference on only 5 videos, which is specified in a test.csv file. So my question is : how exactly to inference on only 1 video, using SLOWFAST_8x8_R50, and is it possible to run on only one GPU ?

dixonhsiao · 2020-03-04T09:38:56Z

here is my command for running this:
python3 tools/run_net.py --cfg /home/SlowFast/configs/Kinetics/c2/SLOWFAST_8x8_R50.yaml DATA.PATH_TO_DATA_DIR /home/ActivityNet/Crawler/Kinetics/kinetics-400_test/test/ DATA.PATH_PREFIX /home/ActivityNet/Crawler/Kinetics/kinetics-400_test/test/ TEST.CHECKPOINT_FILE_PATH /home/SlowFast/models/SLOWFAST_8x8_R50.pkl TRAIN.ENABLE False TEST.CHECKPOINT_TYPE caffe2

haooooooqi · 2020-03-04T09:41:47Z

When you using batch size of 1, could you use 1 GPU (NUM_GPUS: 1)?
Windows!!! :P That explains a lot xD (j/k)

dixonhsiao · 2020-03-04T10:33:53Z

It worked !! Thank you !! But it seems that the NUM_WORKERS can only be set to 0, 1, or 2 but not 3 or above in my circumstance... Anyway, one final question: how to extract the embedding (the next-to-last layer's output) but not the predictions ? Thank you~

haooooooqi · 2020-03-08T01:09:22Z

This is wired, setting NUM_WORKERS to 2 or above seems using the same logic and should not make a difference executable ability.
If you want to extract the embedding, you might need to have something like

x = self.res_stages(x)
# Record the embedding
self.embedding = x
output = self.head(x)

in the forward loop. Then you could get the embedding by embedding = model.embedding. Hope that helps :P

erdongchendou · 2020-03-11T03:13:29Z

I tried to run SLOWFAST_8x8_R50 to inference on a small amount of kinetics-400 test data (like only 5 videos), on a google cloud compute engine machine with 8 K80 GPUs (each has 12GB gpu memory). but it seems that it cannot be run due to some unexpected error:
ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).

Can someone help with this error ? I simply don't know what went wrong .....

I suppose you are using docker container for inference. If so, you need to increase shm-size when you create your container as large as possible like this --shm-size 256G.

haooooooqi added the question Further information is requested label Mar 4, 2020

haooooooqi closed this as completed Mar 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unexpected bus error encountered in worker. #136

Unexpected bus error encountered in worker. #136

dixonhsiao commented Mar 4, 2020

haooooooqi commented Mar 4, 2020

dixonhsiao commented Mar 4, 2020

dixonhsiao commented Mar 4, 2020

dixonhsiao commented Mar 4, 2020

haooooooqi commented Mar 4, 2020 •

edited

dixonhsiao commented Mar 4, 2020

haooooooqi commented Mar 8, 2020 •

edited

erdongchendou commented Mar 11, 2020

Unexpected bus error encountered in worker. #136

Unexpected bus error encountered in worker. #136

Comments

dixonhsiao commented Mar 4, 2020

haooooooqi commented Mar 4, 2020

dixonhsiao commented Mar 4, 2020

dixonhsiao commented Mar 4, 2020

dixonhsiao commented Mar 4, 2020

haooooooqi commented Mar 4, 2020 • edited

dixonhsiao commented Mar 4, 2020

haooooooqi commented Mar 8, 2020 • edited

erdongchendou commented Mar 11, 2020

haooooooqi commented Mar 4, 2020 •

edited

haooooooqi commented Mar 8, 2020 •

edited