Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected bus error encountered in worker. #136

Closed
dixonhsiao opened this issue Mar 4, 2020 · 8 comments
Closed

Unexpected bus error encountered in worker. #136

dixonhsiao opened this issue Mar 4, 2020 · 8 comments
Labels
question Further information is requested

Comments

@dixonhsiao
Copy link

I tried to run SLOWFAST_8x8_R50 to inference on a small amount of kinetics-400 test data (like only 5 videos), on a google cloud compute engine machine with 8 K80 GPUs (each has 12GB gpu memory). but it seems that it cannot be run due to some unexpected error:
ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).

Can someone help with this error ? I simply don't know what went wrong .....

@haooooooqi
Copy link
Contributor

Hi
Thanks for playing with PySF!
I am not sure what is the cause of your issue, looks pretty strange to me :P
Could you try too inference with batch size of 1 and let me know what you see?

Thanks,
Haoqi

@haooooooqi haooooooqi added the question Further information is requested label Mar 4, 2020
@dixonhsiao
Copy link
Author

I changed the BATCH_SIZE to 1 (under the TEST label in SLOWFAST_8x8_R50.yaml) and it shows
image

@dixonhsiao
Copy link
Author

By the way, I tried to inference on only 5 videos, which is specified in a test.csv file. So my question is : how exactly to inference on only 1 video, using SLOWFAST_8x8_R50, and is it possible to run on only one GPU ?

@dixonhsiao
Copy link
Author

here is my command for running this:
python3 tools/run_net.py --cfg /home/SlowFast/configs/Kinetics/c2/SLOWFAST_8x8_R50.yaml DATA.PATH_TO_DATA_DIR /home/ActivityNet/Crawler/Kinetics/kinetics-400_test/test/ DATA.PATH_PREFIX /home/ActivityNet/Crawler/Kinetics/kinetics-400_test/test/ TEST.CHECKPOINT_FILE_PATH /home/SlowFast/models/SLOWFAST_8x8_R50.pkl TRAIN.ENABLE False TEST.CHECKPOINT_TYPE caffe2

@haooooooqi
Copy link
Contributor

haooooooqi commented Mar 4, 2020

When you using batch size of 1, could you use 1 GPU (NUM_GPUS: 1)?
Windows!!! :P That explains a lot xD (j/k)

@dixonhsiao
Copy link
Author

It worked !! Thank you !! But it seems that the NUM_WORKERS can only be set to 0, 1, or 2 but not 3 or above in my circumstance... Anyway, one final question: how to extract the embedding (the next-to-last layer's output) but not the predictions ? Thank you~

@haooooooqi
Copy link
Contributor

haooooooqi commented Mar 8, 2020

This is wired, setting NUM_WORKERS to 2 or above seems using the same logic and should not make a difference executable ability.
If you want to extract the embedding, you might need to have something like

x = self.res_stages(x)
# Record the embedding
self.embedding = x
output = self.head(x)

in the forward loop. Then you could get the embedding by embedding = model.embedding. Hope that helps :P

@erdongchendou
Copy link

I tried to run SLOWFAST_8x8_R50 to inference on a small amount of kinetics-400 test data (like only 5 videos), on a google cloud compute engine machine with 8 K80 GPUs (each has 12GB gpu memory). but it seems that it cannot be run due to some unexpected error:
ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).

Can someone help with this error ? I simply don't know what went wrong .....

I suppose you are using docker container for inference. If so, you need to increase shm-size when you create your container as large as possible like this --shm-size 256G.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants