Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

polybeast is slower than monobeast #10

Closed
Da-Capo opened this issue Dec 30, 2019 · 2 comments
Closed

polybeast is slower than monobeast #10

Da-Capo opened this issue Dec 30, 2019 · 2 comments

Comments

@Da-Capo
Copy link

Da-Capo commented Dec 30, 2019

I build the cuda docker container like this, and tested mono and poly by almost same parameters below:

python -m torchbeast.monobeast \
     --env PongNoFrameskip-v4 \
     --num_actors 64 \
     --total_steps 30000000 \
     --learning_rate 0.0004 \
     --epsilon 0.01 \
     --entropy_cost 0.01 \
     --batch_size 4 \
     --unroll_length 80 \
     --num_buffers 60 \
     --num_threads 4 \
     --xpid example

python -m torchbeast.polybeast \
     --env PongNoFrameskip-v4 \
     --num_actors 64 \
     --total_steps 30000000 \
     --learning_rate 0.0004 \
     --epsilon 0.01 \
     --entropy_cost 0.01 \
     --batch_size 4 \
     --unroll_length 80 \
     --xpid example

I got the result that polybeast is slower than monobeast:
monobeast speed is about 10000SPS.
polybeast speed is about 3000SPS.
I have checked GPU, it works fine. monobeast used 100% of every CPU processor, but polybeast used only 50% of every CPU processor.
How can I speed up the polybeast?

@heiner
Copy link
Contributor

heiner commented Jan 6, 2020

Hey Da-Capo, thanks for your report.

I think to start with a batch size of 4 isn't very large. The reason your CPUs are less busy for Polybeast is that the actor forward passes ("inference") happen on the GPU in that case. Options include:

  • Increase batch size
  • Use different GPUs for inference and learning
  • Potentially increase the number of parallel inference and learner threads
  • You can also play around with the unroll length

@Da-Capo Da-Capo closed this as completed Mar 20, 2020
@MachengShen
Copy link

I'm having similar issue on a ubuntu machine with 32cpu cores, and 4 V100 gpus. with monobeast, it only uses 1 gpu, and full cpu power, the frame rate is ~5000SPS; while with polybeast, I set batch_size=16, num_inference/learner_threads=8, but the frame rate is only ~300SPS, and only 2 gpus are running. Were you able to speed up polybeast? Can you share some insight with me? Thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants