Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doesn't use both threads on each core when running on CPU #27

Open
interfect opened this issue Jun 29, 2022 · 3 comments
Open

Doesn't use both threads on each core when running on CPU #27

interfect opened this issue Jun 29, 2022 · 3 comments

Comments

@interfect
Copy link
Contributor

interfect commented Jun 29, 2022

I'm using a machine with an 8 core, 16 thread CPU, and plenty of CPU memory, but no GPU compatible with current ML toolkits; ROCm dropped support for my hardware a while ago. So I want to run this on CPU as efficiently as I can.

Unfortunately, it seems like I am only getting about half about 80% of the performance I think I ought to be out of the CPU backend.

When I run time python image_from_text.py --text='alien life' --seed=7 --no-torch, it seems to only be able to use one thread on each two-thread core, gets up to about 500% CPU in htop, and reports:

real	1m9.092s
user	5m16.416s
sys	0m10.788s

When I run time python image_from_text.py --text='alien life' --seed=7 --torch, I did manage to catch it at more like 700% CPU. It runs a bit faster but still doesn't seem to be fully using my CPU:

real	0m51.015s
user	4m54.248s
sys	0m14.651s

I also get this different and much more terrifying image; I figured the same seed would produce the same result with both engines, but I was wrong.
generated

Anyway, I would expect CPU usage to be closer to 16000%, and user + sys times to be more like 16x real times, if I was actually managing to use both threads on each of the 8 physical cores at full tilt.

Is there something about the backend that is causing it to only try and use one thread per full core, and not one thread per hardware thread? Is that something that I can change?

@interfect
Copy link
Contributor Author

I tried running two runs at the same time, and I got about the performance hit I would expect from having something on the other thread in a core, and I saw my CPU get fully utilized. I don't think using both threads in each core would really double performance, but it would increase it by a fair bit.

time python image_from_text.py --text='alien life' --seed=8 --torch running alone:

real	0m59.319s
user	5m41.542s
sys	0m14.994s

And with another run also running at the same time:

real	1m36.393s
user	9m33.206s
sys	0m17.905s

User time counts time the same whether there's something on the other thread of your core stealing your CPU machinery or not, so look at the real time times. 59 seconds / (60 seconds + 36 seconds) = 61%, so if I run two runs simultaneously each runs at about 61% of the speed of a single run alone. If one single run could use all that compute it would be running at about 120% speed, or about 20% faster than it runs now.

@kuprel
Copy link
Owner

kuprel commented Jun 29, 2022

I merged your pull request that adds the line torch.set_num_threads(os.cpu_count()) but it still doesn't seem to be using all the threads on my M1 macbook. I agree there should be a way to better utilize the cpu. I'm working on converting it to CoreML so that it can use the neural engine

@interfect
Copy link
Contributor Author

Probably what we want is a --threads option to let the user specify thread count, and then a default detector that works right on both Linux and M1 Mac.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants