Doesn't use both threads on each core when running on CPU #27

interfect · 2022-06-29T01:48:47Z

I'm using a machine with an 8 core, 16 thread CPU, and plenty of CPU memory, but no GPU compatible with current ML toolkits; ROCm dropped support for my hardware a while ago. So I want to run this on CPU as efficiently as I can.

Unfortunately, it seems like I am only getting ~~about half~~ about 80% of the performance I think I ought to be out of the CPU backend.

When I run time python image_from_text.py --text='alien life' --seed=7 --no-torch, it seems to only be able to use one thread on each two-thread core, gets up to about 500% CPU in htop, and reports:

real	1m9.092s
user	5m16.416s
sys	0m10.788s

When I run time python image_from_text.py --text='alien life' --seed=7 --torch, I did manage to catch it at more like 700% CPU. It runs a bit faster but still doesn't seem to be fully using my CPU:

real	0m51.015s
user	4m54.248s
sys	0m14.651s

I also get this different and much more terrifying image; I figured the same seed would produce the same result with both engines, but I was wrong.

Anyway, I would expect CPU usage to be closer to 16000%, and user + sys times to be more like 16x real times, if I was actually managing to use both threads on each of the 8 physical cores at full tilt.

Is there something about the backend that is causing it to only try and use one thread per full core, and not one thread per hardware thread? Is that something that I can change?

The text was updated successfully, but these errors were encountered:

interfect · 2022-06-29T02:09:14Z

I tried running two runs at the same time, and I got about the performance hit I would expect from having something on the other thread in a core, and I saw my CPU get fully utilized. I don't think using both threads in each core would really double performance, but it would increase it by a fair bit.

time python image_from_text.py --text='alien life' --seed=8 --torch running alone:

real	0m59.319s
user	5m41.542s
sys	0m14.994s

And with another run also running at the same time:

real	1m36.393s
user	9m33.206s
sys	0m17.905s

User time counts time the same whether there's something on the other thread of your core stealing your CPU machinery or not, so look at the real time times. 59 seconds / (60 seconds + 36 seconds) = 61%, so if I run two runs simultaneously each runs at about 61% of the speed of a single run alone. If one single run could use all that compute it would be running at about 120% speed, or about 20% faster than it runs now.

kuprel · 2022-06-29T03:03:00Z

I merged your pull request that adds the line torch.set_num_threads(os.cpu_count()) but it still doesn't seem to be using all the threads on my M1 macbook. I agree there should be a way to better utilize the cpu. I'm working on converting it to CoreML so that it can use the neural engine

interfect · 2022-06-29T13:54:31Z

Probably what we want is a --threads option to let the user specify thread count, and then a default detector that works right on both Linux and M1 Mac.

interfect mentioned this issue Jun 29, 2022

Use all logical cores in Torch mode #28

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Doesn't use both threads on each core when running on CPU #27

Doesn't use both threads on each core when running on CPU #27

interfect commented Jun 29, 2022 •

edited

Loading

interfect commented Jun 29, 2022

kuprel commented Jun 29, 2022

interfect commented Jun 29, 2022

Doesn't use both threads on each core when running on CPU #27

Doesn't use both threads on each core when running on CPU #27

Comments

interfect commented Jun 29, 2022 • edited Loading

interfect commented Jun 29, 2022

kuprel commented Jun 29, 2022

interfect commented Jun 29, 2022

interfect commented Jun 29, 2022 •

edited

Loading