-
-
Notifications
You must be signed in to change notification settings - Fork 257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Doesn't use both threads on each core when running on CPU #27
Comments
I tried running two runs at the same time, and I got about the performance hit I would expect from having something on the other thread in a core, and I saw my CPU get fully utilized. I don't think using both threads in each core would really double performance, but it would increase it by a fair bit.
And with another run also running at the same time:
User time counts time the same whether there's something on the other thread of your core stealing your CPU machinery or not, so look at the real time times. 59 seconds / (60 seconds + 36 seconds) = 61%, so if I run two runs simultaneously each runs at about 61% of the speed of a single run alone. If one single run could use all that compute it would be running at about 120% speed, or about 20% faster than it runs now. |
I merged your pull request that adds the line |
Probably what we want is a |
I'm using a machine with an 8 core, 16 thread CPU, and plenty of CPU memory, but no GPU compatible with current ML toolkits; ROCm dropped support for my hardware a while ago. So I want to run this on CPU as efficiently as I can.
Unfortunately, it seems like I am only getting
about halfabout 80% of the performance I think I ought to be out of the CPU backend.When I run
time python image_from_text.py --text='alien life' --seed=7 --no-torch
, it seems to only be able to use one thread on each two-thread core, gets up to about 500% CPU inhtop
, and reports:When I run
time python image_from_text.py --text='alien life' --seed=7 --torch
, I did manage to catch it at more like 700% CPU. It runs a bit faster but still doesn't seem to be fully using my CPU:I also get this different and much more terrifying image; I figured the same seed would produce the same result with both engines, but I was wrong.
Anyway, I would expect CPU usage to be closer to 16000%, and
user
+sys
times to be more like 16xreal
times, if I was actually managing to use both threads on each of the 8 physical cores at full tilt.Is there something about the backend that is causing it to only try and use one thread per full core, and not one thread per hardware thread? Is that something that I can change?
The text was updated successfully, but these errors were encountered: