Performance reduced when running on multi-core ( which is by default ) #159
Comments
Thanks for the issue! Make sure it satisfies this checklist. My human colleagues will appreciate it! Here is what to expect next, and if anyone wants to comment, keep these things in mind. |
The compiler already should be taking advantage of the cores you have. Here are some things that may be relevant:
That said, I need more information to improve things in a directed way. CPU may not actually be the core issue. Maybe a bunch more RAM is used in one case? And maybe GC is causing the CPU usage as a result? Maybe transferring data between all the different cores is costly, so maybe there's some way to get information on that? So here's the plan. When 0.19 alpha is out, please test this again and see if the issue persists. In the meantime, if you find any additional information, let me know about it in an organized way. (I.e. work through it on slack, and prefer one coherent and concise comment over ten scattered comments as you learn more.) Again, the compiler is designed to use as many cores as possible already, so something weird must be going on. Ultimately, I want the compiler to be super fast, so thanks for reporting this and helping make sense of this! |
@evancz Thanks for the response.
The
When I run I'm not a Haskell programmer, so I don't know where to look at. Let me know what tests you'd like me to run, and I'm happy to help on that. |
Cool, yeah, the If you want to look into it more now, please ask @eeue56 for help on slack. We can proceed without Haskell stuff. For example, knowing about memory usage during compilation can help. Maybe it uses 10mb on your laptop and 100mb on your PC. That would be interesting and helpful to know. Knowing about cache misses may help as well. That kind of thing. I suspect I'll need to get on a machine like yours to test things out, but exploring these things may be helpful or personally interesting nonetheless! |
As you can see in the screenshots, there's plenty of memory available. In theory, how much would every extra core speed up the compile time? |
Regarding cache misses
|
We took a look through this on Slack. tl;dr:
Discussed with @AntouanK, it's currently at a "liveable" state for them (decent elm-make times). So we will take a look again after 0.19 is released and try to dig in a bit better then. Some numbers from my chromebook:With two cores enabled,
With a single core enabled,
|
Just for comparison on a beefier CPU ( again on the With 16 cores enabled ( default )
With a single core enabled,
|
Okay, after thinking about it more, I think it makes sense to have a meta issue to try to track the problem. It looks like it's related to some general problem that we haven't been able to pin down yet. I'm not sure where the meta issue should live, so I'm just going to leave it for now. |
Could it be related to this many-core Haskell runtime issue? It might be worth trying running |
I'm encountering the same problem. I'm using Ubuntu 16.04.2 LTS 64-bit on a Dell XPS15 (16GB RAM, i7-7700HQ CPU @ 2.80GHz × 8). Here's the output of
|
I dove deeper into this. It turns out GHC has some, ahem, conservative default garbage collection settings for parallelized workloads. (Example: the nursery gets a whole megabyte!) Simon Marlow and others have looked into this and have tuned the default settings for the next release of GHC 8. Fortunately, we can tune things right now using flags in tl;drTry adding What this does
Bonus perf boost for LinuxFor Linux builds (not sure if this is a no-op or causes problems on non-Linux systems), Simon Marlow suggests adding the Further ReadingI have read the documentation for the If we remove Longer VersionI put the long version of what I learned into a gist. |
thank you @rtfeldman for looking into this. If I remember correctly, compiling elm-make, is not a one line thing. |
So, I ran almost all the combinations suggested with different values. I've put the most useful raw numbers into this gist, benchmarking against elm-css on a Thinkpad T470s. Conclusion: the best option is Make sure to use tl;dr
|
@eeue56 How many cores does that machine have? |
It is this CPU: https://ark.intel.com/products/97466/Intel-Core-i7-7600U-Processor-4M-Cache-up-to-3_90-GHz So, in terms of this, 4 cores (2 real cores, 2 more from HT). If you look at the gist I made, it has all the combinations that had any form of noticeable impact - in order to build:
If you have other questions, this discussion is better asked on #elm-dev on Slack |
@AntouanK if you do build those, can you save the different binaries you make so we can post them somewhere for others to try? A higher sample size will give us more confidence! |
Thanks @eeue56 |
@rtfeldman Seems like there should be an easy script for this.
so should we just make a script instead? |
@AntouanK uploading the binaries would be fantastic! I think that would definitely be the easiest for others. 😄 Since
Among these, I think it'd be most useful o try Saving binaries for anything you do there would be great as well! |
@rtfeldman With my poor scripting skills, I got a loop going. So far, seems like the flag doesn't make a big difference.
|
Added results for
|
Hm, this is very strange. As I recall @eeue56 was seeing like 2-3 second build times on I wonder what the reason is for the discrepancy. 🤔 |
That's with I tried it again. |
I did some more digging around in the docs, and found some more potential answers as to why single-core is outperforming multicore.
|
As an aside, I'm curious if this is why macOS programs are doing better: maybe GHC thinks they all have 1 core. I ran this on my MacBook Pro (2 physical cores + 2 HT): module Main where
import Control.Concurrent
main :: IO ()
main = do
capabilities <- getNumCapabilities
putStrLn $ show capabilities It printed Curious what others see when running this. |
Nope, that's not it. I also ran it on nixOS with 8 physical cores (Ryzen) and it also printed |
@rtfeldman, I get the same on my 8-core AMD under nixOS. Perhaps it's set to 1 by default everywhere, unless overridden with
|
@jmitchell yep, I just confirmed that - it prints out whatever |
This is even more confusing then. Why would |
Ah, because |
Ahh, that's why! I learned from Brian McKenna on Twitter that if you pass Then I set up a test on Travis to see if maybe |
There is also the chance that the cause for the stackoverflow issue #164 is related, if the configured values are relative to the total amount of available memory on a machine, too. |
For future posterity: building https://downloads.haskell.org/~ghc/7.8.4/docs/html/users_guide/runtime-control.html |
@zwilias Any chance to get this flag in by default? (Any implications?) Otherwise a workaround would have to rely on an elm-make fork. |
I want to make elm/compiler#1473 the canonical issue for this. It probably makes sense to summarize the things that need to happen into a meta issue though. I will coordinate with @eeue56 and @zwilias to get a list of TODO items that should be in that. |
Hi.
Noticed this issue when I was working in parallel on the same project, on my PC and my MacBook Pro.
For some weird reason, a 3 year-old MBP, was compiling faster than a brand new 16-core PC.
For example:
MBP:
PC:
After the tip from
@eeue56
on the elm slack channel, to use "sysconfcpus", I saw a huge boost.On the Ryzen PC with linux, with one core ( so with "sysconfcpus -n 1" ) I can run "make build" on ~10.4 seconds!
( on the mac "sysconfcpus -n 1" makes no difference )
So, how come the same process is ~50% slower when running on 16 cores, than running on one?
Is there anything I can do to make the compiler take advantage on the multiple cores?
Thanks.
The text was updated successfully, but these errors were encountered: