Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stack ghc painfully slow #1671

Closed
ezrosent opened this issue Feb 5, 2017 · 46 comments
Closed

stack ghc painfully slow #1671

ezrosent opened this issue Feb 5, 2017 · 46 comments

Comments

@ezrosent
Copy link

ezrosent commented Feb 5, 2017

  • A brief description
    Managing haskell projects with the stack tool is unusable due to how slow it is.

  • Expected results
    (from a laptop running ubuntu 16.04)

time stack ghc -- --version
The Glorious Glasgow Haskell Compilation System, version 8.0.1

real    0m0.124s                                                                                                                                           
user    0m0.092s                                                                                                                                               
sys     0m0.036s
  • Actual results (with terminal output if applicable)
    On desktop running WSL
time stack ghc -- --version
The Glorious Glasgow Haskell Compilation System, version 8.0.1

real    0m50.520s
user    0m0.172s
sys     1m40.547s
  • Your Windows build number
    15025
  • Steps / All commands required to reproduce the error from a brand new installation
    After installation, need stack to pull in a version of GHC. This should do the trick.
stack setup
stack upgrade --install-ghc
time stack ghc -- --version
  • Strace of the failing command
    Generating the strace output (attached) inludes a few long (multi-second) waits on FUTEX_WAIT, as well as one for mmap.
  • Required packages and commands to install
    Install stack with the standard instructions

stack_ghc_strace.txt

@benhillis
Copy link
Member

This is on our backlog but is unlikely to make the Creators Update. I know we're planning on looking at this soon though.

For some context, I've looked at what causes this slowdown. For some reason stack has mapped an mind-bogglingly huge region of memory (I'm talking dozens of terabytes). When we fork we walk the entire address range to set up the new process's state. We have a design that should vastly speed this up, but we're approaching "pencils down" date for Creators Update.

@ezrosent
Copy link
Author

ezrosent commented Feb 6, 2017

Gotcha, thanks for the context!

@therealkenc
Copy link
Collaborator

Terabytes. That's awesome. Can't wait to see it in Resource Monitor.

@benhillis
Copy link
Member

I assume they're doing it to manage their own heap. It's a big "MAP_NORESERVE" region which Linux seems to intelligently handle since "allocate all the things" seems to be a common paradigm.

@therealkenc
Copy link
Collaborator

therealkenc commented Feb 7, 2017

This seems to be the related discussion over at ghc ticket 9706 here, for what it is worth. Quoth:

BTW, I found that I could mmap 100 TB with PROT_NONE (or even PROT_READ) and MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED with no measurable delay, and I can start 10000 such processes at once, so there doesn't seem to be any significant cost to setting up the page table mappings (at least on my system). Not sure why that is, exactly. The VSZ column in ps looks quite funny of course :)

So by my math that's 100×1012×104 = 1018 ≅ 260, which gets you in just under the wire. Or something.

@benhillis
Copy link
Member

Adding @stehufntdev because he's been looking into this as well.

@mckennapsean
Copy link

I have encountered a similar bug with plain ghc and pandoc. Really slow just to call and print out version info. Can confirm for slow-ring Insider Build & the Windows Preview v.15.15014 (using the free VM).

I found this slowdown by installing ghc v.8.0.2 (anything v.7.10 and below was fast) or by installing pandoc v.1.18 & above. See directions to install ghc or install pandoc for testing. If needed, I can provide a simple set of commands to reproduce.

They both run similarly slow/delayed for me on both systems, but I have not seen reports from other *nix users seeing similar slowdowns, so I am guessing this is WSL related.

@sukhmel
Copy link

sukhmel commented Feb 20, 2017

This does not require Stack to replicate, GHC compiler alone is enough. I'm experiencing the same dreadfully slow compiler work. Besides, programs compiled with 8.0.x are slow too.

  1. wget http://downloads.haskell.org/~ghc/8.0.2/ghc-8.0.2-x86_64-deb8-linux.tar.xz
  2. tar -xJf ghc-8.0.2-x86_64-deb8-linux.tar.xz
  3. cd ghc-8.0.2
  4. ./configure --prefix=/tmp/ghc
  5. make install
  6. time /tmp/ghc/bin/ghc -e 'putStrLn ""'

@RyanGlScott
Copy link

@benhillis: I believe you've bumped into GHC 8.0's new block-structured heap for 64-bit platforms. From the GHC 8.0.1 release notes:

We have a shiny new two-step memory allocator for 64-bit platforms (see Trac #9706). In addition to simplifying the runtime system’s implementation this may significantly improve garbage collector performance. Note, however, that Haskell processes will have an apparent virtual memory footprint of a terabyte or so. Don’t worry though, most of this amount is merely mapped but uncommitted address space which is not backed by physical memory.

@benhillis
Copy link
Member

@RyanGlScott - I suspect you are right. We need to modify the way our memory manager keeps track of uncommitted pages.

I'd be very curious to see some performance measurements on how much better their allocator performs versus raw mmap / munmap calls.

@therealkenc
Copy link
Collaborator

I was going to quip that curiosity too, but stuck with "awesome" instead. So you benchmark 8.0 and find out it is some percent faster than 7.0. Or just as fast, but simpler. But you end up demonstrating not much in the exercise. The Haskell guys seem okay with a hello world app asking for a terabyte of virtual memory. The Chakra guys seem okay with asking for 32GB to print hello, and if you are going to do that, [expletive], why not ask for a TB. I am still academically interested in how they arrived at 32GB. Why not 64GB or 128GB? Certainly not because "that would be crazy".

It's working code. Smart people thought it was a good idea. Shrug. What you gonna do except sigh and re-work the memory manager.

@RyanGlScott
Copy link

FWIW, Golang also does something similar by reserving a contiguous chunk of 512 GB of memory (see this comment).

I'm certainly not qualified enough to say how they came up with that number, other than that it's a power of two and—to use their words—"512 GB (MaxMem) should be big enough for now".

@pechersky
Copy link

pechersky commented Apr 19, 2017

I have a workaround in the meantime, based on the discussion in https://ghc.haskell.org/trac/ghc/ticket/13304. It involves compiling your own GHC which does not utilize the large address space allocation, and then using that as the GHC for your stack builds. My workaround relies on then supplying that GHC as the common GHC for your projects. In my example, I will recompile GHC 8.0.2 using whatever ghc you already have on your system. I will also make sure that Cabal is installed using this GHC -- otherwise, installing other packages will fall under the same problem of slowness. I suggest cleaning your ~/.stack and other stack directories to make sure you don't have any GHC lying around with the large-allocation functionality.

To fix, in the bash environment, I ran

# install necessary prereqs if not there
sudo apt-get install ghc happy alex
cd
git clone -b ghc-8.0.2-release --recursive git://git.haskell.org/ghc.git ghc-8.0.2
cd ghc-8.0.2
./boot
./configure --disable-large-address-space #can set --prefix=... here
make -j8 #-j(number-of-threads)
sudo make install
sudo ln -s /usr/local/bin/ghc ~/.local/bin/ghc-8.0.2 #or wherever your prefix put the binaries
# link the rest of the binaries, like runghc, ghci, etc
# this is to make sure the "system-ghc" is properly called
echo "system-ghc: true" >> ~/.stack/config.yaml
cd
# optional Cabal and cabal-install reinstallation to conform to new ghc
stack install Cabal
stack install cabal-install

Now you can do your stack install and stack build in your projects, using the specially compiled GHC.

You can monitor the VIRT usage with something like top or htop. Try stack exec ghci and monitor VIRT before and after.

@sgraf812
Copy link

Don't you also have to recompile stack for this?

@pechersky
Copy link

@sgraf812 In my use cases, I have not had to recompile stack. If I understand correctly, stack itself never builds anything, just calls the appropriate ghc to do so, through the project-level or system-level ghc (ghci, ghc-through-cabal, etc). This issue only appears during builds, so as long as the ghc that stack uses is fine, stack itself should be fine. Monitoring the path of the ghc binary using htop during a build step might help diagnose what ghc is being used if you still see the 1TB VIRT allocs.

@TerrorJack
Copy link

@pechersky This issue affects not only GHC 8, but also anything compiled with it (stack, pandoc, etc). The official binaries provided by stack developers happen to run fine because the latest release version is built with lts-6.25 and uses ghc-7.10.3.

@pechersky
Copy link

@TerrorJack Thank you for clarifying that for me. My work-around fixes the "stack ghc is slow" issue, as well as the @sukhmel MWE. I did rebuild Cabal in my workflow. Regarding pandoc, I would delegate to the example in their docs as "http://pandoc.org/installing.html#quick-stack-method". AFAIK stack just delegates builds to ghc, pandoc, etc, so as long as those are stack install after supplying the fixed GHC, I think you should be fine. You could also rebuild stack from source.

@TerrorJack
Copy link

@pechersky Also, the stack install Cabal step is not necessary. I'm working with GHC HEAD, and directly installed ghc to ~/.stack/programs/.... (using the --prefix= flag), then compiling Haskell projects using stack work out of the box. I guess regular GHC releases shall work the same.

@pechersky
Copy link

@TerrorJack The stack install Cabal outside of a project was in case someone wanted to use stack solver, which falls back to Cabal to inspect the .cabal file, calculate the build plan, and so on.

@TerrorJack
Copy link

@pechersky stack solver uses cabal-install (by invoking it and parsing the output). So in fact we need stack install cabal-install (or installing cabal-install by some other means)

@pechersky
Copy link

I have updated the code above to include your suggestion, @TerrorJack. According to https://docs.haskellstack.org/en/stable/faq/#what-is-the-relationship-between-stack-and-cabal, both the lib (... Cabal) and the executable (... cabal-install) are used. To be on the safe side, one could (re)install both.

@benhillis
Copy link
Member

@Roman2K - Thanks for the information. I'm glad that it's much more usable for you, but we still do have a long way to go. We're looking into ways to improve base NTFS speed to help bring Windows filesystem performance more in line with Linux.

@therealkenc
Copy link
Collaborator

therealkenc commented Aug 1, 2017

I have anecdotally found the same problem with tar, which is (I think) separate from the huge memory allocation slowness. When I untar a large tarball (let's say 10GB) in a Linux VM, it returns almost immediately, because I have 20GB of RAM assigned to the VM and it all ends up in cache at near memcpy() speed. With WSL it seems to rate limit on writes to disk. I did not report it because I don't untar large files that often, and limiting on writes to disk is hard to prove these days without low level instrumentation (ugh, effort). But from the blinkenlights it looks like that is what's happening. It doesn't seem to be a CPU limiting thing, because of inefficient stat() calls per the git slowness complaints, say.

[edit] Another data point is sync never seems to do anything in WSL. With the same 10GB untar in a VM, sync takes countable time to flush the cache.

@jstarks
Copy link
Member

jstarks commented Dec 19, 2017

We have improved mmap performance further in insider build 17063. I believe this makes stack ghc bearable to use now :).

@AaronFriel
Copy link

Thank you! I can attest to the significant improvement.

@cemerick
Copy link

Another anecdote: I opted into 17074 hoping to get acceptable working conditions, but stack setup took exactly an hour to complete (even with windows defender temporarily disabled). For comparison, stack setup under cmd.exe starting from scratch took ~4 minutes. I'll give working with the result a shot, but it doesn't look promising.

Sorry for the negative report; keep doing great work, you'll get there. ❤️

@hvr
Copy link

hvr commented Feb 14, 2018

In the hopes this may be useful to somebody here: I've set up a GHC PPA optimised for WSL (i.e. built with --disable-large-address-space) over at

https://launchpad.net/~hvr/+archive/ubuntu/ghc-wsl

It should merely be a matter of

sudo add-apt-repository ppa:hvr/ghc-wsl
sudo apt-get update
sudo apt-get install ghc-8.2.2-prof cabal-install-head

and then simply prepending /opt/ghc/bin/ to your $PATH env-var.

@cboudereau
Copy link

I would like to use VSCode on Windows + WSL + Stack ghci but due to this problem, it is really slow.

I will check if recompiling a custom ghc without large address space allocation is better or not. Thanks @pechersky !

@therealkenc
Copy link
Collaborator

nb: GHC is "still slow" (as it were) but this was deemed fixedininsiderbuilds back in July 2017, and finally made its way into the April Update.

simnalamburt added a commit to simnalamburt/.dotfiles that referenced this issue Dec 18, 2019
1.  Use cascadia code instead of Consolas.

2.  All Windows versions before Windows Build 16215 is now EOL, so "use
    Windows 16215 or later" is now redundant.

    Reference: https://support.microsoft.com/en-us/help/13853/windows-lifecycle-fact-sheet

3.  Use https://github.com/microsoft/terminal instead of wsltty.

4.  `unsetopt BG_NICE` is not necessary anymore.

5.  microsoft/WSL#1671 is fixed now.

6.  Add comments to the steps which seem unnecessary
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests