Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory usage during compilation #24

Closed
donatello opened this issue Jan 30, 2018 · 21 comments
Closed

Memory usage during compilation #24

donatello opened this issue Jan 30, 2018 · 21 comments

Comments

@donatello
Copy link

When I use either this package or wai-app-static to embed a file around 25MB in size (an executable to be served over http), I see that compilation takes over 3-4 GB of RAM and sometimes crashes my laptop.

Is this expected and can anything be done to make it more efficient? It seems that such high memory usage should not actually be needed during compliation.

@mgsloan
Copy link
Contributor

mgsloan commented Jan 30, 2018

This is probably a GHC issue. Ideally, Template Haskell should let you directly use a ByteArray or similar. Currently, it is run through a [Word8]. I'm surprised the memory usage would be that high, though.

@donatello
Copy link
Author

Here is a somewhat minimal demonstration of the issue - https://github.com/donatello/file-embed-exp

Should I report this as an issue with ghc?

@mgsloan
Copy link
Contributor

mgsloan commented Jan 30, 2018

Sure, I think it'd be reasonable. Unfortunately, base does not have any type like Bytestring or Text for TH to use. It'd make sense to have something that could use ByteArray# (probably wrapped in a datatype), and something that could use (Ptr Word8, Int) or something like that.

@donatello
Copy link
Author

I've opened an issue here - https://ghc.haskell.org/trac/ghc/ticket/14741

@steshaw
Copy link

steshaw commented Aug 9, 2018

@donatello Would be good to see a solution via GHC but in the meantime, maybe file-embed's inject system would suit your purpose? The closest thing I can find to a tutorial on it is here.

@hsyl20
Copy link
Contributor

hsyl20 commented Jan 15, 2019

I have documented an alternative approach that could be used to fix this issue: https://hsyl20.fr/home/posts/2019-01-15-fast-file-embedding-with-ghc.html

@MaxGabriel
Copy link

@hsyl20 Am I understanding the implications of your work correctly, that every time we use fileEmbed, even on the smallest of files, it costs over a second of compilation time? The github links in your post 404 so I wasn't sure what exactly was being tested.

@hsyl20
Copy link
Contributor

hsyl20 commented Mar 7, 2019

The links in my blog post should be fixed soon (i have merged some repos...). The test script was: https://github.com/haskus/packages/blob/ef893138620e8cea1b9ecb2983f5e38b997bd387/haskus-binary/bench.sh
The timings have been obtained with time stack exec -- ghc -v0 bench/BenchEmbed.hs -fforce-recomp -O and on my setup it took over a second even for the smallest test (but it may not be due to file-embed, my computer is quite slow).

Anyway I have made some patchs that have been integrated into GHC:

Thanks to them file-embed should be much faster in the next GHC release (when the native code generator is used, not LLVM).

I also have this merge request that I hope will be merged soon: https://gitlab.haskell.org/ghc/ghc/merge_requests/141
Once it is merged, file-embed could be patched to use the new Bytes primitive as can be seen here:
https://gitlab.haskell.org/ghc/ghc/merge_requests/141/diffs#e0300ef07066368947e579d6342b69d6d88f881c_0_18

Then embedding large files will only consists in reading them into a ByteString and passing the ByteString pointer to GHC via TH (file-embed side) and writing them back into another for inclusion into the binary (GHC side): no transformation into [Word8] anymore \o/.

@MaxGabriel
Copy link

That’s wonderful! Thank you for those patches @hysl20!

As far as the time to do a single file embed on a small file, thankfully it isn’t 1 second or anything. I removed about 20 from our codebase and saw a 1 second compile time reduction, and that could be just noise.

@hsyl20
Copy link
Contributor

hsyl20 commented Mar 8, 2019

The MR has been merged: https://gitlab.haskell.org/ghc/ghc/commit/224a6b864c6aa0d851fcbf79469e5702b1116dbc
Starting with GHC 8.10 (I think) we will be able to use the new approach in TH.

@saurabhnanda
Copy link

I was pointed to this package in a reply to a question I asked on Reddit.

I'm not too sure if embedding a web-app's static assets (HTML, CSS, JS) directly into the binary is a good idea. I am not aware of any other language/framework doing that, but I may be wrong.

In view of this issue (extremely high memory usage during compilation), would you recommend this approach? (We have 100s of static assets being using in our production app).

@saurabhnanda
Copy link

saurabhnanda commented May 8, 2020

Also, this got me thinking, can file embedding be achieved with a pre-processor? Will GHC choke if there is a module with very large strings, eg.

module StaticAssets where

index_js :: ByteString
index_js = "...." -- about 800kb of compressed/minified JS

-- and about 50-60 such top-level declarations

@hsyl20
Copy link
Contributor

hsyl20 commented May 9, 2020

file-embed should be patched as I proposed in [1] now that TH provides Bytes literals [2]. In my tests last year [1] I found that the impact on compilation is then minimal (with the native code generator).

@saurabhnanda using another pre-processor to generate large strings would likely be more costly. With the current approach we bypass the parser and with the approach in [1] we would also bypass TH to Haskell AST string conversion.

[1] https://gitlab.haskell.org/ghc/ghc/issues/14741#note_167616
[2] https://hackage.haskell.org/package/template-haskell-2.16.0.0/docs/Language-Haskell-TH-Lib.html#v:bytesPrimL

@saurabhnanda
Copy link

@hsyl20 is this the complete patch? If yes, I can try this out in the odd-jobs job-queue lbrary and even use this in production and report back.

< module Data.FileEmbed
---
> module FileEmbed
49,53c49
< #if MIN_VERSION_template_haskell(2,5,0)
<     , Lit (StringL, StringPrimL, IntegerL)
< #else
<     , Lit (StringL, IntegerL)
< #endif
---
>     , Lit (..)
60a57
> import Language.Haskell.TH
65a63
> import qualified Data.ByteString.Internal as B
154c152,156
< #if MIN_VERSION_template_haskell(2, 8, 0)
---
> #if MIN_VERSION_template_haskell(2, 15, 0)
>       `AppE` LitE (bytesPrimL (
>                 let B.PS ptr off sz = bs
>                 in  mkBytes ptr (fromIntegral off) (fromIntegral sz))))
> #elif MIN_VERSION_template_haskell(2, 8, 0)

However, we're using GHC 8.4.x in production and the core GHC patch has been applied ONLY to 8.10.x, right? What would it take to back-port the core GHC patch?

@hsyl20
Copy link
Contributor

hsyl20 commented May 10, 2020

is this the complete patch?

Yes. Except that:

> #if MIN_VERSION_template_haskell(2, 15, 0)

should be:

> #if MIN_VERSION_template_haskell(2, 16, 0)

However, we're using GHC 8.4.x in production and the core GHC patch has been applied ONLY to 8.10.x, right? What would it take to back-port the core GHC patch?

It shouldn't be too hard. You have to backport some parts of the following patches:

@snoyberg
Copy link
Owner

Is someone interested in submitting a PR for the changes here?

hsyl20 added a commit to hsyl20/file-embed that referenced this issue May 12, 2020
@hsyl20
Copy link
Contributor

hsyl20 commented May 12, 2020

Is someone interested in submitting a PR for the changes here?

I'll do it

hsyl20 added a commit to hsyl20/file-embed that referenced this issue May 12, 2020
hsyl20 added a commit to hsyl20/file-embed that referenced this issue May 12, 2020
@hsyl20
Copy link
Contributor

hsyl20 commented May 12, 2020

Done in #36

snoyberg added a commit that referenced this issue May 12, 2020
@qrilka
Copy link

qrilka commented Aug 7, 2020

Didn't #36 resolve this issue?

@tschuchortdev
Copy link

@hsyl20 @snoyberg What's the status on this? Can this library be used now with large files without incurring a significant performance hit since #36 has been merged or should a different approach be used?

@snoyberg
Copy link
Owner

I haven't tested this out myself, so I can't vouch one way or the other. But it looks like it should be fixed with #36. I'm going to close the issue (which I should have done over a year ago), if anyone encounters issues please ping again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants