Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory usage during compilation #24

Open
donatello opened this issue Jan 30, 2018 · 19 comments
Open

Memory usage during compilation #24

donatello opened this issue Jan 30, 2018 · 19 comments

Comments

@donatello
Copy link

@donatello donatello commented Jan 30, 2018

When I use either this package or wai-app-static to embed a file around 25MB in size (an executable to be served over http), I see that compilation takes over 3-4 GB of RAM and sometimes crashes my laptop.

Is this expected and can anything be done to make it more efficient? It seems that such high memory usage should not actually be needed during compliation.

@mgsloan
Copy link
Contributor

@mgsloan mgsloan commented Jan 30, 2018

This is probably a GHC issue. Ideally, Template Haskell should let you directly use a ByteArray or similar. Currently, it is run through a [Word8]. I'm surprised the memory usage would be that high, though.

@donatello
Copy link
Author

@donatello donatello commented Jan 30, 2018

Here is a somewhat minimal demonstration of the issue - https://github.com/donatello/file-embed-exp

Should I report this as an issue with ghc?

@mgsloan
Copy link
Contributor

@mgsloan mgsloan commented Jan 30, 2018

Sure, I think it'd be reasonable. Unfortunately, base does not have any type like Bytestring or Text for TH to use. It'd make sense to have something that could use ByteArray# (probably wrapped in a datatype), and something that could use (Ptr Word8, Int) or something like that.

@donatello
Copy link
Author

@donatello donatello commented Jan 30, 2018

I've opened an issue here - https://ghc.haskell.org/trac/ghc/ticket/14741

@steshaw
Copy link

@steshaw steshaw commented Aug 9, 2018

@donatello Would be good to see a solution via GHC but in the meantime, maybe file-embed's inject system would suit your purpose? The closest thing I can find to a tutorial on it is here.

@hsyl20
Copy link
Contributor

@hsyl20 hsyl20 commented Jan 15, 2019

I have documented an alternative approach that could be used to fix this issue: https://hsyl20.fr/home/posts/2019-01-15-fast-file-embedding-with-ghc.html

@MaxGabriel
Copy link

@MaxGabriel MaxGabriel commented Mar 6, 2019

@hsyl20 Am I understanding the implications of your work correctly, that every time we use fileEmbed, even on the smallest of files, it costs over a second of compilation time? The github links in your post 404 so I wasn't sure what exactly was being tested.

@hsyl20
Copy link
Contributor

@hsyl20 hsyl20 commented Mar 7, 2019

The links in my blog post should be fixed soon (i have merged some repos...). The test script was: https://github.com/haskus/packages/blob/ef893138620e8cea1b9ecb2983f5e38b997bd387/haskus-binary/bench.sh
The timings have been obtained with time stack exec -- ghc -v0 bench/BenchEmbed.hs -fforce-recomp -O and on my setup it took over a second even for the smallest test (but it may not be due to file-embed, my computer is quite slow).

Anyway I have made some patchs that have been integrated into GHC:

Thanks to them file-embed should be much faster in the next GHC release (when the native code generator is used, not LLVM).

I also have this merge request that I hope will be merged soon: https://gitlab.haskell.org/ghc/ghc/merge_requests/141
Once it is merged, file-embed could be patched to use the new Bytes primitive as can be seen here:
https://gitlab.haskell.org/ghc/ghc/merge_requests/141/diffs#e0300ef07066368947e579d6342b69d6d88f881c_0_18

Then embedding large files will only consists in reading them into a ByteString and passing the ByteString pointer to GHC via TH (file-embed side) and writing them back into another for inclusion into the binary (GHC side): no transformation into [Word8] anymore \o/.

@MaxGabriel
Copy link

@MaxGabriel MaxGabriel commented Mar 7, 2019

That’s wonderful! Thank you for those patches @hysl20!

As far as the time to do a single file embed on a small file, thankfully it isn’t 1 second or anything. I removed about 20 from our codebase and saw a 1 second compile time reduction, and that could be just noise.

@hsyl20
Copy link
Contributor

@hsyl20 hsyl20 commented Mar 8, 2019

The MR has been merged: https://gitlab.haskell.org/ghc/ghc/commit/224a6b864c6aa0d851fcbf79469e5702b1116dbc
Starting with GHC 8.10 (I think) we will be able to use the new approach in TH.

@saurabhnanda
Copy link

@saurabhnanda saurabhnanda commented May 8, 2020

I was pointed to this package in a reply to a question I asked on Reddit.

I'm not too sure if embedding a web-app's static assets (HTML, CSS, JS) directly into the binary is a good idea. I am not aware of any other language/framework doing that, but I may be wrong.

In view of this issue (extremely high memory usage during compilation), would you recommend this approach? (We have 100s of static assets being using in our production app).

@saurabhnanda
Copy link

@saurabhnanda saurabhnanda commented May 8, 2020

Also, this got me thinking, can file embedding be achieved with a pre-processor? Will GHC choke if there is a module with very large strings, eg.

module StaticAssets where

index_js :: ByteString
index_js = "...." -- about 800kb of compressed/minified JS

-- and about 50-60 such top-level declarations
@hsyl20
Copy link
Contributor

@hsyl20 hsyl20 commented May 9, 2020

file-embed should be patched as I proposed in [1] now that TH provides Bytes literals [2]. In my tests last year [1] I found that the impact on compilation is then minimal (with the native code generator).

@saurabhnanda using another pre-processor to generate large strings would likely be more costly. With the current approach we bypass the parser and with the approach in [1] we would also bypass TH to Haskell AST string conversion.

[1] https://gitlab.haskell.org/ghc/ghc/issues/14741#note_167616
[2] https://hackage.haskell.org/package/template-haskell-2.16.0.0/docs/Language-Haskell-TH-Lib.html#v:bytesPrimL

@saurabhnanda
Copy link

@saurabhnanda saurabhnanda commented May 10, 2020

@hsyl20 is this the complete patch? If yes, I can try this out in the odd-jobs job-queue lbrary and even use this in production and report back.

< module Data.FileEmbed
---
> module FileEmbed
49,53c49
< #if MIN_VERSION_template_haskell(2,5,0)
<     , Lit (StringL, StringPrimL, IntegerL)
< #else
<     , Lit (StringL, IntegerL)
< #endif
---
>     , Lit (..)
60a57
> import Language.Haskell.TH
65a63
> import qualified Data.ByteString.Internal as B
154c152,156
< #if MIN_VERSION_template_haskell(2, 8, 0)
---
> #if MIN_VERSION_template_haskell(2, 15, 0)
>       `AppE` LitE (bytesPrimL (
>                 let B.PS ptr off sz = bs
>                 in  mkBytes ptr (fromIntegral off) (fromIntegral sz))))
> #elif MIN_VERSION_template_haskell(2, 8, 0)

However, we're using GHC 8.4.x in production and the core GHC patch has been applied ONLY to 8.10.x, right? What would it take to back-port the core GHC patch?

@hsyl20
Copy link
Contributor

@hsyl20 hsyl20 commented May 10, 2020

is this the complete patch?

Yes. Except that:

> #if MIN_VERSION_template_haskell(2, 15, 0)

should be:

> #if MIN_VERSION_template_haskell(2, 16, 0)

However, we're using GHC 8.4.x in production and the core GHC patch has been applied ONLY to 8.10.x, right? What would it take to back-port the core GHC patch?

It shouldn't be too hard. You have to backport some parts of the following patches:

@snoyberg
Copy link
Owner

@snoyberg snoyberg commented May 12, 2020

Is someone interested in submitting a PR for the changes here?

hsyl20 added a commit to hsyl20/file-embed that referenced this issue May 12, 2020
@hsyl20
Copy link
Contributor

@hsyl20 hsyl20 commented May 12, 2020

Is someone interested in submitting a PR for the changes here?

I'll do it

hsyl20 added a commit to hsyl20/file-embed that referenced this issue May 12, 2020
hsyl20 added a commit to hsyl20/file-embed that referenced this issue May 12, 2020
@hsyl20
Copy link
Contributor

@hsyl20 hsyl20 commented May 12, 2020

Done in #36

snoyberg added a commit that referenced this issue May 12, 2020
Use TH 2.16 Bytes literal (#24)
@qrilka
Copy link

@qrilka qrilka commented Aug 7, 2020

Didn't #36 resolve this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
8 participants