Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Out of memory on a big query #82

Open
Raveline opened this issue Dec 23, 2018 · 11 comments
Open

Out of memory on a big query #82

Raveline opened this issue Dec 23, 2018 · 11 comments
Labels

Comments

@Raveline
Copy link
Collaborator

I'm trying to compile a query returning 37 fields, using 14 joins, 1 single where clause, a group by on 33 fields, and a order by on 4 fields. Sadly, I get "unable to commit 745537536 bytes of memory" when ghc is trying to compile the module containing the query. (I cannot post the query for IP reasons, sorry)

Do you have any idea of what I could do to help the compiler on this ?

@echatav
Copy link
Contributor

echatav commented Dec 24, 2018

Oh no :-( that’s not good. I tried Googling the error message but nothing useful came out. You could try putting the query alone in its own module or somehow giving GHC more memory (swap space?) to work with. Squeal uses type level lists which are quite inefficient when calculating Join and Has and the rest. At runtime all that inefficiency should completely go away but compile time is a different story. If I had an equivalent example I could investigate more thoroughly.

@Raveline
Copy link
Collaborator Author

Putting the query alone in its own module doesn't seem to make much of a difference. I still have to check if splitting in several functions helps (though it probably shouldn't !).
However, a colleague with more experience in GHC suggested I add a pragma on the file containing the query:

{-# OPTIONS_GHC -fno-specialise -fno-full-laziness  #-}

It still consumes 3.5 GB but that's already way more manageable.

@Raveline Raveline assigned Raveline and unassigned Raveline Dec 24, 2018
@adfretlink
Copy link

Some more information about this, thanks to the remarkable investigative work done by @haitlahcen.
There are two issues at hands:

  • One with Stack and its use of dump-hi files (a non-binary version of GHC's hi), leading to very big files being printed out, with very high memory usage.
  • One with GHC systematically unfolding types, which takes a lof of memory for the types we use in Squeal.

The current workaround, rather than the --fno-specialise and -fno-full-laziness is to use -fomit-interface-pragmas but @haitlahcen is doing his best to solve the issues in both Stack and GHC. See his issue here for more information: https://ghc.haskell.org/trac/ghc/ticket/8095#comment:58.

For current Squeal users with problematic compilation time and memory usage, -fomit-interface-pragmas is probably the best current solution.

@echatav
Copy link
Contributor

echatav commented Jan 25, 2019

Wow! Thanks so much @adfretlink and @haitlahcen ! This is great. Sorry Squeal stresses GHC out so much.

@haitlahcen
Copy link

Hey! I've opened an issue for stack as well

@ilyakooo0
Copy link
Contributor

ilyakooo0 commented Oct 16, 2019

Manually unrolling recursive type families should radically improve compile time and memory usage.

Might open a PR today.

@adfretlink
Copy link

Small update on this topic: we've just squashed our migrations, redifining our Schema as if it was the initial one. We had around ~30 migrations over it. Compilation time for the project went from 40 minutes to 7 ! So there's at least a lead as to the "main culprit" of compilation cost.

@echatav
Copy link
Contributor

echatav commented Nov 14, 2019

Pretty interesting. I wonder what would happen with aggressive use of partial type signatures. If all intermediate schemas are wild-carded _, and only the initial and final schemas are explicitly typed, I wonder if that would help both from a compilation efficiency perspective and a code cleanliness perspective...

@adfretlink
Copy link

How would we do this ?

Something like:

type Base = -- some schema

type AddATable = Create "myTable" ('Table SomeTable) _

type FinalMig = Alter "myTable" ('Table SomeTableV2) AddATable

But how would GHC be able to fetch the order migrations properly ultimately ?

@echatav
Copy link
Contributor

echatav commented Nov 22, 2019

The way I do it in my projects is I have a directory structure like

Schema.hs
Schema/V0.hs
Schema/V1.hs
Schema/V2.hs
..

where each V{n}.hs has a SchemasType called DB (or Schemas) and for n > 0

setup :: Definition V{n-1}.DB DB
teardown :: Definition DB V{n-1}.DB
migration :: Migration Definition V{n-1}.DB DB

and Schema.hs has a

migrations :: AlignedList (Migration Definition) V0.DB V{max}.DB
migrations = V1.migration :>> .. :>> V{max}.migration :>> Done

and re-exports V{max}.DB.
And every other module imports the DB from Schema.hs.

Now, we shouldn't need to define any of the intermediate DBs between V0.DB and V{max}.DB because they should all be inferable and nowhere else referenced. I don't know if that would speed up or slow down or have no effect on compilation time, but it would cut down on some redundancy. I haven't settled on best practice for migrations over time yet. I read this review of Beam's migration system which was pretty negative. Some of the critiques might apply to Squeal as well. I'm a little worried that migrations in Squeal are redundant and cause compilation time issues.

gasi added a commit to zoomhub/zoomhub that referenced this issue Feb 28, 2021
Attempt to work around OOM.

See: morphismtech/squeal#82
gasi added a commit to zoomhub/zoomhub that referenced this issue Feb 28, 2021
Attempt to work around OOM.

See: morphismtech/squeal#82
gasi added a commit to zoomhub/zoomhub that referenced this issue Feb 28, 2021
Attempt to work around OOM.

See: morphismtech/squeal#82
@gasi
Copy link
Contributor

gasi commented Feb 28, 2021

@adfretlink Thank you for documenting the workaround using {-# OPTIONS_GHC -fomit-interface-pragmas #-}. I had to use this + globally disabling optimizations using stack build --ghc-options='-O0' to have it pass on the CircleCI free tier (4GB of RAM) without running out of memory.

In case anyone needs a repro, here’s a PR on my open source project that exhibits this problem:
zoomhub/zoomhub#158

@echatav Thanks for documenting how you organize your schema migrations. I ended up doing something similar on my own but it’s nice to see it being validated: https://github.com/zoomhub/zoomhub/tree/69f420ee9f2d6b88392cfa2657948e1c2c74db30/src/ZoomHub/Storage/PostgreSQL/Schema

@gasi gasi mentioned this issue Mar 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants