The Haskell performance checklist
You have a Haskell program that's not performing how you'd like. Use this list to check that you've done the usual steps to performance nirvana:
Are you compiling it?
Running code in GHCi's interpreter will always be much slower than compiling it to a binary.
Make sure you're compiling your code with
Are you compiling with -Wall?
GHC warns about type defaults and missing type signatures:
- If you let GHC default integers, it will choose
Integer. This is 10x slower than
Int. So make sure you explicitly choose your types.
- You should have explicit types to not miss something obvious in the types that is slow.
Are you compiling with
-O or above?
By default GHC does not optimize your programs. Cabal and Stack enable
this in the build process. If you're calling
ghc directly, don't
forget to add
-O2 for serious, non-dangerous optimizations.
Have you run your code with the profiler?
Profiling is the standard way to see for expressions in your program:
- How many times they run?
- How much do they allocate?
Resources on profiling:
Did you try weighing your operations?
Check that your operations aren't allocating too much or more than you'd expect:
Allocating in GC is claimed to be "fast" but not allocating is always faster.
Have you checked for stack space leaks?
Most space leaks result in an excess use of stack. If you look for the part of the program that results in the largest stack usage, that is the most likely space leak, and the one that should be investigated first.
Resource on stack space leak:
Have you setup an isolated benchmark?
Benchmarking is a tricky business to get right, especially when timing things at a smaller scale. Haskell is lucky to have a very good benchmarking package. If you are asking someone for help, you are helping them by providing benchmarks, and they are likely to ask for them.
Do it right and use Criterion.
Resources on Criterion:
Have you looked at strictness of your function arguments?
Are you using the right data structure?
This GitHub organization provides comparative benchmarks against a few types of data structures. You can often use this to determine which data structure is best for your problem:
- sets - for set-like things
- dictionaries - dictionaries, hashmaps, maps, etc.
- sequences - lists, vectors/arrays, sequences, etc.
Tip: Lists are almost always the wrong data structure. But sometimes they are the right one.
See also HaskellWiki on data structures.
Are your data types strict and/or unpacked?
By default, Haskell fields are lazy and boxed. Making them strict can often (not always) give them more predictable performance, and unboxed fields (such as integers) do not require pointer indirection.
Resources on data type strictness:
Did you check your code isn't too polymorphic?
Code which is type-class-polymorphic, such as,
genericLength :: Num n => [a] -> n
has to accept an additional dictionary argument for which class
instance you want to use for
Num. That can make things slower.
Resources on overloading:
Do you have an explicit export list?
This is a suggestion from the HaskellWiki, but I believe it's based on out of date information about how GHC does inlining. It's left here for interested parties, however.
Have you looked at the Core?
Haskell compiles down to a small language, Core, which represents the real code generated before assembly. This is where many optimization passes take place.
Resources on core:
Have you considered unboxed arrays/strefs/etc?
An array with boxed elements such as
Data.Vector.Vector a means each
element is a pointer to the value, instead of containing the values
Use an unboxed vector where you can (integers and atomic types like that) to avoid the pointer indirection. The vector may be stored and accessed in the CPU cache, avoiding mainline memory altogether.
Likewise, a mutable container like
STRef both contain a
pointer rather than the value. Use
URef for an unboxed version.
Are you using Text or ByteString instead of String?
String type is slow for these reasons:
- It's a linked list, meaning access is linear.
- It's not a packed representation, so each character is a separate structure with a pointer to the next. It requires access to mainline memory.
- It allocates a lot more memory than packed representations.
Resources on string types: