Skip to content

NorfairKing/haskell-dangerous-functions

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

71 Commits
 
 
 
 
 
 

Repository files navigation

Haskell's Dangerous Functions

What does dangerous mean?

Dangerous could mean either of these:

  • Partial: can throw exceptions in pure code
  • Unsafe: can cause segfaults
  • Has unexpected performance characteristics
  • Doesn't do what you want
  • Doesn't do what you think it does

How to forbid these dangerous functions in your codebase

  1. Copy the hlint.yaml file in this repository to .hlint.yaml within your repository

    cat /path/to/haskell-dangerous-functions >> /path/to/your/project/.hlint.yaml
    
  2. Run hlint on your code. Make sure to require new changes to be hlint-clean. You can use hlint --default to generate a settings file ignoring all the hints currently outstanding. You can use pre-commit hooks to forbid committing non-hlint-clean changes.

  3. Whenever you want to make an exception, and use a forbidden function anyway, use the ignore key to add an exception to the .hlint.yaml file.

FAQ

  • It seems pretty silly that these functions still exist, and their dangers not well-documented.

    I know! See the relevant discussion on the GHC issue tracker.

  • Surely everyone knows about these?

    Maybe, but I certainly didn't, until they caused real production issues.

Contributing

WANTED: Evidence of the danger in these functions. If you can showcase a public incident with real-world consequences that happened because of one of these functions, I would love to refer to it in this document!

If you know about another dangerous function that should be avoided, feel free to submit a PR! Please include:

  • an hlint config to forbid the function in hlint.yaml.
  • a section in this document with:
    • Why the function is dangerous
    • A reproducible way of showing that it is dangerous.
    • An alternative to the dangerous function

It might be that the function you have in mind is not dangerous but still weird. In that case you can add it to the Haskell WAT list.

Overview of the dangerous functions

TL;DR: Using forkIO is VERY hard to get right, use the async library instead.

The main issue is that when threads spawned using forkIO throw an exception, this exception is not rethrown in the thread that spawned that thread.

As an example, suppose we forkIO a server and something goes wrong. The main thread will not notice that anything went wrong. The only indication that an exception was thrown will be that something is printed on stderr.

$ cat test.hs
#!/usr/bin/env stack
-- stack --resolver lts-15.15 script
{-# LANGUAGE NumericUnderscores #-}
import Control.Concurrent
main :: IO ()
main = do
  putStrLn "Starting our 'server'."
  forkIO $ do
    putStrLn "Serving..."
    threadDelay 1_000_000
    putStrLn "Oh no, about to crash!"
    threadDelay 1_000_000
    putStrLn "Aaaargh"
    undefined
  threadDelay 5_000_000
  putStrLn "Still running, eventhough we crashed"
  threadDelay 5_000_000
  putStrLn "Ok that's enough of that, stopping here."

Which outputs:

$ ./test.hs
Starting our 'server'.
Serving...
Oh no, about to crash!
Aaaargh
test.hs: Prelude.undefined
CallStack (from HasCallStack):
  error, called at libraries/base/GHC/Err.hs:80:14 in base:GHC.Err
  undefined, called at /home/syd/test/test.hs:17:5 in main:Main
Still running, eventhough we crashed
Ok that's enough of that, stopping here.

Instead, we can use concurrently_ from the async package:

$ cat test.hs
-- stack --resolver lts-15.15 script

{-# LANGUAGE NumericUnderscores #-}

import Control.Concurrent
import Control.Concurrent.Async

main :: IO ()
main = do
  putStrLn "Starting our 'server'."
  let runServer = do
        putStrLn "Serving..."
        threadDelay 1_000_000
        putStrLn "Oh no, about to crash!"
        threadDelay 1_000_000
        putStrLn "Aaaargh"
        undefined
  let mainThread = do
        threadDelay 5_000_000
        putStrLn "Still running, eventhough we crashed"
        threadDelay 5_000_000
        putStrLn "Ok that's enough of that, stopping here."
  concurrently_ runServer mainThread

to output:

$ ./test.hs
Starting our 'server'.
Serving...
Oh no, about to crash!
Aaaargh
test.hs: Prelude.undefined
CallStack (from HasCallStack):
  error, called at libraries/base/GHC/Err.hs:80:14 in base:GHC.Err
  undefined, called at /home/syd/test.hs:18:9 in main:Main

See also:

Mostly impossible to get right. You probably want to be using the async library instead.

If you think "I know what I'm doing" then you're probably still wrong. Rethink what you're doing entirely.

See also https://www.reddit.com/r/haskell/comments/jsap9r/how_dangerous_is_forkprocess/

Partial functions

Throws an exception in pure code when the input is an empty list.

Prelude> head []
*** Exception: Prelude.head: empty list

Use listToMaybe instead.

Applies to Data.Text.head as well

Trail of destruction:

Throws an exception in pure code when the input is an empty list.

Prelude> tail []
*** Exception: Prelude.tail: empty list

Use drop 1 or a case-match instead.

Applies to Data.Text.tail as well

Throws an exception in pure code when the input is an empty list.

Prelude> init []
*** Exception: Prelude.init: empty list

Use a case-match on the reverse of the list instead, but keep in mind that it uses linear time in the length of the list. Use a different data structure if that is an issue for you. Since base-4.19 you can also employ unsnoc.

Applies to Data.Text.init as well

Throws an exception in pure code when the input is an empty list.

Prelude> last []
*** Exception: Prelude.last: empty list

Use a listToMaybe . reverse instead, but keep in mind that it uses linear time in the length of the list. Use a different data structure if that is an issue for you. Since base-4.19 you can also employ unsnoc.

Applies to Data.Text.last as well

Throws an exception in pure code when the index is out of bounds.

Prelude> [1, 2, 3] !! 3
*** Exception: Prelude.!!: index too large

It also allows negative indices, for which it also throws.

Prelude> [1,2,3] !! (-1)
*** Exception: Prelude.!!: negative index

The right way to index is to not use a list, because list indexing takes O(n) time, even if you find a safe way to do it. If you really need to deal with list indexing (you don't), then you can use a combination of take and drop or (since base-4.19) '!?'.

Throws an exception in pure code when the input is Nothing.

Prelude Data.Maybe> fromJust Nothing
*** Exception: Maybe.fromJust: Nothing
CallStack (from HasCallStack):
  error, called at libraries/base/Data/Maybe.hs:148:21 in base:Data.Maybe
  fromJust, called at <interactive>:11:1 in interactive:Ghci1

Use a case-match instead.

There are multiple reasons not to use read. The most obvious one is that it is partial. It throws an exception in pure code whenever the input cannot be parsed (and doesn't even give a helpful parse error):

Prelude> read "a" :: Int
*** Exception: Prelude.read: no parse

You can use readMaybe to get around this issue, HOWEVER:

The second reason not to use read is that it operates on String.

read :: Read a => String -> a

If you are doing any parsing, you should be using a more appropriate data type to parse: (Text or ByteString)

The third reason is that read comes from the Read type class, which has no well-defined semantics. In an ideal case, read and show would be inverses but this is just not the reality. See UTCTime as an example.

The toEnum :: Enum => Int -> a function is partial whenever the Enumerable type a is smaller than Int:

Prelude> toEnum 5 :: Bool
*** Exception: Prelude.Enum.Bool.toEnum: bad argument
Prelude Data.Word> toEnum 300 :: Word8
*** Exception: Enum.toEnum{Word8}: tag (300) is outside of bounds (0,255)

These are partial, on purpose. According to the docs:

The calls succ maxBound and pred minBound should result in a runtime error.

Prelude Data.Word> succ 255 :: Word8
*** Exception: Enum.succ{Word8}: tried to take `succ' of maxBound
Prelude Data.Word> pred 0 :: Word8
*** Exception: Enum.pred{Word8}: tried to take `pred' of minBound

Use something like (succMay](https://hackage.haskell.org/package/safe-0.3.19/docs/Safe.html#v:succMay).

Functions involving division

Prelude> quot 1 0
*** Exception: divide by zero
Prelude> minBound `quot` (-1) :: Int
*** Exception: arithmetic overflow
Prelude> div 1 0
*** Exception: divide by zero
Prelude> minBound `div` (-1) :: Int
*** Exception: arithmetic overflow
Prelude> rem 1 0
*** Exception: divide by zero
Prelude> mod 1 0
*** Exception: divide by zero
Prelude> divMod 1 0
*** Exception: divide by zero
Prelude> quotRem 1 0
*** Exception: divide by zero

Whenever you consider using division, really ask yourself whether you need division. For example, you can (almost always) replace a `div` 2 <= b by a <= 2 * b. (If you're worried about overflow, then use a bigger type.)

If your use-case has a fixed (non-0) literal denominator, like a `div` 2, and you have already considered using something other than division, then your case constitutes an acceptable exception.

Note that integer division may not be what you want in the first place anyway:

Prelude> 5 `div` 2
2 -- Not 2.5

See also https://github.com/NorfairKing/haskell-WAT#num-int

These functions throw an exception in pure code whenever the input is empty:

Prelude> minimum []
*** Exception: Prelude.minimum: empty list
Prelude> maximum []
*** Exception: Prelude.maximum: empty list
Prelude> minimum Nothing
*** Exception: minimum: empty structure
Prelude> minimum (Left "wut")
*** Exception: minimum: empty structure
Prelude Data.Functor.Const> minimum (Const 5 :: Const Int ())
*** Exception: minimum: empty structure

The same goes for minimumBy and maximumBy.

You can use minimumMay from the safe package (or a case-match on the sort-ed version of your list, if you don't want an extra dependency).

Applies to Data.Text.maximum and Data.Text.minimum as well

Throws on invalid UTF-8 datao use Data.Text.Encoding.decodeUtf8' instead.

Functions that throw exceptions in pure code on purpose

Purposely throws an exception in pure code.

Prelude Control.Exception> throw $ ErrorCall "here be a problem"
*** Exception: here be a problem

Don't throw from pure code, use throwIO instead.

Purposely fails, with a particularly unhelpful error message.

Prelude> undefined
*** Exception: Prelude.undefined
CallStack (from HasCallStack):
  error, called at libraries/base/GHC/Err.hs:80:14 in base:GHC.Err
  undefined, called at <interactive>:1:1 in interactive:Ghci1

Deal with errors appropriately instead.

Also see error below.

Purposely fails, with an only slightly less unhelpful error message than undefined.

Prelude> error "here be a problem"
*** Exception: here be a problem
CallStack (from HasCallStack):
  error, called at <interactive>:4:1 in interactive:Ghci1

Deal with errors appropriately instead.

If you're really very extra sure that a certain case will never happen. Bubble up the error to the IO part of your code and then use throwIO or die.

Functions that do unexpected things

This function goes through Rational:

-- | general coercion to fractional types
realToFrac :: (Real a, Fractional b) => a -> b
realToFrac = fromRational . toRational

Rational does not have all the values that a Real like Double might have, so things will go wrong in ways that you don't expect:

Prelude> realToFrac nan :: Double
-Infinity

Avoid general coercion functions and anything to do with Double in particular.

See also https://github.com/NorfairKing/haskell-WAT#real-double

%: Rational values

The % function is used to construct rational values:

data Ratio a = !a :% !a  deriving Eq
Prelude Data.Int Data.Ratio> 1 % 12 :: Ratio Int8
1 % 12

There are constraints on the two values in Rational values:

Recall (from the docs); "The numerator and denominator have no common factor and the denominator is positive."

When using fixed-size underlying types, you can end up with invalid Ratio values using Num functions:

Prelude Data.Int Data.Ratio> let r = 1 % 12 :: Ratio Int8
Prelude Data.Int Data.Ratio> r - r
0 % (-1)
Prelude Data.Int Data.Ratio> r + r
3 % (-14)
> r * r
1 % (-112)

When using arbitrarily-sized underlying types, you can end up with arbitrary runtime:

(1 % 100)^10^10^10 :: Rational -- Prepare a way to kill this before you try it out.

Ratio values create issues for any underlying type, so avoid them. Consider whether you really need any rational values. If you really do, and you have a clear maximum value, consider using fixed-point values. If that does not fit your use-case, consider using Double with all its caveats.

fromIntegral has no constraints on the size of the output type, so that output type could be smaller than the input type. In such a case, it performs silent truncation:

> fromIntegral (300 :: Word) :: Word8
44

Similarly for fromInteger:

> fromInteger 300 :: Word8
44

fromIntegral has also had some very nasty bugs that involved the function behaving differently (even partially) depending on optimisation levels. See GHC #20066 and GHC #19345.

Avoid general coercion functions but write specific ones instead, as long as the type of the result is bigger than the type of the input.

word32ToWord64 :: Word32 -> Word64
word32ToWord64 = fromIntegral -- Safe because Word64 is bigger than Word32

Prefer to use functions with non-parametric types and/or functions that fail loudly, like these:

Witness the trail of destruction:

I was also pointed to the finitary package but I haven't used it yet.

The toEnum function suffers from the following problem on top of being partial.

Some instances of Enum use "the next constructor" as the next element while others use a n+1 variant:

Prelude> toEnum 5 :: Double
5.0
Prelude Data.Fixed> toEnum 5 :: Micro
0.000005

Depending on what you expected, one of those doesn't do what you think it does.

From the docs:

It is implementation-dependent what fromEnum returns when applied to a value that is too large to fit in an Int.

For example, some Integer that does not fit in an Int will be mapped to 0, some will be mapped all over the place

Prelude> fromEnum (2^66 :: Integer) -- To 0
0
Prelude> fromEnum (2^65 :: Integer) -- To 0
0
Prelude> fromEnum (2^64 :: Integer) -- To 0
0
Prelude> fromEnum (2^64 -1 :: Integer) -- To -1 ?!
0
Prelude> fromEnum (2^63 :: Integer) -- To -2^63 ?!
-9223372036854775808

This is because fromEnum :: Integer -> Int is implemented using integerToInt which treats big integers and small integers differently.

These suffer from the same problem as toEnum (see above) on top of being partial.

Prelude> succ 5 :: Double
6.0
Prelude Data.Fixed> succ 5 :: Micro
5.000001
Prelude> pred 0 :: Word
*** Exception: Enum.pred{Word}: tried to take `pred' of minBound
Prelude Data.Ord Data.Int> succ (127 :: Int8)
*** Exception: Enum.succ{Int8}: tried to take `succ' of maxBound

fromString on ByteString

When converting to ByteString, fromString silently truncates to the bottom eight bits, turning your string into garbage.

> print ""
"\9888"
> print (fromString "" :: ByteString)
"\160"

The enumFromTo-related functions

These also suffer from the same problem as toEnum (see above)

Prelude> succ 5 :: Int
6
Prelude Data.Fixed> succ 5 :: Micro
5.000001

Functions related to String-based IO

Input

  • System.IO.getChar
  • System.IO.getLine
  • System.IO.getContents
  • System.IO.interact
  • System.IO.readIO
  • System.IO.readLn
  • System.IO.readFile

These behave differently depending on env vars, and actually fail on non-text data in files:

Prelude> readFile "example.dat"
*** Exception: Test/A.txt: hGetContents: invalid argument (invalid byte sequence) "\226\8364

See also this blogpost.

Use ByteString-based input and then use Data.Text.Encoding.decodeUtf8' if necessary. (But not Data.Text.Encoding.decodeUtf8, see above.)

Output

  • System.IO.putChar
  • System.IO.putStr
  • System.IO.putStrLn
  • System.IO.print
  • System.IO.writeFile
  • System.IO.appendFile

These behave differently depending on env vars:

$ ghci
Prelude> putStrLn "\973"
ύ

but

$ export LANG=C
$ export LC_ALL=C
$ ghci
Prelude> putStrLn "\973"
?

Use ByteString-based output, on encoded Text values or directly on bytestrings instead.

writeFile caused a real-world outage for @tomjaguarpaw on 2021-09-24.

See also this blogpost.

Functions related to Text-based IO

  • Data.Text.IO.readFile
  • Data.Text.IO.Lazy.readFile

These have the same issues as readFile.

See also this blogpost.

Since text-2.1 one can replace Data.Text.IO with Data.Text.IO.Utf8.

Functions with unexpected performance characteristics

nub

O(n^2), use ordNub instead

Trail of destruction: https://gitlab.haskell.org/ghc/ghc/-/issues/8173#note_236901

foldl and foldMap

Lazy. Use foldl' and foldMap' instead.

See this excellent explanation.

sum and product

Lazy accumulator, but is fixed as of GHC 9.0.1.

genericLength

genericLength consumes O(n) stack space when returning a strict numeric type. Lazy numeric types (e.g. data Nat = Z | S Nat) are very rare in practice so genericLength is probably not what you want.

Confusing functions

These functions are a bad idea for no other reason than readability. If there is a bug that involves these functions, it will be really easy to read over them.

unless

unless is defined as follows:

unless p s        =  if p then pure () else s

This is really confusing in practice use when with not instead.

either

Either takes two functions as arguments, one for the Left case and one for the Right case. Which comes first? I don't know either, just use a case-match instead.

Modules or packages to avoid

These are debatable, but requiring a good justification for using them is a good default.

The lens package is full of abstract nonsense and obscure operators. There are good reasons (in exceptional cases) for using it, like in cursor, for example, but it should be avoided in general.

It also has an ENORMOUS dependency footprint.

If you need to use a dependency that uses lenses without the lens dependency, you can use microlens to stick with the (relatively) simple parts of lenses. If you need to use a dependency that uses lens, go ahead and use lens, but stick with view (^.) and set (.~).

Extensions to avoid

These are also debatable and low priority compared to the rest in this document, but still interesting to consider

Just use

data MyExample
instance MyClass MyExample

instead of

data MyExample
  deriving MyClass

GHC (rather puzzlingly) gives the recommendation to turn on DeriveAnyClass even when that would lead to code that throws an exception in pure code at runtime. As a result, banning this extension brings potentially great upside: preventing a runtime exception, as well as reducing confusion, for the cost of writing a separate line for an instance if you know what you are doing.

See also this great explainer video by Tweag.

This lets you add {-# LANGUAGE TupleSections #-} and potential confusion to write (, v) instead of \a -> (a, v).

Whenever you feel the need to use TupleSections, you probably want to be using a data type instead of tuples instead.

To keep things simple, use prefix-named record fields like this:

data Template = Template { templateName :: Text, templateContents :: Text }

instead of this

{-# LANGUAGE DuplicateRecordFields #-}
data Template = Template { name :: Text, contents :: Text }

It may be more typing but it makes code a lot more readable.

If you are concerned about not being able to auto-derive aeson's ToJSON and FromJSON instances anymore, you shouldn't be. You can still that using something like aeson-casing. It's also dangerous to have serialisation output depend on the naming of things in your code, so be sure to test your serialisation with both property tests via genvalidity-sydtest-aeson and golden tests via sydtest-aeson.

Introduces unnecessary syntax.

For this example:

data C = C { a :: Int }

just use this:

f c = foo (a c)

instead of this:

f (C {a}) = foo a

If you're using this, you either know what you're doing - in which case you should know better than to use this - or you don't - in which case you definitely shouldn't use it. Keep your code simple and just use record field selectors instead.

This extension often goes hand in hand with lens usage, which should also be discouraged, see above.

Unsafe functions

unsafePerformIO

Before you use this function, first read its documentation carefully. If you've done (and I know you haven't, you liar) and still want to use it, read the following section first.

When you use unsafePerformIO, you pinkie-promise that the code in the IO a that you provide is definitely always 100% pure, you swear. If this is not the case, all sorts of assumptions don't work anymore. For example, if the code that you want to execute in unsafePerformIO is not evaluated, then the IO is never executed:

Prelude> fmap (const 'o') $ putStrLn "hi"
hi
'o'
Prelude System.IO.Unsafe> const 'o' $ unsafePerformIO $ putStrLn "hi"
'o'

Another issue is that pure code can be inlined whereas IO-based code cannot. When you pinkie-promise that your code is "morally" pure, you also promise that inlining it will not cause trouble. This is not true in general:

Prelude System.IO.Unsafe> let a = unsafePerformIO $ putStrLn "hi"
Prelude System.IO.Unsafe> a
hi
()
Prelude System.IO.Unsafe> a
()

Lastly, this function is also not type-safe, as you can see here:

$ cat file.hs
import Data.IORef
import System.IO.Unsafe

test :: IORef [a]
test = unsafePerformIO $ newIORef []

main = do
  writeIORef test [42]
  bang <- readIORef test
  print $ map (\f -> f 5 6) (bang :: [Int -> Int -> Int])
$ runhaskell file.hs
[file.hs: internal error: stg_ap_pp_ret
    (GHC version 8.8.4 for x86_64_unknown_linux)
    Please report this as a GHC bug:  https://www.haskell.org/ghc/reportabug
[1]    13949 abort (core dumped)  runhaskell file.hs

Like unsafePerformIO but is even less safe.

Used to define lazy IO, which should be avoided.

Unsafe version of fixIO.

Deprecated

return

Use pure instead. See https://gitlab.haskell.org/ghc/ghc/-/wikis/proposal/monad-of-no-return

About

Documentation about Haskell's dangerous functions and a hlint config file to warn about them

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published