Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assorted documentation fixes #248

Merged
merged 8 commits into from
Jul 17, 2020
18 changes: 9 additions & 9 deletions Data/ByteString.hs
Original file line number Diff line number Diff line change
Expand Up @@ -543,7 +543,7 @@ foldr' k v (PS fp off len) =
{-# INLINE foldr' #-}

-- | 'foldl1' is a variant of 'foldl' that has no starting value
-- argument, and thus must be applied to non-empty 'ByteStrings'.
-- argument, and thus must be applied to non-empty 'ByteString's.
-- An exception will be thrown in the case of an empty ByteString.
foldl1 :: (Word8 -> Word8 -> Word8) -> ByteString -> Word8
foldl1 f ps
Expand Down Expand Up @@ -892,7 +892,7 @@ break p ps = case findIndexOrEnd p ps of n -> (unsafeTake n ps, unsafeDrop n ps)
-- of the specified byte. It is more efficient than 'break' as it is
-- implemented with @memchr(3)@. I.e.
--
-- > break (=='c') "abcd" == breakByte 'c' "abcd"
-- > break (==99) "abcd" == breakByte 99 "abcd" -- fromEnum 'c' == 99
--
breakByte :: Word8 -> ByteString -> (ByteString, ByteString)
breakByte c p = case elemIndex c p of
Expand All @@ -917,7 +917,7 @@ span p ps = break (not . p) ps
-- occurence of a byte other than its argument. It is more efficient
-- than 'span (==)'
--
-- > span (=='c') "abcd" == spanByte 'c' "abcd"
-- > span (==99) "abcd" == spanByte 99 "abcd" -- fromEnum 'c' == 99
--
spanByte :: Word8 -> ByteString -> (ByteString, ByteString)
spanByte c ps@(PS x s l) =
Expand Down Expand Up @@ -968,8 +968,8 @@ spanEnd p ps = splitAt (findFromEndUntil (not.p) ps) ps
-- The resulting components do not contain the separators. Two adjacent
-- separators result in an empty component in the output. eg.
--
-- > splitWith (=='a') "aabbaca" == ["","","bb","c",""]
-- > splitWith (=='a') [] == []
-- > splitWith (==97) "aabbaca" == ["","","bb","c",""] -- fromEnum 'a' == 97
-- > splitWith (==97) [] == []
--
splitWith :: (Word8 -> Bool) -> ByteString -> [ByteString]
splitWith _pred (PS _ _ 0) = []
Expand Down Expand Up @@ -1000,17 +1000,17 @@ splitWith pred_ (PS fp off len) = splitWith0 pred# off len fp
-- | /O(n)/ Break a 'ByteString' into pieces separated by the byte
-- argument, consuming the delimiter. I.e.
--
-- > split '\n' "a\nb\nd\ne" == ["a","b","d","e"]
-- > split 'a' "aXaXaXa" == ["","X","X","X",""]
-- > split 'x' "x" == ["",""]
-- > split 10 "a\nb\nd\ne" == ["a","b","d","e"] -- fromEnum '\n' == 10
-- > split 97 "aXaXaXa" == ["","X","X","X",""] -- fromEnum 'a' == 97
-- > split 120 "x" == ["",""] -- fromEnum 'x' == 120
--
-- and
--
-- > intercalate [c] . split c == id
-- > split == splitWith . (==)
--
-- As for all splitting functions in this library, this function does
-- not copy the substrings, it just constructs new 'ByteStrings' that
-- not copy the substrings, it just constructs new 'ByteString's that
-- are slices of the original.
--
split :: Word8 -> ByteString -> [ByteString]
Expand Down
15 changes: 8 additions & 7 deletions Data/ByteString/Builder.hs
Original file line number Diff line number Diff line change
Expand Up @@ -67,12 +67,12 @@ infixr 4 \<\>
@

CSV is a character-based representation of tables. For maximal modularity,
we could first render 'Table's as 'String's and then encode this 'String'
we could first render @Table@s as 'String's and then encode this 'String'
using some Unicode character encoding. However, this sacrifices performance
due to the intermediate 'String' representation being built and thrown away
right afterwards. We get rid of this intermediate 'String' representation by
fixing the character encoding to UTF-8 and using 'Builder's to convert
'Table's directly to UTF-8 encoded CSV tables represented as lazy
@Table@s directly to UTF-8 encoded CSV tables represented as lazy
'L.ByteString's.

@
Expand Down Expand Up @@ -105,10 +105,10 @@ Note that the ASCII encoding is a subset of the UTF-8 encoding,
Using 'intDec' is more efficient than @'stringUtf8' . 'show'@,
as it avoids constructing an intermediate 'String'.
Avoiding this intermediate data structure significantly improves
performance because encoding 'Cell's is the core operation
performance because encoding @Cell@s is the core operation
for rendering CSV-tables.
See "Data.ByteString.Builder.Prim" for further
information on how to improve the performance of 'renderString'.
information on how to improve the performance of @renderString@.

We demonstrate our UTF-8 CSV encoding function on the following table.

Expand Down Expand Up @@ -149,14 +149,14 @@ Looking again at the definitions above,
we see that we took care to avoid intermediate data structures,
as otherwise we would sacrifice performance.
For example,
the following (arguably simpler) definition of 'renderRow' is about 20% slower.
the following (arguably simpler) definition of @renderRow@ is about 20% slower.

>renderRow :: Row -> Builder
>renderRow = mconcat . intersperse (charUtf8 ',') . map renderCell

Similarly, using /O(n)/ concatentations like '++' or the equivalent 'S.concat'
operations on strict and lazy 'L.ByteString's should be avoided.
The following definition of 'renderString' is also about 20% slower.
The following definition of @renderString@ is also about 20% slower.

>renderString :: String -> Builder
>renderString cs = charUtf8 $ "\"" ++ concatMap escape cs ++ "\""
Expand Down Expand Up @@ -291,7 +291,8 @@ toLazyByteString = toLazyByteStringWith
-- enough buffer.
--
-- It is recommended that the 'Handle' is set to binary and
-- 'BlockBuffering' mode. See 'hSetBinaryMode' and 'hSetBuffering'.
-- 'System.IO.BlockBuffering' mode. See 'System.IO.hSetBinaryMode' and
-- 'System.IO.hSetBuffering'.
--
-- This function is more efficient than @hPut . 'toLazyByteString'@ because in
-- many cases no buffer allocation has to be done. Moreover, the results of
Expand Down
2 changes: 1 addition & 1 deletion Data/ByteString/Builder/Extra.hs
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ import Foreign
-- * an IO action for writing the Builder's data into a user-supplied memory
-- buffer.
--
-- * a pre-existing chunks of data represented by a strict 'ByteString'
-- * a pre-existing chunks of data represented by a strict 'S.ByteString'
--
-- While this is rather low level, it provides you with full flexibility in
-- how the data is written out.
Expand Down
20 changes: 11 additions & 9 deletions Data/ByteString/Builder/Internal.hs
Original file line number Diff line number Diff line change
Expand Up @@ -443,7 +443,8 @@ flush = builder step
--
-- 'Put's are a generalization of 'Builder's. The typical use case is the
-- implementation of an encoding that might fail (e.g., an interface to the
-- 'zlib' compression library or the conversion from Base64 encoded data to
-- <https://hackage.haskell.org/package/zlib zlib>
-- compression library or the conversion from Base64 encoded data to
-- 8-bit data). For a 'Builder', the only way to handle and report such a
-- failure is ignore it or call 'error'. In contrast, 'Put' actions are
-- expressive enough to allow reportng and handling such a failure in a pure
Expand Down Expand Up @@ -705,7 +706,7 @@ hPut h p = do
updateBufR op'
return $ fillHandle minSize nextStep
-- 'fillHandle' will flush the buffer (provided there is
-- really less than 'minSize' space left) before executing
-- really less than @minSize@ space left) before executing
-- the 'nextStep'.

insertChunkH op' bs nextStep = do
Expand Down Expand Up @@ -809,7 +810,8 @@ putToLazyByteStringWith strategy k p =
-- Raw memory
-------------

-- | Ensure that there are at least 'n' free bytes for the following 'Builder'.
-- | @'ensureFree' n@ ensures that there are at least @n@ free bytes
-- for the following 'Builder'.
{-# INLINE ensureFree #-}
ensureFree :: Int -> Builder
ensureFree minFree =
Expand Down Expand Up @@ -1013,9 +1015,9 @@ customStrategy
:: (Maybe (Buffer, Int) -> IO Buffer)
-- ^ Buffer allocation function. If 'Nothing' is given, then a new first
-- buffer should be allocated. If @'Just' (oldBuf, minSize)@ is given,
-- then a buffer with minimal size 'minSize' must be returned. The
-- strategy may reuse the 'oldBuffer', if it can guarantee that this
-- referentially transparent and 'oldBuffer' is large enough.
-- then a buffer with minimal size @minSize@ must be returned. The
-- strategy may reuse the @oldBuf@, if it can guarantee that this
-- referentially transparent and @oldBuf@ is large enough.
-> Int
-- ^ Default buffer size.
-> (Int -> Int -> Bool)
Expand Down Expand Up @@ -1067,7 +1069,7 @@ safeStrategy firstSize bufSize =
--
-- This function is inlined despite its heavy code-size to allow fusing with
-- the allocation strategy. For example, the default 'Builder' execution
-- function 'toLazyByteString' is defined as follows.
-- function 'Data.ByteString.Builder.toLazyByteString' is defined as follows.
--
-- @
-- {-\# NOINLINE toLazyByteString \#-}
Expand All @@ -1077,8 +1079,8 @@ safeStrategy firstSize bufSize =
--
-- where @L.empty@ is the zero-length lazy 'L.ByteString'.
--
-- In most cases, the parameters used by 'toLazyByteString' give good
-- performance. A sub-performing case of 'toLazyByteString' is executing short
-- In most cases, the parameters used by 'Data.ByteString.Builder.toLazyByteString' give good
-- performance. A sub-performing case of 'Data.ByteString.Builder.toLazyByteString' is executing short
-- (<128 bytes) 'Builder's. In this case, the allocation overhead for the first
-- 4kb buffer and the trimming cost dominate the cost of executing the
-- 'Builder'. You can avoid this problem using
Expand Down
48 changes: 21 additions & 27 deletions Data/ByteString/Builder/Prim.hs
Original file line number Diff line number Diff line change
Expand Up @@ -100,24 +100,24 @@ import Data.ByteString.Builder.Prim

renderString :: String -\> Builder
renderString cs =
B.charUtf8 \'\"\' \<\> E.'encodeListWithB' escape cs \<\> B.charUtf8 \'\"\'
B.charUtf8 \'\"\' \<\> 'P.primMapListBounded' escape cs \<\> B.charUtf8 \'\"\'
where
escape :: E.'BoundedPrim' Char
escape :: 'P.BoundedPrim' Char
escape =
'condB' (== \'\\\\\') (fixed2 (\'\\\\\', \'\\\\\')) $
'condB' (== \'\\\"\') (fixed2 (\'\\\\\', \'\\\"\')) $
E.'charUtf8'
'charUtf8'
&#160;
{&#45;\# INLINE fixed2 \#&#45;}
fixed2 x = 'liftFixedToBounded' $ const x '>$<' E.'char7' '>*<' E.'char7'
fixed2 x = 'P.liftFixedToBounded' $ const x '>$<' 'P.char7' '>*<' 'P.char7'
@

The code should be mostly self-explanatory. The slightly awkward syntax is
because the combinators are written such that the size-bound of the resulting
'BoundedPrim' can be computed at compile time. We also explicitly inline the
'fixed2' primitive, which encodes a fixed tuple of characters, to ensure that
@fixed2@ primitive, which encodes a fixed tuple of characters, to ensure that
the bound computation happens at compile time. When encoding the following list
of 'String's, the optimized implementation of 'renderString' is two times
of 'String's, the optimized implementation of @renderString@ is two times
faster.

@
Expand All @@ -140,23 +140,23 @@ exploits that the escaped character with the maximal Unicode codepoint is \'>\'.

@
{&#45;\# INLINE charUtf8HtmlEscaped \#&#45;}
charUtf8HtmlEscaped :: E.BoundedPrim Char
charUtf8HtmlEscaped :: 'BoundedPrim' Char
charUtf8HtmlEscaped =
'condB' (> \'\>\' ) E.'charUtf8' $
'condB' (> \'\>\' ) 'charUtf8' $
'condB' (== \'\<\' ) (fixed4 (\'&\',(\'l\',(\'t\',\';\')))) $ -- &lt;
'condB' (== \'\>\' ) (fixed4 (\'&\',(\'g\',(\'t\',\';\')))) $ -- &gt;
'condB' (== \'&\' ) (fixed5 (\'&\',(\'a\',(\'m\',(\'p\',\';\'))))) $ -- &amp;
'condB' (== \'\"\' ) (fixed5 (\'&\',(\'\#\',(\'3\',(\'4\',\';\'))))) $ -- &\#34;
'condB' (== \'\\\'\') (fixed5 (\'&\',(\'\#\',(\'3\',(\'9\',\';\'))))) $ -- &\#39;
('liftFixedToBounded' E.'char7') -- fallback for 'Char's smaller than \'\>\'
('liftFixedToBounded' 'char7') -- fallback for 'Char's smaller than \'\>\'
where
{&#45;\# INLINE fixed4 \#&#45;}
fixed4 x = 'liftFixedToBounded' $ const x '>$<'
E.char7 '>*<' E.char7 '>*<' E.char7 '>*<' E.char7
char7 '>*<' char7 '>*<' char7 '>*<' char7
&#160;
{&#45;\# INLINE fixed5 \#&#45;}
fixed5 x = 'liftFixedToBounded' $ const x '>$<'
E.char7 '>*<' E.char7 '>*<' E.char7 '>*<' E.char7 '>*<' E.char7
char7 '>*<' char7 '>*<' char7 '>*<' char7 '>*<' char7
@

This module currently does not expose functions that require the special
Expand Down Expand Up @@ -301,7 +301,7 @@ corresponding functions in future releases of this library.
-- >
-- > renderString :: String -> Builder
-- > renderString cs =
-- > charUtf8 '"' <> encodeListWithB escapedUtf8 cs <> charUtf8 '"'
-- > charUtf8 '"' <> primMapListBounded escapedUtf8 cs <> charUtf8 '"'
-- > where
-- > escapedUtf8 :: BoundedPrim Char
-- > escapedUtf8 =
Expand Down Expand Up @@ -377,7 +377,8 @@ module Data.ByteString.Builder.Prim (
, FixedPrim

-- ** Combinators
-- | The combinators for 'FixedPrim's are implemented such that the 'size'
-- | The combinators for 'FixedPrim's are implemented such that the
-- 'Data.ByteString.Builder.Prim.size'
-- of the resulting 'FixedPrim' is computed at compile time.
--
-- The '(>*<)' and '(>$<)' pairing and mapping operators can be used
Expand All @@ -390,20 +391,20 @@ module Data.ByteString.Builder.Prim (
-- for constructing 'Builder's from 'FixedPrim's. The fused variants of
-- this function are provided because they allow for more efficient
-- implementations. Our compilers are just not smart enough yet; and for some
-- of the employed optimizations (see the code of 'encodeByteStringWithF')
-- of the employed optimizations (see the code of 'primMapByteStringFixed')
-- they will very likely never be.
--
-- Note that functions marked with \"/Heavy inlining./\" are forced to be
-- inlined because they must be specialized for concrete encodings,
-- but are rather heavy in terms of code size. We recommend to define a
-- top-level function for every concrete instantiation of such a function in
-- order to share its code. A typical example is the function
-- 'byteStringHex' from "Data.ByteString.Builder.ASCII", which is
-- implemented as follows.
-- 'Data.ByteString.Builder.byteStringHex' from "Data.ByteString.Builder.ASCII",
-- which is implemented as follows.
--
-- @
-- byteStringHex :: S.ByteString -> Builder
-- byteStringHex = 'encodeByteStringWithF' 'word8HexFixed'
-- byteStringHex = 'primMapByteStringFixed' 'word8HexFixed'
-- @
--
, primFixed
Expand Down Expand Up @@ -505,10 +506,10 @@ primUnfoldrFixed = primUnfoldrBounded . toB
-- copying it to the buffer to be filled.
--
-- > mapToBuilder :: (Word8 -> Word8) -> S.ByteString -> Builder
-- > mapToBuilder f = encodeByteStringWithF (contramapF f word8)
-- > mapToBuilder f = primMapByteStringFixed (contramapF f word8)
--
-- We can also use it to hex-encode a strict 'S.ByteString' as shown by the
-- 'byteStringHex' example above.
-- 'Data.ByteString.Builder.ASCII.byteStringHex' example above.
{-# INLINE primMapByteStringFixed #-}
primMapByteStringFixed :: FixedPrim Word8 -> (S.ByteString -> Builder)
primMapByteStringFixed = primMapByteStringBounded . toB
Expand Down Expand Up @@ -575,13 +576,7 @@ primBounded w x =
-- TODO: The same rules for 'putBuilder (..) >> putBuilder (..)'

-- | Create a 'Builder' that encodes a list of values consecutively using a
-- 'BoundedPrim' for each element. This function is more efficient than the
-- canonical
--
-- > filter p =
-- > B.toLazyByteString .
-- > E.encodeLazyByteStringWithF (E.ifF p E.word8) E.emptyF)
-- >
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this because it has nothing to do with lists. A similar, and up-to-date example is already in the comment ofprimMapLazyByteStringBounded

-- 'BoundedPrim' for each element. This function is more efficient than
--
-- > mconcat . map (primBounded w)
--
Expand Down Expand Up @@ -742,4 +737,3 @@ encodeCharUtf8 f1 f2 f3 f4 c = case ord c of
x4 = fromIntegral $ (x .&. 0x3F) + 0x80
in f4 x1 x2 x3 x4


9 changes: 5 additions & 4 deletions Data/ByteString/Builder/Prim/Internal.hs
Original file line number Diff line number Diff line change
Expand Up @@ -95,9 +95,10 @@ infixl 4 >$<
-- We can use it for example to prepend and/or append fixed values to an
-- primitive.
--
-- > import Data.ByteString.Builder.Prim as P
-- >showEncoding ((\x -> ('\'', (x, '\''))) >$< fixed3) 'x' = "'x'"
-- > where
-- > fixed3 = char7 >*< char7 >*< char7
-- > fixed3 = P.char7 >*< P.char7 >*< P.char7
--
-- Note that the rather verbose syntax for composition stems from the
-- requirement to be able to compute the size / size bound at compile time.
Expand Down Expand Up @@ -178,7 +179,7 @@ pairF (FP l1 io1) (FP l2 io2) =
-- | Change a primitives such that it first applies a function to the value
-- to be encoded.
--
-- Note that primitives are 'Contrafunctors'
-- Note that primitives are 'Contravariant'
-- <http://hackage.haskell.org/package/contravariant>. Hence, the following
-- laws hold.
--
Expand Down Expand Up @@ -247,7 +248,7 @@ runB (BP _ io) = io
-- | Change a 'BoundedPrim' such that it first applies a function to the
-- value to be encoded.
--
-- Note that 'BoundedPrim's are 'Contrafunctors'
-- Note that 'BoundedPrim's are 'Contravariant'
-- <http://hackage.haskell.org/package/contravariant>. Hence, the following
-- laws hold.
--
Expand Down Expand Up @@ -290,7 +291,7 @@ eitherB (BP b1 io1) (BP b2 io2) =
-- Unicode codepoints above 127 as follows.
--
-- @
--charASCIIDrop = 'condB' (< \'\\128\') ('fromF' 'char7') 'emptyB'
--charASCIIDrop = 'condB' (< \'\\128\') ('liftFixedToBounded' 'Data.ByteString.Builder.Prim.char7') 'emptyB'
-- @
{-# INLINE CONLIKE condB #-}
condB :: (a -> Bool) -> BoundedPrim a -> BoundedPrim a -> BoundedPrim a
Expand Down
8 changes: 5 additions & 3 deletions Data/ByteString/Char8.hs
Original file line number Diff line number Diff line change
Expand Up @@ -370,7 +370,7 @@ foldr' f = B.foldr' (\c a -> f (w2c c) a)
{-# INLINE foldr' #-}

-- | 'foldl1' is a variant of 'foldl' that has no starting value
-- argument, and thus must be applied to non-empty 'ByteStrings'.
-- argument, and thus must be applied to non-empty 'ByteString's.
foldl1 :: (Char -> Char -> Char) -> ByteString -> Char
foldl1 f ps = w2c (B.foldl1 (\x y -> c2w (f (w2c x) (w2c y))) ps)
{-# INLINE foldl1 #-}
Expand Down Expand Up @@ -601,7 +601,7 @@ breakEnd f = B.breakEnd (f . w2c)
-- > split == splitWith . (==)
--
-- As for all splitting functions in this library, this function does
-- not copy the substrings, it just constructs new 'ByteStrings' that
-- not copy the substrings, it just constructs new 'ByteString's that
-- are slices of the original.
--
split :: Char -> ByteString -> [ByteString]
Expand Down Expand Up @@ -867,7 +867,9 @@ lastnonspace ptr n
-}

-- | 'lines' breaks a ByteString up into a list of ByteStrings at
-- newline Chars. The resulting strings do not contain newlines.
-- newline Chars (@'\\n'@). The resulting strings do not contain newlines.
--
-- Note that it __does not__ regard CR (@'\\r'@) as a newline character.
--
lines :: ByteString -> [ByteString]
lines ps
Expand Down
Loading