Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compute length at compile time for literal strings #191

Merged
merged 9 commits into from
Aug 25, 2020
3 changes: 3 additions & 0 deletions Changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,9 @@
* Add `IsList` instances
* Deprecate `Data.ByteString.Lazy.Builder`
* Add `partition` to `Data.ByteString.Char8` and `Data.ByteString.Lazy.Char8`
* Add `unsafePackLiteral` to `Data.ByteString.Internal`. Where possible,
use known-key variant of C `strlen` from `GHC.CString` that supports
constant folding.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "known-key" mean here?

I think it would also be nice if the changelog entry was more explicit about the potential benefits for users.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a term used in GHC that means that the compiler knows about the identifier and consequently can do sophisticated rewrites with it. Ascending levels of magic:

  • Ordinary function (no magic)
  • Known-key function (implementation is in library space in base but GHC's built-in rewrite rules can look for the function)
  • Primop (implementation is provided by the compiler itself, and GHC's built-in rewrite rules can look for the function)

I can beef up the changelog entry.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've improved this by adding a link to the GHC wiki entry on known-key things and by clarifying that string literal desugar to ByteString via OverloadingStrings get better. Let me know if there's anything else.


0.10.10.1 – June 2020

Expand Down
1 change: 1 addition & 0 deletions Data/ByteString.hs
Original file line number Diff line number Diff line change
Expand Up @@ -755,6 +755,7 @@ scanr1 f ps
--
-- This implemenation uses @memset(3)@
replicate :: Int -> Word8 -> ByteString
{-# INLINE replicate #-}
andrewthad marked this conversation as resolved.
Show resolved Hide resolved
replicate w c
| w <= 0 = empty
| otherwise = unsafeCreate w $ \ptr ->
Expand Down
35 changes: 31 additions & 4 deletions Data/ByteString/Internal.hs
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ module Data.ByteString.Internal (
packChars, packUptoLenChars, unsafePackLenChars,
unpackBytes, unpackAppendBytesLazy, unpackAppendBytesStrict,
unpackChars, unpackAppendCharsLazy, unpackAppendCharsStrict,
unsafePackAddress,
unsafePackAddress, unsafePackLiteral,

-- * Low level imperative construction
create, -- :: Int -> (Ptr Word8 -> IO ()) -> IO ByteString
Expand Down Expand Up @@ -136,9 +136,16 @@ import GHC.IO (IO(IO),unsafeDupablePerformIO)
import GHC.IOBase (IO(IO),RawBuffer,unsafeDupablePerformIO)
#endif

import GHC.ForeignPtr (ForeignPtr(ForeignPtr)
,newForeignPtr_, mallocPlainForeignPtrBytes)
import GHC.ForeignPtr (ForeignPtr(ForeignPtr), mallocPlainForeignPtrBytes)

#if __GLASGOW_HASKELL__ >= 811
import GHC.CString (cstringLength#)
import GHC.Exts (Int(I#))
import GHC.ForeignPtr (ForeignPtrContents(FinalPtr))
#else
import GHC.ForeignPtr (newForeignPtr_)
import GHC.Ptr (Ptr(..), castPtr)
#endif

-- CFILES stuff is Hugs only
{-# CFILES cbits/fpstring.c #-}
Expand Down Expand Up @@ -195,6 +202,7 @@ instance IsList ByteString where
#endif

instance IsString ByteString where
{-# INLINE fromString #-}
fromString = packChars

instance Data ByteString where
Expand All @@ -216,7 +224,7 @@ packChars cs = unsafePackLenChars (List.length cs) cs

{-# RULES
"ByteString packChars/packAddress" forall s .
packChars (unpackCString# s) = accursedUnutterablePerformIO (unsafePackAddress s)
packChars (unpackCString# s) = unsafePackLiteral s
#-}

unsafePackLenBytes :: Int -> [Word8] -> ByteString
Expand Down Expand Up @@ -257,14 +265,33 @@ unsafePackLenChars len cs0 =
--
unsafePackAddress :: Addr# -> IO ByteString
unsafePackAddress addr# = do
#if __GLASGOW_HASKELL__ >= 811
return (PS (ForeignPtr addr# FinalPtr) 0 (I# (cstringLength# addr#)))
sjakobi marked this conversation as resolved.
Show resolved Hide resolved
#else
p <- newForeignPtr_ (castPtr cstr)
l <- c_strlen cstr
return $ PS p 0 (fromIntegral l)
where
cstr :: CString
cstr = Ptr addr#
#endif
{-# INLINE unsafePackAddress #-}

-- | See 'unsafePackAddress'. This function has similar behavior. Prefer
-- this function when the address in known to be an @Addr#@ literal. In
-- that context, there is no need for the sequencing guarantees that 'IO'
-- provides. On GHC 8.12 and up, this function uses the @FinalPtr@ data
andrewthad marked this conversation as resolved.
Show resolved Hide resolved
-- constructor for @ForeignPtrContents@.
unsafePackLiteral :: Addr# -> ByteString
unsafePackLiteral addr# =
#if __GLASGOW_HASKELL__ >= 811
PS (ForeignPtr addr# FinalPtr) 0 (I# (cstringLength# addr#))
sjakobi marked this conversation as resolved.
Show resolved Hide resolved
#else
let len = accursedUnutterablePerformIO (c_strlen (Ptr addr#))
in PS (accursedUnutterablePerformIO (newForeignPtr_ (Ptr addr#))) 0 (fromIntegral len)
sjakobi marked this conversation as resolved.
Show resolved Hide resolved
#endif
{-# INLINE unsafePackLiteral #-}


packUptoLenBytes :: Int -> [Word8] -> (ByteString, [Word8])
packUptoLenBytes len xs0 =
Expand Down