Fix the rewrite rule that optimizes packing of string literals #1

Merged
merged 3 commits into from Mar 5, 2013

4 participants

@parcs

The description is in the commit message.

@parcs parcs Fix the rewrite rule that optimizes packing of string literals
This patch addresses two issues:

1) The particular rule was guarded under the wrong CPP conditional

   #if !defined(__GLASGOW_HASKELL__)
   ...
   #endif

   so GHC never even knew about it!

2) The rule didn't apply to ByteStrings that were implicitly packed
   via OverloadedStrings.

So now the RULE is properly guarded and applied earlier (or later,
depending how you look at it). Specifically, the rule changed from

  forall s.
  pack      (unpackCString# s) = inlinePerfomIO (unsafePackAddress s)
  -- defined in Data.ByteString.Char8

to

  forall s.
  packChars (unpackCString# s) = inlinePerfomIO (unsafePackAddress s)
  -- defined in Data.ByteString.Internal

To achieve this, the definition of 'unsafePackAddress' had to be
moved from the .Unsafe module to the .Internal module.
794c345
@bos
Haskell member

Why didn't you change this into simply a re-export of unsafePackAddress from D.B.Internal? There's surely no need to have the same function appear twice with two different names.

Good point. I'll update the patch.

@tibbe tibbe commented on the diff Mar 2, 2013
Data/ByteString/Internal.hs
@@ -226,6 +230,15 @@ packBytes ws = unsafePackLenBytes (List.length ws) ws
packChars :: [Char] -> ByteString
packChars cs = unsafePackLenChars (List.length cs) cs
+#if defined(__GLASGOW_HASKELL__)
+{-# INLINE [0] packChars #-}
@tibbe
Haskell member
tibbe added a note Mar 2, 2013

Won't this also affect non-bytestring users of packChars, as rules are exported unconditionally? Perhaps this rule should actually go in base.

@parcs
parcs added a note Mar 2, 2013

I'm not sure what you mean by non-bytestring users of packChars. Can you explain?

@tibbe
Haskell member
tibbe added a note Mar 5, 2013

I was confused. Please ignore!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@tibbe
Haskell member

Thanks for the patch. Could you give a small example where this works. Perhaps you could also elaborate on how inlining works with OverloadedStrings (e.g. by giving an example how OverloadedStrings results in fromString ... etc. and talk about how the inlining proceeds.)

@parcs

This patch addresses the first issue that Simom M pointed out here: http://hackage.haskell.org/trac/ghc/ticket/5218#comment:3

For example, as of current bytestring, this simple module has to pack its ByteString literals from [Char]s at runtime:

{-# LANGUAGE OverloadedStrings #-}
module Test where
import Data.ByteString.Char8 (ByteString, pack)

test1 = pack "hello"

-- uses OverloadedStrings
test2 = "world" :: ByteString

You can confirm this by compiling the module with -O2 -ddump-simpl -ddump-rule-firings -ddump-rule-rewrites -ddump-inlinings. Notice that no bytestring-specific rules fire, and, in the end, both test1 and test2 are defined in terms of unsafePackLenChars :: Int -> [Char] -> ByteString:

Inlining done: base:Data.String.fromString{v 030} [gid[ClassOp]]
Inlining done:
    bytestring-0.10.0.2:Data.ByteString.Internal.$fIsStringByteString{v riL} [gid[DFunId(nt)]]
Inlining done:
    bytestring-0.10.0.2:Data.ByteString.Internal.packChars{v r3E} [gid]
...
Inlining done:
    bytestring-0.10.0.2:Data.ByteString.Char8.pack{v r5w} [gid]
Inlining done:
    bytestring-0.10.0.2:Data.ByteString.Internal.packChars{v r3E} [gid]
...

==================== Tidy Core ====================
Test.test1 :: Data.ByteString.Internal.ByteString
[GblId,
 Str=DmdType,
 Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=0, Value=False,
         ConLike=False, WorkFree=False, Expandable=False,
         Guidance=IF_ARGS [] 30 0}]
Test.test1 =
  Data.ByteString.Internal.unsafePackLenChars
    Test.test3 Test.test1_cs

Test.test2 :: Data.ByteString.Internal.ByteString
[GblId,
 Str=DmdType,
 Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=0, Value=False,
         ConLike=False, WorkFree=False, Expandable=False,
         Guidance=IF_ARGS [] 30 0}]
Test.test2 =
  Data.ByteString.Internal.unsafePackLenChars
    Test.test4 Test.test2_cs

This is because the relevant rule was incorrectly guarded by CPP and so GHC never knew about it. But even if the rule was in scope, it would only apply to test1 = pack "hello" since the rule referred to D.B.C8.pack, whereas fromString refers to D.B.I.packChars:

pack = packChars
...
#if !defined(__GLASGOW_HASKELL__)
{-# RULES
"ByteString pack/packAddress" forall s .
  pack(unpackCString# s) = inlinePerformIO (unsafePackAddress s)
#-}
#endif
...
instance IsString ByteString where
    fromString = packChars

With the patch applied, the rule that optimizes ByteString constructon can now get fired because 1) it is actually in scope! and 2) the rule refers to packChars instead of pack. The compiler output looks like this:

Inlining done: Data.String.fromString
Inlining done: Data.ByteString.Internal.$fIsStringByteString
...
Rule fired
    Rule: ByteString packChars/packAddress
    Before: Data.ByteString.Internal.packChars
              (GHC.CString.unpackCString# "world"#)
    After:  (\ (s_amV [Occ=Once] :: GHC.Prim.Addr#) ->
               Data.ByteString.Internal.inlinePerformIO
                 @ Data.ByteString.Internal.ByteString
                 (Data.ByteString.Internal.unsafePackAddr s_amV))
              "world"#
    Cont:   Stop[ArgCtxt False] Data.ByteString.Internal.ByteString
...
Inlining done: Data.ByteString.Char8.pack
...
Rule fired
    Rule: ByteString packChars/packAddress
    Before: Data.ByteString.Internal.packChars
              (GHC.CString.unpackCString# "hello"#)
    After:  (\ (s_amV [Occ=Once] :: GHC.Prim.Addr#) ->
               Data.ByteString.Internal.inlinePerformIO
                 @ Data.ByteString.Internal.ByteString
                 (Data.ByteString.Internal.unsafePackAddr s_amV))
              "hello"#
    Cont:   Stop[ArgCtxt False] Data.ByteString.Internal.ByteString

==================== Tidy Core ====================

Test.test1 :: Data.ByteString.Internal.ByteString
[GblId,
 Str=DmdType,
 Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=0, Value=False,
         ConLike=False, WorkFree=False, Expandable=False,
         Guidance=IF_ARGS [] 103 50}]
Test.test1 =
  case GHC.Prim.newMutVar#
         @ (GHC.ForeignPtr.Finalizers, [GHC.Types.IO ()])
         @ GHC.Prim.RealWorld
         GHC.ForeignPtr.mallocForeignPtr3
         GHC.Prim.realWorld#
  of _ { (# ipv_a1ef, ipv1_a1eg #) ->
  let {
    addr#_ana :: GHC.Prim.Addr#
    [LclId, Str=DmdType]
    addr#_ana = "hello"# } in
...
  }
  }

Test.test2 :: Data.ByteString.Internal.ByteString
[GblId,
 Str=DmdType,
 Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=0, Value=False,
         ConLike=False, WorkFree=False, Expandable=False,
         Guidance=IF_ARGS [] 103 50}]
Test.test2 =
  case GHC.Prim.newMutVar#
         @ (GHC.ForeignPtr.Finalizers, [GHC.Types.IO ()])
         @ GHC.Prim.RealWorld
         GHC.ForeignPtr.mallocForeignPtr3
         GHC.Prim.realWorld#
  of _ { (# ipv_a1ef, ipv1_a1eg #) ->
  let {
    addr#_ana :: GHC.Prim.Addr#
    [LclId, Str=DmdType]
    addr#_ana = "world"# } in
...
  }
  }

So ByteString literals (OverloadedStrings or not) are now efficiently built from statically-allocated Addr#s, as was originally intended.

@parcs parcs Move D.B.Unsafe.unsafePackAddr to D.B.Internal
But continue to export it from D.B.Unsafe
3a42bb1
@gregorycollins gregorycollins and 1 other commented on an outdated diff Mar 2, 2013
Data/ByteString/Internal.hs
@@ -32,6 +32,7 @@ module Data.ByteString.Internal (
packChars, packUptoLenChars, unsafePackLenChars,
unpackBytes, unpackAppendBytesLazy, unpackAppendBytesStrict,
unpackChars, unpackAppendCharsLazy, unpackAppendCharsStrict,
+ unsafePackAddress,
@gregorycollins
Haskell member

This function wasn't exported before, does it need to be exported now?

Also, needs an #ifdef __GLASGOW_HASKELL__.

@parcs
parcs added a note Mar 2, 2013

unsafePackAddress was moved from D.B.Unsafe to D.B.Internal so that it could be in scope for when the packChars rule is defined, without resorting to orphan rules or orphan instances. To maintain backwards-compatability, the function remains exported by D.B.Unsafe.

I will fix the ifdef issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@tibbe tibbe merged commit efe48cd into haskell:master Mar 5, 2013
@dcoutts
Haskell member

Note that we still have the problem that the rewritten version gives a different answer.

In particular, it fails for strings with embedded '\0' and will give different results for unicode strings. For unicode strings, I think the rewritten version will produce a bytestring with utf8 encoding, and for pack it will truncate the top bits of the Chars.

Probably what we need is to have two versions of pack, one for Word8 and then have the Char version do a utf8 encoding.

@parcs

In particular, it fails for strings with embedded '\0' and will give different results for unicode strings. For unicode strings, I think the rewritten version will produce a bytestring with utf8 encoding, and for pack it will truncate the top bits of the Chars.

Hmm, the rule doesn't fire for unicode strings either, from what I see.

@tibbe
Haskell member

Doesn't the Char8 module say that it's specifically non-Unicode?

@tibbe tibbe commented on the diff Mar 5, 2013
Data/ByteString/Internal.hs
@@ -112,6 +115,8 @@ import Data.Generics (Data(..), mkNorepType)
#ifdef __GLASGOW_HASKELL__
import GHC.Base (realWorld#,unsafeChr)
+import GHC.CString (unpackCString#)
@tibbe
Haskell member
tibbe added a note Mar 5, 2013

This fails on GHC 7.0.4 with:

Data/ByteString/Internal.hs:118:8:
    Could not find module `GHC.CString':
      Use -v to see a list of the files searched for.

Perhaps unpackCString# was defined elsewhere before then.

@parcs
parcs added a note Mar 5, 2013

Looks like it was previously defined in GHC.Base.

@tibbe
Haskell member
tibbe added a note Mar 6, 2013

Fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment