Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

byteArrayToByteString, and byteStringToByteArray? #186

Open
chessai opened this issue Oct 9, 2019 · 4 comments · Fixed by #317
Open

byteArrayToByteString, and byteStringToByteArray? #186

chessai opened this issue Oct 9, 2019 · 4 comments · Fixed by #317
Milestone

Comments

@chessai
Copy link
Member

chessai commented Oct 9, 2019

i find myself using the following more often now:

unsafeByteArrayToByteString :: ByteArray -> ByteString
unsafeByteArrayToByteString (ByteArray b#) =
  let addr# = byteArrayContents# b#
       fp = ForeignPtr addr# (PlainPtr (unsafeCoerce# b#))
       len = I# (sizeofByteArray# b#)
  in PS fp 0 len

byteArrayToByteString :: ByteArray -> ByteString
byteArrayToByteString b@(ByteArray b#)
  | isTrue# (isByteArrayPinned# b#) = unsafeByteArrayToByteString b
  | otherwise = runST $ do
      let len = sizeofByteArray b
      marr@(MutableByteArray marr#) <- newPinnedByteArray len
      copyByteArray marr 0 b 0 len
      let addr# = byteArrayContents# (unsafeCoerce# marr#)
      let fp = ForeignPtr addr# (PlainPtr (unsafeCoerce# marr#))
      pure (PS fp 0 len)

byteStringToByteArray :: ByteString -> ByteArray
byteStringToByteArray (PS fp off len) =
  accursedUnutterablePerformIO $
    withForeignPtr fp $ \ptr -> do
      marr <- newByteArray len
      copyPtrToMutableByteArray (ptr `plusPtr` off) marr 0 len
      unsafeFreezeByteArray marr

is it possible that these could be added somewhere? i work with ByteArray a lot more than i do bytestring, but there are places where ByteString is necessary because of other apis (aeson, binary, etc.)

@hvr
Copy link
Member

hvr commented Dec 19, 2019

Isn't this basically the conversion to/from ShortByteString? If I understand it correctly, ByteArray is the same as

data ShortByteString = SBS ByteArray#

(slightly related, in the past I also used http://hackage.haskell.org/package/bytestring-plain-0.1.0.2/docs/Data-ByteString-Plain.html before there was ShortByteString which had a different cost-model, but since ShortByteString I almost always use that one -- probably in the very use-cases you seem to prefer ByteArray )

@Bodigrim
Copy link
Contributor

Bodigrim commented Oct 7, 2020

I think @hvr is right, one can employ fromShort / toShort.

import Data.ByteString.Short.Internal (ShortByteString(..), fromShort, toShort)

byteArrayToByteString :: ByteArray -> ByteString
byteArrayToByteString (ByteArray b#) = fromShort (SBS b#)

byteStringToByteArray :: ByteString -> ByteArray
byteStringToByteArray bs = let SBS b# = toShort bs in ByteArray b#

@chessai
Copy link
Member Author

chessai commented Apr 11, 2023

Revisiting this, I believe it is possible to do this without copying, if the ForeignPtr allows it. Some example code:

{-# language MagicHash, UnboxedTuples #-}

import Data.ByteString.Internal (ByteString(..))
import Data.ByteString.Short (ShortByteString(..), toShort)
import GHC.ForeignPtr (ForeignPtr(..), ForeignPtrContents(..))
import GHC.Exts
  ( MutableByteArray#, ByteArray#, Addr#, Int(..), RealWorld,
    mutableByteArrayContents#, eqAddr#, isTrue#, runRW#, unsafeFreezeByteArray#,
    getSizeofMutableByteArray#, (==#)
  )
import Data.Primitive.ByteArray(ByteArray(..))

import qualified Data.ByteString as BS
import qualified Data.List as List
import qualified Data.Char as Char
import Data.Word (Word8)
import GHC.Exts (toList)

byteStringToByteArray :: ByteString -> ByteArray
byteStringToByteArray b@(BS (ForeignPtr addr# contents) (I# len#)) = case contents of
  MallocPtr marr# _ -> ByteArray (go marr# addr#)
  PlainPtr marr# -> ByteArray (go marr# addr#)
  _ -> ByteArray (byteStringToByteArrayCopy b)

  where
    go :: MutableByteArray# RealWorld -> Addr# -> ByteArray#
    go marr# addr# =
      let marrAddr# = mutableByteArrayContents# marr#
      in if isTrue# (eqAddr# addr# marrAddr#)
         then runRW#
                (\s0 -> case getSizeofMutableByteArray# marr# s0 of
                  (# s1, marrLen# #) ->
                    if isTrue# (marrLen# ==# len#)
                    then case unsafeFreezeByteArray# marr# s1 of
                           (# _, r #) -> r
                    else byteStringToByteArrayCopy b
                )
         else byteStringToByteArrayCopy b

    byteStringToByteArrayCopy :: ByteString -> ByteArray#
    byteStringToByteArrayCopy s = case toShort s of { SBS arr -> arr; }

main :: IO ()
main = do
  let b = BS.pack [0x20..0x7e]
  let r = byteStringToByteArray b
  
  let byteToChar byte = Char.chr (fromIntegral @Word8 @Int byte)
  let bChars = List.map byteToChar (BS.unpack b)
  let rChars = List.map byteToChar (toList r)
  
  print (bChars == rChars)

This can just return a ShortByteString instead if ByteString doesn't want to concern itself with lifted ByteArray.

@chessai chessai reopened this Apr 11, 2023
@clyring
Copy link
Member

clyring commented Apr 11, 2023

#547 does some similar stuff specifically to better support IO operations. As discussed there, I have serious reservations about the idea because of its surprising interaction with compact regions.

But there are several tricky things about it to get right about the implementation, and I suppose it is a rather fundamental operation. Perhaps a version of it can live in Data.ByteString.Unsafe? Feel free to open a pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants