New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add PrimArray, Arr class #64

Open
wants to merge 26 commits into
base: master
from

Conversation

Projects
None yet
6 participants
@winterland1989

winterland1989 commented Aug 15, 2017

This is the PrimArray patch as @bgamari requested, This patch also offer polymorphric Arr class which unify boxed and unboxed array interface. Please review carefully because there may be bugs ; )

I haven't consider much compatibility issue when i'm doing this, so please add whatever CPP needed to make it work on older GHCs. I have added CPPs and CI is happy now.

I'm planning to make another patch which bring foldr/build fusion to these arrays, please don't cut a new release before that is done. I believe fusion belongs to high level vector package now, so this is the full patch already, please review and merge(if there's no objections) ASAP.

@winterland1989

This comment has been minimized.

winterland1989 commented Aug 15, 2017

After reading #63 , i think it's probable a good idea to keep pinning status in reszied array, ideas?

@winterland1989

This comment has been minimized.

winterland1989 commented Aug 15, 2017

OK, all build errors are fixed. There should be no problem with older GHC now.

@andrewthad

This comment has been minimized.

Collaborator

andrewthad commented Aug 15, 2017

Small life tip: if you change your ~/.gitignore to have this section:

[user]
    email = winterland1989@gmail.com
    name = John Doe

Then on github, your commits will be attributed to your user instead of being attributed to "handong", who is unknown to github.

@andrewthad

This comment has been minimized.

Collaborator

andrewthad commented Aug 15, 2017

I've put up a PR against your branch fixing minor documentation typos: winterland1989#1

Merge pull request #1 from andrewthad/prim_array
fix minor typos in documentation
@winterland1989

This comment has been minimized.

winterland1989 commented Aug 15, 2017

Thanks!

@andrewthad

This comment has been minimized.

Collaborator

andrewthad commented Aug 15, 2017

I dislike the current Eq and Ord definitions for PrimArray. I would prefer to see Eq actually use the Eq constraint on the element type, as I do here. This way, if someone wanted to use PrimArray of numbers whose equality uses modular arithmetic, it would still work (or really any notion of equality that allows structurally different things to be equal).

Also, for very different reasons, I do not like the overlapping instance for Ord. Overlapping instances mess up instance resolution sometimes. I'll try to come up with an example of this.

I would prefer that fast versions of equality and of comparison be provided as functions but that they not be used to attempt to optimize the Eq and Ord typeclass instances. Providing eqPrimArrayBytewise and comparePrimArrayBytewise would be my preference.

Description : Bounded checked boxed and unboxed arrays
Copyright : (c) Winter, 2017
License : BSD3
Maintainer : libraries@haskell.org, drkoster@qq.com

This comment has been minimized.

@cartazio

cartazio Aug 15, 2017

Member

who's dr koster?

This comment has been minimized.

@winterland1989
check :: String -> Bool -> a -> a
check _ True x = x
check errMsg False _ = throw (IndexOutOfBounds $ "Data.Array.Checked." ++ errMsg)
{-# INLINE check #-}

This comment has been minimized.

@cartazio

cartazio Aug 15, 2017

Member

for this module, does INLINE pragmas make sense? theres definitely no fusion going on , and to quote @thoughtpolice on the topic, if ghc isn't marking them inlinable when they're small enough, thats a ghc bug ..

This comment has been minimized.

@winterland1989

winterland1989 Aug 16, 2017

I'm not sure because I really need this function inlined to reduce the overhead of bound check.

This comment has been minimized.

@cartazio

cartazio Aug 16, 2017

Member

oh, i meant the inlines everywhere, valid forthis one i guess ;)

-- | A typeclass to unify box & unboxed, mutable & immutable array operations.
--
-- Most of these functions simply wrap their primitive counterpart. If there are no primitive ones,
-- the method is polyfilled using other operations to get the same semantics.

This comment has been minimized.

@cartazio

cartazio Aug 15, 2017

Member

i dont like the word polyfilled, can we use something like "emulated"?

This comment has been minimized.

@cartazio

cartazio Aug 15, 2017

Member

also should this interface also support (when applicable) arrays of unlifted pointer values via ArrayArray?

This comment has been minimized.

@winterland1989

winterland1989 Aug 16, 2017

This interface support ArrayArray , though the newArr for ArrayArray is more unsafe, I should add more doc about this.

This comment has been minimized.

@winterland1989

This comment has been minimized.

@cartazio

cartazio Aug 16, 2017

Member

@winterland1989 my point about the term poly filled still stands right?

This comment has been minimized.

@cartazio

cartazio Aug 17, 2017

Member

@winterland1989 can we change polyfilled to emulated with?

-- One exception is that 'shrinkMutableArr' only performs resizing on 'PrimArray' because
-- the RTS only supports shrinking byte arrays. On other array types, 'shrinkMutableArr' will do nothing.
--
class Arr (marr :: * -> * -> *) (arr :: * -> * ) a | arr -> marr, marr -> arr where

This comment has been minimized.

@cartazio

cartazio Aug 15, 2017

Member

My immediate concern when seeing this interface is that this doesn't allow the same interface to work For ArrayArray too! Then we can't have a nice interface for arrays of unlifted values!

This comment has been minimized.

@winterland1989

winterland1989 Aug 16, 2017

Can you give a reason why that's the case?

This comment has been minimized.

@cartazio

cartazio Aug 16, 2017

Member

i'm thinking we could have, at least on recent ghc, something like
marr :: * -> TYPE r -> * , and arr :: TYPE r -> *, which is like having representation genericity, but in a static / allowed way

@@ -137,6 +141,22 @@ instance PrimUnlifted (MV.MutVar s a) where
toArrayArray# (MV.MutVar mv#) = unsafeCoerce# mv#
fromArrayArray# aa# = MV.MutVar (unsafeCoerce# aa#)
instance PrimUnlifted (MVar a) where
toArrayArray# (MVar mv#) = unsafeCoerce# mv#

This comment has been minimized.

@cartazio

cartazio Aug 15, 2017

Member

Wait what. This doesn't seem safe .. could you explain this mapping to me?

This comment has been minimized.

@cartazio

cartazio Aug 15, 2017

Member

and all the associated other unsafe coerces too ..

This comment has been minimized.

@andrewthad

andrewthad Aug 15, 2017

Collaborator

To my understanding, ArrayArray# can hold any values whose type is of kind TYPE UnliftedRep. This seems safe to me, but it would be nice to have that confirmed by someone else. @bgamari Can you comment on this?

This comment has been minimized.

@winterland1989

winterland1989 Aug 16, 2017

Yes, I think it's safe too, and I'm using this instance in my libuv binding without problems. The newly added instances are really not differ from MutVar , they are unlifted boxed pointers just like ArrayArray.

This comment has been minimized.

@winterland1989

winterland1989 Aug 17, 2017

@bgamari , though i'm pretty sure this works, I think we need your confirm here too.

This comment has been minimized.

@cartazio

cartazio Aug 17, 2017

Member

i'm not sure i'm comfortable with this code... this is pretty ... dangerous to use, more so than most
https://ghc.haskell.org/trac/ghc/wiki/Commentary/Rts/Storage/HeapObjects documents some of the representation info, i'm not sure if we can in good consciousness expose that in an modulre that doesn't have Unsafe in the name...

i guess i'm also wondering what we'd use those for?
in some ways it sort of feels like this is a less safe version of the api that the structs lib accomplishes, though @ekmett and @glguy can probably opine on this piece perhaps

This comment has been minimized.

@dolio

dolio Aug 18, 2017

Contributor

If you look at my ArrayArray# wrapping stuff, you'll see the same thing. The primop API is that ArrayArray# stores values of type ArrayArray#, and you unsafeCoerce between that and the type you actually want. I agree it's not a great API, but it's what we have right now.

In the future, maybe there will be an array type that is polymorphic over unlifted types that could be more safe at the primop level. But right now the API I wrote is the safe wrapper around the unsafe API.

This comment has been minimized.

@winterland1989

winterland1989 Aug 19, 2017

@dolio Yes, and i extend your trick to many more unlifted boxed type wrapper, such as IORef/STRef and MVar/TVar, i think @cartazio didn't get this at first, maybe your explain will make him feel ensured.

@andrewthad

This comment has been minimized.

Collaborator

andrewthad commented Aug 15, 2017

The current definition of shrinkMutableArr for boxed arrays is not good. Even though it's documented, it's semantics are totally wrong. The type signature it has been given precludes a sensible implementation. My vote would be to just remove this method from the typeclass since resizeMutableArr can be used to accomplish the same thing.

@winterland1989

This comment has been minimized.

winterland1989 commented Aug 16, 2017

@andrewthad I'm agreed that the equal instances is surprising if someone provide a different eq instance for the element. It will not work unless it bit wise equal like all the current primitive type. I will make the change.

As for the shrinking interface, I don't it is a trouble because ghc never guarantee a shrinking on primitive array is successful, it should be used to optimize GC of an array: when you are building array with double allocating strategy, you may end with an array with unused element in the tail. You don't want next GC copy them so you want to shrinking the closure size. But if that is not feasible, we should return the closure as it is, rather than allocate new one and copy. In short words, it's different from resize, and the type of resizeMutableByteArray# reflect that fact.

I think we can open a issue to ask ghc developers to add shrink for other array types, but this interface should stay IMO.

@winterland1989

This comment has been minimized.

winterland1989 commented Aug 16, 2017

I opened a ghc ticket for asking shrink primops for other array type here: https://ghc.haskell.org/trac/ghc/ticket/14124

But again, i'm defending that current shrinkArr API is quite sensible.

@winterland1989

This comment has been minimized.

winterland1989 commented Aug 16, 2017

Also, for very different reasons, I do not like the overlapping instance for Ord. Overlapping instances mess up instance resolution sometimes. I'll try to come up with an example of this.

@andrewthad I think i need to be convinced this do have a problem before change PrimArray Word8 's Ord instance, because memcmp is much faster than manually loop throught the word8 array, and i really want this OVERLAPPING instance so that my vector package will not slow down. I will even go further and add OVERLAPPING Eq instances for all the built-in primitive type because they're safe to be compared byte-wise.

AFAICT, PrimArray Word8 is strictly more specific than PrimArray a, and according to GHC manual ghc should be able to choose the right instance.

@andrewthad

This comment has been minimized.

Collaborator

andrewthad commented Aug 17, 2017

I've PRed a change to [inline the Eq rules]. To me, it has three benefits:

  • more readable
  • bytewise equality only declared once (instead of once per type)
  • you don't have to worry about whatever it was with cpphs that concerned you
@andrewthad

This comment has been minimized.

Collaborator

andrewthad commented Aug 17, 2017

The earlier conversation is burried a little higher in this PR's history, but I want to say that I'm opposed to making Arr take levity polymorphic arrays, ie marr :: * -> TYPE r -> *. We would have to change the Prim typeclass to make this usable. It's not totally to me what the effect on end-user error messages would be.

@cartazio

added some more prodding and comments

@@ -51,6 +51,7 @@ install:
$HOME/.cabal/packages/hackage.haskell.org/00-index.tar;
fi
- travis_retry cabal update -v
- cabal install cpphs

This comment has been minimized.

@cartazio

cartazio Aug 17, 2017

Member

wait, why do we need cpphs to test this?

This comment has been minimized.

@winterland1989

winterland1989 Aug 18, 2017

It's for the old eq specialize rules, i have updated to use @andrewthad 's solution now, so this's not needed.

This comment has been minimized.

@dolio

dolio Sep 25, 2017

Contributor

If this isn't needed anymore, can we remove it? :)

Ghc-Options: -O2 -Wall
Build-tools: cpphs >= 1.19
Ghc-Options: -O2 -Wall -pgmP cpphs -optP --cpp

This comment has been minimized.

@cartazio

cartazio Aug 17, 2017

Member

why do we need cpphs?

where
go !i !siz | i >= siz = True
| otherwise = indexPrimArray paA i == indexPrimArray paB i && go (i+1) siz

This comment has been minimized.

@cartazio

cartazio Aug 17, 2017

Member

do we have any tests for making sure specialization happens if we look at the compiler verbose output?

This comment has been minimized.

@winterland1989

winterland1989 Aug 18, 2017

I have read the core before, but haven't got a test yet. Built-in primitive types should not be handled by this code, there're rules to rewrite the loop into memcmp.

isTrue# (sameMutableByteArray# (unsafeCoerce# ba1#) (unsafeCoerce# ba2#))
--------------------------------------------------------------------------------

This comment has been minimized.

@cartazio

cartazio Aug 17, 2017

Member

lest I forget, we can actually construct analogous prefetch operations for boxed arrays (lifted and unlifted flavors both), using the availaable prefetch operations

This comment has been minimized.

@winterland1989

winterland1989 Aug 18, 2017

how to do so? by unsafeCoerce# the Array# into ByteArray# ? I'm not sure if this is sensible. let's leave it to future optimizations.

This comment has been minimized.

@cartazio

cartazio Apr 3, 2018

Member

i'll revisit this after i do a proper code review

@@ -137,6 +141,22 @@ instance PrimUnlifted (MV.MutVar s a) where
toArrayArray# (MV.MutVar mv#) = unsafeCoerce# mv#
fromArrayArray# aa# = MV.MutVar (unsafeCoerce# aa#)
instance PrimUnlifted (MVar a) where
toArrayArray# (MVar mv#) = unsafeCoerce# mv#

This comment has been minimized.

@cartazio

cartazio Aug 17, 2017

Member

i'm not sure i'm comfortable with this code... this is pretty ... dangerous to use, more so than most
https://ghc.haskell.org/trac/ghc/wiki/Commentary/Rts/Storage/HeapObjects documents some of the representation info, i'm not sure if we can in good consciousness expose that in an modulre that doesn't have Unsafe in the name...

i guess i'm also wondering what we'd use those for?
in some ways it sort of feels like this is a less safe version of the api that the structs lib accomplishes, though @ekmett and @glguy can probably opine on this piece perhaps

@cartazio

This comment has been minimized.

Member

cartazio commented Aug 17, 2017

@andrewthad why cant the class support both lifted and unlifted element arrays for the boxed flavors? granted, this is supposed to be helper tools for higher level libraries :)

@andrewthad

This comment has been minimized.

Collaborator

andrewthad commented Aug 17, 2017

@cartazio Ah, now I see what you're talking about. Yeah, I think that would be an alright small place to try that out.

@cartazio

This comment has been minimized.

Member

cartazio commented Aug 17, 2017

@andrewthad indeed, and it doesnt' hurt that it would also be even more generic than just that, i think?

@winterland1989 @andrewthad i almost feel like this code review would be easier if we just did a long email thread back and forth with explicit diffs off a commit on master? idk.

@dolio

This comment has been minimized.

Contributor

dolio commented Aug 18, 2017

Speaking of levity polymorphism, I kind of wonder off and on how much of this package will be rendered obsolete by it. Wrapping all the primitive array/reference types becomes less necessary, because the boxed, unlifted kind gains most of the advantages of * that our wrappers are for. And the type safe ArrayArray# wrapper becomes unnecessary, because you can just have a type-parameterized array of unlifted things.

I guess the byte array wrapper still makes sense, and some convenience classes for working with unlifted things can live here. But it might radically alter the package (at least, after we get to a point where versions of GHC without levity polymorphism are considered unworthy of supporting).

@winterland1989

This comment has been minimized.

winterland1989 commented Aug 18, 2017

@cartazio @dolio I think the levity polymorphism version may be left to future version, since levity polymorphism is not really happening in base, we don't have to rush for that. Is that OK?

@winterland1989

This comment has been minimized.

winterland1989 commented Aug 18, 2017

@cartazio I can make a patch mail to libraries@haskell.org, but this patch is rather large and contains several commits, do i have to rebase then? I think github's code review tools is quite nice, but if you prefer email, i can generate one.

And i still can understand your concern with newly added PrimUnlifted instances, they works just like how old MutVar works, i need this because in one of my project i use an array of MVar, i want to cut the memory overhead.

In fact the absence of IORef/STRef instances is very surprising, people don't use MutVar much AFAICT.

@cartazio

This comment has been minimized.

Member

cartazio commented Aug 18, 2017

@winterland1989

This comment has been minimized.

winterland1989 commented Aug 18, 2017

I'm talking UnliftedArray (MVar a) which actually store MVar# s. the use case is here: https://github.com/winterland1989/stdio/blob/master/System/IO/UV/Manager.hs#L67

In short words: I need an array of MVar to block the thread doing read/write. and after read/write finish i unblock them by index.

@cartazio

This comment has been minimized.

Member

cartazio commented Aug 18, 2017

@andrewthad

This comment has been minimized.

Collaborator

andrewthad commented Aug 18, 2017

@cartazio the unsafe coerce is still needed in GHC 8.2. To do what you're describing, we would need a primitive like this:

writeArrayArrayUnlifted# :: forall (a :: TYPE UnliftedRep). MutableArrayArray# s -> Int# -> a -> State# s -> State# s

Such a function could replace writeArrayArray#, writeByteArrayArray#, writeMutableByteArrayArray#. However, we don't have a primitive that does this, even in 8.2.

@winterland1989

This comment has been minimized.

winterland1989 commented Aug 19, 2017

@cartazio It's irrelevant to levity polymorphism , the trick's idea is that ArrayArray# is an unlifted boxed type, which MutVar#, MVar# .. can be cast into and from without runtime danger. The unsafeCoerce# bit is doing the type cast, not kind.

@winterland1989

This comment has been minimized.

winterland1989 commented Aug 22, 2017

@cartazio I wrote a blog post about the unlifted array stuff: http://winterland.me/2017/08/18/an%20unified%20array%20interface/ . I hope that can clear your concern about my extension to PrimUnlifted.

BTW, this patch is ready, please consider merging if no other problems.

@cartazio

This comment has been minimized.

Member

cartazio commented Aug 22, 2017

@@ -18,5 +19,6 @@ void hsprimitive_memset_Float (HsFloat *, ptrdiff_t, size_t, HsFloat);
void hsprimitive_memset_Double (HsDouble *, ptrdiff_t, size_t, HsDouble);
void hsprimitive_memset_Char (HsChar *, ptrdiff_t, size_t, HsChar);
int hsprimitive_is_byte_array_pinned(void* p);

This comment has been minimized.

@bgamari

bgamari Sep 8, 2017

Contributor

The question of why we aren't simply using the primop deserves a comment.

c_is_byte_array_pinned :: ByteArray# -> CInt
foreign import ccall unsafe "hsprimitive_is_byte_array_pinned"
c_is_mutable_byte_array_pinned :: MutableByteArray# s -> CInt

This comment has been minimized.

@bgamari

bgamari Sep 8, 2017

Contributor

We should really use the primop whenever possible. Afterall, the C version may break with future GHC releases.

This comment has been minimized.

@winterland1989

winterland1989 Sep 9, 2017

Yes, you're right. I've add CPPs to use primops where possible.

@dolio

This comment has been minimized.

Contributor

dolio commented Sep 25, 2017

Sorry, I've been a bit distracted lately. I'm going to look through this a bit closer in preparation for merging (also see what conflicts need to be solved; probably nothing major, so I can maybe do it myself, without needing you to rebase).

@winterland1989

This comment has been minimized.

winterland1989 commented Sep 25, 2017

@dolio Thanks, if need anything please let me know.

-- Most of these functions simply wrap their primitive counterpart. If there are no primitive ones,
-- the method is emulated using other operations to get the same semantics.
--
class Arr (marr :: * -> * -> *) (arr :: * -> * ) a | arr -> marr, marr -> arr where

This comment has been minimized.

@treeowl

treeowl Mar 20, 2018

Member

I don't understand why everything is lumped in one class. I think it makes more sense to have three:

class MutArr (marr :: * -> * -> *) a where
  newArr :: ...
  newArrWith :: ...
  readArr :: ...
  writeArr :: ...
  setArr :: ...
  copyMutableArr :: ...
  moveArr :: ...
  cloneMutableArr :: ...
  sameMutableArr :: ...
  sizeofMutableArr :: ...
  shrinkMutableArr :: ...
  resizeMutableArr :: ...

class ImmutArr (arr :: * -> *) a where
  indexArr# :: arr a -> Int -> (# a #)
  -- Leave indexArr and indexArrM out of the class as they can easily be implemented
  -- as functions given indexArr#; going the other way is a good bit harder for humans
  -- and optimizers.

  -- cloneArr strikes me as a bit fishy. What's its story?
  cloneArr :: ...
  sameArr :: ...
  sizeofArr :: ...

class (ImmutArr arr a, MutArr marr a) => Arr marr arr a | arr -> marr, marr -> arr where
  freezeArr :: ...
  unsafeFreezeArr :: ...
  thawArr :: ...
  unsafeThawArr :: ...
  copyArr :: ...
LT ->
let !d = s2 - s1
!s2l = s2 + l
go !i | i >= s2l = return ()

This comment has been minimized.

@treeowl

treeowl Mar 20, 2018

Member

Use when or unless to avoid writing out return () branches.

@treeowl

Very much apart from everything else, this pull request is very large. Lots of new functionality. I suspect it might be better to break it up. One of the big potential advantages of adding a class for basic array access is that we might be able to share a lot of code between Array and SmallArray, so their many instances can look like

instance Functor SmallArray where
  fmap = fmapArray

where fmapArray looks something like

fmapArray :: (ImmutableArray ary a, Arr mary ary b) => (a -> b) -> ary a -> ary b
{-# RULES "eqPrimArray/Addr" eqPrimArray = memcmpPrimArray (undefined :: Addr) #-}
{-# RULES "eqPrimArray/Ptr" eqPrimArray = memcmpPrimArray (undefined :: Ptr a) #-}
{-# RULES "eqPrimArray/FunPtr" eqPrimArray = memcmpPrimArray (undefined :: FunPtr a) #-}

This comment has been minimized.

@treeowl

treeowl Mar 20, 2018

Member

Ugh... Wouldn't it be better to use a class for eqPrimArray? Rules are fragile, and require user involvement, and so on. It's not great. Separately, passing undefined is rather old-fashioned. Use a proxy type instead.

@andrewthad andrewthad referenced this pull request Mar 23, 2018

Merged

Implement PrimArray #106

@cartazio

This comment has been minimized.

Member

cartazio commented Apr 3, 2018

i need to revisit this PR and review it properly

@andrewthad

This comment has been minimized.

Collaborator

andrewthad commented Apr 4, 2018

Currently, #106 implements just the PrimArray part of this PR. The other main parts of this are:

  1. A typeclass unifying the interface to arrays
  2. A (slower) bounds-checking interface to array that can easily be switched in while debugging
  3. Function for checking if byte array is pinned
  4. Prefetching for byte arrays
  5. Additional PrimUnlifted instances

Some of these have been addressed by recent PRs. I suspect that, at this point, it may be best to break pieces off of this PR into other PRs. This is especially important for typeclass that unifies the interface to arrays, since there's some bikeshedding possible there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment