Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add PrimArray, Arr class #64

Closed
wants to merge 26 commits into from
Closed

Conversation

@winterland1989
Copy link

@winterland1989 winterland1989 commented Aug 15, 2017

This is the PrimArray patch as @bgamari requested, This patch also offer polymorphric Arr class which unify boxed and unboxed array interface. Please review carefully because there may be bugs ; )

I haven't consider much compatibility issue when i'm doing this, so please add whatever CPP needed to make it work on older GHCs. I have added CPPs and CI is happy now.

I'm planning to make another patch which bring foldr/build fusion to these arrays, please don't cut a new release before that is done. I believe fusion belongs to high level vector package now, so this is the full patch already, please review and merge(if there's no objections) ASAP.

handong added 7 commits Aug 15, 2017
handong
handong
handong
handong
handong
@winterland1989
Copy link
Author

@winterland1989 winterland1989 commented Aug 15, 2017

After reading #63 , i think it's probable a good idea to keep pinning status in reszied array, ideas?

handong added 3 commits Aug 15, 2017
handong
handong
handong
@winterland1989
Copy link
Author

@winterland1989 winterland1989 commented Aug 15, 2017

OK, all build errors are fixed. There should be no problem with older GHC now.

handong
@andrewthad
Copy link
Contributor

@andrewthad andrewthad commented Aug 15, 2017

Small life tip: if you change your ~/.gitignore to have this section:

[user]
    email = winterland1989@gmail.com
    name = John Doe

Then on github, your commits will be attributed to your user instead of being attributed to "handong", who is unknown to github.

@andrewthad
Copy link
Contributor

@andrewthad andrewthad commented Aug 15, 2017

I've put up a PR against your branch fixing minor documentation typos: winterland1989#1

fix minor typos in documentation
@winterland1989
Copy link
Author

@winterland1989 winterland1989 commented Aug 15, 2017

Thanks!

@andrewthad
Copy link
Contributor

@andrewthad andrewthad commented Aug 15, 2017

I dislike the current Eq and Ord definitions for PrimArray. I would prefer to see Eq actually use the Eq constraint on the element type, as I do here. This way, if someone wanted to use PrimArray of numbers whose equality uses modular arithmetic, it would still work (or really any notion of equality that allows structurally different things to be equal).

Also, for very different reasons, I do not like the overlapping instance for Ord. Overlapping instances mess up instance resolution sometimes. I'll try to come up with an example of this.

I would prefer that fast versions of equality and of comparison be provided as functions but that they not be used to attempt to optimize the Eq and Ord typeclass instances. Providing eqPrimArrayBytewise and comparePrimArrayBytewise would be my preference.

Description : Bounded checked boxed and unboxed arrays
Copyright : (c) Winter, 2017
License : BSD3
Maintainer : libraries@haskell.org, drkoster@qq.com

This comment has been minimized.

@cartazio

cartazio Aug 15, 2017
Contributor

who's dr koster?

This comment has been minimized.

@winterland1989

winterland1989 Aug 16, 2017
Author

It's my email.

check :: String -> Bool -> a -> a
check _ True x = x
check errMsg False _ = throw (IndexOutOfBounds $ "Data.Array.Checked." ++ errMsg)
{-# INLINE check #-}

This comment has been minimized.

@cartazio

cartazio Aug 15, 2017
Contributor

for this module, does INLINE pragmas make sense? theres definitely no fusion going on , and to quote @thoughtpolice on the topic, if ghc isn't marking them inlinable when they're small enough, thats a ghc bug ..

This comment has been minimized.

@winterland1989

winterland1989 Aug 16, 2017
Author

I'm not sure because I really need this function inlined to reduce the overhead of bound check.

This comment has been minimized.

@cartazio

cartazio Aug 16, 2017
Contributor

oh, i meant the inlines everywhere, valid forthis one i guess ;)

-- | A typeclass to unify box & unboxed, mutable & immutable array operations.
--
-- Most of these functions simply wrap their primitive counterpart. If there are no primitive ones,
-- the method is polyfilled using other operations to get the same semantics.

This comment has been minimized.

@cartazio

cartazio Aug 15, 2017
Contributor

i dont like the word polyfilled, can we use something like "emulated"?

This comment has been minimized.

@cartazio

cartazio Aug 15, 2017
Contributor

also should this interface also support (when applicable) arrays of unlifted pointer values via ArrayArray?

This comment has been minimized.

@winterland1989

winterland1989 Aug 16, 2017
Author

This interface support ArrayArray , though the newArr for ArrayArray is more unsafe, I should add more doc about this.

This comment has been minimized.

@winterland1989

winterland1989 Aug 16, 2017
Author

Docs added.

This comment has been minimized.

@cartazio

cartazio Aug 16, 2017
Contributor

@winterland1989 my point about the term poly filled still stands right?

This comment has been minimized.

@cartazio

cartazio Aug 17, 2017
Contributor

@winterland1989 can we change polyfilled to emulated with?

-- One exception is that 'shrinkMutableArr' only performs resizing on 'PrimArray' because
-- the RTS only supports shrinking byte arrays. On other array types, 'shrinkMutableArr' will do nothing.
--
class Arr (marr :: * -> * -> *) (arr :: * -> * ) a | arr -> marr, marr -> arr where

This comment has been minimized.

@cartazio

cartazio Aug 15, 2017
Contributor

My immediate concern when seeing this interface is that this doesn't allow the same interface to work For ArrayArray too! Then we can't have a nice interface for arrays of unlifted values!

This comment has been minimized.

@winterland1989

winterland1989 Aug 16, 2017
Author

Can you give a reason why that's the case?

This comment has been minimized.

@cartazio

cartazio Aug 16, 2017
Contributor

i'm thinking we could have, at least on recent ghc, something like
marr :: * -> TYPE r -> * , and arr :: TYPE r -> *, which is like having representation genericity, but in a static / allowed way

@@ -137,6 +141,22 @@ instance PrimUnlifted (MV.MutVar s a) where
toArrayArray# (MV.MutVar mv#) = unsafeCoerce# mv#
fromArrayArray# aa# = MV.MutVar (unsafeCoerce# aa#)

instance PrimUnlifted (MVar a) where
toArrayArray# (MVar mv#) = unsafeCoerce# mv#

This comment has been minimized.

@cartazio

cartazio Aug 15, 2017
Contributor

Wait what. This doesn't seem safe .. could you explain this mapping to me?

This comment has been minimized.

@cartazio

cartazio Aug 15, 2017
Contributor

and all the associated other unsafe coerces too ..

This comment has been minimized.

@andrewthad

andrewthad Aug 15, 2017
Contributor

To my understanding, ArrayArray# can hold any values whose type is of kind TYPE UnliftedRep. This seems safe to me, but it would be nice to have that confirmed by someone else. @bgamari Can you comment on this?

This comment has been minimized.

@winterland1989

winterland1989 Aug 16, 2017
Author

Yes, I think it's safe too, and I'm using this instance in my libuv binding without problems. The newly added instances are really not differ from MutVar , they are unlifted boxed pointers just like ArrayArray.

This comment has been minimized.

@winterland1989

winterland1989 Aug 17, 2017
Author

@bgamari , though i'm pretty sure this works, I think we need your confirm here too.

This comment has been minimized.

@cartazio

cartazio Aug 17, 2017
Contributor

i'm not sure i'm comfortable with this code... this is pretty ... dangerous to use, more so than most
https://ghc.haskell.org/trac/ghc/wiki/Commentary/Rts/Storage/HeapObjects documents some of the representation info, i'm not sure if we can in good consciousness expose that in an modulre that doesn't have Unsafe in the name...

i guess i'm also wondering what we'd use those for?
in some ways it sort of feels like this is a less safe version of the api that the structs lib accomplishes, though @ekmett and @glguy can probably opine on this piece perhaps

This comment has been minimized.

@dolio

dolio Aug 18, 2017
Contributor

If you look at my ArrayArray# wrapping stuff, you'll see the same thing. The primop API is that ArrayArray# stores values of type ArrayArray#, and you unsafeCoerce between that and the type you actually want. I agree it's not a great API, but it's what we have right now.

In the future, maybe there will be an array type that is polymorphic over unlifted types that could be more safe at the primop level. But right now the API I wrote is the safe wrapper around the unsafe API.

This comment has been minimized.

@winterland1989

winterland1989 Aug 19, 2017
Author

@dolio Yes, and i extend your trick to many more unlifted boxed type wrapper, such as IORef/STRef and MVar/TVar, i think @cartazio didn't get this at first, maybe your explain will make him feel ensured.

@andrewthad
Copy link
Contributor

@andrewthad andrewthad commented Aug 15, 2017

The current definition of shrinkMutableArr for boxed arrays is not good. Even though it's documented, it's semantics are totally wrong. The type signature it has been given precludes a sensible implementation. My vote would be to just remove this method from the typeclass since resizeMutableArr can be used to accomplish the same thing.

@winterland1989
Copy link
Author

@winterland1989 winterland1989 commented Aug 16, 2017

@andrewthad I'm agreed that the equal instances is surprising if someone provide a different eq instance for the element. It will not work unless it bit wise equal like all the current primitive type. I will make the change.

As for the shrinking interface, I don't it is a trouble because ghc never guarantee a shrinking on primitive array is successful, it should be used to optimize GC of an array: when you are building array with double allocating strategy, you may end with an array with unused element in the tail. You don't want next GC copy them so you want to shrinking the closure size. But if that is not feasible, we should return the closure as it is, rather than allocate new one and copy. In short words, it's different from resize, and the type of resizeMutableByteArray# reflect that fact.

I think we can open a issue to ask ghc developers to add shrink for other array types, but this interface should stay IMO.

@winterland1989
Copy link
Author

@winterland1989 winterland1989 commented Aug 16, 2017

I opened a ghc ticket for asking shrink primops for other array type here: https://ghc.haskell.org/trac/ghc/ticket/14124

But again, i'm defending that current shrinkArr API is quite sensible.

@winterland1989
Copy link
Author

@winterland1989 winterland1989 commented Aug 16, 2017

Also, for very different reasons, I do not like the overlapping instance for Ord. Overlapping instances mess up instance resolution sometimes. I'll try to come up with an example of this.

@andrewthad I think i need to be convinced this do have a problem before change PrimArray Word8 's Ord instance, because memcmp is much faster than manually loop throught the word8 array, and i really want this OVERLAPPING instance so that my vector package will not slow down. I will even go further and add OVERLAPPING Eq instances for all the built-in primitive type because they're safe to be compared byte-wise.

AFAICT, PrimArray Word8 is strictly more specific than PrimArray a, and according to GHC manual ghc should be able to choose the right instance.

@cartazio
Copy link
Contributor

@cartazio cartazio commented Aug 17, 2017

@andrewthad why cant the class support both lifted and unlifted element arrays for the boxed flavors? granted, this is supposed to be helper tools for higher level libraries :)

@andrewthad
Copy link
Contributor

@andrewthad andrewthad commented Aug 17, 2017

@cartazio Ah, now I see what you're talking about. Yeah, I think that would be an alright small place to try that out.

@cartazio
Copy link
Contributor

@cartazio cartazio commented Aug 17, 2017

@andrewthad indeed, and it doesnt' hurt that it would also be even more generic than just that, i think?

@winterland1989 @andrewthad i almost feel like this code review would be easier if we just did a long email thread back and forth with explicit diffs off a commit on master? idk.

@dolio
Copy link
Contributor

@dolio dolio commented Aug 18, 2017

Speaking of levity polymorphism, I kind of wonder off and on how much of this package will be rendered obsolete by it. Wrapping all the primitive array/reference types becomes less necessary, because the boxed, unlifted kind gains most of the advantages of * that our wrappers are for. And the type safe ArrayArray# wrapper becomes unnecessary, because you can just have a type-parameterized array of unlifted things.

I guess the byte array wrapper still makes sense, and some convenience classes for working with unlifted things can live here. But it might radically alter the package (at least, after we get to a point where versions of GHC without levity polymorphism are considered unworthy of supporting).

winterland1989 and others added 3 commits Aug 18, 2017
inline Eq rules instead of using CPP
handong
handong
@winterland1989
Copy link
Author

@winterland1989 winterland1989 commented Aug 18, 2017

@cartazio @dolio I think the levity polymorphism version may be left to future version, since levity polymorphism is not really happening in base, we don't have to rush for that. Is that OK?

@winterland1989
Copy link
Author

@winterland1989 winterland1989 commented Aug 18, 2017

@cartazio I can make a patch mail to libraries@haskell.org, but this patch is rather large and contains several commits, do i have to rebase then? I think github's code review tools is quite nice, but if you prefer email, i can generate one.

And i still can understand your concern with newly added PrimUnlifted instances, they works just like how old MutVar works, i need this because in one of my project i use an array of MVar, i want to cut the memory overhead.

In fact the absence of IORef/STRef instances is very surprising, people don't use MutVar much AFAICT.

@cartazio
Copy link
Contributor

@cartazio cartazio commented Aug 18, 2017

@winterland1989
Copy link
Author

@winterland1989 winterland1989 commented Aug 18, 2017

I'm talking UnliftedArray (MVar a) which actually store MVar# s. the use case is here: https://github.com/winterland1989/stdio/blob/master/System/IO/UV/Manager.hs#L67

In short words: I need an array of MVar to block the thread doing read/write. and after read/write finish i unblock them by index.

@cartazio
Copy link
Contributor

@cartazio cartazio commented Aug 18, 2017

@andrewthad
Copy link
Contributor

@andrewthad andrewthad commented Aug 18, 2017

@cartazio the unsafe coerce is still needed in GHC 8.2. To do what you're describing, we would need a primitive like this:

writeArrayArrayUnlifted# :: forall (a :: TYPE UnliftedRep). MutableArrayArray# s -> Int# -> a -> State# s -> State# s

Such a function could replace writeArrayArray#, writeByteArrayArray#, writeMutableByteArrayArray#. However, we don't have a primitive that does this, even in 8.2.

@winterland1989
Copy link
Author

@winterland1989 winterland1989 commented Aug 19, 2017

@cartazio It's irrelevant to levity polymorphism , the trick's idea is that ArrayArray# is an unlifted boxed type, which MutVar#, MVar# .. can be cast into and from without runtime danger. The unsafeCoerce# bit is doing the type cast, not kind.

@winterland1989
Copy link
Author

@winterland1989 winterland1989 commented Aug 22, 2017

@cartazio I wrote a blog post about the unlifted array stuff: http://winterland.me/2017/08/18/an%20unified%20array%20interface/ . I hope that can clear your concern about my extension to PrimUnlifted.

BTW, this patch is ready, please consider merging if no other problems.

@cartazio
Copy link
Contributor

@cartazio cartazio commented Aug 22, 2017

@@ -18,5 +19,6 @@ void hsprimitive_memset_Float (HsFloat *, ptrdiff_t, size_t, HsFloat);
void hsprimitive_memset_Double (HsDouble *, ptrdiff_t, size_t, HsDouble);
void hsprimitive_memset_Char (HsChar *, ptrdiff_t, size_t, HsChar);

int hsprimitive_is_byte_array_pinned(void* p);

This comment has been minimized.

@bgamari

bgamari Sep 8, 2017
Contributor

The question of why we aren't simply using the primop deserves a comment.

c_is_byte_array_pinned :: ByteArray# -> CInt

foreign import ccall unsafe "hsprimitive_is_byte_array_pinned"
c_is_mutable_byte_array_pinned :: MutableByteArray# s -> CInt

This comment has been minimized.

@bgamari

bgamari Sep 8, 2017
Contributor

We should really use the primop whenever possible. Afterall, the C version may break with future GHC releases.

This comment has been minimized.

@winterland1989

winterland1989 Sep 9, 2017
Author

Yes, you're right. I've add CPPs to use primops where possible.

@dolio
Copy link
Contributor

@dolio dolio commented Sep 25, 2017

Sorry, I've been a bit distracted lately. I'm going to look through this a bit closer in preparation for merging (also see what conflicts need to be solved; probably nothing major, so I can maybe do it myself, without needing you to rebase).

@winterland1989
Copy link
Author

@winterland1989 winterland1989 commented Sep 25, 2017

@dolio Thanks, if need anything please let me know.

-- Most of these functions simply wrap their primitive counterpart. If there are no primitive ones,
-- the method is emulated using other operations to get the same semantics.
--
class Arr (marr :: * -> * -> *) (arr :: * -> * ) a | arr -> marr, marr -> arr where

This comment has been minimized.

@treeowl

treeowl Mar 20, 2018
Contributor

I don't understand why everything is lumped in one class. I think it makes more sense to have three:

class MutArr (marr :: * -> * -> *) a where
  newArr :: ...
  newArrWith :: ...
  readArr :: ...
  writeArr :: ...
  setArr :: ...
  copyMutableArr :: ...
  moveArr :: ...
  cloneMutableArr :: ...
  sameMutableArr :: ...
  sizeofMutableArr :: ...
  shrinkMutableArr :: ...
  resizeMutableArr :: ...

class ImmutArr (arr :: * -> *) a where
  indexArr# :: arr a -> Int -> (# a #)
  -- Leave indexArr and indexArrM out of the class as they can easily be implemented
  -- as functions given indexArr#; going the other way is a good bit harder for humans
  -- and optimizers.

  -- cloneArr strikes me as a bit fishy. What's its story?
  cloneArr :: ...
  sameArr :: ...
  sizeofArr :: ...

class (ImmutArr arr a, MutArr marr a) => Arr marr arr a | arr -> marr, marr -> arr where
  freezeArr :: ...
  unsafeFreezeArr :: ...
  thawArr :: ...
  unsafeThawArr :: ...
  copyArr :: ...
LT ->
let !d = s2 - s1
!s2l = s2 + l
go !i | i >= s2l = return ()

This comment has been minimized.

@treeowl

treeowl Mar 20, 2018
Contributor

Use when or unless to avoid writing out return () branches.

Copy link
Contributor

@treeowl treeowl left a comment

Very much apart from everything else, this pull request is very large. Lots of new functionality. I suspect it might be better to break it up. One of the big potential advantages of adding a class for basic array access is that we might be able to share a lot of code between Array and SmallArray, so their many instances can look like

instance Functor SmallArray where
  fmap = fmapArray

where fmapArray looks something like

fmapArray :: (ImmutableArray ary a, Arr mary ary b) => (a -> b) -> ary a -> ary b

{-# RULES "eqPrimArray/Addr" eqPrimArray = memcmpPrimArray (undefined :: Addr) #-}
{-# RULES "eqPrimArray/Ptr" eqPrimArray = memcmpPrimArray (undefined :: Ptr a) #-}
{-# RULES "eqPrimArray/FunPtr" eqPrimArray = memcmpPrimArray (undefined :: FunPtr a) #-}

This comment has been minimized.

@treeowl

treeowl Mar 20, 2018
Contributor

Ugh... Wouldn't it be better to use a class for eqPrimArray? Rules are fragile, and require user involvement, and so on. It's not great. Separately, passing undefined is rather old-fashioned. Use a proxy type instead.

@andrewthad andrewthad mentioned this pull request Mar 23, 2018
@cartazio
Copy link
Contributor

@cartazio cartazio commented Apr 3, 2018

i need to revisit this PR and review it properly

@andrewthad
Copy link
Contributor

@andrewthad andrewthad commented Apr 4, 2018

Currently, #106 implements just the PrimArray part of this PR. The other main parts of this are:

  1. A typeclass unifying the interface to arrays
  2. A (slower) bounds-checking interface to array that can easily be switched in while debugging
  3. Function for checking if byte array is pinned
  4. Prefetching for byte arrays
  5. Additional PrimUnlifted instances

Some of these have been addressed by recent PRs. I suspect that, at this point, it may be best to break pieces off of this PR into other PRs. This is especially important for typeclass that unifies the interface to arrays, since there's some bikeshedding possible there.

@chessai
Copy link
Member

@chessai chessai commented Feb 29, 2020

After reading this over, I much prefer the API defined in contiguous: https://hackage.haskell.org/package/contiguous

Maybe others can chime in here.

@chessai
Copy link
Member

@chessai chessai commented Feb 29, 2020

Also, the PrimArray changes have already happened. I'm referring specifically to the Arr typeclass.

@andrewthad
Copy link
Contributor

@andrewthad andrewthad commented Jun 4, 2020

The PrimArray type has been added to primitive, and this PR has bitrotted in the last three years. I've going to close this, but if anyone wants to discuss adding an Arr typeclass to primitive, open an new issue to discuss this.

@andrewthad andrewthad closed this Jun 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

7 participants
You can’t perform that action at this time.