Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add strict vector #488

Merged
merged 20 commits into from
Apr 14, 2024
Merged

Add strict vector #488

merged 20 commits into from
Apr 14, 2024

Conversation

Shimuuar
Copy link
Contributor

Fixes #483 and #380

Implementation follows discussion in issues above and virtually identical to one in strict-containers. Only difference fromArray and unsafeFromArraySlice now force all elements of returned vector to WHNF. It changes complexity to O(n) so to compensate lazyFromArray and unsafeLazyFromArraySlice are added, they retain O(1) complexity but user has to deal with strictness.

This is draft since there's still some work to do with documentation.

  1. We need to decide to prefix to use in examples. V. is used by lazy vector, VS by storable. One option
  2. I think it would be better to switch examples in generic vector to strict vector. It's better default data structure
  3. It's worth making a new release soon after this PR is merged, so I think I should just slap @since 0.13.2.0 on each function in Strict modules

Also

  • Missing LICENSE file for vector-bench-papi
  • Small cleanup in test suite

This is essentially a copy of lazy boxed vector with
 - MVector.basicUnsafeWrite
 - MVector.basicUnsafeReplicate
 - Vector.elemseq
are made strict

Tests are copy of lazy vector's tests as well
fromArray reduces all elements of array to WHNF but that changes complexity from
O(1) to O(n). To compensate we provide lazy variant which pass all
responsibility to user
Strict vectors are generally better default
@Shimuuar Shimuuar marked this pull request as ready for review March 22, 2024 18:34
@Shimuuar
Copy link
Contributor Author

At this point PR is pretty much ready

@Bodigrim
Copy link
Contributor

@infinity0 could you possibly review?

@Shimuuar
Copy link
Contributor Author

Basically it's same as version from strict-containers. Only differences are strict fromArray & fromMutableArray, added lazyFromArray and haddocks.

I also switched examples in D.V.Generic to strict vectors but that's purely documentation change

Copy link
Contributor

@Bodigrim Bodigrim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!

vector/src/Data/Vector/Strict/Mutable.hs Outdated Show resolved Hide resolved


-- | Boxed vectors, supporting efficient slicing.
data Vector a = Vector {-# UNPACK #-} !Int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it be a newtype over lazy boxed vectors? I imagine it would save us some instance definitions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should be possible. Probably most importantly for cheap conversion between strict and lazy vectors.

But we should be careful with instances. Anything that produces vectors should be implemented in terms of operations over strict vectors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth to employ newtypes if possible, to demonstrate clearly that it is the same data structure just with different instances.

type role MVector nominal representational

-- | Mutable boxed vectors keyed on the monad they live in ('IO' or @'ST' s@).
data MVector s a = MVector { _offset :: {-# UNPACK #-} !Int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question: could this be a newtype over lazy boxed MVector?

Shimuuar and others added 7 commits March 24, 2024 00:31
Co-authored-by: ˌbodʲɪˈɡrʲim <andrew.lelechenko@gmail.com>
This is done over @Bodigrim's suggestion. This allowed to drop quite bit of code
However we can't derive Eq & Ord since DerivingStrategies are
only available since 8.2
It seems cabal decided on plan with deepseq<1.4.3
@Shimuuar
Copy link
Contributor Author

@Bodigrim switching to a newtype worked out quite well and allowed to drop a lot of duplicated code. It doesn't work all that well for instances since DerivingStrategies are only available since 8.2. So it's not possible to instruct GHC to derive say Ord.

I also added function for conversion between strict and lazy vectors.

-- of holding any Haskell value). It is possible to create vector
-- which contain bottom elements, either by using mutable interfaces
-- (see "Data.Vector.Strict.Mutable") or functions that don't preserve
-- strictness ('lazyFromArray').

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the use case for this? Just seems like a potential source of bugs and/or making existing bugs harder to find. If somebody sees this strict type they are going to assume it doesn't contain bottom elements, but this design seems to suggest otherwise.

IMO a type that is advertised as "strict" should have no possible external (user) code path that can lead it to containing non-WHNF elements, even if it's internally implemented using lazy structures.

Copy link

@infinity0 infinity0 Mar 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they retain O(1) complexity but user has to deal with strictness.

I don't think this is a good use-case, if the user selects a strict data type they should be aware that there will be potential performance penalties for certain operations, we shouldn't expose a non-strict operation for them to "regain performance", they should instead just pick the lazy type.

edit: If a strict struct has to be passed around a bunch of different libraries, any one of them could make a mistake making this trade-off decision, and it will be a massive pain to track down the culprit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On one hand it's one of many unsafe functions where all responsibility on maintaining invariants is passed to the caller. So it seems that it would be in a good company.

However here invariant is "all elements are in WHNF" And it's known to be rather tricky to maintain. And even worse it's difficult to check and invariant violations could be invisible when all test look fine and then it fail on prod due to space leak.

So I think it would be better to drop these functions and if someone really need these it could be discussed.

These functions require user to maintain invariant that all element of vector is
in WHNF. And it's well known that this insvariant is difficult to enforce and
its violations are usually invisible.

Such spec leaks may be invisibe during testing only to cause pproblem in
production. So it's better to remove them unless good use case for them appears
@Shimuuar
Copy link
Contributor Author

I removed function that don't enforce strictness: lazyFromLazy, etc.

@@ -23,11 +23,9 @@
-- Immutable strict boxed vectors (that is, polymorphic arrays capable
-- of holding any Haskell value). It is possible to create vector
-- which contain bottom elements, either by using mutable interfaces

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can drop "either" here now, also might be good to clarify the vectors that might contain bottom elements are the original data type, not this strict type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reworded module comment in order to make it more clear

@Shimuuar
Copy link
Contributor Author

Shimuuar commented Apr 8, 2024

I'm going to merge this PR on next weekend if there are no objections

Copy link
Contributor

@lehins lehins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I skimmed through the PR. Looks like it is just more of the same as regular lazy vector. I looked at all the entry points where lazy values could enter into the strict vector and it looks like they are all covered.

So, no objections from me.

@Shimuuar Shimuuar merged commit 219b33f into haskell:master Apr 14, 2024
22 checks passed
@Shimuuar Shimuuar deleted the strict-vector branch April 14, 2024 16:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add strict boxed vectors
4 participants