Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate unsafe functions into their own Unsafe modules #361

Closed
lehins opened this issue Jan 18, 2021 · 13 comments
Closed

Migrate unsafe functions into their own Unsafe modules #361

lehins opened this issue Jan 18, 2021 · 13 comments
Labels

Comments

@lehins
Copy link
Contributor

lehins commented Jan 18, 2021

This is a proposal and hasn't been set in stone yet. Please feel free to comment your opinion on the matter in this ticket.

Prelude: It has been requested many times that we provide access to all constructors from vector package. Giving access to those constructors allows violation of many invariants and hence is unsafe. Quote from #357:

Decide what is best way to expose constructors for vectors. It have been requested numerous times: #245, #343, #171, #63 (tangentially related), #49. It's of course totally unsafe. Still it's very useful to have on occasion

The best approach to tackle this is to create special Unsafe modules and export such constructors form there. This also spawned a discussion that it would make sense to migrate all of the of the unsafe functions to such modules (eg. unsafeIndex, unsafeRead, etc.), in order to provide a clear boundary between safe and dangerous functionality, which can have different qualified imports in the user land.

Doing such migration of unsafe functionality is worthwhile in a long run, but immediate problem is that it will cause a lot of breakage. Therefore it would have to be done in a few stages across couple of major releases.

  1. Next minor release vector-0.12.2.0 would be adding these modules with all of the unsafe functions exported from there:
module Data.Vector.Unsafe
module Data.Vector.Generic.Unsafe
module Data.Vector.Primitive.Unsafe
module Data.Vector.Storable.Unsafe
module Data.Vector.Unboxed.Unsafe
  1. Next major release vector-0.13, would continue to export all unsafe function from the Unsafe modules and their current locations, but latter ones would get DEPRECATED pragma added to them. This way such change will be visible to all the users
  2. The following major release vector-0.14 would have those unsafe functions removed from their current locations, but only if such release happens at least a year after vector-0.13 that deprecated unsafe functions.

Other relevant points to discuss:

  1. Do we stretch out the deprecation timeline over a longer period of time (eg. announce in vector-0.13, deprecated in vector-0.14 and remove in vector-0.15) It will take a couple of years.
  2. Do we also provide Unsafe modules for Bundle?
  3. Do we have separate Unsafe modules for Mutable vs immutable. My opinion is that such separation is not needed, since there aren't any name clashes of top of my head. And if there are any they can be mitigated by renaming, since this change is additive.
  4. Should we bother with monomorphic unsafe exports or instead only export Generic unsafe functions?
  5. Is the whole thing really worth it?
@lehins lehins mentioned this issue Jan 18, 2021
6 tasks
@Shimuuar
Copy link
Contributor

  1. Deprecation will likely involve CLC and libraries mailing list. And it'll be discussed there to death. I'd prefer longer timeline. Things seems to work out for now so there's no urgency. 0.13 will be first major release in 4 years. Even deprecation warnings on code which worked perfectly for years will raise complaints. It's probably better to let people prepare

  2. Is it used anywhere except in vector? To me it looks like as de facto internal API of vector

  3. If names clashes aren't a problem then it's better to go with single module. 15 modules is a lot, 20 — way too many

  4. On second thought. We'll need to have modules for each vector type in order to export constructor.

  5. If we decide to export constructors we need modules to export them from. Unsafe modules follws from that naturally If we don't we'll need to decide how to expose vector's representation

@Bodigrim
Copy link
Contributor

My experience with bytestring is that even much more modest breaking changes encounter a huge resistance of the ecosystem. I'm certainly in favor of a longer timeline.

Changes of such magnitude deserve a wider community discussion and CLC involvement. I suggest sending to libraries mail list an RFC with a disclaimer "Discussion is taking place in vector issue tracker, replies to this email are not necessarily monitored by maintaners".

@lehins
Copy link
Contributor Author

lehins commented Jan 18, 2021

I also try to avoid breaking changes as much as I can, but this ones in my opinion are justified. I totally support as wide discussion on this issue as possible, so naturally CLC and libraries list are a must. I personally prefer to stay away from writing emails whenever I can, so maybe one of you @Shimuuar or @Bodigrim could kick off this discussion on the appropriate channels? it would be awesome if you could 😉

Ok, so a proposed high level timeline:

  1. vector-0.13.x addition of new location with duplicate re-exports and notes in haddock about future plans. In other words, no breakage.
  2. couple years later new version with addition of deprecation pragma
  3. after another few years removal of deprecated functions

The most important question right now we need to answer is: Will this restructure happen or not? If the answer is no, then we can close the ticket and get on with our lives. On the other hand if the answer is yes, then we can slowly work out the gory details over time, because the first stage is additive and non-breaking so there is really no rush on this. However, there is nothing wrong with discussing those gory details now. Who knows, someone might come up with real blockers.

@Shimuuar
Copy link
Contributor

I compiled list of unsafe API for mutable vector so we can understand what we're getting into. And we have quite a bit of unsafe APIs. I didn't include Vector/MVector type classes since we have to export them anyway.

Pure vectors

  1. Indexing: unsafeIndex. unsafeHead, unsafeLast, unsafeIndexM. unsafeHeadM, unsafeLastM,
  2. Slices: unsafeSlice, unsafeTail, unsafeTake, unsafeDrop
  3. Bulk update unsafeUpd, unsafeUpdate, unsafeUpdate_, unsafeBackpermute
  4. Freezing/thawing unsafeFreeze, unsafeThaw, unsafeCopy
  5. Storable vectors have their own unsafe functions: unsafeFromForeignPtr, unsafeFromForeignPtr0, unsafeToForeignPtr, unsafeToForeignPtr0, unsafeWith

Mutable vectors

Unsafe = "not memory safe"

  1. Slices: unsafeSlice, unsafeInit, unsafeTail, unsafeTake, unsafeDrop
  2. Indexing: unsafeRead, unsafeWrite, unsafeModify, unsafeSwap, unsafeExchange`
  3. Copying: unsafeCopy, unsafeMove
  4. Most likely anything that writes to mutable vector from stream

Unsafe = "doesn't zero-out memory"

  1. unsafeNew, unsafeGrow, unsafeGrowFront

Constructors

  • Boxed vectors. There're functions to convert to/from underlying Array. It should cover most of use cases but there could be some cases when direct access to underlying buffer is needed
  • Storable vectors: unsafeFromForeignPtr/unsafeToForeignPtr gives isomorphism to the constuctor
  • Primitive vectors: constructor is exported
  • Unboxed vectors: all constructors which are exported from Data.Vector.Unboxed.Base which is not documented in haddocks. (I didn't even knew that constructors are accessible before!) But notable we don't have safe API for working with unboxed representation. There's no good way to write unzipFoo :: Vector (Foo a) → (Vector Int, Vector a) for data Foo a = Foo Int a

Discussion

We have a lot of unsafe APIs. I think it makes perfect sense to move most of them into Unsafe modules. We'll still need to export Vector/MVector type classes and functions for working with pointers for storable vectors.

@Bodigrim
Copy link
Contributor

The most important question right now we need to answer is: Will this restructure happen or not?

I do not feel motivated enough. Not that I doubt technical merits of separating safe from unsafe, but given my experience with bytestring-0.11 and our shared experience with random-1.2, even minor breaking changes are extremely difficult to push forward. Spending years chasing outdated packages is not what I wish to spend my life on.

I'm fine with either exposing constructors from existing modules, or not exposing them at all (I'm happy to unsafeCoerce through). This is magnitudes less work than the proposed restructuring.

@lehins
Copy link
Contributor Author

lehins commented Jan 31, 2021

@Shimuuar thank you for compiling this list, it's very helpful.

@Bodigrim breakage is ok, but only if it is worth it, if in the end we will have a better and safer API.

Let me give it a try and give you some motivation. Recently I was digging through older versions of vector and noticed that vector-0.9.x had .Safe suffix modules that were the inverted versions of what we are trying to do here, which I assumed where motivated by Safe Haskell that didn't really take off. Too bad we no longer have the old issue tracker that was prior to github, but I managed to find the related discussion on libraries mailing list. And guess what, almost a decade ago the exact same thing was proposed by Johan Tibel Responses to that proposal were quite positive. So, safe modules where removed, but the only reason Unsafe modules weren't added because Roman didn't want to do the work manually and instead wanted a generative solution. I have to say, I don't like when code is being generated for me, because it is much harder to reason about it and even harder for new contributors to understand it. I hate seeing all this CPP in vector and especially primitive packages. I really wish Roman did go through with that suggestion, this way we would not be discussing it now.

Considering that this is a recurring proposal, I do think it is worth doing it. Also, it is the right thing to do. @Bodigrim if you personally don't feel motivated to do the work, that's totally fine, no one will be asking you to do anything about it. But if you believe that the proposal is actually the wrong thing to do, then can you please elaborate on why.

Core packages like bytestring, random, vector, text, etc. serve as examples of how APIs should be designed in Haskell. It only makes sense that they actually live up to their name of core packages an provide a worthy example. @Bodigrim I realize that you got some push back on changes you made in new version of bytestring, but I want to really thank you for all that work, because in the end we got a better library. There will always be push back, because there are people who hate changes. But without changes it is impossible tot improve.

PS. Honestly, I am more scared about breaking changes that adjust semantics of existing functions (like the issue we discussing about maximumBy #362) vs changes that result in a compile error.

@gksato
Copy link
Contributor

gksato commented Feb 4, 2021

As a random user, I'm completely happy about having Safe modules, whether it is the current modules or it gets the name .Safe. I use Haskell for Codeforces, a competitive programming website, and they configure Haskell Safe and won't fix that.

@Bodigrim
Copy link
Contributor

Bodigrim commented Feb 8, 2021

@gksato I don't think we'll be able to mark vector modules as {-# LANGUAGE Safe #-}, because primitive package is not Safe Haskell.

@lehins I agree that this restructuring has technical merits. But it is not clear to me that it is worth breaking every other package: things like unsafeIndex are extremely widespread. This is a question for a wider community and CLC. (Asking for feedback is an exausting business, I know). If they approve the proposed changes, I will not oppose their implemention - and actually will try to help with :)

@gksato
Copy link
Contributor

gksato commented Feb 9, 2021

@Bodigrim I know. Sorry, I just want them to be TrustWorthy. I've been using Rust lately and I was confused about the terminology. By the way, if my memory is correct, the package base had a TrustWorthy module Control.Monad.ST.Safe a little ago. Did it have a wrong name? Now I get it; I should've said safe, not Safe. This also contributed to my confusion.

@lehins
Copy link
Contributor Author

lehins commented Feb 9, 2021

@Bodigrim, as @gksato pointed out all of the safe modules would have to be marked Trustworthy, instead of Safe, but that change would be very far away, because it does require removal of all unsafe functions from the safe interface.

things like unsafeIndex are extremely widespread.

The initial change would not introduce any breakage at all, so that is not that scary, but further down the road it would require a little bit of work in order to fix compatibility, but the change is minimal: namely adding a new import whenever unsafe functions are being used.

This is a question for a wider community and CLC

No doubt about it. But before we even raise this question to the broader audience I want to make sure all maintainers are in favor if this change.

it is not clear to me that it is worth breaking every other package

I do not want to waste my time on trying to convince others if I got no support from co-maintainers, if we are not all on board with this then we can just close this ticket and forget about it.

@Bodigrim
Copy link
Contributor

To be honest, I deem that Safe Haskell failed to deliver any tangible benefit. The amount of packages, which can be marked as Safe, is extremely limited, and Trustworthy is nothing else than unsafeCoerce Safe and is appropriate only for code automatically generated from a formally proved source (e. g., Agda).

No sizeable real-world application can afford limiting its dependencies to Safe only. Thus from my perspective exporting unsafeIndex from D.V.Unsafe does not buy us much: such import cannot be prohibited in an automatic and verifiable way (unless client code is marked as Safe, which is exceedingly rare), and does not make things easier for a human reviewer: unsafeIndex already bears in its name that it is unsafe, not much point to exaggerate this again in a module name.

I think there are two approaches, which have better value/breakage ratio:

  • We can copy a safe interface into a separate package vector-safe. This way clients can declare that they depend on this package only, and GHC would prohibit them from using unsafeIndex, because it is simply not there. This is a non-breaking change.

  • We can rename all unsafe functions and constructors to reflect this fact. E. g., D.V.P.Vector = UnsafeVector !Int !Int !ByteArray. This way it is easy for humans to track that an unsafe stuff is going on. This is a breaking change, but of a much smaller scale than proposed above.

@Shimuuar
Copy link
Contributor

Safe I think is lost cause. It never saw significant adoption and I'm not sure that API that is not unsafe is really safe. I mean that there're no bugs that compromise memory safety etc. Most important thing I believe is to make sure that it's not possible to use unsafe API by accident.

When it comes to constructors unboxed vectors are worst offenders. There're 2 constructors for each type. Curiously mutable module exports them and immutable doesn't (likely by accident). We also could do deprecation for constructors using pattern synonyms: rename constructor and add depreated pattern synonym with old name.

Creating separate vector-safe seems like a lot of work with little benefit. We could add *.Safe modules in vector as well.

@lehins
Copy link
Contributor Author

lehins commented Mar 1, 2021

Looking at the pushback at the maintainer level I am inclined to stop pursuing this split and leave things as they are: safe+unsafe living together in harmony.

It seems thought that we all at least agree that:

  • Safe Haskell is not important enough to even worry about.
  • We do need to export constructors, because there is demand for it.

Considering my suggestion didn't work, I let you guys decide what you wanna do about exporting unsafe constructors.

Here is my response to your comments:

We could add *.Safe modules in vector as well

Let's not go that route. It has already been tried and people were against it (see pre vector-0.9.1 versions)

Creating separate vector-safe seems like a lot of work with little benefit

Totally agree, it is not worth it. Moreover, I think safe API should be the default and unsafe to be opt-in, not the other way around. But it doesn't look like others share my concerns, so I am not gonna try to promote this ideology any further.

No sizeable real-world application can afford limiting its dependencies to Safe only

Completely disagree on this one. There are plenty of sizable projects that I am personally acquainted with that only use a small subset of safe function from vector package. Also, what is more important? Couple sizeable projects or thousands small ones?

Closing this ticket as a "won't fix". I am sure someone else in the future will revive this conversation once more, maybe then the outcome will be different ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants