Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Int vs Natural, overflow, and type safety #121

Open
mniip opened this issue May 18, 2023 · 3 comments
Open

Int vs Natural, overflow, and type safety #121

mniip opened this issue May 18, 2023 · 3 comments

Comments

@mniip
Copy link

mniip commented May 18, 2023

This package uses KnownNat whenever it is necessary to pass length information via types. A KnownNat is backed by a Natural, an algebraic datatype with two constructors, which in general cannot be unboxed (unless the simplifier is certain which constructor it is, which is rare). On the other hand Vector stores its length in an Int, and so at all interfaces we have to convert between Natural and Int. Both passing Natural to functions, and having them convert to/from Int incurs a performance penalty.

Worse yet, the conversions are lossy. fromIntegral :: Natural -> Int will silently truncate the higher bits, leaving you with a sized vector whose underlying unsized vector doesn't match its type:

> Data.Vector.Sized.replicate @0x10000000000000000 ()
Vector []

Needless to say this breaks invariants of many functions. Worse yet, the package exports a way to access the reverse conversion from Int to Natural to KnownNat:

> Data.Vector.Sized.knownLength' (Data.Vector.Sized.replicate @0x10000000000000000 ()) GHC.TypeNats.natVal
0

knownLength' uses unsafeCoerce under the hood, which in this case is actually unsafe. It generates an instance dictionary for KnownNat 0x10000000000000000, but populates it with the natural 0 instead. This incoherent dictionary breaks type safety, as can be observed via Typeable or simply sameNat:

> Just r = Data.Vector.Sized.knownLength' (Data.Vector.Sized.replicate @0x10000000000000000 ()) (GHC.TypeNats.sameNat @0 Proxy)
> :t r
r :: 0 :~: 18446744073709551616
> r
Refl

With a little type family you can construct an unsafeCoerce.

As a quick and dirty workaround, consider using fromEnum :: Natural -> Int which will at least error if the argument is out of bounds (this behavior doesn't seem consistent with e.g. fromEnum @Integer). You will have to migrate from the legacy GHC.TypeLits api that uses Integer (btw incurring additional Natural<->Integer conversions), to the new GHC.TypeNats (GHC 8.2+) that uses Natural.

As a more long term solution, over at https://github.com/mniip/finite-typelits/tree/integral I've been experimenting with an extension to Finite that would allow the underlying type to be a different type other than Integer, possibly Int. I've quickly run into essentially all the same issues, and the way the API is currently shaping to be is that an instance dictionary for a "known Int n" can only be solved if n <= maxBound @Int. This would ensure invariants of things like Vector or an Int-based Finite are satisfied, and reifying an Int back into a dictionary remains type-safe.

A challenge there is that much like having to prove KnownNat (n + m) one would have to prove n + m <= maxBound, except this time the constraint might not actually hold. However with the way Vector is using KnownNat this seems like a non-issue: if all places where the vector's underlying length is taken from KnownNat are annotated with the inequality, it should be impossible to construct a vector whose underlying length has overflowed, because the components from which the vector is constructed will need to have size just under maxBound, which should be impossible due to address space limitations. concatMap would be at risk if it weren't using size-unaware concat under the hood.

I don't know how the use-cases of vector-sized would fare with such inequalities sprinkled about, and would love to hear alternative suggestions.

@kozross
Copy link
Collaborator

kozross commented Jun 9, 2023

@mniip: To be honest, I'd rather move to TypeNats anyway: the only reason we didn't were backcompat concerns, which I no longer have. I like the idea of using something like your integral branch: how close are you to a working solution? Efficiency of representation, and added safety, are two things I would really like to have.

@mniip
Copy link
Author

mniip commented Jun 11, 2023

I believe the branch is in a working state, with a testsuite and all, but I'm wondering if the provided interface is usable.

@mniip
Copy link
Author

mniip commented Apr 7, 2024

@kozross finite-typelits 0.2.0.0 is on hackage and has the features I alluded to above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants