Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add HasField instances for tuples to allow tuple-indexing #143

Closed
BinderDavid opened this issue Mar 7, 2023 · 19 comments
Closed

Add HasField instances for tuples to allow tuple-indexing #143

BinderDavid opened this issue Mar 7, 2023 · 19 comments
Labels
dormant Hibernated by proposer or committee

Comments

@BinderDavid
Copy link

TL;DR

Add instances of the following kind to base and make them available via the Prelude:

instance HasField "_1" (a1,a2,a3) a1 where
  getField (x,_,_) = x

instance HasField "_2" (a1,a2,a3) a2 where
  getField (_,x,_) = x

instance HasField "_3" (a1,a2,a3) a3 where
  getField (_,_,x) = x

In order to allow access to tuple elements using record dot syntax like this:

(true, "hello", 42)._1 == true
(true, "hello", 42)._2 == "hello"
(true, "hello", 42)._3 == 42

Motivation

Accessing elements of tuples other than 2-tuples is currently cumbersome, since the Prelude defines fst and snd functions only for 2-tuples. We have to pattern match on n-tuples explicitly in order to access the elements of the tuple. Other languages provide some form of tuple-indexing, which makes these n-tuples much more convenient. With the OverloadedRecordDot we can use this newly available mechanism to enable convenient tuple indexing as well.

What about another name for the field?

In an ideal world we could use the syntax (true, "hello", 42).2 without the underscore. This is, for example, the syntax that Rust uses for tuple-indexing: Rust by Example. As far as I can tell, this doesn't currently work since this expression cannot be parsed. If we want this syntax instead, then the parser/lexer would have to be changed, and this couldn't be a CLC proposal but would have to be turned into a GHC proposal.

What about Lens / Optics

Both the Control.Lens.Tuple and the Data.Tuple.Optics modules provide accessors using the same naming scheme.
With both Lens and optics you can use the syntax (1,2) ^. _1.

I think using the Lens/Optics libraries is not a proper solution to the basic usability problem of tuples outlined above.
I just want to make it easier to access fields in a tuple, not use a big library to permit abstraction over access into nested data structures which requires additional imports and dependencies. If these instances are exported in the Prelude, then I never have to add any imports, and can just use the record dot syntax (if I have OverloadedRecordDot enabled). Record dot syntax is also much more newcomer friendly. If we do want to use the full expressive power of Lens/Optics, then this proposal actually provides an easier onboarding ramp, since the Lens and Optics libraries use the same names for tuple fields. The HasField instances and the Lens/Optics accessors can also peacefully coexist in the same codebase.

Can I try it out?

Yes, you can add a dependency of my prototype implementation tuple-fields on Github

What about Solo?

For consistency reasons, Solo should also get its instance:

instance HasField "_1" (Solo a) a where
  getField (Solo x) = x

Should instances for all n-tuples be provided?

Currently the largest tuples are 62 tuples. In https://github.com/BinderDavid/tuple-fields/blob/main/src/Data/Tuple/Fields.hs I added all instances for all tuples. The Lens and Optics libraries don't support field access for all these tuples. I guess that the reason might be due to excessive compilation time? Adding all instances would be the more consistent choice, but I don't know the details of how that interacts with the complexity of instance search.

What about unboxed tuples?

Adding the corresponding instances for unboxed tuples is currently not possible.
The following instance:

instance HasField "_1" (# a1,a2,a3 #) a1 where
  getField (# x,_,_ #) = x

currently results in the following error:

Main.hs:17:24: error: [GHC-83865]
     Couldn't match a lifted type with an unlifted type
      Expected kind ‘*’,
        but (# a1, a2, a3 #) has kind TYPE
                                           (TupleRep [LiftedRep, LiftedRep, LiftedRep])
     In the second argument of HasField, namely (# a1, a2, a3 #)
      In the instance declaration for HasField "_1" (# a1, a2, a3 #) a1
   |
17 | instance HasField "_1" (# a1,a2,a3 #) a1 where
   |                        ^^^^^^^^^^^^^^

This is because the HasField typeclass is currently not representation-polymorphic enough.
Discussions about making the typeclass more polymorphic are discussed here https://gitlab.haskell.org/ghc/ghc/-/issues/22156 and here https://github.com/adamgundry/ghc-proposals/blob/hasfield-redesign/proposals/0000-hasfield-redesign.rst#recap-planned-changes-to-hasfield Since adding these instances is currently not possible, adding them is out of scope for this proposal.

Backwards Incompatibility

I don't currently see how this could break backwards compatibility in any way. Afaik, as soon as the OverloadedRecordDot extension is enabled, the lexing rules around the .symbol change so that foo.bar without whitespaces can only be used for field access. So there should be no conflicts with existing uses of the tuple accessors from the Lens or Optics libraries.

@mixphix
Copy link
Collaborator

mixphix commented Mar 7, 2023

Thanks for taking the time to develop this idea into its own package!

@nomeata
Copy link

nomeata commented Mar 7, 2023

https://github.com/ghc-proposals/ghc-proposals/blob/master/proposals/0170-unrestricted-overloadedlabels.rst allows numbers as labels (#1). I do not know if that will imply the use of 1 as a field name, as suggested here.

@BinderDavid
Copy link
Author

BinderDavid commented Mar 7, 2023

Thanks for taking the time to develop this idea into its own package!

No problem. I just don't want to upload it to Hackage, since publishing a library which only consists of orphan instances is probably bad for my Karma :) Orphan instances are one reason why I think it shouldn't be a library in the long run, the other one is that it fixes a small syntactic inconvencience in Haskell, and adding a dependency and an import is the comparatively bigger inconvenience.

https://github.com/ghc-proposals/ghc-proposals/blob/master/proposals/0170-unrestricted-overloadedlabels.rst allows numbers as labels (#1). I do not know if that will imply the use of 1 as a field name, as suggested here.

Thanks, I hadn't seen this proposal. I could have phrased the passage better by saying that my personal preference is to use x.1 instead of x._1, but one can equally make the point that it would be less confusing if the syntactic choice is consistent with the Lens and Optics library.

@chshersh
Copy link
Member

chshersh commented Mar 7, 2023

Thanks for writing such a thoughtful proposal and putting effort into it!

Generally, I agree with the idea of making tuples more ergonomic. I myself experience the pain of switching from (a, b) to (a, b, c) and needing to refactor lots of code. Having such a polymorphic interface would help with code migration and backwards compatibility.

There's a general sentiment that you shouldn't rely on tuples too much, and you're encouraged to introduce custom records instead. And I agree with it. But in some cases, it's not that easy. I can imagine providing some generic tuple interface for FFI or e.g. like the one from postgresql-simple.


I don't have an opinion on x._1 vs x.1 yet. I don't think we need to implement it for all 62-size tuples if compilation times are a concern, we could just limit by what other popular libraries do (btw, @BinderDavid, could you provide numbers on how long it takes to compile your package from scratch?).

I see only one problem though. Currently, all tuple data types are implemented in the GHC.Tuple module in the ghc-prim package, while HasField is in base. So orphan instances are unavoidable unless we want to move some code around. EDIT: Instances can be put in the GHC.Records module, so no orphan instances in base.

I'm personally not against orphan instances but I don't know the best place to put them. Data.Tuple could be one place but it feels wrong to reexport GHC.*-specific stuff from an established module, especially in the context of recent discussions of internal GHC API.

Similarly, I wouldn't mind moving GHC.Records to ghc-prim and reexporting it from base. So I would like hear opinions from other CLC members on this.

@BinderDavid
Copy link
Author

(btw, @BinderDavid, could you provide numbers on how long it takes to compile your package from scratch?).

Sure. Here is my system configuration:

$ lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         39 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  8
  On-line CPU(s) list:   0-7
Vendor ID:               GenuineIntel
  Model name:            Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
$ ghc --version
The Glorious Glasgow Haskell Compilation System, version 9.4.2

And here are the timings for all iustances up to 62 tuples:

$ time ghc src/Data/Tuple/Fields.hs 
[1 of 1] Compiling Data.Tuple.Fields ( src/Data/Tuple/Fields.hs, src/Data/Tuple/Fields.o )

real    0m6,918s
user    0m6,676s
sys     0m0,253s

If I remove all instances above 16 tuples, I get the following times:

$ time ghc src/Data/Tuple/Fields.hs 
[1 of 1] Compiling Data.Tuple.Fields ( src/Data/Tuple/Fields.hs, src/Data/Tuple/Fields.o )

real    0m0,290s
user    0m0,255s
sys     0m0,028s

@parsonsmatt
Copy link

That compilation hit shouldn't be visible to end users if this is in base, though it will make base take longer to compile. I'm neutral on how many tuples we should give these instances to - for consistency, 62 makes sense; for discouraging overuse of tuples, 8 makes sense.

I'm a fan of virtual fields like this - I think it's one of the better uses of the feature.

I'm in favor of _1 as the field label name - it keeps consistency with Haskell's general name convention for variables.

I'm ambivalent about where the instances go. Orphans are regrettably common with some of the core datatypes, but as long as this is exposed in the Data.Tuple and Prelude, we should be mostly fine - anyone importing tuples from ghc-prim is perfoming Advanced Tricks and should know what they are doing.

@jvanbruegge
Copy link

Am I missing something or why can't the instances be declared right next to the HasField class? That would not be an orphan instance then

@Jashweii
Copy link

Jashweii commented Mar 7, 2023

I don't think this goes far enough to justify it, it is less convenient than lens' _1 etc and you can even achieve . with RebindableSyntax setting getField as ^. and setField as %~ (giving you even more from lens over this). It is also inconvenient for hlist et all as you are getting the type level string "_1", not the natural number 1.
Edit: It might not be that directly convenient for lens since you need a class to map the field phantom parameter to a lens class, but you can do something similarly easy with generic-lens

FWIW you could provide these instances via GHC.Generics to Generically, then use DerivingVia both for tuples and as an opt-in for users for their own types. But you still have the downside of a string that presumably needs the _ extracting then the tail parsing before you can recur on it as a natural.

@mixphix
Copy link
Collaborator

mixphix commented Mar 7, 2023

base is not a convenience package. It is a package providing the absolute fundamentals for writing a program. It's in the name. It is not ideal that every library and application has to depend on base directly.

A "base package" or "core library" should provide:

  • Types and functions essential for program construction: Bool, Char, tuples, linked lists, Text, Maybe, Either, IO
  • Simple abstractions with common functionality: Monoid, Functor/Applicative/Monad, Foldable, Traversable

A "base package" or "core library" should not provide:

  • The kitchen sink
  • 6000 lines of orphan instances

@adamgundry
Copy link
Member

https://github.com/ghc-proposals/ghc-proposals/blob/master/proposals/0170-unrestricted-overloadedlabels.rst allows numbers as labels (#1). I do not know if that will imply the use of 1 as a field name, as suggested here.

It doesn't. Changing OverloadedRecordDot to permit numbers as field names would perhaps not be unreasonable, given that it would be consistent with the OverloadedLabels changes, but it hasn't been proposed yet.

Am I missing something or why can't the instances be declared right next to the HasField class? That would not be an orphan instance then

Agreed, I think the instances can be defined as non-orphan in GHC.Records, if there is consensus that adding them is a good idea.

@Bodigrim
Copy link
Collaborator

Bodigrim commented Mar 7, 2023

Should instances for all n-tuples be provided?

Haskell Report says "The Prelude and libraries define tuple functions such as zip for tuples up to a size of 7", and I think this is a reasonable limit.

I just don't want to upload it to Hackage, since publishing a library which only consists of orphan instances is probably bad for my Karma :)

I'm not particularly worried about orphans here, a single dedicated library with instances can work fairly well, even in the long run.

@Bodigrim
Copy link
Collaborator

I think this is an interesting idea, but I don't have any experience with HasField yet, so it is difficult for me to evaluate the proposal. The general feeling is that this stuff is fairly new and experimental?.. I'd rather wait for a release or two to get it more solid before committing to it in base.

I would actually suggest to upload tuple-fields to Hackage and promote it using usual community channels. If it gets traction, we'll be in much stronger position to justify its inclusion in base.

@hasufell
Copy link
Member

A "base package" or "core library" should provide:

Just a note that the CLC is (still) not aligned on what base should or shouldn't provide.

@georgefst
Copy link

Changing OverloadedRecordDot to permit numbers as field names would perhaps not be unreasonable, given that it would be consistent with the OverloadedLabels changes, but it hasn't been proposed yet.

If there's no particular reason not to permit them, I really think this should be explored, even if it means this proposal (or a modified version) takes longer to come to fruition. x.1 is fairly objectively aesthetically superior to x._1.

@Bodigrim
Copy link
Collaborator

@BinderDavid how would you like to proceed? I see that you've uploaded https://hackage.haskell.org/package/tuple-fields, so my suggestion would be to announce it widely, gather feedback, gain adoption and return to this proposal in a couple of months or so, hibernating it in the meantime. But as a proposer you have a right indeed to pursue the CLC decision as is.

@Bodigrim
Copy link
Collaborator

Current design is

class HasField x r a | x r -> a where
  getField :: r -> a

But the accepted GHC proposal 158 suggests changing it to

class HasField (x :: k) r a | x r -> a where
  hasField :: r -> (a -> r, a)

getField :: forall x r a . HasField x r a => r -> a
getField = snd . hasField @x

And then a fresh GHC proposal 583 changes it yet again to

class HasField x r a | x r -> a where
  getField :: r -> a

class SetField x r a | x r -> a where
  modifyField :: (a -> a) -> r -> r

IMO this is a strong evidence that the design of class HasField is very much in flux and is not yet ready to be codified in base. My recommendation is to hibernate the proposal.

@BinderDavid please suggest how would you like to proceed within two weeks, otherwise I'll mark the proposal as dormant.

@BinderDavid
Copy link
Author

Current design is

class HasField x r a | x r -> a where
  getField :: r -> a

But the accepted GHC proposal 158 suggests changing it to

class HasField (x :: k) r a | x r -> a where
  hasField :: r -> (a -> r, a)

getField :: forall x r a . HasField x r a => r -> a
getField = snd . hasField @x

And then a fresh GHC proposal 583 changes it yet again to

class HasField x r a | x r -> a where
  getField :: r -> a

class SetField x r a | x r -> a where
  modifyField :: (a -> a) -> r -> r

IMO this is a strong evidence that the design of class HasField is very much in flux and is not yet ready to be codified in base. My recommendation is to hibernate the proposal.

My impression is that the first proposal of record dot syntax envisioned one typeclass HasField with a method hasField :: r -> (a -> r, a) for both getting and setting fields. This design was accepted, but an implementation of that design was never merged. Instead, the original proposal was modified in ghc-proposals/ghc-proposals#405 to split the typeclass into two separate typeclasses HasField for getting the content of a field, and SetField for updating the contents of record fields. It was my impression that the design of the getting part, i.e. HasField, is somewhat final, except that it might become more representation polymorphic in the future. I have not followed the latest development of the SetField typeclass, where there are obvious complications due to polymorphic record updates which don't exist for simple field access.

W.r.t to this proposal: I have uploaded the package tuple-fields to Hackage and will keep it running with any future changes to the HasField typeclass. So I am also happy to hibernate this proposal for a while. I still remain convinced that providing these instances in base should be explored for the following reasons:

  • They can be provided without much effort as non-orphan instances (and only base can provide them as non-orphans).
  • They greatly improve the usability of tuple-destructuring where it is mostly used: small helper functions in let-bindings and where-clauses, since it doesn't even require the addition of an import at the beginning of the file.

I think we will see how prevalent the use of OverloadedRecordDot will be in the future, I personally like the feature a lot, and providing the instances for tuples would compliment its design.

@chshersh
Copy link
Member

I like the feature but I think it's too early to include this change in base.

I wouldn't want base to turn into the playground for experimental features. I'd like it to be stable and adopt well-known practices, so they can be encouraged further downstream.

Besides, since the design of relevant GHC extensions is still experimental, I'm not too comfortable with increasing the burden of GHC developers to update hundreds of lines of orphan instances when HasField or company changes.

@Bodigrim
Copy link
Collaborator

Thanks @BinderDavid, hibernating.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dormant Hibernated by proposer or committee
Projects
None yet
Development

No branches or pull requests