.. author:: Neil Mitchell and Shayne Fletcher
.. date-accepted:: 2020-05-03, amended 2021-03-09
.. ticket-url:: https://gitlab.haskell.org/ghc/ghc/-/issues/18599
.. implemented:: 9.2
Contents
- 1. Motivation
- 2. Proposed Change Specification
- 3. Examples
- 4. Effect and Interactions
- 5. Costs and Drawbacks
- 6. Alternatives to this proposal
- 7. Alternatives within this proposal
- 7.1 Should the extensions imply
NoFieldSelectors
or another extension? - 7.2 Should a syntax be provided for modification?
- 7.3 Should there be update sections?
- 7.4 Should pattern matching be extended?
- 7.5 Will whitespace sensitivity become worse?
- 7.6 Should a new update syntax be added?
- 7.7 Why two extensions and not just one?
- 7.8 Why not make
RecordDotSyntax
part of this proposal?
- 7.1 Should the extensions imply
- 8. Unresolved issues
- 9. Implementation Plan
Records in Haskell are widely recognised as
being under-powered, with duplicate field names being particularly
troublesome. We propose new language extensions
OverloadedRecordDot
and OverloadedRecordUpdate
that provide
syntactic sugar to make the features introduced in the HasField
proposal
more accessible, improving the user experience.
In almost every programming language we write a.b
to mean the b
field of the a
record expression. In Haskell that becomes b a
,
and even then, only works if there is only one b
in scope. Haskell
programmers have struggled with this weakness, variously putting each
record in a separate module and using qualified imports, or prefixing
record fields with the type name. We propose bringing a.b
to
Haskell, which works regardless of how many b
fields are in scope.
Here’s a simple example of what is on offer:
{-# LANGUAGE OverloadedRecordDot, OverloadedRecordUpdate #-}
data Company = Company {name :: String, owner :: Person}
data Person = Person {name :: String, age :: Int}
display :: Company -> String
display c = c.name ++ " is run by " ++ c.owner.name
nameAfterOwner :: Company -> Company
nameAfterOwner c = c{name = c.owner.name ++ "'s Company"}
We declare two records both having name
as a field label. The user
may then write c.name
and c.owner.name
to access those fields.
We can also write c{name = x}
as a record update, which works even
though name
is no longer unique. Under the hood, we make use of
getField
and setField
from the HasField proposal.
An implementation of this proposal has been battle tested and hardened over two years in the enterprise environment as part of Digital Asset’s DAML smart contract language (a Haskell derivative utilizing GHC in its implementation), and also in a Haskell preprocessor and a GHC plugin. When initially considering Haskell as a basis for DAML, the inadequacy of records was considered the most severe problem, and without devising the scheme presented here, wouldn’t be using Haskell. The feature enjoys universal popularity with users.
For the specification we focus on the changes to the parsing rules, and the desugaring, with the belief the type checking and renamer changes required are an unambiguous consequences of those.
This change adds new language extensions OverloadedRecordDot
and
OverloadedRecordUpdate
.
If OverloadedRecordDot
is on:
- The expression
.lbl
meansgetField @"lbl"
; - The expression
e.lbl
meansgetField @"lbl" e
.
If OverloadedRecordDot
is not on, these expressions are parsed as
uses of the function (.)
.
If OverloadedRecordUpdate
is on:
- The expression
e{lbl = val}
meanssetField @"lbl" val
; - Update expressions with qualified labels like
r{M.x = val}
are disallowed.
If OverloadedRecordUpdate
is not on, e{lbl = val}
means just
what it does in Haskell98.
If OverloadedRecordDot
and OverloadedRecordUpdate
are both on
the expression e{lbl₁.lbl₂ = val}
means e{lbl₁ = (e.lbl₁){lbl2 =
val}}
otherwise the expression e{lbl₁.lbl₂ = val}
is illegal.
In the event the language extensions OverloadedRecordDot
and
OverloadedRecordUpdate
are enabled:
Expression | Equivalent |
---|---|
(.lbl) |
(\e -> e.lbl) |
(.lbl₁.lbl₂) |
(\e -> e.lbl₁.lbl₂) |
e.lbl |
getField @"lbl" e |
e.lbl₁.lbl₂ |
(e.lbl₁).lbl₂ |
e{lbl = val} |
setField @"lbl" e val |
e{lbl₁.lbl₂ = val} |
e{lbl₁ = (e.lbl₁){lbl₂ = val}} |
e.lbl₁{lbl₂ = val} |
(e.lbl₁){lbl₂ = val} |
e{lbl₁ = val₁}.val₂ |
(e{lbl₁ = val₁}).val₂ |
e{lbl₁} |
e{lbl₁ = lbl₁} [Note: requires NamedFieldUpdates ] |
e{lbl₁.lbl₂} |
e{lbl₁.lbl₂ = lbl₂} [Note: requires NamedFieldUpdates ] |
- Updating nested fields.
e{lbl = val}
is the syntax of a standard H98 record update. It’s the nested form introduced by this proposal that is new :e{lbl1.lbl2 = val}
. However, in the eventOverloadedRecordUpdate
is in effect, note thate{lbl = val}
desugars tosetField @"lbl" e val
]. - Punning. With
NamedFieldPuns
, the forme { x, y }
meanse { x=x, y=y }
. WithOverloadedRecordUpdate
this behaviour is extended to nested updates:e { a.b.c, x.y }
meanse { a.b.c=c, x.y=y }
. Note the variable that is referred to implicitly (herec
andy
) is the last chunk of the field to update. Soc
is the last chunk ofa.b.c
, andy
is the last chunk ofx.y
.
M.N.x
looks ambiguous. It could mean:
(M.N).x
that is, select thex
field from the (presumably nullary) data constructorM.N
, or- The qualifed name
M.N.x
, meaning thex
imported fromM.N
.
The ambiguity is resolved in favor of M.N.x
as a qualified name.
If the other interpretation is desired you can still write (M.N).x
We propose that .
“bind more tightly” than function application
thus, f r.a.b
parses as f (r.a.b)
.
Expression | Interpretation |
---|---|
f r.x |
means f (r.x) |
f r .x |
is illegal |
f (g r).x |
f ((g r).x) |
f (g r) .x |
is illegal |
f M.n.x |
means f (M.n.x) (that is, f (getField @"x" M.n) ) |
f M.N.x |
means f (M.N.x) (M.N.x is a qualified name, not a record field selection) |
We propose that dot notation isn’t available for fields whose names are
operator symbols (for example, +
, .+.
and so on).
[Note : For fields whose names are operator symbols, one can still
write getField
expressions (e.g. getField @".+." r
)].
For what follows, we use these informal definitions:
- A field selector is an expression like
.a
or.a.b
; - A field selection is an expression like
r.a
or(f x).a.b
; - A field update is an expression like
r{a = 12}
orr{a.b = "foo"}
; - A punned field update is an expression like
r{a}
orr{a.b}
(here it is understood thatb
is a variable bound in the environment of the expression and only valid syntax if theNamedFieldPuns
language extension is in effect).
The prototype implements the parsing scheme presented here. More information about the prototype is available in this section.
A new token case ITproj Bool
is introduced. When the
OverloadedRecordDot
extension is enabled occurences of operator
.
not as part of a qualified name are classified using the
whitespace sensitive operator mechanism from this (accepted) GHC
proposal.
The rules are:
Occurence | Token | Means | Example |
---|---|---|---|
prefix | ITproj True |
field selector | .x |
tight infix | ITproj False |
field selection | r.x |
suffix | ITdot |
function composition | f. g |
loose infix | ITdot |
function composition | f . g |
No ITproj
tokens will ever be issued if OverloadedRecordDot
is
not enabled.
The Haskell grammar is extended with the following productions. We use these notations:
Symbol | Occurence |
---|---|
.ᴾ | prefix |
.ᵀ | tight-infix |
2.3.2.1
[Field]
field → varid
2.3.2.2
[Field to update]
fieldToUpdate → fieldToUpdate .ᵀ field | field
2.3.2.3
[Field selectors]
aexp → ( projection )
projection → .ᴾ field | projection .ᵀ field
2.3.2.4
[Field selection]
fexp → fexp .ᵀ field
2.3.2.5
[Field update]
fbind → field .ᵀ fieldToUpdate = exp
fbind → field .ᵀ fieldToUpdate
This is a record type with functions describing a study Class
(Oh!
Pascal, 2nd ed. Cooper & Clancy, 1985).
data Grade = A | B | C | D | E | F
data Quarter = Fall | Winter | Spring
data Status = Passed | Failed | Incomplete | Withdrawn
data Taken =
Taken { year :: Int
, term :: Quarter
}
data Class =
Class { hours :: Int
, units :: Int
, grade :: Grade
, result :: Status
, taken :: Taken
}
getResult :: Class -> Status
getResult c = c.result -- get
setResult :: Class -> Status -> Class
setResult c r = c{result = r} -- update
setYearTaken :: Class -> Int -> Class
setYearTaken c y = c{taken.year = y} -- nested update
getResults :: [Class] -> [Status]
getResults = map (.result) -- selector
getTerms :: [Class] -> [Quarter]
getTerms = map (.taken.term) -- nested selector
Further examples accompany the prototype and yet more (as tests) are available in the examples directory of this repository. Those tests include infix applications, polymorphic data types, interoperation with other extensions and more.
Polymorphic updates: When enabled, this extension takes the
a{b=c}
syntax and uses it to mean setField
. The biggest
difference a user is likely to experience is that the resulting type of
a{b=c}
is the same as the type a
- you cannot change the type
of the record by updating its fields. The removal of polymorphism is
considered essential to preserve decent type inference, and is the only
option supported by the HasField proposal.
Anyone wishing to use polymorphic updates can write
let Foo{..} = a in Foo{polyField=[], ..}
instead.
Higher-rank fields: It is impossible to express HasField
instances for data types such as
data T = MkT { foo :: forall a . a -> a}
, which means they can’t
have this syntax available. Users can still write their own selector
functions using record puns if required. There is a possibility that
with future types of impredicativity such getField
expressions could
be solved specially by the compiler.
Lenses and a.b syntax: The a.b
syntax is commonly used in
conjunction with the lens
library, e.g. expr^.field1.field2
.
Treating a.b
without spaces as a record projection would break such
code. The alternatives would be to use a library with a different lens
composition operator (e.g. optics
), introduce an alias in lens
for .
(perhaps %
), write such expressions with spaces, or not
enable this extension when also using lenses. While unfortunate, we
consider that people who are heavy users of lens don’t feel the problems
of inadequate records as strongly, so the problems are lessened. In
addition, it has been discussed
(e.g. here),
that this proposal is complimentary to lens and can actually benefit
lens users (as with NoFieldSelectors
one can use the same field
names for everything: dot notation, lens-y getting, lens-y modification,
record updates, Show/Generic
).
Rebindable syntax: When RebindableSyntax
is enabled the
getField
and setField
functions are those in scope, rather than
those in GHC.Records
. The .
function (as used in the a.b.c
desugaring) remains the Prelude
version (we see the .
as a
syntactic shortcut for an explicit lambda, and believe that whether the
implementation uses literal .
or a lambda is an internal detail).
Enabled extensions: The extensions do not imply enabling/disabling
any other extensions. It is often likely to be used in conjunction
with either the NoFieldSelectors
extension orDuplicateRecordFields
.
The implementation of this proposal adds code to the compiler, but not a huge amount. Our prototype shows the essence of the parsing changes, which is the most complex part.
If this proposal becomes widely used then it is likely that all Haskell
users would have to learn that a.b
is a record field selection.
Fortunately, given how popular this syntax is elsewhere, that is
unlikely to surprise new users.
This proposal advocates a different style of writing Haskell records,
which is distinct from the existing style. As such, it may lead to the
bifurcation of Haskell styles, with some people preferring the lens
approach, and some people preferring the syntax presented here. That is
no doubt unfortunate, but hard to avoid - a.b
really is ubiquitous
in programming languages. We consider that any solution to the records
problem must cause some level of divergence, but note that this
mechanism (as distinct from some proposals) localises that divergence in
the implementation of a module - users of the module will not know
whether its internals used this extension or not.
The use of a.b
with no spaces on either side can make it harder to
write expressions that span multiple lines. To split over two lines it
is possible to use the &
function from Base
or do either of:
(myexpression.field1.field2.field3
).field4.field5
let temp = myexpression.field1.field2.field3
in temp.field4.field5
Instead of this proposal, we could do any of the following:
- Using the lens library. While lenses help
both with accessors and overloaded names (e.g.
makeFields
), one still needs to use one of the techniques mentioned below (or similar) to work around the problem of duplicate name selectors. In addition, lens-based syntax is more verbose, e.g.f $ record ^. field
instead of possiblef record.field
. More importantly, while the concept of lenses is very powerful, that power can be complex to use, and for many projects that complexity is undesirable. In many ways lenses let you abstract over record fields, but Haskell has neglected the “unabstracted” case of concrete fields. Moreover, as it has been previously mentioned, this proposal is orthogonal to lens and can actually benefit lens users. - The DuplicateRecordFields extension is designed to solve similar problems. We evaluated this extension as the basis for DAML, but found it lacking. The rules about what types must be inferred by what point are cumbersome and tricky to work with, requiring a clear understanding of at what stage a type is inferred by the compiler.
- Some style guidelines mandate that each record should be in a
separate module. That works, but then requires qualified modules to
access fields - e.g.
Person.name (Company.owner c)
. Forcing the structure of the module system to follow the records also makes circular dependencies vastly more likely, leading to complications such as boot files that are ideally avoided. - Some style guidelines suggest prefixing each record field with the
type name, e.g.
personName (companyOwner c)
. While it works, it isn’t pleasant, and many libraries then abbreviate the types to lead to code such asprsnName (coOwner c)
, which can increase confusion. - There is a GHC plugin and
preprocessor
that both implement much of this proposal. While both have seen light
use, their ergonomics are not ideal. The preprocessor struggles to
give good location information given the necessary expansion of
substrings. The plugin cannot support the full proposal and leads to
error messages mentioning
getField
. Suggesting either a preprocessor or plugin to beginners is not an adequate answer. One of the huge benefits to thea.b
style in other languages is support for completion in IDE’s, which is quite hard to give for something not actually in the language. - Continue to vent about records on social media.
All these approaches are currently used, and represent the “status quo”, where Haskell records are considered not fit for purpose.
Typically the extensions will be used in conjunction with
NoFieldSelectors
, but DuplicateRecordFields
would work too. Of
those two, DuplicateRecordFields
complicates GHC, while
NoFieldSelectors
conceptually simplifies it, so we prefer to bias
the eventual outcome. However, there are lots of balls in the air, and
enabling the extensions should ideally not break normal code, so
we leave everything distinct (after being convinced).
Earlier versions of this proposal contained a modify field syntax of the
form a{field * 2}
. While appealing, there is a lot of syntactic
debate, with variously a{field <- (*2)}
, a{field * = 2}
and
others being proposed. None of these syntax variations are immediately
clear to someone not familiar with this proposal. To be conservative, we
leave this feature out.
There are no update sections. Should ({a=})
, ({a=b})
or
(.lbl=)
be an update section? While nice, we leave this feature out.
We do not extend pattern matching, although it would be possible for
P{foo.bar=Just x}
to be defined.
We’re not aware of qualified modules giving any problems, but it’s adding whitespace sensitivity in one more place.
One suggestion is that record updates remain as normal, but
a { .foo = 1 }
be used to indicate the new forms of updates. While
possible, we believe that option leads to a confusing result, with two
forms of update both of which fail in different corner cases. Instead,
we recommend use of C{foo}
as a pattern (with -XNamedFieldPuns
)
to extract fields if necessary.
Things we could have done instead:
- Add two extensions, as proposed here.
- Pro: flexibility for people who want type-changing update, but would still like dot-notation. Breaking back on type-changing update, like
OverloadedRecordUpdate
does, has proved to be controversial, and we don’t want it to hold back the integration of this proposal in GHC. - Pro: orthogonal things are controlled by separate flags.
- Con: each has to be documented separately: two flags with one paragraph each, instead of one flag with two paragraphs. (The implementation cost is zero: it's only a question of which flag to test.)
- Add a single extension (
OverloadedRecordFields
, say) to do whatOverloadedRecordDot
andOverloadedRecordUpdate
do in this proposal.
- Pro: only one extension.
- Con: some users might want dot-notation, but not want to give up type-changing update.
- Make this modification a no-op, doing nothing. Instead adopt precisely the previous proposal. Use
RecordDotSyntax
as the extension, covering both record dot and update. However, we should then be prepared to change whatRecordDotSyntax
means later. In particular, it is very likely that we’ll wantRecordDotSyntax
to implyNoFieldSelectors
.
- Pro: only one extension
- Con: changing the meaning of an extension will break programs.
- Use
RecordDotSyntax
, just as in the original proposal, but addNoFieldSelectors
immediately
- Con: it’s too early to standardize this, we’re not really sure that it’s what we want (e.g. we may want
DuplicatRecordFields
instead).
NB: the difference between (2) and (3) is tiny: only whether we have OverloadedRecordFields
now and RecordDotSyntax
later; or RecordDotSyntax
now and <something else> later.
We think RecordDotSyntax
will enable these extensions plus some
extension that allows multiple field names, e.g. NoFieldSelectors
.
Which final extension that is has not yet been determined.
None.
To gain confidence these changes integrate as expected a prototype was produced that parses and desugars forms directly in the parser. For confirmation, we do not view desugaring in the parser as the correct implementation choice, but it provides a simple mechanism to pin down the changes without going as far as adding additional AST nodes or type checker rules. The prototype was rich enough to “do the right thing”. Update July 2021: More tests are now available in the GHC tree, e.g. RecordDotSyntax1.hs.
If accepted, the proposal authors would be delighted to provide an implementation. Implementation depends on the implementation of the HasField proposal.