Use GenIdent for anonymous instances #4096

rhendric · 2021-05-29T02:47:37Z

This fixes the limitation of the CoreFn renamer which prevented it from
renaming top-level GenIdents. As a consequence, we can now give unnamed
instances more idiomatic names and still guarantee that they will be
unique in their module.

Description of the change

With compliments to @JordanMartinez, follow-up from one of my suggestions on #4085.

Checklist:

Added the change to the changelog's "Unreleased" section with a reference to this PR (e.g. "- Made a change (#0000)")
Added myself to CONTRIBUTORS.md (if this is my first contribution)
Linked any existing issues or proposals that this pull request should close
Updated or added relevant documentation
Added a test for the contribution (if applicable)

JordanMartinez

I'm just writing out my understanding of what this PR does, so that you can confirm/correct my understanding (and anyone following along can be brought up to speed sooner).

First, I'll cover some context / background info on the Ident type, the type class instance compiler-generated names and the Renamer code. Then, I'll describe the goal you're trying to achieve. Finally, I'll explain what you changed to achieve that goal.

The Ident type has three values:

Ident - an identifier provided by the developer
GenIdent - an identifier generated by the compiler
UnusedIdent - this is irrelevant for this PR. It's used in typechecking. Since it doesn't store any text value, we can't "rename" it anyway.

Identifiers can be thought of as 'names' of things (e.g. a function, a value, a let binding). Below, every variation of fooX (where X is a number) is an identifier:

module Module where

-- type signature omitted
-- but think `foo1 :: String -> Value`
foo1 foo2 = do
  let foo3 = --
  foo4 foo2
  where
  foo4 _ = "x"

Via #4085, type class instance whose names are generated currently use the Ident constructor when they should use the GenIdent constructor as it better reflects the source of their name. Such instances are currently named $_ClassNameTypeName_4 for the following reasons:

the leading $ ensures it does not clash with other purescript identifiers because no such identifiers can use $ in their name.
the first _ makes it easier to read the name because the $ character distracts the eye; however, it is not necessary for the name.
the final _ is also not necessary, but does make things a bit nicer to read due to the unique identifier at the end.
the 4 is a unique identifier that ensures similar instance names do not clash due to generating a name and then truncating it to 25 characters.

The Renamer is responsible for ensuring two or more identifiers don't clash within a given scope. For example, given the below code:

module Main where

import ImportedModule as ImportedModule

-- top-level member
foo a {- a1 -} =
  let
    -- let binding
    a {- a2 -} = "hello"
  in a <> " world"

-- or perhaps this?
bar a {- a1 -} = a <> ImportedModule.a {- a2 -}

the "a near {- a1 -}" will have a different identifier in the outputted JavaScript (e.g. a1) than the "a near {- a2 -}" despite both having the same name in the source code.

The Renamer removes clashes by adding an integer to the "base name" (e.g. a1, a2, a3, etc. for identifiers named a) every time it finds another clash.

The Renamer code BEFORE this PR works by doing the following:

Extract a module's declarations via renameInModules
Create the top-level scope by getting all identifiers in the module via findDeclIdents
Use a top-down approach to recurse through each layer in the AST and rename identifiers based on what's in scope at that point via renameInDecl, renameInValue, renameInLiteral, and renameInBinder. The "scope" of the current scope is determined by the binding group (I think).
1. Renames both the name and the value in a single pass (this is changed in this PR)
  1. top-level identifiers are not renamed (this is important)
  2. all other identifiers are renamed
Returns a new version of the module with a new version of the list of declarations with updated names

Type class instances will always be top-level identifiers in a module. However, the Renamer doesn't currently rename top-level members. If top-level members were renamed, then the module's exports also need to be renamed (so that exported identifiers still work when imported into other modules). To rebuild a module's exports, one also needs to incorporate the module's FFI (if it exists) in case such members are also exported.

Thus, here's the goal of this PR

Make generated type class instances use the GenIdent constructor rather than the Ident constructor to accurately reflect the source of their name.
Remove the leading $ character from the instance name by updating the Renamer to ensure names don't clash by renaming top-level GenIdent constructors (if needed).

This is how the PR achieves the goal. This PR...

Changes the final name generated by the compiler for a type class instance from $_ClassNameTypeName_4 to classNameTypeName4
Updates generated type class instance names to use GenIdent rather than Ident
Changes the Renamer to include top-level GenIdents when renaming and returns an updated version of the module's exports (new) in addition to its decls (current).
Changes the Rename to first rename names in one pass and then rename values in a second pass (whereas current approach renames both names and values in the same pass).

The Renamer code AFTER this PR works by doing the following:

Extract a module's exports, FFI, and declarations via renameInModules
Create the top-level scope by combining 1) only the Ident identifiers via findDeclIdents and 2) the FFI identifiers in the module
Uses State's Applicative (which executes sequentially and not in parallel) to do the following:
1. Use a top-down approach to recurse through each layer in the AST and rename identifiers based on what's in scope at that point via renameInDecl, renameInValue, renameInLiteral, and renameInBinder. The "scope" of the current scope is determined by the binding group (I think).
  1. Renames the names and then the values
    1. top-level GenIdents are renamed while top-level Idents are not renamed
    2. all other identifiers are renamed
2. Lookup what the new name of an identifier is (if not renamed, the original name will be returned) and create a new version of the export with the new identifier.
3. Return the updated declaration and its corresponding export
Returns a new version of the module with
1. a new version of the declarations with updated names
2. a new version of the exports with updated names

It seems the various other changes (e.g. renameDecl, etc.) are to get the types to line up and other changes (e.g. updateName) are minor refactoring.

JordanMartinez · 2021-05-29T16:21:48Z

lib/purescript-cst/src/Language/PureScript/CST/Convert.hs

      genName :: Text.Text
-      genName = "$_" <> Text.take 25 (className <> typeArgs) <> "_"
+      genName = Text.take 25 (className <> typeArgs)


Why is the final _ character being removed? Is this just preference?

I'm not for or against this as I think the larger readability problems are addressed in this PR. To me, this seems more like a 'stylistic' change than anything else.

It is cosmetic, yes. But note that the renamer only appends numbers to the end of a GenIdent if necessary to disambiguate. So with the _ character, instances were getting named classNameArg_, which is why I removed it.

Ah, makes sense.

JordanMartinez · 2021-05-29T16:26:10Z

src/Language/PureScript/Renamer.hs

+-- whereas in a Let declarations are renamed if their name shadows another in
+-- the current scope. In order to prevent inner bindings earlier in the list
+-- from shadowing bindings later in the list, first all of the declarations
+-- are considered for renaming, and then all of the values are recursed into.


In order to prevent inner bindings earlier in the list from shadowing bindings later in the list, first all of the declarations are considered for renaming, and then all of the values are recursed into.

Could you clarify this point further because I'm not familiar with the code outside of this file?

Is this in case something like this occurs?

foo :: String -> String foo a = -- does it use top-level `a` -- or foo's `a`? a <> "something" a :: String a = "text value

Yes, that's exactly what I'm talking about there. The a in foo is an inner binding that shadows the top-level a. The renamer should rename the a in foo to a1. I think the CoreImp optimizer expects the code it receives to have no shadowing, so it's not just a style thing (even though in JavaScript shadowing works just fine).

Gotcha. Thanks for clarifying!

rhendric · 2021-05-30T20:50:56Z

Your analysis is quite correct!

One thing that might be clarifying to add (at least, it briefly confused me just now when I read your analysis and thought I had made things too complicated) is why renameInDecls does two passes at every level in addition to the findDeclIdents call at the top level, which acts as sort of a zeroth mini-pass.

At the top level, we effectively do three passes because we want regular Idents to have first dibs on their preferred name, then GenIdents, then bindings in inner scopes. If we didn't do the findDeclIdents pass first, then an earlier GenIdent could possibly steal a name from the top-level scope that a later Ident would want (which would be very bad if we export that name). If we didn't do the decl-rename pass (which at the top level just means renaming GenIdents), then a reference to that GenIdent appearing deep in a value somewhere might get visited before the GenIdent has its new name in scope, which could maybe cause it to get a different final name than the thing it references.

We could do three passes at every level—I'm still considering that, actually—but the importance of letting Idents have first dibs over GenIdents is much less when nothing is being exported.

(Please ignore the fixup commit I momentarily pushed and then retracted; as I said, I was briefly confused, and thought that two passes everywhere would work. I forgot that Idents need to have priority over GenIdents at least at the top level.)

JordanMartinez · 2021-05-30T23:21:32Z

We could do three passes at every level—I'm still considering that, actually—but the importance of letting Idents have first dibs over GenIdents is much less when nothing is being exported.

Sounds like the 3-pass approach is necessary for top-level members and safer (but possibly not necessary?) for non-top-level members. Is that correct? If we do the 3-pass approach, does that slightly slow down the purs compile command?

rhendric · 2021-05-31T00:28:22Z

Using three passes below the top level doesn't get us any more safety. What it would get us is:

Code simplicity: we would treat all declarations the same, which might make this code easier to read.
Somewhat more predictable generated code in the presence of a pseudo-collision between a GenIdent and an Ident. I suspect this isn't possible to trigger right now; most if not all of the GenIdents I've seen prior to this PR are function parameters and not let-bindings, and instances have to be top-level. But suppose some future language feature resulted in some CoreFn that looked like this (in pseudo-PS):
```
let <GenIdent "x" 42> = foo ...
    x = bar ...
in <GenIdent "x" 42> + x
```
where x is a user-supplied proper Ident, and <GenIdent "x" 42> is an identifier that came from some desugaring somehow.

With two passes, this code could end up being generated:
```
var x = foo(...);
var x1 = bar(...);
return x + x1;
```
With three passes, this is what you'd get always:
```
var x1 = foo(...);
var x = bar(...);
return x1 + x;
```
The idea is that a name chosen by the programmer has priority over a variable produced by the compiler. A programmer using a debugger might not appreciate the variable x meaning something other than what they used it for. It's still a safe transformation either way, just potentially a surprising one to anyone reading the generated code or otherwise looking under the hood.

But as I said, I don't know of any way to trigger this sort of conflict below the top level right now, so this is a pretty theoretical benefit.

JordanMartinez · 2021-05-31T11:49:02Z

Simple code is always better than complex code if no other factors are at play.

The idea is that a name chosen by the programmer has priority over a variable produced by the compiler. A programmer using a debugger might not appreciate the variable x meaning something other than what they used it for. It's still a safe transformation either way, just potentially a surprising one to anyone reading the generated code or otherwise looking under the hood.

I think this is actually worth doing even if nothing triggers it now. If a programmer has said that some name is x, then it should be x in the compiled output. AFAIK, a GenIdent means the compiler generated it whereas Ident means the developer defined it. I think the decision to make GenIdents work around Idents rather than making later bindings work around earlier bindings better abides with the principle of readable debuggable code.

rhendric · 2021-06-01T16:36:45Z

New commit has a few changes:

I've implemented three passes everywhere. I think the code simplicity gains ended up being minimal, but positive.
I overlooked a few large mistakes that actually would have broken unnamed instances rather severely. 🤦 First, the renamer needs to rename variables that are qualified GenIdents now, because anything at the top level of a module gets qualified. Second, externs files weren't downstream of the renaming phase, so GenIdents there weren't getting renamed. Now there is some extra logic in externs generation that reuses the renaming map generated by the renamer.
I've added a bunch of tests for various cases that broke while I was experimenting. These are all basically copies of existing tests, but with some instance names removed.

rhendric · 2021-06-01T16:41:28Z

tests/purs/failing/OverlapAcrossModulesUnnamedInstance.out

+    OverlapAcrossModules.X.cX
+    OverlapAcrossModules.$cXY0


These error messages are not great. Neither of these instances has a programmer-provided name, and while at least the $ in the second instance suggests this, the first instance has nothing to indicate that the programmer shouldn't go literally searching for an instance named cX. But even in the second case, $cXY0 is not even the final name that will appear in the generated code!

I think the ideal thing here would be to describe these instances with something like this:

Suggested change

OverlapAcrossModules.X.cX

OverlapAcrossModules.$cXY0

instance in module OverlapAcrossModules.X with type forall y. C X y

instance in module OverlapAcrossModules with type C X Y

I'd like to address this in a later PR.

Sounds good!

src/Language/PureScript/Renamer.hs

JordanMartinez

AFAICT, this PR looks good. Could another core member look this over and provide their feedback?

tests/purs/failing/OverlappingUnnamedInstances.purs

JordanMartinez · 2021-06-18T13:50:20Z

I've fixed the merge conflict. Can we get another review here?

hdgarrood

Looks good!

hdgarrood · 2021-06-20T11:58:45Z

lib/purescript-cst/src/Language/PureScript/CST/Convert.hs


      className :: Text.Text
-      className = N.runProperName $ qualName cls
+      className = foldMap (uncurry Text.cons . first toLower)
+                . Text.uncons


Bit of a nit and I know that this style isn't universal in the code base already, but for new code I think it's good practice to avoid having indentation depend on the length of an identifier. In this case, if we wanted to rename className, we'd also have to touch every line in its implementation to realign the code.

Oh, I'm completely on board with that.

Maybe we should start a wiki page with our current/evolving notions of good style? I frequently find myself waffling between a style that is more idiomatic/traditional Haskell versus one which has practical benefits like this, and not having a dominant style in the code itself doesn't help. (Another example: equational declaration style versus lambda case—I think the latter is usually superior where applicable, again because renaming the function touches fewer lines.)

I think that would be good 👍

Rather than a wiki page, should it be included in the main repo?

Another thing we could add to that is to use record types whenever something has 3+ args.

I'm thinking there isn't much advantage to versioning the standard along with the code—particularly while we're hammering out what the standard is, that could generate a lot of noise in the commit history. Maybe if the wiki page stabilizes, it makes sense to add it then, and then gatekeep it with the same PR workflow we use for code. But for starting out, I think we want something that we can feel more fluid about rewriting.

Using a wiki page initially and then merging it into the repo when it stabilizes sounds like a good tradeoff.

hdgarrood · 2021-06-20T11:59:20Z

src/Language/PureScript/Externs.hs

-moduleToExternsFile :: Module -> Environment -> ExternsFile
-moduleToExternsFile (Module _ _ _ _ Nothing) _ = internalError "moduleToExternsFile: module exports were not elaborated"
-moduleToExternsFile (Module ss _ mn ds (Just exps)) env = ExternsFile{..}
+moduleToExternsFile :: Module -> Environment -> M.Map Ident Ident -> ExternsFile


Do you think it would be good to add a comment here which clarifies what the new Map argument is for? Even if it's just a pointer to the rename step of desugaring.

hdgarrood · 2021-06-20T12:32:39Z

tests/purs/failing/OverlappingUnnamedInstances.out

+  [33m               [0m
+  The following instances were found:
+
+    Main.$test1


This probably shouldn't happen in this PR, but now that instance names are optional we should probably include source spans in this list so that people can find the problematic instances a little more easily.

This fixes the limitation of the CoreFn renamer which prevented it from renaming top-level GenIdents. As a consequence, we can now give unnamed instances more idiomatic names and still guarantee that they will be unique in their module.

…ass (#4099) This refactors the work done in #4085. It was not merged immediately due to the work done in #4096 that would produce a merge conflict.

hdgarrood · 2021-06-24T13:48:45Z

After a bug report on FP Slack in which someone ran into a case where an instance name had a $13 suffix in the module where it was defined but some code was trying to access the same instance with a $12 suffix, I am wondering if the renamer was deliberately leaving top level declarations out before, and whether we might want to backtrack here because this limits our ability to perform cutoff in incremental rebuilds, since the answer to question of “has the public interface of the module changed” is much more likely to become “in a way no, but we have to treat it as if it has, because of renaming.”

hdgarrood · 2021-06-24T13:49:46Z

#3724

rhendric · 2021-06-24T14:16:58Z

If I understand the issue correctly, isn't it substantially worse without GenIdent than with?

Prior to this PR, generated instance names always had a numeric suffix, and it depended on the state of the supply monad, which means any change that pulls one more or fewer number from the supply at any earlier point during compilation would affect the naming.

Post this PR, generated instance names only change if there is a top-level name conflict, which is rare to begin with, and much less likely to change mid-development. (It can happen, but any name generation strategy runs some risk of deoptimizing incremental builds.)

hdgarrood · 2021-06-24T14:26:49Z

Right, ok, that makes sense, and I agree that this is a substantial improvement. I wasn’t objecting to this PR specifically, rather the approach of using statefulness in top level identifiers at all - I’m wondering whether instance names ought to be more deterministic than they are even after this PR, i.e. whether we should work out an algorithm which guarantees uniqueness without having to depend on statefulness / what else is inside the module. Of course readability will suffer that way, but I think reliability of incremental builds ought to win over readability when they are in conflict. But maybe this is enough of an edge case to not really be necessary; perhaps we should just get this change shipped instead.

rhendric · 2021-06-24T14:40:28Z

If we gave anonymous instances unreadable names—I assume you're thinking something like a robust hash of the fully qualified class name and argument types—perhaps we could restore some readability by also generating friendly-named local vars inside the module and at any import sites? Then downstream modules, if not recompiled, would still function correctly, even though they'd potentially have the ‘wrong’ internal friendly name sometimes.

I'm not crazy about that approach; seems like a fair whack of complexity to take on for an edge case in a compile-time performance feature. But we could pivot into it in the future if this ends up presenting a practical issue for cutoff?

rhendric · 2021-06-24T14:42:33Z

(Oh, and isn't it just a bug that incremental builds weren't triggered for downstream modules when $12 became $13? Is that tracked somewhere?)

rhendric · 2021-06-24T14:47:01Z

proceeds to seriously consider implementing the idea he just described as not being crazy about

JordanMartinez · 2021-06-24T15:16:50Z

Of course readability will suffer that way, but I think reliability of incremental builds ought to win over readability when they are in conflict.

But if one really wanted readable instance names, they could always provide it themselves, right?

hdgarrood · 2021-06-24T15:17:44Z

It might be a bug in incremental compilation, but I think it’s more likely that it’s an instance of #3323 - “i recompiled a downstream module on its own using the ide server and didn’t think the upstream module would need recompiling because the module interface ought not to have changed,” which is arguably not a bug, but somewhat problematic if people can’t reliably guess when a module interface has changed.

f-f · 2021-06-24T21:04:36Z

Why not generate names that contain both a readable name and a hash? In this way they won't become too unreadable.

rhendric force-pushed the rhendric/use-genident-for-anonymous-instances branch 3 times, most recently from d078f0e to 2b9009c Compare May 29, 2021 04:15

JordanMartinez reviewed May 29, 2021

View reviewed changes

rhendric force-pushed the rhendric/use-genident-for-anonymous-instances branch from a974eb4 to 2b9009c Compare May 30, 2021 20:37

rhendric commented Jun 1, 2021

View reviewed changes

JordanMartinez reviewed Jun 2, 2021

View reviewed changes

src/Language/PureScript/Renamer.hs Show resolved Hide resolved

JordanMartinez approved these changes Jun 2, 2021

View reviewed changes

JordanMartinez mentioned this pull request Jun 2, 2021

Desugar type class instance names in type class desugaring pass #4099

Merged

5 tasks

triallax reviewed Jun 3, 2021

View reviewed changes

tests/purs/failing/OverlappingUnnamedInstances.purs Outdated Show resolved Hide resolved

rhendric force-pushed the rhendric/use-genident-for-anonymous-instances branch from d818ca6 to 6957dab Compare June 5, 2021 19:10

hdgarrood approved these changes Jun 20, 2021

View reviewed changes

rhendric added 2 commits June 20, 2021 17:27

Use GenIdent for anonymous instances

37cf06a

This fixes the limitation of the CoreFn renamer which prevented it from renaming top-level GenIdents. As a consequence, we can now give unnamed instances more idiomatic names and still guarantee that they will be unique in their module.

fixup! Use GenIdent for anonymous instances

c801acd

rhendric force-pushed the rhendric/use-genident-for-anonymous-instances branch from 92ddb51 to c801acd Compare June 20, 2021 21:28

rhendric merged commit 936d8d2 into purescript:master Jun 20, 2021

rhendric deleted the rhendric/use-genident-for-anonymous-instances branch June 20, 2021 22:07

This was referenced Jun 22, 2021

Remove generated names from errors about instances #4118

Merged

Float compiler-synthesized function applications #3915

Merged

rhendric mentioned this pull request Jun 26, 2021

Update unnamed instances changelog entry #4122

Merged

5 tasks

rhendric added a commit to rhendric/purescript that referenced this pull request Jul 3, 2021

Use anteaters to clean up kitchen (purescript#4096)

0612bf2

rhendric added a commit to rhendric/purescript that referenced this pull request Jul 3, 2021

Use anteaters to clean up kitchen (purescript#4096)

2d22eac

rhendric added a commit to rhendric/purescript that referenced this pull request Jul 3, 2021

Use anteaters to clean up kitchen (purescript#4096)

e4da1fa

rhendric added a commit to rhendric/purescript that referenced this pull request Jul 6, 2021

Use anteaters to clean up kitchen (purescript#4096)

824107f

rhendric added a commit to rhendric/purescript that referenced this pull request Jul 11, 2021

Use anteaters to clean up kitchen (purescript#4096)

870aac1

rhendric added a commit to rhendric/purescript that referenced this pull request Jul 11, 2021

Use anteaters to clean up kitchen (purescript#4096)

19018cc

JordanMartinez mentioned this pull request Jan 18, 2022

Update version to v0.15.0 working-group-purescript-es/purescript#4

Closed

Use GenIdent for anonymous instances #4096

Use GenIdent for anonymous instances #4096

Conversation

rhendric commented May 29, 2021 • edited Loading

JordanMartinez left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rhendric commented May 30, 2021

JordanMartinez commented May 30, 2021

rhendric commented May 31, 2021

JordanMartinez commented May 31, 2021

rhendric commented Jun 1, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JordanMartinez left a comment

Choose a reason for hiding this comment

JordanMartinez commented Jun 18, 2021

hdgarrood left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hdgarrood commented Jun 24, 2021 • edited Loading

hdgarrood commented Jun 24, 2021

rhendric commented Jun 24, 2021

hdgarrood commented Jun 24, 2021

rhendric commented Jun 24, 2021

rhendric commented Jun 24, 2021

rhendric commented Jun 24, 2021

JordanMartinez commented Jun 24, 2021

hdgarrood commented Jun 24, 2021

f-f commented Jun 24, 2021

rhendric commented May 29, 2021 •

edited

Loading

hdgarrood commented Jun 24, 2021 •

edited

Loading