Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[compiler] add newtype for bound names in IR #14547

Merged
merged 1 commit into from
May 16, 2024
Merged

Conversation

patrick-schultz
Copy link
Collaborator

@patrick-schultz patrick-schultz commented May 10, 2024

This PR introduces a new type Name for representing bound variables in the IR, replacing String. For now, it is just an AnyVal wrapper around String, but in the future I would like to take advantage of the new type. For example, I'd like to:

  • change equality of Name from string comparison to comparing object identity with eq. That way freshName becomes just new Name(), with stronger guarantees that the new name doesn't occur anywhere in the current IR, without needing to maintain global state as we do now.
  • get rid of NormalizeNames, instead enforcing the global uniqueness of names as a basic invariant of the IR (typecheck could also check this invariant)
  • keep a string in the Name, but no longer require it to be unique. Instead it's just a suggestion for how to show the name in printouts, adding a uniqueifying suffix as needed. With NormalizeNames gone, this would let us preserve meaningful variable names further in the lowering pipeline.
  • possibly keep other state in the Name, for example to allow a more efficient implementation of environments, similar to the mark state on BaseIR

This is obviously a large change, but there are only a few conceptual pieces (appologies for not managing to separate these out):

  • attempt to minimize the number of locations in which the Name constructor is called, to make future refactorings easier
  • add freshName(), which just wraps genUID(), returning a Name
  • convert IR construction to use the convenience methods in ir.package, which take scala lambdas to represent blocks with bound variables, instead of manually creating new variable names
  • replace uses of the magic constant variable names (row, va, sa, g, global) with constants (TableIR.{rowName, globalName}, MatrixIR.{rowName, colName, entryName, globalName})
  • the above changes modified the names we use for bound variables in many places. That shouldn't matter, but it cought a couple bugs where it did.
    • NormalizeNames optionally allows the IR to contain free variables. But it didn't do anything to ensure the newly generated variable names are distinct from any contained free variables. Thus it was possible to rename a bound variable to mistakenly capture a contained free variable. I've fixed that.
    • SimplifySuite compared simplified IR with the pre-constructed expected IR, carefully controlling the genUID global state to make simplify generate exactly the names expected. I've replaced that by just comparing with the expected IR using a alpha-equivalence comparison.

Copy link
Collaborator Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @patrick-schultz and the rest of your teammates on Graphite Graphite

@patrick-schultz patrick-schultz changed the title add newtype for bound names [compiler] add newtype for bound names in IR May 10, 2024
@patrick-schultz patrick-schultz force-pushed the variable-name-newtype branch 3 times, most recently from 38496f3 to e682651 Compare May 14, 2024 15:03
@patrick-schultz patrick-schultz marked this pull request as ready for review May 14, 2024 19:58
Copy link
Collaborator Author

cc @ehigham

Copy link
Collaborator

@ehigham ehigham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is heroic.

Comment on lines +9 to +11
case class Name(str: String) extends AnyVal {
override def toString: String = str
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think you can do this in one line :)

Suggested change
case class Name(str: String) extends AnyVal {
override def toString: String = str
}
case class Name(override val toString: String) extends AnyVal

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like your idea of using object equality rather than string equality.
Keeping the string, however, might allow us to better keep track of where a name came from through lowering.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely! From the description:

keep a string in the Name, but no longer require it to be unique. Instead it's just a suggestion for how to show the name in printouts, adding a uniqueifying suffix as needed. With NormalizeNames gone, this would let us preserve meaningful variable names further in the lowering pipeline.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think you can do this in one line :)

I could... but I'd rather not commit to this choice of toString permanently.

@hail-ci-robot hail-ci-robot merged commit 8a3c60d into main May 16, 2024
2 checks passed
@hail-ci-robot hail-ci-robot deleted the variable-name-newtype branch May 16, 2024 00:52
hail-ci-robot pushed a commit that referenced this pull request Jun 28, 2024
…4579)

I believe #14547 introduced a bug that broke IR function deserialization
in QoB by changing `value_parameter_names` from `Array[String]` to
`Array[Name]`. This fixes the issue by pushing the introduction of the
`Name` wrapper object to after deserialization. Another option is to
incorporate the `{ str: String }` structure of `Name` into the python ->
scala payload, but I'm not sure I see a use case for that and we can
always do that later (there is no issue of backwards compatibility with
this communication between python and scala).

My main concern here is that clearly this isn't tested. I'd appreciate
guidance on the current advised practice for testing this behavior.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants