Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIP-52 - Binary APIs #58

Merged
merged 17 commits into from
Oct 23, 2023
Merged
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
212 changes: 212 additions & 0 deletions content/binary-api.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,212 @@
---
layout: sip
permalink: /sips/:title.html
stage: pre-sip
status: submitted
title: SIP-52 - Binary APIs
---

**By: Author Nicolas Stucki**

## History

| Date | Version |
|---------------|--------------------|
| Feb 27 2022 | Initial Draft |

## Summary

This proposal introduces the `@binaryAPI` and `@binaryAPIAccessor` annotations on term definitions. The purpose of binary APIs is to have publicly accessible definitions in generated bytecode for definitions that are private or protected.


## Motivation

### Provide a sound way to refer to private members in inline definitions

Currently, the compiler automatically generates accessors for references to private members in inline definitions. This scheme interacts poorly with binary compatibility. It causes the following three unsoundness in the system:
* Changing any definition from private to public is a binary incompatible change
* Changing the implementation of an inline definition can be a binary incompatible change
* Removing final from a class is a binary incompatible change

You can find more details in https://github.com/lampepfl/dotty/issues/16983
julienrf marked this conversation as resolved.
Show resolved Hide resolved
nicolasstucki marked this conversation as resolved.
Show resolved Hide resolved

### Avoid duplication of inline accessors

Ideally, private definitions should have a maximum of one inline accessor, which is not the case now.
When an inline method accesses a private/protected definition that is defined outside of its class, we generate an inline in the class of the inline method. This implies that accessors might be duplicated if a private/protected definition is accessed from different classes.

### Removing deprecated APIs

There is no precise mechanism to remove a deprecated method from a library without causing binary incompatibilities. We should have a straightforward way to indicate that a method is no longer publicly available but still available in the generated code for binary compatibility.
Copy link

@JD557 JD557 May 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an actual problem?

Based on the Semantic Versioning spec and a discussion with clarifications, my understanding is that the semver way would require a MAJOR version bump to remove a deprecated method.

Binary incompatibility of different major versions seems like a reasonable expectation.

There are obviously other version schemes other than semver, but I imagine the handling of deprecated APIs is similar (e.g. I think PVP is even more extreme, where both deprecating and removing the API would require a major bump)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is an actual problem because the Scala ecosystem, and therefore use of SemVer, is rooted in binary and TASTy compatibility. You want to be able to make source breaking changes while retaining binary and TASTy compatibility, and that should stay a MINOR version bump. Perhaps you are mitigating the source breakage with extension methods or new overloads instead.

(If breaking source compat required a MAJOR version bump in Scala, every new version of a library with a new public class or method would be a MAJOR version, and we don't want that for obvious reasons.)


```diff
- @deprecated(...) def myOldAPI: T = ...
+ private[C] def myOldAPI: T = ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be private[pkg] ? The behavior of private[C] is to generate a name-mangled definition C$$myOldAPI to avoid clashes in case a subclass defines its own myOldAPI.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure in which cases private[C] generates a mangled name. I tried the following and non of the names get mangled.

class C:
  private[C] def f = 1
object C:
  private[C] def f = 1
trait B:
  private[B] def f = 1
class A extends B

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do get the mangling for private[this] val in traits.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried the following and non of the names get mangled.

Ah, it looks like the mangling only happens in Scala 2, not Scala 3. So that's still an issue for cross-compiling codebases.

```
Comment on lines +41 to +48

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this ties-in well with the discussion in lightbend-labs/mima#724. To summarize:

  1. MiMa currently ignores changes to private[package] APIs, because the assumption is that these are internal to the library. This improves the signal-to-noise when making such internal changes.
  2. As noted here, package[private] is often used for removing a public API without breaking bincompat. But, due to (1), MiMa will no longer check these now-package-private APIs for binary-compatibility, which is dangerous. The workaround is to check MiMa against every previous artifact, but this has its own costs/risks.
  3. By adding an explicit annotation to denote public APIs, MiMa would be able to distinguish between internal APIs and once-public-now-removed APIs that must otherwise maintain binary-compatibility.

So, I hope that integration with MiMa will be considered as part of this proposal :)

Big 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this proposal was partially designed to address that MiMa problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the link lightbend-labs/mima#724 to the SIP document.


Related to discussion in https://github.com/lightbend/mima/discussions/724
nicolasstucki marked this conversation as resolved.
Show resolved Hide resolved

## Proposed solution

### High-level overview

This proposal introduces 2 the `@binaryAPI` and `@binaryAPIAccessor` annotations, and changes adds a migration path to inline methods.
nicolasstucki marked this conversation as resolved.
Show resolved Hide resolved
julienrf marked this conversation as resolved.
Show resolved Hide resolved

#### `@binaryAPI` annotation

A binary API is a definition that is annotated with `@binaryAPI` or overrides a definition annotated with `@binaryAPI`.
This annotation can be placed on `def`, `val`, `lazy val`, `var`, `object`, and `given` definitions.
A binary API will be publicly available in the bytecode.

This annotation cannot be used on `private`/`private[this]` definitions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is because those get mangled, then it should also be disallowed on private[C] definitions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mangled names are not an issue. The problem with these is that they compile to private methods in the bytecode. The names of those methods are not mangled, and therefore a class and its parent could have two unrelated private definitions with the same name. If we make them public in the bytecode, they would clash.


Removing this annotation from a non-public definition is a binary incompatible change.

Example:

~~~ scala
class C {
@binaryAPI private[C] def packagePrivateAPI: Int = ...
@binaryAPI protected def protectedAPI: Int = ...
@binaryAPI def publicAPI: Int = ... // warn: `@binaryAPI` has no effect on public definitions
}
~~~
will generate the following bytecode signatures
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're talking about bytecode it would be good to actually specify what we're generating at the bytecode level exactly. I assume that public in the code example below maps to ACC_PUBLIC in https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-4.html, but we might also want to use ACC_SYNTHETIC so that these definitions stay hidden from javac and java IDEs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if adding ACC_SYNTHETIC is always a good idea.

  • @binaryAPI def publicAPI: Int = ... should not be marked as ACC_SYNTHETIC because it is truly public.
  • Would @binaryAPI protected def publicAPI: Int = ... overridable in JAVA if it has ACC_SYNTHETIC?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would @binaryapi protected def publicAPI: Int = ... overridable in JAVA if it has ACC_SYNTHETIC?

Yeah if you want the method to be overridable then it can't be ACC_SYNTHETIC.

~~~ java
public class C {
public C();
public int packagePrivateAPI();
public int protectedAPI();
public int publicAPI();
}
~~~

#### `@binaryAPIAccessor` annotation

A binary API with accessor is a definition that is annotated with `@binaryAPIAccessor`.
This annotation can be placed on `def`, `val`, `lazy val`, `var`, `object`, and `given` definitions.
The annotated definition will get a public accessor.

This can be used to access `private`/`private[this]` definitions within inline definitions.
Copy link
Member

@smarter smarter Mar 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about access to inner private classes?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question for primary and secondary private constructors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about access to inner private classes?

We currently do not allow them in inline methods. Not sure if we could use @binaryAPI to make the inner class public in the bytecode. I will investigate this possibility.

Same question for primary and secondary private constructors.

Same for the constructors. We would require @binaryAPI and not @binaryAPIAccessor in this case as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

19165b1 adresses the question on constructors. scala/scala3#16992 Has been updated to reflect this addition.


Example:
~~~ scala
class C {
@binaryAPIAccessor private def privateAPI: Int = ...
@binaryAPIAccessor def publicAPI: Int = ...
}
~~~
will generate the following bytecode signatures
~~~ java
public class C {
public C();
private int privateAPI();
public int publicAPI();
public final int C$$inline$privateAPI();
public final int C$$inline$publicAPI();
}
~~~

Note that the change from `private[this]` to package private, protected or public is a binary compatible change.
Removing this annotation is a binary incompatible change.

#### Binary API and inlining

If there is a reference to a binary API in an inline method we can use the definition without needing an inline accessor.

Example 3:
~~~ scala
class C {
@binaryAPI protected def a: Int = ...
protected def b: Int = ...
inline def foo: Int = a + b
}
~~~
before inlining the compiler will generate the accessors for inlined definitions
~~~ scala
class C {
@binaryAPI protected def a: Int = ...
protected def b: Int = ...
final def C$inline$b: Int = ...
inline def foo: Int = a + C$inline$b
}
~~~

Note that if the inlined member is `a` would be private, we would generate the accessor `C$inline$a`, which happens to be binary compatible with the automatically generated one.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this sentence (also I think there is a typo: "is a" should be "a"?), could an example be provided?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the example

This is only a tiny mitigation of binary compatibility issues compared with all the different ways accessors can be generated.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate on this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the example


### Specification

We must add `binaryAPI` and `binaryAPIAccessor` to the standard library.

```scala
package scala.annotation

final class binaryAPI extends scala.annotation.StaticAnnotation
final class binaryAPIAccessor extends scala.annotation.StaticAnnotation
```

#### `@binaryAPI` annotation

* Only valid on `def`, `val`, `lazy val`, `var`, `object`, and `given`.
* TASTy will contain references to non-public definitions that are out of scope but `@binaryAPI`. TASTy already allows those references.
* Annotated definition will be public in the generated bytecode. Definitions should be made public as early as possible in the compiler phases, as this can remove the need to create other accessors. It should be done after we check the accessibility of references.
nicolasstucki marked this conversation as resolved.
Show resolved Hide resolved


#### `@binaryAPIAccessor` annotation

* Only valid on `def`, `val`, `lazy val`, `var`, `object`, and `given`.
* An public accessor will be generated for the annotated definition. This accessor will be named `<fullClassName>$$inline$<definitionName>`.
nicolasstucki marked this conversation as resolved.
Show resolved Hide resolved

#### Inline

* Inlining will not require the generation of an inline accessor for binary APIs.
* Inlining will not require the generation of a new inline accessor, it will use the binary API accessors.
* The user will be warned if a new inline accessor is automatically generated.
The message will suggest `@binaryAPI` or `@binaryAPIAccessor` and how to fix potential incompatibilities.
Copy link
Member

@smarter smarter Mar 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the message suggest using either of them? It seems we should always suggest using the latter, because it would break the ABI of existing code called from inline defs if we add the former.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both can break the binary API.

  • @binaryAPI migration will always break binary compatibility.
  • Only if the is private/private[this] need @binaryAPIAccessor. These can break binary compatibility if the class is final.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code like this

class A:
  private[foo] val foo: Int = 1
  inline def inlined = foo // error

would show an error message like this

-- Error: inline-unstable-accessors.scala:3:4 ------------------------------------
10 |    valBinaryAPI + // error
   |    ^^^^^^^^^^^^^
   |    Generated unstable inline accessor for value foo defined in class A.
   |
   |    Annotate foo with `@binaryAPI` to make it accessible.
   |
   |    Adding @binaryAPI may break binary compatibility if a previous version of this
   |    library was compiled with Scala 3.0-3.3, Binary compatibility should be checked
   |    using MiMa. To keep binary you can add the following accessor to class A:
   |      @binaryAPI private[A] def foo$A$$inline$foo: Int = this.foo

More examples can be found in https://github.com/lampepfl/dotty/pull/16992/files#diff-d12e949663e52759a532bf54a827260b0513390bfcdebddf5ff1c1bd04919aaa

In a future version, these will become an error.

### Compatibility

The introduction of the `@binaryAPI` and `@binaryAPIAccessor` do not introduce any binary incompatibility.

Using references to `@binaryAPI` and `@binaryAPIAccessor` in inline code can cause binary incompatibilities. These incompatibilities are equivalent to the ones that can occur due to the unsoundness we want to fix. When migrating to binary APIs, the compiler will show the implementation of accessors that the users need to add to keep binary compatibility with pre-binaryAPI code.

A definition can be both `@binaryAPI` and `@binaryAPIAccessor`. This would be used to indicate that the definition used to be private, but now we want to publish it as public. The definition would become public, and the accessor would be generated for binary compatibility.
Copy link
Member

@lrytz lrytz Mar 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the compiler will show the implementation of accessors that the users need to add to keep binary compatibility

...

the accessor would be generated for binary compatibility

IIUC, currently accessors are generated next to the inline method, whereas @binaryAPIAccessor wil generate an accessor next to the annotated private method, so that migration cannot be made in a binary compatbile manner - right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently accessors are generated next to the inline method

This is the case if the inline method is in a different class from the accessed member.

On the other hand, if the inline method is in the class the accessor will be placed in the class that defines the accessed members, similar to @binaryAPIAccessor. In some cases, these accessors are the same and we avoid binary incompatibilities in the migration from auto-generated to @binaryAPIAccessor (example 1).

@binaryAPIAccessor wil generate an accessor next to the annotated private method, so that migration cannot be made in a binary compatbile manner?

In general, the migration can't be made binary-compatible. We chose to reuse the accessor name <fullClassName>$$inline$<definitionName> to be binary compatible with some auto-generated accessors. The accessors that are binary compatible are the ones in a non-final class that are generated by an inline method in that class (example 1).

Example 1 (compatible):

class A(x: Int):
  inline def y = x + 1 // inline def y = A$$inline$x + 1
  // def A$$inline$x = x // auto-generated

migrates to

class A(@binaryAPIAccessor x: Int):
  inline def y = x + 1 // inline def y = A$$inline$x + 1
  // def A$$inline$x = x // generated by @binaryAPIAccessor

Example 2 (incompatible):

final class A(x: Int):
  inline def y = x + 1 // inline def y = inline$x + 1
  // def inline$x = x // auto-generated

migrates to

class A(@binaryAPIAccessor x: Int):
  inline def y = x + 1 // inline def y = A$$inline$x + 1
  def inline$x = x // user needs to add this. Migration warning shows tells the users shows this code
  // def A$$inline$x = x // generated by @binaryAPIAccessor

Example 3 (incompatible):

package p
final class A(private[p] x: Int)
package p
inline def f(a: A): Int = a.x // inline def f(a: A): Int = A$$inline$x(a) 
// def A$$inline$x(a: A) = a.x // auto-generated

migrates to

package p
final class A(@binaryAPI private[p] x: Int)
package p
inline def f(a: A): Int = a.x // inline def f(a: A): Int = a.x
def A$$inline$x(a: A) = a.x // user needs to add this. Migration warning shows tells the users shows this code

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, the migration can't be made binary-compatible

In that case, it's maybe not worth introducing both @binaryAPI and @binaryAPIAccessor? Is it always possible to start using @binaryAPI and maintain binary compatibility by manually adding the accessors that were previously generated by the compiler?

Copy link
Contributor Author

@nicolasstucki nicolasstucki Apr 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue with that approach is that users would need to change all their private/private[this] members that are accessed from inline methods to protected/private[Cls]/pribate[pack]. This can cause source incompatibilities and unintentionally leak private members beyond the scope of inlining.

Copy link
Member

@lrytz lrytz Apr 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand, I'm sorry. Where am I wrong?

  • Currently, if a private method is accessed in an inline def, the compiler generates an accessor to the private method
  • The user can annotate the method @binaryAPI (but keep it private in source), which makes it public in bytecode, and the accessor is no longer generated
  • To maintain binary compatibility, the user can write the accessor that was previously generated into the source code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user can annotate the method @binaryapi (but keep it private in source), which makes it public in bytecode, and the accessor is no longer generated

@binaryAPI cannot be placed on private/private[this] APIs. This is because that change could be a source or binary incompatible change. For example:

class A:
  @binaryAPI private val x: Int = 1 // should not be allowed
  def f: Int = x
class B extends A:
  val x: Int = 2

If we make x public in the bytecode, we would accidentally make B.x override A.x which would change the semantics of a private definition. Or we would have to flag them as overrides of private members, which implies that we would have to rename anything that would clash with a @binaryAPI private. In this case we would not be able to use the name x in B anymore.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right, thanks. Was name-mangling considered?

"Only add @binaryAPIAccessor" could also be an option. The concern

The drawback is that we would add code size and runtime overhead to all uses of this feature

is maybe minor? Accessors are very common in Scala, why would these ones be problematic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Only add @binaryAPIAccessor" could also be an option.

If we do that we would not cover 2 of the motivations: Removing deprecated APIs and No way to inline reference to private constructors. Those use cases could be supported later with the addition of @binaryAPI.

If we add @binaryAPI later there is also a migration concern. In a first step users would add @binaryAPIAccessor to all the definitions, but then when @binaryAPI they might also want to add that one to access the member directly. This would force them to have both @binaryAPIAccessor and @binaryAPI. If we add both at the same time, then they would be able to only use @binaryAPI.

Ah, right, thanks. Was name-mangling considered?

I assume with @binaryAPIAccessor of private members. The issue is that we would need to mangle the names of private members but not public/protected members. This implies that if the user changes a private member into a public one the accessor would not be generated anymore. This is the exact same issue we have right now but triggered in a different way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the details. I don't want to be annoying and keep discussing details here, the discussion should move elsewhere?

I think introducing two annotations is bad. Users will have to pick one or the other, they have to know how they both work and how using them affects binary compatibility.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see a scenario where a user would need to think about which annotation to choose. In the scenario of Removing deprecated APIs they can only use @binaryAPI. In all other scenarios with inlining, the users would be told which annotation they need to use.


### Other concerns

* Tools that analyze inlined TASTy code might need to know about `@binaryAPI`. For example [MiMa](https://github.com/lightbend/mima/) and [TASTy MiMa](https://github.com/scalacenter/tasty-mima).

### Open questions

#### Question 1
Should `@binaryAPIAccessor` accessors be named `<fullClassName>$$<definitionName>`? This encoding would match the names of `trait` accessor generated for private definition. We could use a single accessor instead of two. This would introduce an extra binary incompatibility with pre-binaryAPI code.

#### Question 2
```scala
class A:
@binaryAPIAccessor protected def protectedDef: Int = ...
class B extends A:
override protected def protectedDef: Int = ...
inline def inlinedDef: Int =
// Should this use the accessor of generated for `A.protectedDef`? Or should we warn that `protectedDef` should be a `@binaryAPI`
protectedDef
```

## Alternatives

Having alternatives is not a strict requirement for a proposal, but having at least one with carefully exposed pros and cons gives much more weight to the proposal as a whole. -->

### Only add `@binaryAPI`
This would simplify the system and the user interaction with this feature. The drawback is that we could not access `private[this]` definitions in inline code. Users would need to use `private[C]` instead, which could cause name clashes.

### Only add `@binaryAPIAccessor`
This would simplify the system and the user interaction with this feature. The drawback is that we would add code size and runtime overhead to all uses of this feature. It would not solve the [Removing deprecated APIs](#removing-deprecated-apis) motivation.

## Related work

* Proof of concept: https://github.com/lampepfl/dotty/pull/16992
nicolasstucki marked this conversation as resolved.
Show resolved Hide resolved
* Initial discussions: https://github.com/lampepfl/dotty/issues/16983
nicolasstucki marked this conversation as resolved.
Show resolved Hide resolved

<!-- ## FAQ -->