Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for more types #51

Closed
wants to merge 14 commits into from

Conversation

mcimadamore
Copy link
Collaborator

@mcimadamore mcimadamore commented Apr 15, 2024

This PR adds support for type-variables and wildcard type arguments in the code model JavaType's hierarchy.

This allows the code model to reflect the source types much more accurately, as we no longer need to erase the source type at the first sign of a non-denotable type. Instead, we can use the a modified (see below) version of the Types::upwards function (type projection) to compute the closest denotable upper bound to the type found in the source code. This means that the type associated with every op in the model is a (denotable) supertype of the type in the javac AST. The fact that such type is denotable has three important consequences:

  • the type can be expressed in the source code (in case the code model needs to be lifted back into Java source)
  • the type must be expressible in the syntax of bytecode signature attributes (this is important e.g. for the local variable type attribute)
  • the type can be resolved to its runtime counterpart in j.l.r.Type (not implemented in this PR), as explained below

Some parser changes were required to support this, so that we can serialize and deserialize the new types accordingly.

A new method has been added to JavaType, namely JavaType::erasure, which computes the erasure of a JavaType. This might come in handy when lowering the model into bytecode. Since supporting erasure is crucial, modelling of types has been carefully chosen as to facilitate that operation as much as possible: that is why, for example, TypeVariableRef contains the "principal" type-variable bound (so that we can define erasure for type-variables in a straightforward fashion, as the erasure of the primary bound).

Denotable projections

The code model type associated with an op result is computed by applying a modified version of Types::upwards - that is, the function that implements type projections as specified in JLS 4.10.5. The original projection algorithm is designed to leave intersection types in place - while this is handy, as it maximizes the applicability of the type inferred for local variables declared with var, for the code model use this is not suitable, as we'd like to get to a denotable type in the end (jshell has a similar problem, which was addressed in a more ad-hoc way).

It is generally possible to project an intersection type using only one of its bounds, e.g.

List<A & B>

Is projected to:

List<? extends A>

There are, however, problems when projecting intersection types that are on the right of some lower-bounded wildcard - e.g.

List<? super A & B>

In this case, projecting to List<? super A> is not valid, as List<? super A> is not a supertype of List<? super A & B>. For this reason, in these cases we have to fallback to an unbounded wildcard List<?>.

Runtime resolution

Support for runtime resolution of elements in the JavaType hierarchy is possible, as there is a subtype of j.l.r.Type for each of the subtypes in JavaType. The main problem is being able to resolve type-variables: in the current modelling, type-variable types only have a name, and names can be ambiguous. That is, it could be possible for a type-variable with same name to be defined at different levels in the source code:

class Foo<X> { //1
    <X> void test() { ... } // 2
}

To allow for better disambiguation we need to add ownership information to the TypeVariableRef class. This could point to either another JavaType (if the type-variable is a class type-variable), or to a MethodRef in case the type-variable is defined in a method. In this PR I didn't want to tackle to problem of modelling this additional information (that will come in a follow-up PR). Once the proper ownership info is in place, we might add code to enable runtime resolution of JavaTypes.

Update

After some consideration, I have also added support for ownership info in type-variables. A type variable reference can now have a parent method or class (the source element which declared the type-variable). In the former case, a MethodRef is used, in the latter case a JavaType is used. The string representation for type-variables is a tad convoluted. For class type-variables:

#Foo::X

While for method type-variables:

#Foo::bar()Baz::Z

The parser has been adjusted accordingly.


Progress

  • Change must not contain extraneous whitespace

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/babylon.git pull/51/head:pull/51
$ git checkout pull/51

Update a local copy of the PR:
$ git checkout pull/51
$ git pull https://git.openjdk.org/babylon.git pull/51/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 51

View PR using the GUI difftool:
$ git pr show -t 51

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/babylon/pull/51.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Apr 15, 2024

👋 Welcome back mcimadamore! A progress list of the required criteria for merging this PR into code-reflection will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Apr 15, 2024

@mcimadamore This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

Add support for more types

Reviewed-by: psandoz

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 11 new commits pushed to the code-reflection branch:

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the code-reflection branch, type /integrate in a new comment.

@openjdk
Copy link

openjdk bot commented Apr 15, 2024

@mcimadamore this pull request can not be integrated into code-reflection due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout projections
git fetch https://git.openjdk.org/babylon.git code-reflection
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge code-reflection"
git push

@openjdk openjdk bot added the merge-conflict Pull request has merge conflict with target branch label Apr 15, 2024
Type quotedReturnType = new ClassType(null,
com.sun.tools.javac.util.List.of(quotedOpType), syms.quotedType.tsym);
MethodType mtype = new MethodType(nil, quotedReturnType, nil, syms.methodClass);
MethodType mtype = new MethodType(nil, syms.quotedType, nil, syms.methodClass);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code seemed to try to parameterized the Quoted type, which is no (longer) a generic type. This was causing a crash in the logic for computing the set of captured variables of a given type (types::captures).

This change is what caused the fixes in the two reflect/code tests, as the tests were also expecting a parameterized Quoted type.

*/
public final class TypeVarRef implements JavaType {

// @@@: how do we encode tvar owner?
Copy link
Collaborator Author

@mcimadamore mcimadamore Apr 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the comment indicates, ideally a type-variable reference should also points to its owner (a type or a method). I'm not 100% sure how to encode that in the TypeElement structure (see also the toplevel PR summary).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now handled as part of 52fc6e9

@openjdk openjdk bot removed the merge-conflict Pull request has merge conflict with target branch label Apr 15, 2024
Tweak type projection to eliminate intersections/unions
Tweak tests
Add erasure method
@mcimadamore mcimadamore changed the title Add support for non-denotable types Add support for more types Apr 23, 2024
@mcimadamore mcimadamore marked this pull request as ready for review April 23, 2024 13:08
@openjdk openjdk bot added ready Pull request is ready to be integrated rfr Pull request is ready for review labels Apr 23, 2024
@mlbridge
Copy link

mlbridge bot commented Apr 23, 2024

Webrevs

@@ -2236,22 +2240,15 @@ FieldRef symbolToFieldRef(Symbol s, Type site) {
// @@@ Made Gen::binaryQualifier public, duplicate logic?
// Ensure correct qualifying class is used in the reference, see JLS 13.1
// https://docs.oracle.com/javase/specs/jls/se20/html/jls-13.html#jls-13.1
return symbolToFieldRef(gen.binaryQualifier(s, types.erasure(site)));
return symbolToErasedFieldRef(gen.binaryQualifier(s, types.erasure(site)));
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized there was an issue here, as the field reference was not using erased types, and so it was incompatible with the binary qualifier used in codegen

@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Apr 25, 2024
while (l.acceptIf(Tokens.TokenKind.DOT)) {
identifier.append(Tokens.TokenKind.DOT.name);
t = l.accept(Tokens.TokenKind.IDENTIFIER);
identifier.append(t.name());
}

if (l.token().kind == TokenKind.COLCOL && isTypeVar) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we see #Foo::Bar we might be seeing two things:

  • a type-variable Bar in class Foo
  • a method type-variable in some method Foo.Bar

So, we need to disambiguate based on what follows. E.g. if after Bar we see (, then we know we're in the method case.

t = l.accept(TokenKind.IDENTIFIER); // type-var or method name
identifier.append(t.name());
if (l.token().kind == TokenKind.LPAREN) {
FunctionType functionType = parseMethodType(l);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that here we parse, then we throw away, as the type definition only wants a string-based identifier, so we'll need to reparse the identifier string again in the type factory.

if (typeArguments.size() != 1) {
throw new IllegalArgumentException("Bad type-variable bounds: " + tree);
}
String[] parts = identifier.split("::");
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this is the duplicate parsing logic (although here we already know if it's a method or a class type-variable based on the number of ::)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if instead we can check #, and the parsers job is dumbly accumulate all valid characters (selected tokens and identifiers) up to but not including the < token. We could even check if there is quoted string for the type identifier.

Note the special code for arrays in the parser was added only to avoid updating many tests.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can take a look

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uploaded a new iteration with this simplification (which looks much nicer than what I had):

8ad6110

Note that if we wanted a truly general "quoting" mechanism we'd need both a prefix and a suffix token. Otherwise one can only use quotes if there's some nested type-definition with <>. Your idea of using just strings (e.g. surrounded with ") seems a powerful one (and more robust in the long run), because it would make the desc parsing logic a lot less opinionated (e.g. we wouldn't even need to special case qualified identifiers).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's much simpler. We can iterate further afterwards if need be. I believe you can now replace identifier.contains("::") with identifier.startsWith("#")?

@openjdk openjdk bot added ready Pull request is ready to be integrated rfr Pull request is ready for review labels Apr 25, 2024
@mcimadamore
Copy link
Collaborator Author

From a modelling perspective, it would be cleaner to have a TypeArgument interface that is not a sub-interface of JavaType. Then ClassType, ArrayType, WildcardType and TypeVariableRef can implement that interface. This would allow us to state clearly in the API that the type arguments of a ClassType must be of type TypeArgument, and that WildcardType is not really a type.

But doing this in the current world is painful: all types have a uniform structure (identifier + list of type elements), which pushes us towards modelling wildcards using proper types (otherwise parsing becomes very convoluted).

To be honest, with the recent changes to DescParser to parse additional types (esp. type-variables) it seems to me that the distinction between "generic parsing" and "Java-specific parsing" has been lost somewhat (e.g. DescParser has special code which needs to be ready for the specific needs of java types).

@PaulSandoz
Copy link
Member

PaulSandoz commented Apr 25, 2024

I really like core principle of projecting upwards to a (or the nearest?) denotable supertype. It really simplifies things and is generally easy to grasp, even if the actual details can be hard to understand e.g., the set of Java types expressible in the code model is almost the same as the set of the types one can express in source code.

I agree with you having a clearer distinction for modeling type arguments, it may be useful to have a top-level Java type'ish interface covering Java type and java type argument. This seems possible, the Java type factory can create whatever instances it wants based off the type identifier information e.g.,

            if (identifier.equals("+") || identifier.equals("-")) {
                // wildcard type
                BoundKind kind = identifier.equals("+") ?
                        BoundKind.EXTENDS : BoundKind.SUPER;
                return JavaTypeArgument.wildcard(kind, typeArguments.get(0));

?

@mcimadamore
Copy link
Collaborator Author

This seems possible, the Java type factory can create whatever instances it wants based off the type identifier information e.g.,

Yes, TypeDefinition is identifier plus List<TypeDefinition>. So we have some flexibility in there. I was assuming we wanted 1-1 relationship between TypeDefinition and JavaTypes, but that doesn't need to be the case.

@PaulSandoz
Copy link
Member

Yes, TypeDefinition is identifier plus List<TypeDefinition>. So we have some flexibility in there. I was assuming we wanted 1-1 relationship between TypeDefinition and JavaTypes, but that doesn't need to be the case.

Right, this enables us to serialize and parse non-Java-based code models with some non-Java-like type descriptions.

@openjdk openjdk bot added merge-conflict Pull request has merge conflict with target branch and removed ready Pull request is ready to be integrated labels Apr 26, 2024
@openjdk
Copy link

openjdk bot commented Apr 26, 2024

@mcimadamore Please do not rebase or force-push to an active PR as it invalidates existing review comments. Note for future reference, the bots always squash all changes into a single commit automatically as part of the integration. See OpenJDK Developers’ Guide for more information.

@openjdk openjdk bot added ready Pull request is ready to be integrated and removed merge-conflict Pull request has merge conflict with target branch labels Apr 26, 2024
@mcimadamore
Copy link
Collaborator Author

I gave this a try, but I don't think we should pursue this, at least not as part of this patch. Here's some code I put together:

https://github.com/mcimadamore/babylon/compare/projections...mcimadamore:babylon:java_type_argument?expand=1

I think I got the parser working, but then we're greeted with a death-by-thousands cuts situation where most classes use JavaType to mean "type argument" (and they use that to call the JavaType.type factory for parameterized types). If we tweak that factory to take JavaType.Argument instead (as I did in that branch), then several calls to the factory start failing, and we need to add casts instead. The situation is not helped by the fact that the JavaType factories are not always sharp (e.g. the factory type returns just JavaType.

This is aggravated by the fact that there's no type to say "a JavaType that is also a type argument". As a result, JavaType casts too wide a net (because of primitive types), but TypeArgument is too sharp, as it contains stuff (wildcards) that are not JavaType.

Overall it wasn't clear to me that doing this refactoring would be beneficial, especially as part of a PR that is already relatively big - given that the refactoring doesn't seem the "slam dunk" we were looking for.

Yes, TypeDefinition is identifier plus List<TypeDefinition>. So we have some flexibility in there. I was assuming we wanted 1-1 relationship between TypeDefinition and JavaTypes, but that doesn't need to be the case.

Right, this enables us to serialize and parse non-Java-based code models with some non-Java-like type descriptions.

if (typeArguments.size() != 1) {
throw new IllegalArgumentException("Bad type-variable bounds: " + tree);
}
String[] parts = identifier.split("::");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's much simpler. We can iterate further afterwards if need be. I believe you can now replace identifier.contains("::") with identifier.startsWith("#")?

@mcimadamore
Copy link
Collaborator Author

/integrate

@openjdk
Copy link

openjdk bot commented Apr 26, 2024

Going to push as commit 6713aca.
Since your change was applied there have been 11 commits pushed to the code-reflection branch:

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Apr 26, 2024
@openjdk openjdk bot closed this Apr 26, 2024
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Apr 26, 2024
@openjdk
Copy link

openjdk bot commented Apr 26, 2024

@mcimadamore Pushed as commit 6713aca.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
integrated Pull request has been integrated
2 participants