Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add JavaAPICompletenessChecker script to find methods that are missing from the Java API #713

Merged
merged 2 commits into from
Jul 21, 2013

Conversation

JoshRosen
Copy link
Member

This pull request adds a script for automatically identifying methods in Spark's Scala API that need to be ported to its Java API. The script, JavaAPICompletenessChecker, works by enumerating the public methods in the Scala API, converting the method signatures to their Java equivalents using a set of rewrite rules, and searching the Java API for those methods. For each missing method, the tool prints its expected Java signature.

This technique isn't perfect, but it does a pretty good job of finding missing methods or methods with signatures that are inconsistent with the Java API conventions. For example, check out the list of methods missing from the 0.7.3 Java API.

In order to complete the Java API in both 0.7 and master, I'm submitting this pull request against branch-0.7. My plan is to first add the missing methods in 0.7, then cherry-pick those commits into master and add the missing methods from master.

I created a new tools subproject to hold this tool. I considered putting it in examples, but I thought that would be confusing. It would be possible to store this tool in its own Scala / Maven project that adds Spark as a dependency, but including it in Spark proper makes it easier to iteratively re-run the tool as you port methods to the Java API.

One caveat about this tool: for Java API methods that are overloaded on spark.java.api.function.* types, the tool only reports the non-specialized methods; implementors will still need to implement the additional methods for PairFunction, DoubleFunction, etc. Similarly, it doesn't catch methods that need to be re-implemented in JavaDoubleRDD, JavaPairRDD, and JavaRDD, so it won't catch mistakes like implementing same-result-type methods like map in JavaRDD but forgetting to implement them elsewhere. Both of these limitations are fixable, but they'd add a lot of complexity and I think the current code is still useful as-is (I may fix this later, but I'm a bit tired of working on this for now).

This is used to find methods in the Scala API that
need to be ported to the Java API.  To use it:

  ./run spark.tools.JavaAPICompletenessChecker
@AmplabJenkins
Copy link

Thank you for submitting this pull request. Unfortunately, the automated tests for this request have failed.
Refer to this link for build results: http://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/227/

@mateiz
Copy link
Member

mateiz commented Jul 18, 2013

This is pretty cool! The only awkward thing is the long list of methods excluded by name, but I guess we have to do that for now, unless we want to mark those methods with some kind of annotation. In Scala 2.10 it might be possible to use Scala's reflection facilities to determine the original (Scala-level) visibility of a method.

@JoshRosen
Copy link
Member Author

I agree that the list of excluded methods is a bit awkward.

I tried marking those methods with an @ExcludedFromJavaAPI annotation, but that's only a partial solution: annotating a private[spark] val myValue won't work as expected, since we can only apply the annotation to the val while we need the annotation to apply to the compiler-generated accessor method myValue().

Scala 2.10 reflection probably solves this problem. I asked on StackOverflow to see if there were any easy ways to identify private[mypackage] methods in Scala 2.9: it looks like the best approach involves parsing these binary @scala.reflect.ScalaSignature annotations generated by the Scala compiler, but that seemed too complicated.

The list of excluded methods is annoying, but I don't think it's a major maintenance burden: if a method is missing from this exclusion list, the worst thing that happens is that we get a few false-positives in the list of missing methods.

@mateiz
Copy link
Member

mateiz commented Jul 18, 2013

Okay, makes sense. One other question: why is spark.api.java.JavaRDD<T>.filter(spark.api.java.function.Function<T, java.lang.Object>) on the list? Is it just misunderstanding what the return value of the filter function should be?

@JoshRosen
Copy link
Member Author

Ah, maybe that's a type-erasure problem. javap gives this for RDD.filter():

  public spark.RDD<T> filter(scala.Function1<T, java.lang.Object>);
    Signature: (Lscala/Function1;)Lspark/RDD;

@mateiz
Copy link
Member

mateiz commented Jul 20, 2013

Jenkins, please retest

@mateiz
Copy link
Member

mateiz commented Jul 20, 2013

Jenkins, retest this please

@AmplabJenkins
Copy link

Thank you for submitting this pull request. Unfortunately, the automated tests for this request have failed.
Refer to this link for build results: http://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/236/

@mateiz
Copy link
Member

mateiz commented Jul 21, 2013

So according to https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/236/consoleFull, this PR somehow causes the compilation to fail on Jenkins (you can see "error Compilation failed"). Any thought on why?

@JoshRosen
Copy link
Member Author

Ah, it looks like I accidentally included the code that used the @ExcludeFromJavaAPI annotation (which I decided not to use). I forgot to include ExcludeFromJavaAPI.java, causing this build failure. It should be fixed now, though.

Jenkins, retest this please

@AmplabJenkins
Copy link

Thank you for your pull request. All automated tests for this request have passed.
Refer to this link for build results: http://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/238/

@mateiz
Copy link
Member

mateiz commented Jul 21, 2013

Okay, thanks! Going to merge this in then.

mateiz added a commit that referenced this pull request Jul 21, 2013
Add JavaAPICompletenessChecker script to find methods that are missing from the Java API
@mateiz mateiz merged commit 4d56ec4 into mesos:branch-0.7 Jul 21, 2013
@mateiz
Copy link
Member

mateiz commented Jul 22, 2013

Added this to master as well.

@JoshRosen JoshRosen deleted the java-api-completeness-checker branch July 23, 2013 00:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants