-
Notifications
You must be signed in to change notification settings - Fork 385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add JavaAPICompletenessChecker script to find methods that are missing from the Java API #713
Add JavaAPICompletenessChecker script to find methods that are missing from the Java API #713
Conversation
This is used to find methods in the Scala API that need to be ported to the Java API. To use it: ./run spark.tools.JavaAPICompletenessChecker
Thank you for submitting this pull request. Unfortunately, the automated tests for this request have failed. |
This is pretty cool! The only awkward thing is the long list of methods excluded by name, but I guess we have to do that for now, unless we want to mark those methods with some kind of annotation. In Scala 2.10 it might be possible to use Scala's reflection facilities to determine the original (Scala-level) visibility of a method. |
I agree that the list of excluded methods is a bit awkward. I tried marking those methods with an Scala 2.10 reflection probably solves this problem. I asked on StackOverflow to see if there were any easy ways to identify The list of excluded methods is annoying, but I don't think it's a major maintenance burden: if a method is missing from this exclusion list, the worst thing that happens is that we get a few false-positives in the list of missing methods. |
Okay, makes sense. One other question: why is |
Ah, maybe that's a type-erasure problem.
|
Jenkins, please retest |
Jenkins, retest this please |
Thank you for submitting this pull request. Unfortunately, the automated tests for this request have failed. |
So according to https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/236/consoleFull, this PR somehow causes the compilation to fail on Jenkins (you can see "error Compilation failed"). Any thought on why? |
Ah, it looks like I accidentally included the code that used the Jenkins, retest this please |
Thank you for your pull request. All automated tests for this request have passed. |
Okay, thanks! Going to merge this in then. |
Add JavaAPICompletenessChecker script to find methods that are missing from the Java API
Added this to master as well. |
This pull request adds a script for automatically identifying methods in Spark's Scala API that need to be ported to its Java API. The script, JavaAPICompletenessChecker, works by enumerating the public methods in the Scala API, converting the method signatures to their Java equivalents using a set of rewrite rules, and searching the Java API for those methods. For each missing method, the tool prints its expected Java signature.
This technique isn't perfect, but it does a pretty good job of finding missing methods or methods with signatures that are inconsistent with the Java API conventions. For example, check out the list of methods missing from the 0.7.3 Java API.
In order to complete the Java API in both 0.7 and master, I'm submitting this pull request against branch-0.7. My plan is to first add the missing methods in 0.7, then cherry-pick those commits into master and add the missing methods from master.
I created a new
tools
subproject to hold this tool. I considered putting it inexamples
, but I thought that would be confusing. It would be possible to store this tool in its own Scala / Maven project that adds Spark as a dependency, but including it in Spark proper makes it easier to iteratively re-run the tool as you port methods to the Java API.One caveat about this tool: for Java API methods that are overloaded on
spark.java.api.function.*
types, the tool only reports the non-specialized methods; implementors will still need to implement the additional methods for PairFunction, DoubleFunction, etc. Similarly, it doesn't catch methods that need to be re-implemented in JavaDoubleRDD, JavaPairRDD, and JavaRDD, so it won't catch mistakes like implementing same-result-type methods likemap
in JavaRDD but forgetting to implement them elsewhere. Both of these limitations are fixable, but they'd add a lot of complexity and I think the current code is still useful as-is (I may fix this later, but I'm a bit tired of working on this for now).