Fix Javadoc (Unidoc) generation issue in PR 24434 #5

HyukjinKwon · 2019-04-23T13:55:57Z

What changes were proposed in this pull request?

This PR proposes to fix the javadoc generation issue in PR apache#24434.

Unidoc does not properly handle traits given my observation. The root cause is 4 errors below:

[info] Generating /home/jenkins/workspace/SparkPullRequestBuilder/target/javaunidoc/org/apache/spark/streaming/rdd/MapWithStateRDDSuite.html...
[error] /home/jenkins/workspace/SparkPullRequestBuilder/core/target/java/org/apache/spark/SparkFunSuite.java:35: error: reference not found
[error]    * Note: this method doesn't support {@link BeforeAndAfter}. You must use {@link BeforeAndAfterEach} to
[error]                                               ^
[error] /home/jenkins/workspace/SparkPullRequestBuilder/core/target/java/org/apache/spark/SparkFunSuite.java:35: error: reference not found
[error]    * Note: this method doesn't support {@link BeforeAndAfter}. You must use {@link BeforeAndAfterEach} to
[error]                                                                                    ^
[error] /home/jenkins/workspace/SparkPullRequestBuilder/core/target/java/org/apache/spark/SparkFunSuite.java:44: error: reference not found
[error]    * Note: this method doesn't support {@link BeforeAndAfter}. You must use {@link BeforeAndAfterEach} to
[error]                                               ^
[error] /home/jenkins/workspace/SparkPullRequestBuilder/core/target/java/org/apache/spark/SparkFunSuite.java:44: error: reference not found
[error]    * Note: this method doesn't support {@link BeforeAndAfter}. You must use {@link BeforeAndAfterEach} to
[error]                                                                                    ^
...
[info] 4 errors

Unidoc (via javadoc with SBT) shows warnings as errors when there is at least one error is found. See SPARK-20840. I gave few tries before but it wasn't easy to fix.

Therefore, this PR works around the 4 occurrences of Unidoc link bugs via making the link just as a code block. This has been a usual workaround to avoid this.

How was this patch tested?

Jenkins SBT build command:

./build/sbt  -Phadoop-2.7 -Pkubernetes -Phive-thriftserver -Pkinesis-asl -Pyarn -Pspark-ganglia-lgpl -Phive -Pmesos clean test:package streaming-kinesis-asl-assembly/assembly
./build/sbt  -Phadoop-2.7 -Pkubernetes -Phive-thriftserver -Pkinesis-asl -Pyarn -Pspark-ganglia-lgpl -Phive -Pmesos unidoc

srowen · 2019-04-23T14:04:36Z

Oh crazy, OK. Yeah I know we actually have a ton of doc-related errors given how genjavadoc works. That's a good explanation. OK, I will also try updating genjavadoc to see if it fixes some of that.

HyukjinKwon · 2019-04-24T06:27:36Z

Looks upgrading fixed some other cases but not this :( .. Let's get this in.

## What changes were proposed in this pull request? This PR aims at improving the way physical plans are explained in spark. Currently, the explain output for physical plan may look very cluttered and each operator's string representation can be very wide and wraps around in the display making it little hard to follow. This especially happens when explaining a query 1) Operating on wide tables 2) Has complex expressions etc. This PR attempts to split the output into two sections. In the header section, we display the basic operator tree with a number associated with each operator. In this section, we strictly control what we output for each operator. In the footer section, each operator is verbosely displayed. Based on the feedback from Maryann, the uncorrelated subqueries (SubqueryExecs) are not included in the main plan. They are printed separately after the main plan and can be correlated by the originating expression id from its parent plan. To illustrate, here is a simple plan displayed in old vs new way. Example query1 : ``` EXPLAIN SELECT key, Max(val) FROM explain_temp1 WHERE key > 0 GROUP BY key HAVING max(val) > 0 ``` Old : ``` *(2) Project [key#2, max(val)apache#15] +- *(2) Filter (isnotnull(max(val#3)apache#18) AND (max(val#3)apache#18 > 0)) +- *(2) HashAggregate(keys=[key#2], functions=[max(val#3)], output=[key#2, max(val)apache#15, max(val#3)apache#18]) +- Exchange hashpartitioning(key#2, 200) +- *(1) HashAggregate(keys=[key#2], functions=[partial_max(val#3)], output=[key#2, max#21]) +- *(1) Project [key#2, val#3] +- *(1) Filter (isnotnull(key#2) AND (key#2 > 0)) +- *(1) FileScan parquet default.explain_temp1[key#2,val#3] Batched: true, DataFilters: [isnotnull(key#2), (key#2 > 0)], Format: Parquet, Location: InMemoryFileIndex[file:/user/hive/warehouse/explain_temp1], PartitionFilters: [], PushedFilters: [IsNotNull(key), GreaterThan(key,0)], ReadSchema: struct<key:int,val:int> ``` New : ``` Project (8) +- Filter (7) +- HashAggregate (6) +- Exchange (5) +- HashAggregate (4) +- Project (3) +- Filter (2) +- Scan parquet default.explain_temp1 (1) (1) Scan parquet default.explain_temp1 [codegen id : 1] Output: [key#2, val#3] (2) Filter [codegen id : 1] Input : [key#2, val#3] Condition : (isnotnull(key#2) AND (key#2 > 0)) (3) Project [codegen id : 1] Output : [key#2, val#3] Input : [key#2, val#3] (4) HashAggregate [codegen id : 1] Input: [key#2, val#3] (5) Exchange Input: [key#2, max#11] (6) HashAggregate [codegen id : 2] Input: [key#2, max#11] (7) Filter [codegen id : 2] Input : [key#2, max(val)#5, max(val#3)apache#8] Condition : (isnotnull(max(val#3)apache#8) AND (max(val#3)apache#8 > 0)) (8) Project [codegen id : 2] Output : [key#2, max(val)#5] Input : [key#2, max(val)#5, max(val#3)apache#8] ``` Example Query2 (subquery): ``` SELECT * FROM explain_temp1 WHERE KEY = (SELECT Max(KEY) FROM explain_temp2 WHERE KEY = (SELECT Max(KEY) FROM explain_temp3 WHERE val > 0) AND val = 2) AND val > 3 ``` Old: ``` *(1) Project [key#2, val#3] +- *(1) Filter (((isnotnull(KEY#2) AND isnotnull(val#3)) AND (KEY#2 = Subquery scalar-subquery#39)) AND (val#3 > 3)) : +- Subquery scalar-subquery#39 : +- *(2) HashAggregate(keys=[], functions=[max(KEY#26)], output=[max(KEY)apache#45]) : +- Exchange SinglePartition : +- *(1) HashAggregate(keys=[], functions=[partial_max(KEY#26)], output=[max#47]) : +- *(1) Project [key#26] : +- *(1) Filter (((isnotnull(KEY#26) AND isnotnull(val#27)) AND (KEY#26 = Subquery scalar-subquery#38)) AND (val#27 = 2)) : : +- Subquery scalar-subquery#38 : : +- *(2) HashAggregate(keys=[], functions=[max(KEY#28)], output=[max(KEY)apache#43]) : : +- Exchange SinglePartition : : +- *(1) HashAggregate(keys=[], functions=[partial_max(KEY#28)], output=[max#49]) : : +- *(1) Project [key#28] : : +- *(1) Filter (isnotnull(val#29) AND (val#29 > 0)) : : +- *(1) FileScan parquet default.explain_temp3[key#28,val#29] Batched: true, DataFilters: [isnotnull(val#29), (val#29 > 0)], Format: Parquet, Location: InMemoryFileIndex[file:/user/hive/warehouse/explain_temp3], PartitionFilters: [], PushedFilters: [IsNotNull(val), GreaterThan(val,0)], ReadSchema: struct<key:int,val:int> : +- *(1) FileScan parquet default.explain_temp2[key#26,val#27] Batched: true, DataFilters: [isnotnull(key#26), isnotnull(val#27), (val#27 = 2)], Format: Parquet, Location: InMemoryFileIndex[file:/user/hive/warehouse/explain_temp2], PartitionFilters: [], PushedFilters: [IsNotNull(key), IsNotNull(val), EqualTo(val,2)], ReadSchema: struct<key:int,val:int> +- *(1) FileScan parquet default.explain_temp1[key#2,val#3] Batched: true, DataFilters: [isnotnull(key#2), isnotnull(val#3), (val#3 > 3)], Format: Parquet, Location: InMemoryFileIndex[file:/user/hive/warehouse/explain_temp1], PartitionFilters: [], PushedFilters: [IsNotNull(key), IsNotNull(val), GreaterThan(val,3)], ReadSchema: struct<key:int,val:int> ``` New: ``` Project (3) +- Filter (2) +- Scan parquet default.explain_temp1 (1) (1) Scan parquet default.explain_temp1 [codegen id : 1] Output: [key#2, val#3] (2) Filter [codegen id : 1] Input : [key#2, val#3] Condition : (((isnotnull(KEY#2) AND isnotnull(val#3)) AND (KEY#2 = Subquery scalar-subquery#23)) AND (val#3 > 3)) (3) Project [codegen id : 1] Output : [key#2, val#3] Input : [key#2, val#3] ===== Subqueries ===== Subquery:1 Hosting operator id = 2 Hosting Expression = Subquery scalar-subquery#23 HashAggregate (9) +- Exchange (8) +- HashAggregate (7) +- Project (6) +- Filter (5) +- Scan parquet default.explain_temp2 (4) (4) Scan parquet default.explain_temp2 [codegen id : 1] Output: [key#26, val#27] (5) Filter [codegen id : 1] Input : [key#26, val#27] Condition : (((isnotnull(KEY#26) AND isnotnull(val#27)) AND (KEY#26 = Subquery scalar-subquery#22)) AND (val#27 = 2)) (6) Project [codegen id : 1] Output : [key#26] Input : [key#26, val#27] (7) HashAggregate [codegen id : 1] Input: [key#26] (8) Exchange Input: [max#35] (9) HashAggregate [codegen id : 2] Input: [max#35] Subquery:2 Hosting operator id = 5 Hosting Expression = Subquery scalar-subquery#22 HashAggregate (15) +- Exchange (14) +- HashAggregate (13) +- Project (12) +- Filter (11) +- Scan parquet default.explain_temp3 (10) (10) Scan parquet default.explain_temp3 [codegen id : 1] Output: [key#28, val#29] (11) Filter [codegen id : 1] Input : [key#28, val#29] Condition : (isnotnull(val#29) AND (val#29 > 0)) (12) Project [codegen id : 1] Output : [key#28] Input : [key#28, val#29] (13) HashAggregate [codegen id : 1] Input: [key#28] (14) Exchange Input: [max#37] (15) HashAggregate [codegen id : 2] Input: [max#37] ``` Note: I opened this PR as a WIP to start getting feedback. I will be on vacation starting tomorrow would not be able to immediately incorporate the feedback. I will start to work on them as soon as i can. Also, currently this PR provides a basic infrastructure for explain enhancement. The details about individual operators will be implemented in follow-up prs ## How was this patch tested? Added a new test `explain.sql` that tests basic scenarios. Need to add more tests. Closes apache#24759 from dilipbiswal/explain_feature. Authored-by: Dilip Biswal <dbiswal@us.ibm.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

Fix Javadoc generation issue in PR 24434

be7199c

HyukjinKwon mentioned this pull request Apr 23, 2019

[SPARK-27460][FOLLOW-UP][TESTS] Fix flaky tests apache/spark#24434

Closed

srowen mentioned this pull request Apr 23, 2019

[MINOR][BUILD] Update genjavadoc to 0.13 apache/spark#24443

Closed

gatorsmile merged commit e5a5991 into gatorsmile:fixFlakyTest Apr 24, 2019

HyukjinKwon deleted the fix-javadoc-issue branch March 3, 2020 01:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Javadoc (Unidoc) generation issue in PR 24434 #5

Fix Javadoc (Unidoc) generation issue in PR 24434 #5

HyukjinKwon commented Apr 23, 2019

srowen commented Apr 23, 2019

HyukjinKwon commented Apr 24, 2019

Fix Javadoc (Unidoc) generation issue in PR 24434 #5

Fix Javadoc (Unidoc) generation issue in PR 24434 #5

Conversation

HyukjinKwon commented Apr 23, 2019

What changes were proposed in this pull request?

How was this patch tested?

srowen commented Apr 23, 2019

HyukjinKwon commented Apr 24, 2019