Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue #113: Grow the code for relocatables, and do fixup, and relocate #114

Closed
wants to merge 1 commit into from

Conversation

HeartSaVioR
Copy link
Contributor

There's a bug discovered in #113 which padding is applied multiple times due to growing in relocatable. Once there's any relocatable growing the code, padding is re-applied without rolling back padding previously added. In some circumstance, applying multiple padding adds more bytes than padding between TABLESWITCH OP code and default parameter, and breaks bytecode.

Looking into codebase, I found the following:

  • Growing only happens in Branch
  • Once it adds spaces, expanded is marked as true, and no grow will be happen

Based on this logic, we may be able to simplify the logic of fixUpAndRelocate to sequential steps, grow code -> do fixUp -> relocate.

This patch adds new UT to reproduce #113 - it fails on latest master branch, and passes with this PR.

@HeartSaVioR
Copy link
Contributor Author

HeartSaVioR commented Mar 6, 2020

I'm not familiar with Janino hence I had to put lots of assumptions while fixing the code.

    /**
     * Fixes up all of the offsets and relocate() all relocatables.
     */
    public void
    fixUpAndRelocate() {


        // We do this in a loop to allow relocatables to adjust the size
        // of things in the byte stream.  It is extremely unlikely, but possible
        // that a late relocatable will grow the size of the bytecode, and require
        // an earlier relocatable to switch from 32K mode to 64K mode branching
        do {
            this.fixUp();
        } while (!this.relocate());
    }

Is there any chance for the PR breaking the comment? I'm not 100% sure but there might be the case if fixUp adds space between source and destination of relocatable which wasn't wide but becomes wide. I guess it should be extremely unlikely even if it's valid, but if this is a valid case, it's going to be complicated - padding should be corrected via new offset after growing.

aunkrig added a commit that referenced this pull request Mar 8, 2020
@aunkrig
Copy link
Member

aunkrig commented Mar 8, 2020

Hey Lim,

this is brilliant work!! The test case reproduces the problem, the fix works, and the program structure is even cleaner now. Congrats!! If all bug reports were so easy to process...

I merged your proposed changes exactly as they were.

CU Arno

@HeartSaVioR
Copy link
Contributor Author

Thanks for reviewing and merging!

@HeartSaVioR HeartSaVioR deleted the ISSUE-113 branch March 8, 2020 20:41
HeartSaVioR added a commit to HeartSaVioR/janino that referenced this pull request Mar 10, 2020
aunkrig added a commit that referenced this pull request Mar 16, 2020
…(or one time release for 3.0.16)

Backported PR #114 (the fix for issue #113) into the "3.0.x" branch.
dongjoon-hyun pushed a commit to apache/spark that referenced this pull request Mar 22, 2020
### What changes were proposed in this pull request?

This PR(SPARK-31101) proposes to upgrade Janino to 3.0.16 which is released recently.

* Merged pull request janino-compiler/janino#114 "Grow the code for relocatables, and do fixup, and relocate".

Please see the commit log.
- https://github.com/janino-compiler/janino/commits/3.0.16

You can see the changelog from the link: http://janino-compiler.github.io/janino/changelog.html / though release note for Janino 3.0.16 is actually incorrect.

### Why are the changes needed?

We got some report on failure on user's query which Janino throws error on compiling generated code. The issue is here: janino-compiler/janino#113 It contains the information of generated code, symptom (error), and analysis of the bug, so please refer the link for more details.
Janino 3.0.16 contains the PR janino-compiler/janino#114 which would enable Janino to succeed to compile user's query properly.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Existing UTs.

Closes #27932 from HeartSaVioR/SPARK-31101-janino-3.0.16.

Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
HeartSaVioR added a commit to HeartSaVioR/spark that referenced this pull request Mar 24, 2020
### What changes were proposed in this pull request?

This PR(SPARK-31101) proposes to upgrade Janino to 3.0.16 which is released recently.

* Merged pull request janino-compiler/janino#114 "Grow the code for relocatables, and do fixup, and relocate".

Please see the commit log.
- https://github.com/janino-compiler/janino/commits/3.0.16

You can see the changelog from the link: http://janino-compiler.github.io/janino/changelog.html / though release note for Janino 3.0.16 is actually incorrect.

### Why are the changes needed?

We got some report on failure on user's query which Janino throws error on compiling generated code. The issue is here: janino-compiler/janino#113 It contains the information of generated code, symptom (error), and analysis of the bug, so please refer the link for more details.
Janino 3.0.16 contains the PR janino-compiler/janino#114 which would enable Janino to succeed to compile user's query properly.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Existing UTs.

Closes apache#27932 from HeartSaVioR/SPARK-31101-janino-3.0.16.

Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
HeartSaVioR added a commit to HeartSaVioR/spark that referenced this pull request Mar 24, 2020
This PR(SPARK-31101) proposes to upgrade Janino to 3.0.16 which is released recently.

* Merged pull request janino-compiler/janino#114 "Grow the code for relocatables, and do fixup, and relocate".

Please see the commit log.
- https://github.com/janino-compiler/janino/commits/3.0.16

You can see the changelog from the link: http://janino-compiler.github.io/janino/changelog.html / though release note for Janino 3.0.16 is actually incorrect.

We got some report on failure on user's query which Janino throws error on compiling generated code. The issue is here: janino-compiler/janino#113 It contains the information of generated code, symptom (error), and analysis of the bug, so please refer the link for more details.
Janino 3.0.16 contains the PR janino-compiler/janino#114 which would enable Janino to succeed to compile user's query properly.

No.

Existing UTs.

Closes apache#27932 from HeartSaVioR/SPARK-31101-janino-3.0.16.

Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
dongjoon-hyun pushed a commit to apache/spark that referenced this pull request Mar 25, 2020
### What changes were proposed in this pull request?

This PR(SPARK-31101) proposes to upgrade Janino to 3.0.16 which is released recently.

* Merged pull request janino-compiler/janino#114 "Grow the code for relocatables, and do fixup, and relocate".

Please see the commit log.
- https://github.com/janino-compiler/janino/commits/3.0.16

You can see the changelog from the link: http://janino-compiler.github.io/janino/changelog.html / though release note for Janino 3.0.16 is actually incorrect.

### Why are the changes needed?

We got some report on failure on user's query which Janino throws error on compiling generated code. The issue is here: janino-compiler/janino#113 It contains the information of generated code, symptom (error), and analysis of the bug, so please refer the link for more details.
Janino 3.0.16 contains the PR janino-compiler/janino#114 which would enable Janino to succeed to compile user's query properly.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Existing UTs.

Below test code fails on branch-3.0 and passes with this patch.

```
  /**
   * NOTE: The test code tries to control the size of for/switch statement in expand_doConsume,
   * as well as the overall size of expand_doConsume, so that the query triggers known Janino
   * bug - janino-compiler/janino#113.
   *
   * The expected exception message from Janino when we use switch statement for "ExpandExec":
   * - "Operand stack inconsistent at offset xxx: Previous size 1, now 0"
   * which will not happen when we use if-else-if statement for "ExpandExec".
   *
   * "The number of fields" and "The number of distinct aggregation functions" are the major
   * factors to increase the size of generated code: while these values should be large enough
   * to trigger the Janino bug, these values should not also too big; otherwise one of below
   * exceptions might be thrown:
   * - "expand_doConsume would be beyond 64KB"
   * - "java.lang.ClassFormatError: Too many arguments in method signature in class file"
   */
  test("SPARK-31115 Lots of columns and distinct aggregations shouldn't break code generation") {
    withSQLConf(
      (SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key, "true"),
      (SQLConf.WHOLESTAGE_MAX_NUM_FIELDS.key, "10000"),
      (SQLConf.CODEGEN_FALLBACK.key, "false"),
      (SQLConf.CODEGEN_LOGGING_MAX_LINES.key, "-1")
    ) {
      var df = Seq(("1", "2", 1), ("1", "2", 2), ("2", "3", 3), ("2", "3", 4)).toDF("a", "b", "c")

      // The value is tested under commit "e807118eef9e0214170ff62c828524d237bd58e3":
      // the query fails with switch statement, whereas it passes with if-else statement.
      // Note that the value depends on the Spark logic as well - different Spark versions may
      // require different value to ensure the test failing with switch statement.
      val numNewFields = 100

      df = df.withColumns(
        (1 to numNewFields).map { idx => s"a$idx" },
        (1 to numNewFields).map { idx =>
          when(col("c").mod(lit(2)).===(lit(0)), lit(idx)).otherwise(col("c"))
        }
      )

      val aggExprs: Array[Column] = Range(1, numNewFields).map { idx =>
        if (idx % 2 == 0) {
          coalesce(countDistinct(s"a$idx"), lit(0))
        } else {
          coalesce(count(s"a$idx"), lit(0))
        }
      }.toArray

      val aggDf = df
        .groupBy("a", "b")
        .agg(aggExprs.head, aggExprs.tail: _*)

      // We are only interested in whether the code compilation fails or not, so skipping
      // verification on outputs.
      aggDf.collect()
    }
  }
```

Closes #27996 from HeartSaVioR/SPARK-31101-branch-3.0.

Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
dongjoon-hyun pushed a commit to apache/spark that referenced this pull request Mar 30, 2020
### What changes were proposed in this pull request?

This PR(SPARK-31101) proposes to upgrade Janino to 3.0.16 which is released recently.

* Merged pull request janino-compiler/janino#114 "Grow the code for relocatables, and do fixup, and relocate".

Please see the commit log.
- https://github.com/janino-compiler/janino/commits/3.0.16

You can see the changelog from the link: http://janino-compiler.github.io/janino/changelog.html / though release note for Janino 3.0.16 is actually incorrect.

### Why are the changes needed?

We got some report on failure on user's query which Janino throws error on compiling generated code. The issue is here: janino-compiler/janino#113 It contains the information of generated code, symptom (error), and analysis of the bug, so please refer the link for more details.
Janino 3.0.16 contains the PR janino-compiler/janino#114 which would enable Janino to succeed to compile user's query properly.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Existing UTs.

Below test code fails on branch-2.4 and passes with this patch.

(Note that there seems to be the case where another UT affects this UT to not fail - adding this to SQLQuerySuite won't fail this UT, but adding this to DateFunctionsSuite will fail this UT, and if you run this UT solely in SQLQuerySuite via `build/sbt "sql/testOnly *.SQLQuerySuite -- -z SPARK-31115"` then it fails.)

```
  /**
   * NOTE: The test code tries to control the size of for/switch statement in expand_doConsume,
   * as well as the overall size of expand_doConsume, so that the query triggers known Janino
   * bug - janino-compiler/janino#113.
   *
   * The expected exception message from Janino when we use switch statement for "ExpandExec":
   * - "Operand stack inconsistent at offset xxx: Previous size 1, now 0"
   * which will not happen when we use if-else-if statement for "ExpandExec".
   *
   * "The number of fields" and "The number of distinct aggregation functions" are the major
   * factors to increase the size of generated code: while these values should be large enough
   * to trigger the Janino bug, these values should not also too big; otherwise one of below
   * exceptions might be thrown:
   * - "expand_doConsume would be beyond 64KB"
   * - "java.lang.ClassFormatError: Too many arguments in method signature in class file"
   */
  test("SPARK-31115 Lots of columns and distinct aggregations shouldn't break code generation") {
    withSQLConf(
      (SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key, "true"),
      (SQLConf.WHOLESTAGE_MAX_NUM_FIELDS.key, "10000"),
      (SQLConf.CODEGEN_FALLBACK.key, "false"),
      (SQLConf.CODEGEN_LOGGING_MAX_LINES.key, "-1")
    ) {
      var df = Seq(("1", "2", 1), ("1", "2", 2), ("2", "3", 3), ("2", "3", 4)).toDF("a", "b", "c")

      // The value is tested under commit "244405fe57d7737d81c34ba9e8917df6285889eb":
      // the query fails with switch statement, whereas it passes with if-else statement.
      // Note that the value depends on the Spark logic as well - different Spark versions may
      // require different value to ensure the test failing with switch statement.
      val numNewFields = 100

      df = df.withColumns(
        (1 to numNewFields).map { idx => s"a$idx" },
        (1 to numNewFields).map { idx =>
          when(col("c").mod(lit(2)).===(lit(0)), lit(idx)).otherwise(col("c"))
        }
      )

      val aggExprs: Array[Column] = Range(1, numNewFields).map { idx =>
        if (idx % 2 == 0) {
          coalesce(countDistinct(s"a$idx"), lit(0))
        } else {
          coalesce(count(s"a$idx"), lit(0))
        }
      }.toArray

      val aggDf = df
        .groupBy("a", "b")
        .agg(aggExprs.head, aggExprs.tail: _*)

      // We are only interested in whether the code compilation fails or not, so skipping
      // verification on outputs.
      aggDf.collect()
    }
  }
```

Closes #27997 from HeartSaVioR/SPARK-31101-branch-2.4.

Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
sjincho pushed a commit to sjincho/spark that referenced this pull request Apr 15, 2020
### What changes were proposed in this pull request?

This PR(SPARK-31101) proposes to upgrade Janino to 3.0.16 which is released recently.

* Merged pull request janino-compiler/janino#114 "Grow the code for relocatables, and do fixup, and relocate".

Please see the commit log.
- https://github.com/janino-compiler/janino/commits/3.0.16

You can see the changelog from the link: http://janino-compiler.github.io/janino/changelog.html / though release note for Janino 3.0.16 is actually incorrect.

### Why are the changes needed?

We got some report on failure on user's query which Janino throws error on compiling generated code. The issue is here: janino-compiler/janino#113 It contains the information of generated code, symptom (error), and analysis of the bug, so please refer the link for more details.
Janino 3.0.16 contains the PR janino-compiler/janino#114 which would enable Janino to succeed to compile user's query properly.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Existing UTs.

Closes apache#27932 from HeartSaVioR/SPARK-31101-janino-3.0.16.

Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
dongjoon-hyun pushed a commit to apache/spark that referenced this pull request May 29, 2020
### What changes were proposed in this pull request?

This PR proposes to upgrade Janino to 3.1.2 which is released recently.

Major changes were done for refactoring, as well as there're lots of "no commit message". Belows are the pairs of (commit title, commit) which seem to deal with some bugs or specific improvements (not purposed to refactor) after 3.0.15.

* Issue #119: Guarantee executing popOperand() in popUninitializedVariableOperand() via moving popOperand() out of "assert"
* Issue #116: replace operand to final target type if boxing conversion and widening reference conversion happen together
* Merged pull request `#114` "Grow the code for relocatables, and do fixup, and relocate".
  * janino-compiler/janino@367c58e
* issue `#107`: Janino requires "org.codehaus.commons.compiler.io", but commons-compiler does not export this package
  * janino-compiler/janino@f7d9959
* Throw an NYI CompileException when a static interface method is invoked.
  * janino-compiler/janino@efd3884
* Fixed the promotion of the array access index expression (see JLS7 15.13 Array Access Expressions)
  * janino-compiler/janino@32fdb5f
* Issue `#104`: ClassLoaderIClassLoader 's ClassNotFoundException handle mechanism enhancement
  * janino-compiler/janino@6e8a97d

You can see the changelog from the link: http://janino-compiler.github.io/janino/changelog.html

### Why are the changes needed?

We got some report on failure on user's query which Janino throws error on compiling generated code. The issue is here: janino-compiler/janino#113 It contains the information of generated code, symptom (error), and analysis of the bug, so please refer the link for more details.
Janino 3.1.1 contains the PR janino-compiler/janino#114 which would enable Janino to succeed to compile user's query properly. I've also fixed a couple of more bugs as 3.1.1 made Spark UTs fail - hence we need to upgrade to 3.1.2.

Furthermore, from my testing, janino-compiler/janino#90 (which Josh Rosen filed before) seems to be also resolved in 3.1.2 as well.

Looks like Janino is maintained by one person and there's no even version branches and releases/tags so we can't expect Janino maintainer to release a new bugfix version - hence have to try out new minor version.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Existing UTs.

Closes #27860 from HeartSaVioR/SPARK-31101.

Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants