Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce the identifier length in generated code for nested columns #537

Open
wants to merge 1 commit into
base: master
from

Conversation

4 participants
@vkorukanti
Copy link

vkorukanti commented Mar 25, 2019

Currently the type signature is used as name which could be long if the
type is a nested type. We recently hit the limit of identifier length in
ASM [1] on one of the wide tables which has nested column deferences in
project/filter.

Change the name to a short name based on the Java object type of the Type,
as the instruction definition is based on the Java object type. This reduces
the size of the code generated too.

Bytecode for the test query used in unittests:
Before fix: https://gist.github.com/vkorukanti/64bfab1842c192fdb21f12f54006def3#file-before_fix
After fix: https://gist.github.com/vkorukanti/64bfab1842c192fdb21f12f54006def3#file-after-fix

[1] https://gitlab.ow2.org/asm/asm/blob/master/asm/src/main/java/org/objectweb/asm/ByteVector.java#L246

@cla-bot cla-bot bot added the cla-signed label Mar 25, 2019

@vkorukanti vkorukanti force-pushed the vkorukanti:compiler_error branch from 25ab76e to 5554575 Mar 25, 2019

@vkorukanti vkorukanti changed the title WIP: Reduce the identifier length in generated code for nested columns Reduce the identifier length in generated code for nested columns Mar 26, 2019

@@ -621,6 +621,35 @@ public void testBinaryOperatorsString()
Futures.allAsList(futures).get();
}

@Test
public void testNestedColumnFilter()

This comment has been minimized.

Copy link
@findepi

findepi Mar 26, 2019

Member

Make sure this test fails without the changes to SqlTypeBytecodeExpression.java

This comment has been minimized.

Copy link
@vkorukanti

vkorukanti Mar 26, 2019

Author

Sure @findepi. Setting FunctionAssertions.TEST_ROW_NUMBER_OF_FIELDS to a large number (eg. 4000) reproes the issue. I didn't include it because FunctionAssertions is used in lot of tests, not sure of the implications of the having such large schema on test runtimes. Let me see if I can separate out the wide row into a separate test data variable.

This comment has been minimized.

Copy link
@vkorukanti

vkorukanti Mar 27, 2019

Author

@findepi Updated commit to repro the issue in test case. I tried to find a better way to simplify the test using CAST( ROW(1.0, 2.0, .... n) AS ROW(c1 DOUBLE, c2 DOUBLE,... cn DOUBLE)). This reproes the issue, but hits another limit (generated method for CAST is too big). I ran the TestExpressionComplier and TestRowOperators with and without the large row data and I only see increase of ms in test runtime. Hope this is ok. Let me know if there is a simpler way to test this.

@vkorukanti vkorukanti force-pushed the vkorukanti:compiler_error branch from 5554575 to 28e4279 Mar 27, 2019

Reduce the identifier length in generated code for nested columns
Currently the type signature is used as name which could be long if the
type is a nested type. We recently hit the limit of identifier length in
ASM [1] on one of the wide tables which has nested column deferences in
project/filter.

Change the name to a short name based on the Java object type of the `Type`,
as the instruction definition is based on the Java object type. This reduces
the size of the code generated too.

Bytecode for the test query used in unittests:
Before fix: https://gist.github.com/vkorukanti/64bfab1842c192fdb21f12f54006def3#file-before_fix
After fix: https://gist.github.com/vkorukanti/64bfab1842c192fdb21f12f54006def3#file-after-fix

[1] https://gitlab.ow2.org/asm/asm/blob/master/asm/src/main/java/org/objectweb/asm/ByteVector.java#L246

address review comments:
1. Use the existing logic up to some size limit (maybe 20 bytes), and after that, `use type.getTypeSignature().getBase()`
2. Increase the width of the nested column to repro the issue in test
3. Revert change to SqlTypeByteCodeExpression.formatOneLine

@vkorukanti vkorukanti force-pushed the vkorukanti:compiler_error branch from 28e4279 to 12dca14 Apr 5, 2019

name = type.getTypeSignature().getBase();
}

return name;

This comment has been minimized.

Copy link
@dain

dain Apr 9, 2019

Member

I think we need the regex replace on the base name also. Maybe this:

        String name = type.getTypeSignature().toString();
        if (name.length() > 20) {
            // Use type base to reduce the identifier size in generated code
            name = type.getTypeSignature().getBase();
        }
        return name.replaceAll("\\W+", "_");
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.