Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

main 分支测试报错 #53

Closed
xuzhangtian opened this issue Apr 20, 2023 · 2 comments
Closed

main 分支测试报错 #53

xuzhangtian opened this issue Apr 20, 2023 · 2 comments
Labels
bug Something isn't working

Comments

@xuzhangtian
Copy link

@Slf4j
public class SimpleTest extends AbstractBasicTest {

    @Before
    public void createTable() {
        createMysqlUser();
        createOdsUser();
    }

    @Test
    public void concat() {
        String sql = "insert into mysql_user " +
                "select id, birthday, concat(first_name, last_name) as full_name " +
                "from ods_user";
        List<Result> actualList = context.parseFieldLineage(sql);
        log.info("Linage Result: ");
        actualList.forEach(e -> log.info(e.toString()));
    }

    protected void createMysqlUser() {
        context.execute("DROP TABLE IF EXISTS mysql_user ");
        context.execute("CREATE TABLE IF NOT EXISTS mysql_user (" +
                "       id                        BIGINT           ," +
                "       birthday                  TIMESTAMP(3)     ," +
                "       full_name                 STRING            " +
                ") WITH ( " +
                "       'connector' = 'jdbc'                 ," +
                "       'url'       = 'jdbc:mysql://127.0.0.1:3306/demo?useSSL=false&characterEncoding=UTF-8'," +
                "       'username'  = 'root'                 ," +
                "       'password'  = 'xxx'          ," +
                "       'table-name'= 'mysql_user' " +
                ")"
        );
    }

    protected void createOdsUser() {
        context.execute("DROP TABLE IF EXISTS ods_user ");
        context.execute("CREATE TABLE IF NOT EXISTS ods_user (" +
                "       id                        BIGINT           ," +
                "       birthday                  TIMESTAMP(3)     ," +
                "       first_name                STRING           ," +
                "       last_name                 STRING           ," +
                "       company_name              STRING           " +
                ") WITH ( " +
                "       'connector' = 'jdbc'                 ," +
                "       'url'       = 'jdbc:mysql://127.0.0.1:3306/demo?useSSL=false&characterEncoding=UTF-8'," +
                "       'username'  = 'root'                 ," +
                "       'password'  = 'xxx'          ," +
                "       'table-name'= 'ods_user' " +
                ")"
        );
    }
}

如果去掉 ods_user 的 company_name 字段就能解析成功。我这个写法是有问题吗。

@xuzhangtian
Copy link
Author

在 PushProjectIntoTableSourceScanRule.onMatch(RelOptRuleCall call) 方法中源表字段和输入字段数量不一致会调用

        final TableSourceTable newSource =
                sourceTable.copy(
                        newTableSource,
                        newRowType,
                        getExtraDigests(abilitySpecs),
                        abilitySpecs.toArray(new SourceAbilitySpec[0]));

导致 TableSourceTable.getQualifiedName() 方法中会多一个值 “project=[id, birthday, first_name, last_name]”。

    private Set<String> optimizeSourceColumnSet(Set<RelColumnOrigin> inputSet) {
        Set<String> catalogSet = new HashSet<>();
        Set<String> databaseSet = new HashSet<>();
        Set<String> tableSet = new HashSet<>();
        Set<List<String>> qualifiedSet = new LinkedHashSet<>();
        for (RelColumnOrigin rco : inputSet) {
            RelOptTable originTable = rco.getOriginTable();
            List<String> qualifiedName = originTable.getQualifiedName();

            // catalog,database,table,field
            List<String> qualifiedList = new ArrayList<>(qualifiedName);
            catalogSet.add(qualifiedName.get(0));
            databaseSet.add(qualifiedName.get(1));
            tableSet.add(qualifiedName.get(2));

            String field = rco.getTransform() != null ? rco.getTransform() :
                    originTable.getRowType().getFieldNames().get(rco.getOriginColumnOrdinal());
           
            // 这里是否应该改成 qualifiedList.add(3, field)
            qualifiedList.add(field);
            
            qualifiedSet.add(qualifiedList);
        }
        if (catalogSet.size() == 1 && databaseSet.size() == 1 && tableSet.size() == 1) {
            return optimizeName(qualifiedSet, e -> e.get(3));
        } else if (catalogSet.size() == 1 && databaseSet.size() == 1) {
            return optimizeName(qualifiedSet, e -> String.join(DELIMITER, e.subList(2, 4)));
        } else if (catalogSet.size() == 1) {
            return optimizeName(qualifiedSet, e -> String.join(DELIMITER, e.subList(1, 4)));
        } else {
            return optimizeName(qualifiedSet, e -> String.join(DELIMITER, e));
        }
    }

@HamaWhiteGG
Copy link
Owner

@xuzhangtian Very good question, and detailed test cases, many thanks.

This is indeed a bug in parsing transform. Mainly due to the wrong order of parsing transform substitution variables.

I have added your use case to the project source code and passed the unit test.

test case: SimpleTest.java
屏幕快照 2023-04-28 下午4 44 28

You can run it directly or SuiteTest to see all the test results。
屏幕快照 2023-04-28 下午4 47 38

The core code modified by this BUG is to add the RelMdColumnOrigins.buildSourceColumnMap method
屏幕快照 2023-04-28 下午4 51 52

@HamaWhiteGG HamaWhiteGG added the bug Something isn't working label Apr 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants