Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Rust] add substrait for flink and be compatible for other engines #454

Merged
merged 7 commits into from
Apr 7, 2024

Conversation

mag1c1an1
Copy link
Contributor

add flink expression to substrait

add more functions

add more tests

add base schema for namedscan, substriat type to arrow type

compatibility

switch to java8

@@ -235,6 +235,7 @@ public DynamicTableSource copy() {
lsts.projectedFields = this.projectedFields;
lsts.remainingPartitions = this.remainingPartitions;
lsts.filter = this.filter;
lsts.filter = this.filter;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicate code

@@ -52,15 +55,18 @@ public LakeSoulSource(TableId tableId,
List<String> pkColumns,
Map<String, String> optionParams,
@Nullable List<Map<String, String>> remainingPartitions,
@Nullable FilterPredicate filter) {
@Nullable FilterPredicate filterStr,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

improper name 'filterStr'

@@ -129,7 +133,11 @@ private void initializeReader() throws IOException {
}

if (filter != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will two kinds of filter cause conflict?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, filterSter is null forever in above code. It is used to debug the difference of result datafusion::expr of two kinds filters.

return Tuple2.of(SupportsFilterPushDown.Result.of(accepted, remaining), planToProto(filter));
}

static Schema toArrowSchema(String tableSchema) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No diff from Schema.fromJSON

import java.util.stream.Collectors;
import java.util.stream.Stream;

public class SubstraitUtil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move to lakesoul-io-java maybe better.

*
* @param plan Filter{}
*/
public void addFilterProto(Plan plan) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refer com.dmetasoul.lakesoul.meta.jnr.NativeMetadataJavaClient#executeInsert

@@ -255,20 +255,36 @@ pub async fn prune_filter_and_execute(
df: DataFrame,
request_schema: SchemaRef,
filter_str: Vec<String>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use Vec as input directly

@mag1c1an1
Copy link
Contributor Author

rebase from main, please review the dependency

Copy link
Contributor

@Ceng23333 Ceng23333 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please provide some flink test cases.

@@ -73,7 +74,8 @@ public class LakeSoulOneSplitRecordsReader implements RecordsWithSplitIds<RowDat
// arrow batch -> row, with requested schema
private ArrowReader curArrowReaderRequestedSchema;

private final FilterPredicate filter;
private final FilterPredicate _filterPredicate;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use underline as var name

return Tuple2.of(SupportsFilterPushDown.Result.of(accepted, remaining), planToProto(filter));
}

public static Expression doTransform(ResolvedExpression flinkExpression, Schema arrow_schema) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arrow_schema should use CamelCase name in Java

}
return ExpressionCreator.binary(nullable, b);
}
case TINYINT:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use integer type with exactly bit-width

}
return ExpressionCreator.fp64(nullable, d);
}
case DATE: {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any unit test for date/timestamp case?

Signed-off-by: mag1c1an1 <mag1cian@icloud.com>

add flink expression to substrait

Signed-off-by: mag1c1an1 <mag1cian@icloud.com>

add more functions

Signed-off-by: mag1c1an1 <mag1cian@icloud.com>

add more tests

Signed-off-by: mag1c1an1 <mag1cian@icloud.com>

add base schema for namedscan, substriat type to arrow type

Signed-off-by: mag1c1an1 <mag1cian@icloud.com>

compatibility

Signed-off-by: mag1c1an1 <mag1cian@icloud.com>

switch to java8

Signed-off-by: mag1c1an1 <mag1cian@icloud.com>

before apply cargo fix

Signed-off-by: mag1c1an1 <mag1cian@icloud.com>

cargo clippy && cargo fmt

Signed-off-by: mag1c1an1 <mag1cian@icloud.com>

fix ci

Signed-off-by: mag1c1an1 <mag1cian@icloud.com>

rebase

Signed-off-by: mag1c1an1 <mag1cian@icloud.com>

refactor

Signed-off-by: mag1c1an1 <mag1cian@icloud.com>
Signed-off-by: mag1c1an1 <mag1cian@icloud.com>
Signed-off-by: mag1c1an1 <mag1cian@icloud.com>
Signed-off-by: mag1c1an1 <mag1cian@icloud.com>
Signed-off-by: mag1c1an1 <mag1cian@icloud.com>
Signed-off-by: mag1c1an1 <mag1cian@icloud.com>
import java.util.stream.Stream;

public class SubstraitUtil {
public static final SimpleExtension.ExtensionCollection Se;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace 'Se' with recognizable name.


public class SubstraitUtil {
public static final SimpleExtension.ExtensionCollection Se;
public static final SubstraitBuilder Builder;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename static constants with UPPERCASE_UNDERSCORE format

@@ -466,6 +466,8 @@ SPDX-License-Identifier: Apache-2.0
<include>com.google.code.gson:gson</include>
<include>dev.failsafe:failsafe</include>
<include>com.google.protobuf:protobuf-java</include>
<!--substrait-->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try removing org.apache.parquet:parquet-column from pom

createLakeSoulSourceTableWithDateType(createTableEnv);
// not supported
// String testSql = "select * from type_info where modifyTime=TO_TIMESTAMP_LTZ(1612176000,0)";
String testSql = "select * from type_info " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mark not supported datatype with comment.


public class SubstraitTest extends AbstractTestBase {

private final String BATCH_TYPE = "batch";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add tests which filter on hash column/range column

Signed-off-by: mag1c1an1 <mag1cian@icloud.com>
@xuchen-plus xuchen-plus merged commit 499de72 into lakesoul-io:main Apr 7, 2024
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants