Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GLUTEN-1662][VL] feat: Support InsertIntoHiveDirCommand in velox parquet write #1663

Merged
merged 6 commits into from
May 30, 2023

Conversation

JkSelf
Copy link
Contributor

@JkSelf JkSelf commented May 17, 2023

What changes were proposed in this pull request?

This patch adds support for InsertIntoHiveDirCommand only in velox parquet write. It will fallback to vanilla spark when not parquet format.

(Fixes: #1662)

related: #1636

How was this patch tested?

spark.sql("insert overwrite directory 's3a://tpch/etl' STORED AS parquet select l_returnflag, l_linestatus, sum(l_quantity) as sum_qty, sum(l_extendedprice) as sum_base_price, sum(l_extended
price * (1 - l_discount)) as sum_disc_price, sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge, avg(l_quantity) as avg_qty, avg(l_extendedprice) as avg_price, avg(l_discoun
t) as avg_disc, count(*) as count_order from lineitem where l_shipdate <= '1998-12-01' group by l_returnflag, l_linestatus order by l_returnflag, l_linestatus").show
spark.hadoop.fs.s3a.impl           org.apache.hadoop.fs.s3a.S3AFileSystem
spark.hadoop.fs.s3a.aws.credentials.provider org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider

#spark.sql.sources.outputCommitterClass=com.netflix.bdp.s3.S3DirectoryOutputCommitter
spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version 2
spark.hadoop.fs.s3a.committer.name directory
spark.hadoop.fs.s3a.committer.magic.enabled false
spark.hadoop.fs.s3a.commiter.staging.conflict-mode replace
spark.hadoop.fs.s3a.committer.staging.unique-filenames true
spark.hadoop.fs.s3a.committer.staging.abort.pending.uploads true
spark.hadoop.mapreduce.outputcommitter.factory.scheme.s3a org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory

@github-actions
Copy link

Run Gluten Clickhouse CI

@github-actions
Copy link

#1662

@github-actions
Copy link

Run Gluten Clickhouse CI

@github-actions
Copy link

Run Gluten Clickhouse CI

@github-actions
Copy link

Run Gluten Clickhouse CI

@JkSelf JkSelf changed the title [GLUTEN-1662][VL] feat: WIP Support InsertIntoHiveDirCommand in parquet write [GLUTEN-1662][VL] feat: Support InsertIntoHiveDirCommand in velox parquet write May 24, 2023
@github-actions
Copy link

Run Gluten Clickhouse CI

@github-actions
Copy link

Run Gluten Clickhouse CI

1 similar comment
@github-actions
Copy link

Run Gluten Clickhouse CI

@zhouyuan
Copy link
Contributor

note: this patch will do an override the HiveFormat class from Spark

Copy link
Contributor

@zhouyuan zhouyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[VL] Support InsertIntoHiveDirCommand in parquet write
2 participants