Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does this blaze optimization have any requirements for the operating system version, only supporting Centos and Ubuntu? Can other operating systems be used, such as Anolis OS #434

Closed
bigmancomeon opened this issue Apr 2, 2024 · 4 comments

Comments

@bigmancomeon
Copy link

I use anolis os, this is like rhel system,can it work?
9fe5c82d2b6b44d339b766323b61790

use spark version 3.3.3
Common spark conf configurations : num-executors 15, executor-cores 2, driver-memory 2g

this is spark conf with blaze:
spark.executor.memory 5g
spark.executor.memoryOverhead 3072
spark.blaze.memoryFraction 0.7
spark.blaze.enable.caseconvert.functions true
spark.blaze.enable.smjInequalityJoin false
spark.blaze.enable.bhjFallbacksToSmj false

this is spark conf without blaze:
spark.executor.memory 6g
spark.executor.memoryOverhead 2048

each spark job using 1+215=31 vcores and 815+2=122g memory resources

36e927f4a7e9811868177be6ddd7da8

I find , There is not much difference between Spark sql job with and without blaze,
I generated 1TB of text data using tpcds and then converted it into parquet format data,
using tpcds parquet data, running spark-sql like query3 query4 query5 query16 in the picture ,
Comparing Spark job speeds with and without the use of the blaze plugin,only query4 is faster, but compared to the official query4 test, there is not Several times the speed difference, and there is no significant difference in other queries
Even for the same spark-sql job with blaze or without blaze, running it three times, The time consumption is also different each time

@richox
Copy link
Collaborator

richox commented Apr 3, 2024

could you check the execution plan graph of q05 to confirm that all operators were running on native?

@bigmancomeon
Copy link
Author

could you check the execution plan graph of q05 to confirm that all operators were running on native?

1712454168437
How to check all operators were running on native? the picture is yarn logs of one spark application job ,is it ok?

@richox
Copy link
Collaborator

richox commented Apr 8, 2024

could you check the execution plan graph of q05 to confirm that all operators were running on native?

1712454168437 How to check all operators were running on native? the picture is yarn logs of one spark application job ,is it ok?

i see, currently we have no support for InsertIntoHadoopFsRelationCommand (generated by insert into/overwrite directory, the insert operator will be failed-back to spark with C2R), which slows down the execution if the written data is too large.
other operators looks good. you can check the detailed metrics of each operator and compare to spark too see whether it speeds up the execution.

@richox
Copy link
Collaborator

richox commented Jul 4, 2024

you can try latest release, this should achieve better performance running on default spark configuration.

@richox richox closed this as completed Jul 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants