Does this blaze optimization have any requirements for the operating system version, only supporting Centos and Ubuntu? Can other operating systems be used, such as Anolis OS #434

bigmancomeon · 2024-04-02T10:17:06Z

I use anolis os, this is like rhel system,can it work?

use spark version 3.3.3
Common spark conf configurations : num-executors 15, executor-cores 2, driver-memory 2g

this is spark conf with blaze:
spark.executor.memory 5g
spark.executor.memoryOverhead 3072
spark.blaze.memoryFraction 0.7
spark.blaze.enable.caseconvert.functions true
spark.blaze.enable.smjInequalityJoin false
spark.blaze.enable.bhjFallbacksToSmj false

this is spark conf without blaze:
spark.executor.memory 6g
spark.executor.memoryOverhead 2048

each spark job using 1+215=31 vcores and 815+2=122g memory resources

I find ， There is not much difference between Spark sql job with and without blaze,
I generated 1TB of text data using tpcds and then converted it into parquet format data,
using tpcds parquet data, running spark-sql like query3 query4 query5 query16 in the picture ,
Comparing Spark job speeds with and without the use of the blaze plugin,only query4 is faster, but compared to the official query4 test, there is not Several times the speed difference, and there is no significant difference in other queries
Even for the same spark-sql job with blaze or without blaze, running it three times, The time consumption is also different each time

richox · 2024-04-03T04:34:53Z

could you check the execution plan graph of q05 to confirm that all operators were running on native?

bigmancomeon · 2024-04-07T01:47:21Z

could you check the execution plan graph of q05 to confirm that all operators were running on native?

How to check all operators were running on native? the picture is yarn logs of one spark application job ，is it ok?

richox · 2024-04-08T06:37:23Z

could you check the execution plan graph of q05 to confirm that all operators were running on native?

How to check all operators were running on native? the picture is yarn logs of one spark application job ，is it ok?

i see, currently we have no support for InsertIntoHadoopFsRelationCommand (generated by insert into/overwrite directory, the insert operator will be failed-back to spark with C2R), which slows down the execution if the written data is too large.
other operators looks good. you can check the detailed metrics of each operator and compare to spark too see whether it speeds up the execution.

richox · 2024-07-04T09:43:39Z

you can try latest release, this should achieve better performance running on default spark configuration.

richox closed this as completed Jul 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does this blaze optimization have any requirements for the operating system version, only supporting Centos and Ubuntu? Can other operating systems be used, such as Anolis OS #434

Does this blaze optimization have any requirements for the operating system version, only supporting Centos and Ubuntu? Can other operating systems be used, such as Anolis OS #434

bigmancomeon commented Apr 2, 2024

richox commented Apr 3, 2024

bigmancomeon commented Apr 7, 2024

richox commented Apr 8, 2024

richox commented Jul 4, 2024

Does this blaze optimization have any requirements for the operating system version, only supporting Centos and Ubuntu? Can other operating systems be used, such as Anolis OS #434

Does this blaze optimization have any requirements for the operating system version, only supporting Centos and Ubuntu? Can other operating systems be used, such as Anolis OS #434

Comments

bigmancomeon commented Apr 2, 2024

richox commented Apr 3, 2024

bigmancomeon commented Apr 7, 2024

richox commented Apr 8, 2024

richox commented Jul 4, 2024