You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Does this blaze optimization have any requirements for the operating system version, only supporting Centos and Ubuntu? Can other operating systems be used, such as Anolis OS
#434
Closed
bigmancomeon opened this issue
Apr 2, 2024
· 4 comments
I use anolis os, this is like rhel system,can it work?
use spark version 3.3.3
Common spark conf configurations : num-executors 15, executor-cores 2, driver-memory 2g
this is spark conf with blaze:
spark.executor.memory 5g
spark.executor.memoryOverhead 3072
spark.blaze.memoryFraction 0.7
spark.blaze.enable.caseconvert.functions true
spark.blaze.enable.smjInequalityJoin false
spark.blaze.enable.bhjFallbacksToSmj false
this is spark conf without blaze:
spark.executor.memory 6g
spark.executor.memoryOverhead 2048
each spark job using 1+215=31 vcores and 815+2=122g memory resources
I find , There is not much difference between Spark sql job with and without blaze,
I generated 1TB of text data using tpcds and then converted it into parquet format data,
using tpcds parquet data, running spark-sql like query3 query4 query5 query16 in the picture ,
Comparing Spark job speeds with and without the use of the blaze plugin,only query4 is faster, but compared to the official query4 test, there is not Several times the speed difference, and there is no significant difference in other queries
Even for the same spark-sql job with blaze or without blaze, running it three times, The time consumption is also different each time
The text was updated successfully, but these errors were encountered:
could you check the execution plan graph of q05 to confirm that all operators were running on native?
How to check all operators were running on native? the picture is yarn logs of one spark application job ,is it ok?
i see, currently we have no support for InsertIntoHadoopFsRelationCommand (generated by insert into/overwrite directory, the insert operator will be failed-back to spark with C2R), which slows down the execution if the written data is too large.
other operators looks good. you can check the detailed metrics of each operator and compare to spark too see whether it speeds up the execution.
I use anolis os, this is like rhel system,can it work?
use spark version 3.3.3
Common spark conf configurations : num-executors 15, executor-cores 2, driver-memory 2g
this is spark conf with blaze:
spark.executor.memory 5g
spark.executor.memoryOverhead 3072
spark.blaze.memoryFraction 0.7
spark.blaze.enable.caseconvert.functions true
spark.blaze.enable.smjInequalityJoin false
spark.blaze.enable.bhjFallbacksToSmj false
this is spark conf without blaze:
spark.executor.memory 6g
spark.executor.memoryOverhead 2048
each spark job using 1+215=31 vcores and 815+2=122g memory resources
I find , There is not much difference between Spark sql job with and without blaze,
I generated 1TB of text data using tpcds and then converted it into parquet format data,
using tpcds parquet data, running spark-sql like query3 query4 query5 query16 in the picture ,
Comparing Spark job speeds with and without the use of the blaze plugin,only query4 is faster, but compared to the official query4 test, there is not Several times the speed difference, and there is no significant difference in other queries
Even for the same spark-sql job with blaze or without blaze, running it three times, The time consumption is also different each time
The text was updated successfully, but these errors were encountered: