I find blaze has no acceleration effect？why #426

bigmancomeon · 2024-03-29T05:57:29Z

spark version 3.3.3

this is spark conf with blaze
spark.executor.memory 5g
spark.executor.memoryOverhead 3072
spark.blaze.memoryFraction 0.7
spark.blaze.enable.caseconvert.functions true
spark.blaze.enable.smjInequalityJoin false
spark.blaze.enable.bhjFallbacksToSmj false

this is spark conf without blaze
spark.executor.memory 6g
spark.executor.memoryOverhead 2048

driver-memory 4G
num-executors 6

I find ， There is not much difference between Spark sql job with and without blaze,
using 100G tpcds parquet data, running spark-sql like query4 query5, query16 query17 in the picture ,The speed difference is not significant between using blaze and not using.Even for the same spark-sql job with blaze or without blaze, running it three times, The time consumption is also different each time

richox · 2024-03-29T08:17:28Z

did you run the benchmark on a isolated cluster? it looks like the sql time varies greatly, may be resources issue?

bigmancomeon · 2024-03-29T08:51:04Z

did you run the benchmark on a isolated cluster? it looks like the sql time varies greatly, may be resources issue?

thanks ,this maybe the reason.But on the other hand, there is a limit 100 at the end of each query sql. Could this also be the reason for the unstable running time? Because the limit 100 only needs to take the 100 pieces of data from the final calculation result. Maybe every time it is run, the last one obtained 100 pieces of data are all different

richox · 2024-04-01T03:05:05Z

did you run the benchmark on a isolated cluster? it looks like the sql time varies greatly, may be resources issue?

thanks ,this maybe the reason.But on the other hand, there is a limit 100 at the end of each query sql. Could this also be the reason for the unstable running time? Because the limit 100 only needs to take the 100 pieces of data from the final calculation result. Maybe every time it is run, the last one obtained 100 pieces of data are all different

i suggest testing with a larger dataset with isolated cluster to get stable benchmark result. most time is taken in driver side and the performance is not stable if the dataset is too small.

bigmancomeon · 2024-04-01T03:20:22Z

did you run the benchmark on a isolated cluster? it looks like the sql time varies greatly, may be resources issue?

thanks ,this maybe the reason.But on the other hand, there is a limit 100 at the end of each query sql. Could this also be the reason for the unstable running time? Because the limit 100 only needs to take the 100 pieces of data from the final calculation result. Maybe every time it is run, the last one obtained 100 pieces of data are all different

i suggest testing with a larger dataset with isolated cluster to get stable benchmark result. most time is taken in driver side and the performance is not stable if the dataset is too small.

thanks ,maybe you are right,now the data I use tpcds to generate 100 G text data,then transformer it to parquet format ,spark resources is 3*8g executor + 4g driver ,as show in the picture ,query4 sql run thre times,the largest time is 993s,small time is 802,this is not stable.Next I will do as you suggest

bigmancomeon · 2024-04-01T03:47:17Z

did you run the benchmark on a isolated cluster? it looks like the sql time varies greatly, may be resources issue?

thanks ,this maybe the reason.But on the other hand, there is a limit 100 at the end of each query sql. Could this also be the reason for the unstable running time? Because the limit 100 only needs to take the 100 pieces of data from the final calculation result. Maybe every time it is run, the last one obtained 100 pieces of data are all different

i suggest testing with a larger dataset with isolated cluster to get stable benchmark result. most time is taken in driver side and the performance is not stable if the dataset is too small.

By the way, what are the num executor and executor core values for Spark job during the following official testing

https://github.com/kwai/blaze/blob/master/benchmark-results/20240202.md

richox · 2024-04-01T07:50:00Z

did you run the benchmark on a isolated cluster? it looks like the sql time varies greatly, may be resources issue?

thanks ,this maybe the reason.But on the other hand, there is a limit 100 at the end of each query sql. Could this also be the reason for the unstable running time? Because the limit 100 only needs to take the 100 pieces of data from the final calculation result. Maybe every time it is run, the last one obtained 100 pieces of data are all different

i suggest testing with a larger dataset with isolated cluster to get stable benchmark result. most time is taken in driver side and the performance is not stable if the dataset is too small.

By the way, what are the num executor and executor core values for Spark job during the following official testing

https://github.com/kwai/blaze/blob/master/benchmark-results/20240202.md

we use the setting spark.executor.cores 5 which is the same to production conf.

bigmancomeon · 2024-04-01T08:18:49Z

did you run the benchmark on a isolated cluster? it looks like the sql time varies greatly, may be resources issue?

thanks ,this maybe the reason.But on the other hand, there is a limit 100 at the end of each query sql. Could this also be the reason for the unstable running time? Because the limit 100 only needs to take the 100 pieces of data from the final calculation result. Maybe every time it is run, the last one obtained 100 pieces of data are all different

i suggest testing with a larger dataset with isolated cluster to get stable benchmark result. most time is taken in driver side and the performance is not stable if the dataset is too small.

By the way, what are the num executor and executor core values for Spark job during the following official testing
https://github.com/kwai/blaze/blob/master/benchmark-results/20240202.md

we use the setting spark.executor.cores 5 which is the same to production conf.

spark.executor.cores is 5, and what is the number of executor?

bigmancomeon · 2024-04-03T03:15:23Z

did you run the benchmark on a isolated cluster? it looks like the sql time varies greatly, may be resources issue?

I use 15executors2core +1driver core=31 CPUs and 158g executor+2g driver=122g of memory to run spark jobs. The resources are sufficient. Each job is run three times, but the running time is still different each time. The speed difference between using and not using blaze is almost the same. Does this plugin really work?

MrFireChow · 2024-05-06T08:52:56Z

I built the environment of spark3.3.3 and blaze2.0.8， then i did some tests based on 100G tpcds data，however，I did not receive any benefits compared to not using blaze as well.This is my login command:
spark-sql --master spark://xxxx:xxxx --conf spark.sql.extensions=org.apache.spark.sql.blaze.BlazeSparkSessionExtension --conf spark.shuffle.manager=org.apache.spark.sql.execution.blaze.shuffle.BlazeShuffleManager --conf spark.blaze.enable.smjInequalityJoin=true

MrFireChow · 2024-05-06T08:57:05Z

By the way, the plan tree shows that the plugin is indeed effective，plans are converted to native plan but the query time does not decrease.

richox · 2024-06-19T09:20:30Z

it's likely related to some hard-written configurations, for example shuffle compression is fixed to zstd in blaze, while spark uses lz4 as default. in low IO latency environment blaze will take more time on compression and slow down the performance.
we are working on a new version which should work on spark's default compression. you can benchmark tpch this branch: https://github.com/kwai/blaze/tree/3.0.0-preview1 (not completed, has some bugs on tpcds)

richox closed this as completed Jun 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I find blaze has no acceleration effect？why #426

I find blaze has no acceleration effect？why #426

bigmancomeon commented Mar 29, 2024

richox commented Mar 29, 2024

bigmancomeon commented Mar 29, 2024

richox commented Apr 1, 2024

bigmancomeon commented Apr 1, 2024

bigmancomeon commented Apr 1, 2024

richox commented Apr 1, 2024

bigmancomeon commented Apr 1, 2024

bigmancomeon commented Apr 3, 2024

MrFireChow commented May 6, 2024

MrFireChow commented May 6, 2024

richox commented Jun 19, 2024

I find blaze has no acceleration effect？why #426

I find blaze has no acceleration effect？why #426

Comments

bigmancomeon commented Mar 29, 2024

richox commented Mar 29, 2024

bigmancomeon commented Mar 29, 2024

richox commented Apr 1, 2024

bigmancomeon commented Apr 1, 2024

bigmancomeon commented Apr 1, 2024

richox commented Apr 1, 2024

bigmancomeon commented Apr 1, 2024

bigmancomeon commented Apr 3, 2024

MrFireChow commented May 6, 2024

MrFireChow commented May 6, 2024

richox commented Jun 19, 2024