You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
for the query counting distinct 10 billion data.
like below:
`
SELECT *
FROM
(SELECT fdate as a_ds,
count(distinct(userid)) as index_0_7366
FROM
(SELECT t1.fdate as fdate,
t1.userid as userid
FROM
(SELECT t0.fdate as fdate,
t0.userid as userid
FROM test.test_table t0
WHERE t0.fdate>=20210401 AND t0.fdate<20210531 ) t1) a
GROUP BY a_ds) t_ret
order by a_ds desc LIMIT 5001
`
running with the same hardware(83 node, 48 core with 96 processor, 256GB mem ).
presto (0.242): take 22 seconds
impala (3.4 with multi thread): take 28 seconds
although presto run faster than impala, but presto waste too much cpu resource than impala.
Is the disadvantage of java (presto) compare with C++ ( impala)
one of presto host Cpu Utilization (50%+)
impala cluster cpu Utilization( one line for one machine) (15%+)
I capture thread stack when running query, and get top 10 class (first line of runnable thread) below: class full name ----- occurrence count in thread stack
ps: High cpu resource has noting to do with alluxio because runnable thread of alluxio stop at epollWait:
alluxio.shaded.client.io.netty.channel.epoll.Native.epollWait0(Native.java:-2)-----241
alluxio.shaded.client.io.netty.channel.epoll.Native.epollWait(Native.java:-2)-----141
In Impala, impala use code generation to accelerate, why presto not?
The text was updated successfully, but these errors were encountered:
"use_mark_distinct=fasle" make no difference. And single distinct query will be optimized to group-by query.
The problem is similar with this issue : #13015
for the query counting distinct 10 billion data.
like below:
`
SELECT *
FROM
(SELECT fdate as a_ds,
count(distinct(userid)) as index_0_7366
FROM
(SELECT t1.fdate as fdate,
t1.userid as userid
FROM
(SELECT t0.fdate as fdate,
t0.userid as userid
FROM test.test_table t0
WHERE t0.fdate>=20210401 AND t0.fdate<20210531 ) t1) a
GROUP BY a_ds) t_ret
order by a_ds desc LIMIT 5001
`
running with the same hardware(83 node, 48 core with 96 processor, 256GB mem ).
although presto run faster than impala, but presto waste too much cpu resource than impala.
Is the disadvantage of java (presto) compare with C++ ( impala)
one of presto host Cpu Utilization (50%+)
impala cluster cpu Utilization( one line for one machine) (15%+)
I capture thread stack when running query, and get top 10 class (first line of runnable thread) below:
class full name ----- occurrence count in thread stack
alluxio.shaded.client.io.netty.channel.epoll.Native-----382
com.facebook.presto.operator. MultiChannelGroupByHash-----59
io.airlift.slice.Slices-----31
sun.nio.ch.EPoll-----20
com.facebook.presto.common.block.AbstractVariableWidthBlock-----13
io.airlift.slice.DynamicSliceOutput-----10
com.facebook.presto.common.type.AbstractLongType-----9
com.facebook.airlift.http.client.jetty.BufferingResponseListener-----7
com.facebook.presto.common.block.VariableWidthBlock-----6
sun.management.ThreadImpl-----5
In Impala, impala use code generation to accelerate, why presto not?
The text was updated successfully, but these errors were encountered: