Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MultiChannelGroupByHash waste too much cpu resource compare with impala #16443

Open
dengweisysu opened this issue Jul 20, 2021 · 2 comments
Open

Comments

@dengweisysu
Copy link

dengweisysu commented Jul 20, 2021

for the query counting distinct 10 billion data.
like below:
`
SELECT *
FROM
(SELECT fdate as a_ds,
count(distinct(userid)) as index_0_7366
FROM
(SELECT t1.fdate as fdate,
t1.userid as userid
FROM
(SELECT t0.fdate as fdate,
t0.userid as userid
FROM test.test_table t0
WHERE t0.fdate>=20210401 AND t0.fdate<20210531 ) t1) a
GROUP BY a_ds) t_ret
order by a_ds desc LIMIT 5001

`
running with the same hardware(83 node, 48 core with 96 processor, 256GB mem ).

  • presto (0.242): take 22 seconds
  • impala (3.4 with multi thread): take 28 seconds

although presto run faster than impala, but presto waste too much cpu resource than impala.
Is the disadvantage of java (presto) compare with C++ ( impala)

one of presto host Cpu Utilization (50%+)
image

impala cluster cpu Utilization( one line for one machine) (15%+)
image

I capture thread stack when running query, and get top 10 class (first line of runnable thread) below:
class full name ----- occurrence count in thread stack

alluxio.shaded.client.io.netty.channel.epoll.Native-----382
com.facebook.presto.operator. MultiChannelGroupByHash-----59
io.airlift.slice.Slices-----31
sun.nio.ch.EPoll-----20
com.facebook.presto.common.block.AbstractVariableWidthBlock-----13
io.airlift.slice.DynamicSliceOutput-----10
com.facebook.presto.common.type.AbstractLongType-----9
com.facebook.airlift.http.client.jetty.BufferingResponseListener-----7
com.facebook.presto.common.block.VariableWidthBlock-----6
sun.management.ThreadImpl-----5

ps: High cpu resource has noting to do with alluxio because runnable thread of alluxio stop at epollWait:
alluxio.shaded.client.io.netty.channel.epoll.Native.epollWait0(Native.java:-2)-----241
alluxio.shaded.client.io.netty.channel.epoll.Native.epollWait(Native.java:-2)-----141

In Impala, impala use code generation to accelerate, why presto not?

@kaikalur
Copy link
Contributor

We need a more reproducible test Also Presto has the mark_distinct operator for count disitncct. See if turning that off makes any difference.

@dengweisysu
Copy link
Author

"use_mark_distinct=fasle" make no difference. And single distinct query will be optimized to group-by query.
The problem is similar with this issue : #13015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants