Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[presto 0.206] Exception while running TPCH queries with 500GB data. #11232

Open
ajantha-bhat opened this issue Aug 9, 2018 · 7 comments
Open

Comments

@ajantha-bhat
Copy link
Contributor

Hi, I am running TPCH queries with 500GB data on 3 node cluster [each node has 150GB query memory with 48 core CPU].
I have my own carbondata connector with presto.

I am using presto [0.206]

I got below exception for 5 queries out of 22 queries.
Is anyone familiar with this exception call stack ? what is the workaround ?

java.lang.IllegalArgumentException: Too large (897278064 expected elements with load factor 0.75)
at it.unimi.dsi.fastutil.HashCommon.arraySize(HashCommon.java:160)
at com.facebook.presto.operator.PagesHash.(PagesHash.java:63)
at com.facebook.presto.operator.JoinHashSupplier.(JoinHashSupplier.java:70)
at com.facebook.presto.operator.PagesIndex.createLookupSourceSupplier(PagesIndex.java:512)

at com.facebook.presto.operator.HashBuilderOperator.buildLookupSource(HashBuilderOperator.java:589)
at com.facebook.presto.operator.HashBuilderOperator.finishInput(HashBuilderOperator.java:486)
at com.facebook.presto.operator.HashBuilderOperator.finish(HashBuilderOperator.java:442)
at com.facebook.presto.operator.Driver.processInternal(Driver.java:393)
at com.facebook.presto.operator.Driver.lambda$processFor$8(Driver.java:282)
at com.facebook.presto.operator.Driver.tryWithLock(Driver.java:672)
at com.facebook.presto.operator.Driver.processFor(Driver.java:276)
at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:973)
at com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:162)
at com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:477)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

@sopel39
Copy link
Contributor

sopel39 commented Aug 10, 2018

Could you post explain of the query?
It seems that there is too many row for each HashBuilderOperator operator instance. You could try increasing number of nodes or increasing task concurrency.

@ajantha-bhat
Copy link
Contributor Author

ajantha-bhat commented Aug 10, 2018 via email

@sopel39
Copy link
Contributor

sopel39 commented Aug 10, 2018

set session task_concurrency=64 for instance

@ajantha-bhat
Copy link
Contributor Author

I used task.concurrency = 64. with this instead of failing in a minute, it failed after 3 minutes.

And below is the explain result that you asked.

presto:tpchcarbon_default> explain select n_name, sum(l_extendedprice * (1 - l_discount)) as revenue from customer, orders, lineitem, supplier, nation, region where c_custkey = o_custkey and l_orderkey = o_orderkey and l_suppkey = s_suppkey and c_nationkey = s_nationkey and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = 'ASIA' and o_orderdate >= date('1994-01-01') and o_orderdate < date('1995-01-01') group by n_name order by revenue desc;
Query Plan

  • Output[n_name, revenue] => [n_name:varchar, sum:double]
    revenue := sum
    • RemoteMerge[sum DESC_NULLS_LAST] => [n_name:varchar, sum:double]
      • LocalMerge[sum DESC_NULLS_LAST] => [n_name:varchar, sum:double]
        • PartialSort[sum DESC_NULLS_LAST] => [n_name:varchar, sum:double]
          • RemoteExchange[REPARTITION] => n_name:varchar, sum:double
            • Project[] => [n_name:varchar, sum:double]
              • Aggregate(FINAL)[n_name][$hashvalue] => [n_name:varchar, $hashvalue:bigint, sum:double]
                sum := "sum"("sum_8")
                • LocalExchange[HASH][$hashvalue] ("n_name") => n_name:varchar, sum_8:double, $hashvalue:bigint
                  • RemoteExchange[REPARTITION][$hashvalue_9] => n_name:varchar, sum_8:double, $hashvalue_9:bigint
                    • Aggregate(PARTIAL)[n_name][$hashvalue_35] => [n_name:varchar, $hashvalue_35:bigint, sum_8:double]
                      sum_8 := "sum"("expr")
                    • Project[] => [n_name:varchar, expr:double, $hashvalue_35:bigint]
                      expr := ("l_extendedprice" * (1E0 - "l_discount"))
                      $hashvalue_35 := "combine_hash"(bigint '0', COALESCE("$operator$hash_code"("n_name"), 0))
                    • InnerJoin[("n_regionkey" = "r_regionkey")][$hashvalue_10, $hashvalue_32] => [l_extendedprice:double, l_discount:double, n_name:varchar]
                      Distribution: PARTITIONED
                    • RemoteExchange[REPARTITION][$hashvalue_10] => l_extendedprice:double, l_discount:double, n_regionkey:varchar, n_name:varchar, $hashvalue_10:bigint
                    • Project[] => [l_extendedprice:double, l_discount:double, n_regionkey:varchar, n_name:varchar, $hashvalue_31:bigint]
                      $hashvalue_31 := "combine_hash"(bigint '0', COALESCE("$operator$hash_code"("n_regionkey"), 0))
                    • InnerJoin[("c_nationkey" = "n_nationkey")][$hashvalue_11, $hashvalue_28] => [l_extendedprice:double, l_discount:double, n_regionkey:varchar, n_name:varchar]
                      Distribution: PARTITIONED
                    • RemoteExchange[REPARTITION][$hashvalue_11] => c_nationkey:varchar, l_extendedprice:double, l_discount:double, $hashvalue_11:bigint
                      Cost: {rows: ? (?), cpu: ?, memory: ?, network: ?}
                    • Project[] => [c_nationkey:varchar, l_extendedprice:double, l_discount:double, $hashvalue_27:bigint]
                      Cost: {rows: ? (?), cpu: ?, memory: ?, network: ?}
                      $hashvalue_27 := "combine_hash"(bigint '0', COALESCE("$operator$hash_code"("c_nationkey"), 0))
                    • InnerJoin[("l_suppkey" = "s_suppkey") AND ("c_nationkey" = "s_nationkey")][$hashvalue_12, $hashvalue_24] => [c_nationkey:varchar, l_extendedprice:double, l_discount:d
                      Distribution: PARTITIONED
                      Cost: {rows: ? (?), cpu: ?, memory: ?, network: ?}
                    • RemoteExchange[REPARTITION][$hashvalue_12] => c_nationkey:varchar, l_extendedprice:double, l_suppkey:varchar, l_discount:double, $hashvalue_12:bigint
                    • Project[] => [c_nationkey:varchar, l_extendedprice:double, l_suppkey:varchar, l_discount:double, $hashvalue_23:bigint]
                      $hashvalue_23 := "combine_hash"("combine_hash"(bigint '0', COALESCE("$operator$hash_code"("l_suppkey"), 0)), COALESCE("$operator$hash_code"("c_nationkey
                    • InnerJoin[("o_orderkey" = "l_orderkey")][$hashvalue_13, $hashvalue_20] => [c_nationkey:varchar, l_extendedprice:double, l_suppkey:varchar, l_discount:doub
                      Distribution: PARTITIONED
                    • RemoteExchange[REPARTITION][$hashvalue_13] => c_nationkey:varchar, o_orderkey:integer, $hashvalue_13:bigint
                    • Project[] => [c_nationkey:varchar, o_orderkey:integer, $hashvalue_19:bigint]
                      $hashvalue_19 := "combine_hash"(bigint '0', COALESCE("$operator$hash_code"("o_orderkey"), 0))
                    • InnerJoin[("c_custkey" = "o_custkey")][$hashvalue_14, $hashvalue_16] => [c_nationkey:varchar, o_orderkey:integer]
                      Distribution: PARTITIONED
                    • RemoteExchange[REPARTITION][$hashvalue_14] => c_custkey:varchar, c_nationkey:varchar, $hashvalue_14:bigint
                      Cost: {rows: ? (?), cpu: ?, memory: 0.00, network: ?}
                    • ScanProject[table = carbondata:carbondata:tpchcarbon_default.customer, originalConstraint = true] => [c_nationkey:varchar, c_custkey:v
                      Cost: {rows: ? (?), cpu: ?, memory: 0.00, network: 0.00}/{rows: ? (?), cpu: ?, memory: 0.00, network: 0.00}
                      $hashvalue_15 := "combine_hash"(bigint '0', COALESCE("$operator$hash_code"("c_custkey"), 0))
                      LAYOUT: CarbondataTableLayoutHandle{table=carbondata:tpchcarbon_default.customer, constraint=com.facebook.presto.spi.predicate.T
                      c_custkey := CarbondataColumnHandle{connectorId=carbondata, columnName=c_custkey, columnType=varchar, ordinalPosition=2}
                      c_nationkey := CarbondataColumnHandle{connectorId=carbondata, columnName=c_nationkey, columnType=varchar, ordinalPosition=1}
                    • LocalExchange[HASH][$hashvalue_16] ("o_custkey") => o_custkey:varchar, o_orderkey:integer, $hashvalue_16:bigint
                      Cost: {rows: ? (?), cpu: ?, memory: 0.00, network: ?}
                    • RemoteExchange[REPARTITION][$hashvalue_17] => o_custkey:varchar, o_orderkey:integer, $hashvalue_17:bigint
                      Cost: {rows: ? (?), cpu: ?, memory: 0.00, network: ?}
                    • ScanFilterProject[table = carbondata:carbondata:tpchcarbon_default.orders, originalConstraint = (("o_orderdate" >= DATE '1994-01-0
                      Cost: {rows: ? (?), cpu: ?, memory: 0.00, network: 0.00}/{rows: ? (?), cpu: ?, memory: 0.00, network: 0.00}/{rows: ? (?), cp
                      $hashvalue_18 := "combine_hash"(bigint '0', COALESCE("$operator$hash_code"("o_custkey"), 0))
                      LAYOUT: CarbondataTableLayoutHandle{table=carbondata:tpchcarbon_default.orders, constraint=com.facebook.presto.spi.predicate
                      o_orderdate := CarbondataColumnHandle{connectorId=carbondata, columnName=o_orderdate, columnType=date, ordinalPosition=0}
                      :: [[1994-01-01, 1995-01-01)]
                      o_custkey := CarbondataColumnHandle{connectorId=carbondata, columnName=o_custkey, columnType=varchar, ordinalPosition=4}
                      o_orderkey := CarbondataColumnHandle{connectorId=carbondata, columnName=o_orderkey, columnType=integer, ordinalPosition=3}
                    • LocalExchange[HASH][$hashvalue_20] ("l_orderkey") => l_orderkey:integer, l_extendedprice:double, l_suppkey:varchar, l_discount:double, $hashvalue_20:b
                      Cost: {rows: ? (?), cpu: ?, memory: 0.00, network: ?}
                    • RemoteExchange[REPARTITION][$hashvalue_21] => l_orderkey:integer, l_extendedprice:double, l_suppkey:varchar, l_discount:double, $hashvalue_21:bigi
                      Cost: {rows: ? (?), cpu: ?, memory: 0.00, network: ?}
                    • ScanProject[table = carbondata:carbondata:tpchcarbon_default.lineitem, originalConstraint = true] => [l_orderkey:integer, l_suppkey:varchar, l
                      Cost: {rows: ? (?), cpu: ?, memory: 0.00, network: 0.00}/{rows: ? (?), cpu: ?, memory: 0.00, network: 0.00}
                      $hashvalue_22 := "combine_hash"(bigint '0', COALESCE("$operator$hash_code"("l_orderkey"), 0))
                      LAYOUT: CarbondataTableLayoutHandle{table=carbondata:tpchcarbon_default.lineitem, constraint=com.facebook.presto.spi.predicate.TupleDoma
                      l_orderkey := CarbondataColumnHandle{connectorId=carbondata, columnName=l_orderkey, columnType=integer, ordinalPosition=5}
                      l_extendedprice := CarbondataColumnHandle{connectorId=carbondata, columnName=l_extendedprice, columnType=double, ordinalPosition=10}
                      l_suppkey := CarbondataColumnHandle{connectorId=carbondata, columnName=l_suppkey, columnType=varchar, ordinalPosition=7}
                      l_discount := CarbondataColumnHandle{connectorId=carbondata, columnName=l_discount, columnType=double, ordinalPosition=11}
                    • LocalExchange[HASH][$hashvalue_24] ("s_suppkey", "s_nationkey") => s_nationkey:varchar, s_suppkey:varchar, $hashvalue_24:bigint
                      Cost: {rows: ? (?), cpu: ?, memory: 0.00, network: ?}
                    • RemoteExchange[REPARTITION][$hashvalue_25] => s_nationkey:varchar, s_suppkey:varchar, $hashvalue_25:bigint
                      Cost: {rows: ? (?), cpu: ?, memory: 0.00, network: ?}
                    • ScanProject[table = carbondata:carbondata:tpchcarbon_default.supplier, originalConstraint = true] => [s_suppkey:varchar, s_nationkey:varchar, $hashvalue_2
                      Cost: {rows: ? (?), cpu: ?, memory: 0.00, network: 0.00}/{rows: ? (?), cpu: ?, memory: 0.00, network: 0.00}
                      $hashvalue_26 := "combine_hash"("combine_hash"(bigint '0', COALESCE("$operator$hash_code"("s_suppkey"), 0)), COALESCE("$operator$hash_code"("s_natio
                      LAYOUT: CarbondataTableLayoutHandle{table=carbondata:tpchcarbon_default.supplier, constraint=com.facebook.presto.spi.predicate.TupleDomain@1f}
                      s_nationkey := CarbondataColumnHandle{connectorId=carbondata, columnName=s_nationkey, columnType=varchar, ordinalPosition=4}
                      s_suppkey := CarbondataColumnHandle{connectorId=carbondata, columnName=s_suppkey, columnType=varchar, ordinalPosition=1}
                    • LocalExchange[HASH][$hashvalue_28] ("n_nationkey") => n_nationkey:varchar, n_regionkey:varchar, n_name:varchar, $hashvalue_28:bigint
                      Cost: {rows: ? (?), cpu: ?, memory: 0.00, network: ?}
                    • RemoteExchange[REPARTITION][$hashvalue_29] => n_nationkey:varchar, n_regionkey:varchar, n_name:varchar, $hashvalue_29:bigint
                      Cost: {rows: ? (?), cpu: ?, memory: 0.00, network: ?}
                    • ScanProject[table = carbondata:carbondata:tpchcarbon_default.nation, originalConstraint = true] => [n_name:varchar, n_nationkey:varchar, n_regionkey:varchar, $hashval
                      Cost: {rows: ? (?), cpu: ?, memory: 0.00, network: 0.00}/{rows: ? (?), cpu: ?, memory: 0.00, network: 0.00}
                      $hashvalue_30 := "combine_hash"(bigint '0', COALESCE("$operator$hash_code"("n_nationkey"), 0))
                      LAYOUT: CarbondataTableLayoutHandle{table=carbondata:tpchcarbon_default.nation, constraint=com.facebook.presto.spi.predicate.TupleDomain@1f}
                      n_nationkey := CarbondataColumnHandle{connectorId=carbondata, columnName=n_nationkey, columnType=varchar, ordinalPosition=1}
                      n_regionkey := CarbondataColumnHandle{connectorId=carbondata, columnName=n_regionkey, columnType=varchar, ordinalPosition=2}
                      n_name := CarbondataColumnHandle{connectorId=carbondata, columnName=n_name, columnType=varchar, ordinalPosition=0}
                    • LocalExchange[HASH][$hashvalue_32] ("r_regionkey") => r_regionkey:varchar, $hashvalue_32:bigint
                      Cost: {rows: ? (?), cpu: ?, memory: 0.00, network: ?}
                    • RemoteExchange[REPARTITION][$hashvalue_33] => r_regionkey:varchar, $hashvalue_33:bigint
                      Cost: {rows: ? (?), cpu: ?, memory: 0.00, network: ?}
                    • ScanFilterProject[table = carbondata:carbondata:tpchcarbon_default.region, originalConstraint = ("r_name" = CAST('ASIA' AS varchar)), filterPredicate = ("r_name" = CAST('ASIA' AS
                      Cost: {rows: ? (?), cpu: ?, memory: 0.00, network: 0.00}/{rows: ? (?), cpu: ?, memory: 0.00, network: 0.00}/{rows: ? (?), cpu: ?, memory: 0.00, network: 0.00}
                      $hashvalue_34 := "combine_hash"(bigint '0', COALESCE("$operator$hash_code"("r_regionkey"), 0))
                      LAYOUT: CarbondataTableLayoutHandle{table=carbondata:tpchcarbon_default.region, constraint=com.facebook.presto.spi.predicate.TupleDomain@1f5f59fb}
                      r_regionkey := CarbondataColumnHandle{connectorId=carbondata, columnName=r_regionkey, columnType=varchar, ordinalPosition=1}
                      r_name := CarbondataColumnHandle{connectorId=carbondata, columnName=r_name, columnType=varchar, ordinalPosition=0}
                      :: [[ASIA]]

(1 row)

@ajantha-bhat
Copy link
Contributor Author

ajantha-bhat commented Aug 10, 2018 via email

@LeonBein
Copy link

We have the same error on 1TB of TPCH files.
Were you able to fix your issue?
Also, #11563 and #3005 both seem to be the same issue but none of them have a concrete solution proposed ...
(At least nothing that worked for us)

@zhengxingmao
Copy link
Contributor

+1
I run tpcds 10T on 8 worker nodes and 1 condinator with 128GiB memory and 16 cores by Trino-360 version ,meet it again.
Any effective solutions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants