Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tweaking big sort #6042

Open
jangorecki opened this issue Sep 6, 2016 · 10 comments
Open

tweaking big sort #6042

jangorecki opened this issue Sep 6, 2016 · 10 comments

Comments

@jangorecki
Copy link

jangorecki commented Sep 6, 2016

I have table of 5e9 rows, I read in a faq that it is better to avoid order by, but is there anything I could try to overcome current query failure?

presto:benchmark> desc x;
 Column |  Type  | Comment 
--------+--------+---------
 key    | bigint |         
 x2     | bigint |         
(2 rows)

presto:benchmark> SELECT COUNT(*) FROM (SELECT * FROM x ORDER BY KEY) t;
Query 20160905_223346_00088_bivq2 [FAILED] i[2.18B 13.5G 62M] o[2.18B 13.5G 62M] splits[170/597/80]
Query 20160905_223346_00088_bivq2, FAILED, 9 nodes
Splits: 847 total, 80 done (9.45%)
3:43 [2.18B rows, 13.5GB] [9.74M rows/s, 62MB/s]

Query 20160905_223346_00088_bivq2 failed: 2147483639

My mem settings

query.max-memory=1600GB
query.max-memory-per-node=200GB
resources.reserved-system-memory=24GB

v0.150

@electrum
Copy link
Contributor

electrum commented Sep 6, 2016 via email

@jangorecki
Copy link
Author

jangorecki commented Sep 6, 2016

I need to materialise sort query results, or get as close to materialising it as I can. I need to measure time of sorting only (or as close to "only" as possible), and not interfaces, jdbc drivers or speed of printing results to console - thus wrapped into count(*) - for fair comparison. According to my tests optimizer still do sort in this case, exactly as I need.

@cawallin
Copy link
Member

@jangorecki Do you still have any questions regarding sorting in Presto, or can we close this issue?

@jangorecki
Copy link
Author

@cawallin I am still interested in measuring time of ORDER BY in presto. If optimizer now skips sorting for COUNT outer query then I would expect to have a query hint to at least have an option to measure ORDER BY timing. Otherwise presto will be marked as not capable to measure this operation in benchmark reports.

@cawallin
Copy link
Member

cawallin commented Apr 3, 2017

You can try inserting into the blackhole connector, which is as if piping the results to /dev/null. For example:
create table blackhole.default.foo as select l_orderkey from lineitem order by l_orderkey;
That will add a little bit of overhead, but the impact will be negligible.

@electrum
Copy link
Contributor

electrum commented Apr 3, 2017 via email

@stale
Copy link

stale bot commented Apr 3, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added the stale label Apr 3, 2019
@jangorecki
Copy link
Author

I think it should not be closed, stale is still not resolved

@stale stale bot removed the stale label Apr 3, 2019
@stale
Copy link

stale bot commented Jun 22, 2021

This issue has been automatically marked as stale because it has not had any activity in the last 2 years. If you feel that this issue is important, just comment and the stale tag will be removed; otherwise it will be closed in 7 days. This is an attempt to ensure that our open issues remain valuable and relevant so that we can keep track of what needs to be done and prioritize the right things.

@stale stale bot added the stale label Jun 22, 2021
@jangorecki
Copy link
Author

quite the same as 2 years ago

@stale stale bot removed the stale label Jun 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants