Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception: Query would require a scan without a limit on table(s) #19

Closed
neko940709 opened this issue May 31, 2017 · 7 comments
Closed

Comments

@neko940709
Copy link

When I execute a query in the TPC-H dataset, the mapd throws this exception, what's the meaning of this exception?

@asuhan
Copy link
Contributor

asuhan commented May 31, 2017

@neko940709 Thanks for reporting, which query? It usually means failure to optimize away an intermediate projection. We don't want to let it run anyway because it'd take a very long time / use a lot of memory.

The larger point: we're aware that we need to work towards TPC-H compatibility. Over the next few months, we plan to get most / all of the queries working. Several pieces of work related to this goal are already in progress.

@neko940709
Copy link
Author

neko940709 commented May 31, 2017

The query is like this:
select l_orderkey, sum(l_extendedprice * (1 - l_discount)) as revenue, o_orderdate, o_shippriority from lineitem, orders, customer where l_orderkey = o_orderkey and c_custkey = o_custkey and c_mktsegment = 'BUILDING' group by l_orderkey, o_orderdate, o_shippriority order by revenue desc, o_orderdate;

The query executes on V2.17.2 TPC-H benchmark which I used to generate 10GB dataset. This query is also a provided sample and I simplify it. Actually, I have met this exception several times.
The first time is doing the following query:
select * from newdata;
The NEWDATA table consists of 3 columns, all the columns are INTEGER type. The size of Newdata table is about 2.5GB. If you said it's a failure of the intermediate projection, I DONT understand why this simple query will also throw exception?
This is the log message:
I0524 23:10:21.887410 132126 MapDHandler.cpp:2295] select * from newdata; I0524 23:10:21.888273 132126 Calcite.cpp:196] User mapd catalog mapd sql 'select * from newdata; I0524 23:10:21.897106 132126 Calcite.cpp:221] Time marshalling in JNI 0 (ms), Time in Java Calcite 9 (ms) E0524 23:10:22.045779 132126 MapDHandler.cpp:2331] Exception: Query would require a scan without a limit on table(s): newdata

@neko940709
Copy link
Author

More, there is other exception I've got during the TPC-H test.Some of the sql will make the mapd_server crash. Here is an example:
select s_acctbal, s_name, n_name, p_partkey, p_mfgr, s_address, s_phone, s_comment from part, supplier, partsupp, nation, region where p_partkey = ps_partkey and s_suppkey = ps_suppkey and p_size = 30 and p_type like '%SMALL' and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = 'ASIA' order by s_acctbal desc, n_name, s_name, p_partkey;
This query throws exception like this:
Check failed: static_cast<size_t>(curr_idx) < target_count_ (24 vs. 24)

@neko940709
Copy link
Author

Apart from the TPC-H data, I ran the queries you give on your official website. The query was executed on the FLIGHT data in 2.6GB size. The query is the following:
select count(*) from flights_2008_7M where origin_name='Lambert-St Louis International' and dest_name = 'Lincoln Municipal;
And the exception is:
The complexity of matching the regular expression exceeded predefined bounds. Try refactoring the regular expression to make each choice made by the state machine unambiguous. This exception is thrown to prevent "eternal" matches that take an indefinite period time to locate.
How should I modify the sql ?

@asuhan
Copy link
Contributor

asuhan commented May 31, 2017

I'll answer the questions one by one.

  1. select * from newdata tries to select everything from a big table. It's a projection without a "reasonable" limit (I think we ask it to be at most 10M). If you want to see a sample of the data, you can specify a limit and it'll work. Allowing this type of query without a limit will need us to implement pagination. I believe we'll end up doing it (for the sake of least surprise principle), but we haven't found an use-case which wouldn't be served by specifying a reasonable limit.
  2. It's not an exception, it's a crash. We'll look into it very soon.
  3. You shouldn't modify the SQL, we'll fix it. Your query misses the end quote for the second literal string, should be'Lincoln Municipal', not 'Lincoln Municipal. We'll check / fix it on our website, thanks for the catch.

@pwentrys
Copy link

Just wanted to say I strongly agree with you on your first point, @asuhan. I would argue that even if you come up with a need case, it wouldn't be the best (going by fastest) solution. If you're doing this with that much data, the person with the use-case might need to adjust other parts in their system's flow. I view any select * from foo statement to be poor practice (lack of future-proofing) and "unreasonable" because you just opened the door for many errors that will be done by others working with the system.

@randyzwitch
Copy link
Contributor

Closing this issue for now, but please feel free to start another discussion at https://community.mapd.com/ if this is still a problem.

fexolm pushed a commit to fexolm/omniscidb that referenced this issue Apr 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants