Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indicate progress only if limit is applied #618

Merged
merged 1 commit into from Apr 12, 2019

Conversation

6 participants
@martint
Copy link
Member

martint commented Apr 11, 2019

The code was always returning an indication that it could benefit
from the limit pushdown even if that wasn't the case. This could
cause the optimizer to loop indefinitely in some cases.

@cla-bot cla-bot bot added the cla-signed label Apr 11, 2019

@dain

dain approved these changes Apr 11, 2019

@@ -356,10 +356,12 @@ public ConnectorTableProperties getTableProperties(ConnectorSession session, Con
{
MemoryTableHandle table = (MemoryTableHandle) handle;

if (!table.getLimit().isPresent() || limit < table.getLimit().getAsLong()) {
table = new MemoryTableHandle(table.getId(), OptionalLong.of(limit));
if (table.getLimit().isPresent() && table.getLimit().getAsLong() <= limit) {

This comment has been minimized.

Copy link
@kokosing

kokosing Apr 11, 2019

Member

It is strange when given limit is supported we return Optional.empty() like in the case where limit is not supported.

It is not practical case, but in case where you have subsequent limits then only first of the would be pushed down to connector. However this example gets more practical in case of predicate pushdown.

IMO we should handle that in optimizer (engine) code, instead requiring all connectors to have if like that.

This comment has been minimized.

Copy link
@martint

martint Apr 11, 2019

Author Member

It is strange when given limit is supported we return Optional.empty() like in the case where limit is not supported.

Optional.empy() just means "applying the provided limit has no effect (so don't try to do it again)". I don't think we need to distinguish between supported and not supported. It makes for awkward implementations that never return Optional.empty() and indicate no progress with a special flag in the object they return.

In the long term (when we connectors can provide custom Rule instances), this will be replaced by a Rule. Connectors would indicate they support limit pushdown by providing a rule, and the rule will return Rule.Result.empty() to indicate it had no effect or couldn't be applied in that particular case.

It is not practical case, but in case where you have subsequent limits then only first of the would be pushed down to connector. However this example gets more practical in case of predicate pushdown.

Subsequent limits will be pushed if they are smaller than the current limit. If, for some reason, you end up with a larger limit on top of a table scan that has a smaller limit applied, the right way to fix this is for the engine to eliminate the Limit after determining that the max number of rows produced by the TableScan is going to be smaller, similar to how we do it in #441

IMO we should handle that in optimizer (engine) code, instead requiring all connectors to have if like that.

How do you suggest we do that? There has to be a way for the connector to signal that the "applying the provided limit has no effect (so don't try to do it again)", or the optimizer will loop forever. An alternative is to introduce machinery like we have for Rule (patterns, etc), but that would be a lot of work and would probably require duplicating a lot of code due to limitations of what can go in the SPI.

This comment has been minimized.

Copy link
@sopel39

sopel39 Apr 11, 2019

Member

How do you suggest we do that?

I think rule could get table property before and after applyLimit. Then it could derive that returned limit is the same as previously. However, currently there is no such thing as LimitProperty so it might just not be worth it.

This comment has been minimized.

Copy link
@electrum

electrum Apr 12, 2019

Member

I agree that it would be nice if the engine could handle this, otherwise it's more difficult and error prone to write connectors. But it sounds like we don't have a feasible way to do that.

This comment has been minimized.

Copy link
@kokosing

kokosing Apr 12, 2019

Member

LimitProperty sounds good to me.

Consider an example that connector knows a lot about its tables. It knows things like tables cardinality, partitioning. For a table with a lower cardinality than limit, connector in first rule application would need to return Optional.empty() (saying no progress), but for engine it looks like connector is not capable to push down limit. In regards to LIMIT is not a big deal, however in case of partitioning or other SQL fragments it might affect query latency.

In my opinion if we are going to push down some property to connector, we need to have an ability for connector to say what are actual properties. The there is no need to even call applyLimit in case where required properties are already satisfied with actual properties.

An alternative is to introduce machinery like we have for Rule (patterns, etc),

We do not need such machinery. We can handle this manually in Rule code. Like patterns that are too complicated for pattern matching.

This comment has been minimized.

Copy link
@findepi

findepi Apr 12, 2019

Member

I agree that it would be nice if the engine could handle this[...]. But it sounds like we don't have a feasible way to do that.

Let's have a test. It might not catch all of the cases, but it still may catch some.

This comment has been minimized.

Copy link
@martint

martint Apr 12, 2019

Author Member

LimitProperty, or rather, "max row count" or "max cardinality", would work for the case where the limit is guaranteed by the connector. However, if the connector can't guarantee a limit, such as a when pushing the limit to a parallel database, it wouldn't be able to describe the "max cardinality" of the derived table, so the engine would have no way of knowing whether it's safe to call applyLimit again.

The other downside of relying on that property is that it requires connectors to implement two seemingly disconnected APIs.

This comment has been minimized.

Copy link
@martint

martint Apr 12, 2019

Author Member

Let's have a test. It might not catch all of the cases, but it still may catch some.

There's already a test that would catch this if connectors implemented it wrong: AbstractTestIntegrationSmokeTest.testLimit().

@findepi

This comment has been minimized.

Copy link
Member

findepi commented Apr 11, 2019

This could cause the optimizer to loop indefinitely in some cases.

Would it be possible to have a test?
Especially if this is connector's responsibility, having a test case in AbstractTestIntegrationSmokeTest could help implementors.

@martint

This comment has been minimized.

Copy link
Member Author

martint commented Apr 11, 2019

Would it be possible to have a test?
Especially if this is connector's responsibility, having a test case in AbstractTestIntegrationSmokeTest could help implementors.

It doesn't happen currently for this connector, because the engine eliminates the Limit due to the connector saying it's guaranteed. However, that's not required for the engine so it's a potential for future breakage. The current implementation does not satisfy the contract of the interface.

@martint martint force-pushed the martint:memory-limit branch from 3efb024 to d98763e Apr 11, 2019

Indicate progress only if limit is applied
The code was always returning an indication that it could benefit
from the limit pushdown even if that wasn't the case. This could
cause the optimizer to loop indefinitely in some cases.

@martint martint force-pushed the martint:memory-limit branch from d98763e to f27fddb Apr 12, 2019

@martint martint closed this Apr 12, 2019

@martint martint deleted the martint:memory-limit branch Apr 12, 2019

@martint martint merged commit f27fddb into prestosql:master Apr 12, 2019

1 check passed

verification/cla-signed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.