Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query on IMap doesnt use Index when running locally #12351

Closed
vertex-github opened this issue Feb 14, 2018 · 13 comments
Closed

Query on IMap doesnt use Index when running locally #12351

vertex-github opened this issue Feb 14, 2018 · 13 comments

Comments

@vertex-github
Copy link

@vertex-github vertex-github commented Feb 14, 2018

HZ 3.9.2. Query an IMap with map.values( predicate ) where the 'predicate' is IndexAware. When running on-heap, the code in QueryPartitionOperation.java:50 only leverages an index if running off-heap. Why don't we use an index when running on-heap as well?

@jerrinot
Copy link
Contributor

@jerrinot jerrinot commented Feb 15, 2018

The QueryPartitionOperation should be called if and only if there was a topology change when a query was running. In this case there is a fallback to per-partition predicate evaluation. However the on-heap uses a single global index (per map and member) -> it cannot be used to evaluate a predicate on a single partition only.

See QueryOperation for the default logic.

a long story short: If you see QueryPartitionOperation in a stable cluster then there is some other issue.

@vertex-github
Copy link
Author

@vertex-github vertex-github commented Feb 15, 2018

@jerrinot
Copy link
Contributor

@jerrinot jerrinot commented Feb 19, 2018

@vertex-github: can you please share the reproducer?

@mmedenjak
Copy link
Contributor

@mmedenjak mmedenjak commented Feb 21, 2018

@vertex-github any update on this?

@taburet taburet self-assigned this Feb 27, 2018
@taburet
Copy link
Contributor

@taburet taburet commented Feb 27, 2018

@vertex-github I ran some basic query-index tests and was unable to reproduce the issue. Probably your setup is more complex and that's why I'm unable to reproduce it. Could you please share the reproducer?

@vertex-github
Copy link
Author

@vertex-github vertex-github commented Feb 28, 2018

Just run a query on a map, on-heap, with an IndexAwarePredicate class via an EntryProcessor calling map.values( myIndexAwarePredicate ). The IndexAwarePredicate methods isIndexed and getIndex are not invoked.

@vertex-github
Copy link
Author

@vertex-github vertex-github commented Feb 28, 2018

Stacktrace:

Breakpoint reached
                  at com.hazelcast.map.impl.query.QueryPartitionOperation.run(QueryPartitionOperation.java:55)
                  at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.run(OperationRunnerImpl.java:194)
                  at com.hazelcast.spi.impl.operationexecutor.impl.OperationExecutorImpl.run(OperationExecutorImpl.java:406)
                  at com.hazelcast.spi.impl.operationexecutor.impl.OperationExecutorImpl.runOrExecute(OperationExecutorImpl.java:433)
                  at com.hazelcast.spi.impl.operationservice.impl.Invocation.doInvokeLocal(Invocation.java:569)
                  at com.hazelcast.spi.impl.operationservice.impl.Invocation.doInvoke(Invocation.java:554)
                  at com.hazelcast.spi.impl.operationservice.impl.Invocation.invoke0(Invocation.java:513)
                  at com.hazelcast.spi.impl.operationservice.impl.Invocation.invoke(Invocation.java:207)
                  at com.hazelcast.spi.impl.operationservice.impl.OperationServiceImpl.invokeOnPartition(OperationServiceImpl.java:310)
                  at com.hazelcast.map.impl.query.QueryDispatcher.dispatchPartitionScanQueryOnOwnerMemberOnPartitionThread(QueryDispatcher.java:136)
                  at com.hazelcast.map.impl.query.MapQueryEngineImpl.runQueryOnGivenPartition(MapQueryEngineImpl.java:139)
                  at com.hazelcast.map.impl.query.MapQueryEngineImpl.execute(MapQueryEngineImpl.java:92)
                  at com.hazelcast.map.impl.proxy.MapProxyImpl.executePredicate(MapProxyImpl.java:671)
                  at com.hazelcast.map.impl.proxy.MapProxyImpl.values(MapProxyImpl.java:655)
                  at com.vertex.XYZ.serialization.FlatPerfMain$EP2.process(FlatPerfMain.java:535)
                  at com.hazelcast.map.impl.operation.EntryOperator.process(EntryOperator.java:335)
                  at com.hazelcast.map.impl.operation.EntryOperator.operateOnKeyValueInternal(EntryOperator.java:194)
                  at com.hazelcast.map.impl.operation.EntryOperator.operateOnKey(EntryOperator.java:179)
                  at com.hazelcast.map.impl.operation.EntryOperation.runVanilla(EntryOperation.java:395)
                  at com.hazelcast.map.impl.operation.EntryOperation.run(EntryOperation.java:177)
                  at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.run(OperationRunnerImpl.java:194)
                  at com.hazelcast.spi.impl.operationexecutor.impl.OperationThread.process(OperationThread.java:120)
                  at com.hazelcast.spi.impl.operationexecutor.impl.OperationThread.run(OperationThread.java:100)
@jerrinot
Copy link
Contributor

@jerrinot jerrinot commented Feb 28, 2018

@vertex-github: just to double-check - does your predicate extend / is_wrapped_in PartitionPredicate ?

@vertex-github
Copy link
Author

@vertex-github vertex-github commented Feb 28, 2018

@jerrinot
Copy link
Contributor

@jerrinot jerrinot commented Feb 28, 2018

right. so non-hd indexes are not partition specific. you can't ask an index: "give me entries matching THIS and only for a partition THAT". The index is not capable of doing this.

@vertex-github
Copy link
Author

@vertex-github vertex-github commented Feb 28, 2018

@jerrinot
Copy link
Contributor

@jerrinot jerrinot commented Feb 28, 2018

Indexes spanning multiple partitions have some benefits. Imagine an index as a hashmap where key is the indexed attribute and value is a collection of all matching entries. Queries such as indexedAttribute = "foo" are just a single map lookup indexedAttributeIndex.get("foo") No matter how many partitions/operations_threads you have - it's constant in time.

Obviously one of the down-sides is when you are interested only in entries from a specific partition - the index will always give you matching entries from all partitions (stored on that member). So it's not used for queries wrapped inside the PartitionPredicate However this is not a common case so it's an acceptable trade-off.

There is an optimization opportunity right here: If we know a query wrapped inside a PartitionPredicate has a really good selectivity then it might make sense to still hit the "global" index and then filter results only for the partition the user is interested in. This is currently not implemented, but when you have a custom predicate then you can still do this manually.

HD indexes are backed by a different data structure. The key difference is the structure backing HD indexes is not thread-safe -> partitioning them was a logical choice. We take an advantage of this when running a query wrapped inside the partition predicate - as now the index can give us exactly the results for a given partition. The downside is when you are running a query spanning all partitions (=the common case) then Hazelcast does lookups into multiple instances of the index-backing structure and then compose the results. It's O(n) where n is partition count. There is a bit of parallelism so it's not as bad in practice.

@jerrinot
Copy link
Contributor

@jerrinot jerrinot commented Feb 28, 2018

based on ☝️ closing as "not a bug"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants
You can’t perform that action at this time.