Create option on @Index to allow for propagation to subclasses #367

bzikarsky · 2017-05-21T06:44:43Z

In case of more complex class hierarchies it'd be nice to have an index (or primary-constraint) propagate to all subclasses.

Current Behavior

Currently the @Index annotation only triggers index-/constraint-creation on the entityType/label it's defined on. If there a subclasses, they don't get the index. This is especially problematic when the primary key gets redefined with primary = true. When loadAll(type) (or similiar methods) is used to load objects of a subclass, that subclasses type is used on the query. But the primary-key is not present as an index and slows down the query immensely.

Expected Behavior / Possible Solution

Let's assume an artificial argument propagateToSubclasses=[true|false] on @Index. A simple class-hierarchy can then be defined like this:

@NodeEntity
abstract class Animal {

    @Index(unique = true, primary = true, propagateToSubclasses = true)
    private String qualifiedName;

}

@NodeEntity
class Cat extends Animal {

    @Property
    private int independence;
}


@NodeEntity
class Dog extends Animal {

    @Property
    private int loyalty;
}

This would add the constraint (and implicit index) to :Animal and additionally to :Dog and :Cat

Alternative solutions

Use an attribute `inheritIndices` on `@NodeEntity` instead of `propagateToSubclasses`

This moves the flag to inherit the indices of the parent class to the subclass. Pro: Matches the location of the extends-definition. Con: Has to be defined explicitly on all subclasses.

Don't use an annotation attribute but do this automatically

Title says it all. One could discuss to only automatically inherit the custom primary index/constraint. But that is even more "magic".

Probably not a good idea, since it changes the behavior and should probably be classified as a breaking change.

Context

I have project which persists a traceability model (about 150M nodes) in neo4j. The entities are structured hierarchically (e.g. TestResult -> Artifact -> Entity) and some basic properties are enforced at higher level (e.g. every Entity needs to have a qualifiedName). The problem arises when those are indexed properties (such as qualifiedName which is the primary key for those nodes). Any query specific to a subclass (e.g. a TestResult with a specific id/qualifiedName) takes a long time, even if it's as simple as this: MATCH (n:TestResult) WHERE n.qualifiedName.

One can argue that the query should uses Artifact in its type-clause, but this is neither user-freindly nor supported by OGM.

I am willing to contribute on this matter. 😃

Your Environment

OGM Version used: 2.1.2
Java Version used: 1.8.0_131 (x64)
Neo4J Version used: 3.2.0
Bolt Driver Version used (if applicable): 2.1.2
Operating System and Version: Win 10 Pro x64, Build 15063
Link to your project: n/a (closed source, proprietary)

The text was updated successfully, but these errors were encountered:

bzikarsky · 2017-05-23T14:20:33Z

It'd be nice to get an assessment if this is a feature you approve. Then I'd put some time into a working implementation.

Currently we are working around this "issue" by having a custom helper, which checks the class hierarchy and adds missing indices by itself. But it feels kind of dirty.

frant-hartm · 2017-05-23T14:33:04Z

Hi,

only 1 primary index per class hierarchy is supported - it is used in MERGE statements - if there were multiple primary indexes we wouldn't know which to use.

As for non-primary indexes/constraints:
Propagating indexes down the hierarchy doesn't seem like a very good idea - it would lead to slower write times and data would take up more space, but we will think about it as a opt-in feature.

Solution that should be as fast in terms of querying is to provide more labels in the query - the label of the type you are specifically querying and the label containing primary index.

E.g. for Dog the query would be

MATCH (n:Dog:Animal) WHERE n.primaryKey=....

and neo4j should pick the right label index if it exists.

bzikarsky · 2017-05-23T14:51:30Z

Maybe this should be spun off into 2 features:

a) Opt-in index-inheritance
b) Support for class hierarchies in the find*-methods -- The queries built by the delegates always use MetaData/Session.entityType(class) which only returns the specific class' label. Therefor those queries don't hit the primary keys. The delegates need to be extended in such a way that they built a chain of labels until they hit the label containing the primary-key definition.

DwayneSmurdon · 2017-09-22T05:04:28Z

I could really use this feature right now, but I am not sure it is a good long term choice. I think option (a) above borders on SDN getting back into managing indices. An OGM arguably shouldn't manage indices, although I think another library to help with this (Java based) would be great.

However, an OGM could (should) intelligently use indexes that are available, which I think option (b) implies. That said, it has to make an assumption that all Dogs are Animals. Just because the Java class hierarchy states that this assumption is true, it doesn't mean it holds true for the actual state of the database and could therefore miss (i.e. there is already a (Dog{type:'labrador'}) but for some reason not an (Animal{type:'labrador'}) a lookup. This could have serious consequences, as my Labradors might start to breed. :)

However, perhaps like @bzikarsky I am constructing a strictly typed db of reference data. I could use this feature. It makes me wonder what type of conveniences and optimizations could be made if there were some contract made that my neo4j state adheres to my Java state hierarchy. For example, I actually want the 'name' property index for any and all labels. I would love to be able to specify that once. I could also accept that I could have delayed indexing turned on for Super or Sub classes so that loading can be less affected because of multiple indices.

@frant-hartm is right that we can just query on Animal:Dog or presumably Dog:Animal, but that is less than optimal and easily forgotten. But still, perhaps this is the best practice. Would a middle ground be to at least emit a warning of some kind during compilation? "This lookup does not appear to use an index." Maybe even emitting a 'score' based on the Query Plan. This might get us Java devs to consider query plans more -- I rarely look at them and to be honest I have not mastered parsing them.

meistermeier · 2018-11-07T11:38:13Z

Indexes will now created on the "bottom most" label/class. #514
There is probably a revisit when we aim for version 4.0

frant-hartm added the new feature label May 22, 2017

frant-hartm mentioned this issue Nov 25, 2017

AutoIndex label NPE during initialization and index lookup #437

Closed

frant-hartm mentioned this issue Jan 17, 2018

Load/count/delete queries should use all labels from hierarchy #461

Closed

meistermeier closed this as completed Nov 7, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create option on @Index to allow for propagation to subclasses #367

Create option on @Index to allow for propagation to subclasses #367

bzikarsky commented May 21, 2017

bzikarsky commented May 23, 2017

frant-hartm commented May 23, 2017

bzikarsky commented May 23, 2017

DwayneSmurdon commented Sep 22, 2017

meistermeier commented Nov 7, 2018

Create option on @Index to allow for propagation to subclasses #367

Create option on @Index to allow for propagation to subclasses #367

Comments

bzikarsky commented May 21, 2017

Current Behavior

Expected Behavior / Possible Solution

Alternative solutions

Use an attribute inheritIndices on @NodeEntity instead of propagateToSubclasses

Don't use an annotation attribute but do this automatically

Context

Your Environment

bzikarsky commented May 23, 2017

frant-hartm commented May 23, 2017

bzikarsky commented May 23, 2017

DwayneSmurdon commented Sep 22, 2017

meistermeier commented Nov 7, 2018

Use an attribute `inheritIndices` on `@NodeEntity` instead of `propagateToSubclasses`