Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create option on @Index to allow for propagation to subclasses #367

Closed
bzikarsky opened this issue May 21, 2017 · 5 comments
Closed

Create option on @Index to allow for propagation to subclasses #367

bzikarsky opened this issue May 21, 2017 · 5 comments

Comments

@bzikarsky
Copy link

In case of more complex class hierarchies it'd be nice to have an index (or primary-constraint) propagate to all subclasses.

Current Behavior

Currently the @Index annotation only triggers index-/constraint-creation on the entityType/label it's defined on. If there a subclasses, they don't get the index. This is especially problematic when the primary key gets redefined with primary = true. When loadAll(type) (or similiar methods) is used to load objects of a subclass, that subclasses type is used on the query. But the primary-key is not present as an index and slows down the query immensely.

Expected Behavior / Possible Solution

Let's assume an artificial argument propagateToSubclasses=[true|false] on @Index. A simple class-hierarchy can then be defined like this:

@NodeEntity
abstract class Animal {

    @Index(unique = true, primary = true, propagateToSubclasses = true)
    private String qualifiedName;

}

@NodeEntity
class Cat extends Animal {

    @Property
    private int independence;
}


@NodeEntity
class Dog extends Animal {

    @Property
    private int loyalty;
}

This would add the constraint (and implicit index) to :Animal and additionally to :Dog and :Cat

Alternative solutions

Use an attribute inheritIndices on @NodeEntity instead of propagateToSubclasses

This moves the flag to inherit the indices of the parent class to the subclass. Pro: Matches the location of the extends-definition. Con: Has to be defined explicitly on all subclasses.

Don't use an annotation attribute but do this automatically

Title says it all. One could discuss to only automatically inherit the custom primary index/constraint. But that is even more "magic".

Probably not a good idea, since it changes the behavior and should probably be classified as a breaking change.

Context

I have project which persists a traceability model (about 150M nodes) in neo4j. The entities are structured hierarchically (e.g. TestResult -> Artifact -> Entity) and some basic properties are enforced at higher level (e.g. every Entity needs to have a qualifiedName). The problem arises when those are indexed properties (such as qualifiedName which is the primary key for those nodes). Any query specific to a subclass (e.g. a TestResult with a specific id/qualifiedName) takes a long time, even if it's as simple as this: MATCH (n:TestResult) WHERE n.qualifiedName.

One can argue that the query should uses Artifact in its type-clause, but this is neither user-freindly nor supported by OGM.

I am willing to contribute on this matter. 😃

Your Environment

  • OGM Version used: 2.1.2
  • Java Version used: 1.8.0_131 (x64)
  • Neo4J Version used: 3.2.0
  • Bolt Driver Version used (if applicable): 2.1.2
  • Operating System and Version: Win 10 Pro x64, Build 15063
  • Link to your project: n/a (closed source, proprietary)
@bzikarsky
Copy link
Author

It'd be nice to get an assessment if this is a feature you approve. Then I'd put some time into a working implementation.

Currently we are working around this "issue" by having a custom helper, which checks the class hierarchy and adds missing indices by itself. But it feels kind of dirty.

@frant-hartm
Copy link
Contributor

Hi,

only 1 primary index per class hierarchy is supported - it is used in MERGE statements - if there were multiple primary indexes we wouldn't know which to use.

As for non-primary indexes/constraints:
Propagating indexes down the hierarchy doesn't seem like a very good idea - it would lead to slower write times and data would take up more space, but we will think about it as a opt-in feature.

Solution that should be as fast in terms of querying is to provide more labels in the query - the label of the type you are specifically querying and the label containing primary index.

E.g. for Dog the query would be

MATCH (n:Dog:Animal) WHERE n.primaryKey=....

and neo4j should pick the right label index if it exists.

@bzikarsky
Copy link
Author

Maybe this should be spun off into 2 features:

a) Opt-in index-inheritance
b) Support for class hierarchies in the find*-methods -- The queries built by the delegates always use MetaData/Session.entityType(class) which only returns the specific class' label. Therefor those queries don't hit the primary keys. The delegates need to be extended in such a way that they built a chain of labels until they hit the label containing the primary-key definition.

@DwayneSmurdon
Copy link

I could really use this feature right now, but I am not sure it is a good long term choice. I think option (a) above borders on SDN getting back into managing indices. An OGM arguably shouldn't manage indices, although I think another library to help with this (Java based) would be great.

However, an OGM could (should) intelligently use indexes that are available, which I think option (b) implies. That said, it has to make an assumption that all Dogs are Animals. Just because the Java class hierarchy states that this assumption is true, it doesn't mean it holds true for the actual state of the database and could therefore miss (i.e. there is already a (Dog{type:'labrador'}) but for some reason not an (Animal{type:'labrador'}) a lookup. This could have serious consequences, as my Labradors might start to breed. :)

However, perhaps like @bzikarsky I am constructing a strictly typed db of reference data. I could use this feature. It makes me wonder what type of conveniences and optimizations could be made if there were some contract made that my neo4j state adheres to my Java state hierarchy. For example, I actually want the 'name' property index for any and all labels. I would love to be able to specify that once. I could also accept that I could have delayed indexing turned on for Super or Sub classes so that loading can be less affected because of multiple indices.

@frant-hartm is right that we can just query on Animal:Dog or presumably Dog:Animal, but that is less than optimal and easily forgotten. But still, perhaps this is the best practice. Would a middle ground be to at least emit a warning of some kind during compilation? "This lookup does not appear to use an index." Maybe even emitting a 'score' based on the Query Plan. This might get us Java devs to consider query plans more -- I rarely look at them and to be honest I have not mastered parsing them.

@meistermeier
Copy link
Collaborator

Indexes will now created on the "bottom most" label/class. #514
There is probably a revisit when we aim for version 4.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants