How to Use Indices
HyperGraphDB allows you to index atoms by their attributes. Internally, various indices are maintained around the basic organizational layout of the hypergraph data. For example, because every atoms X has an associated incidence set holding all links pointing to it, the set of those links is readily available and can be efficiently intersected with other incidence sets. But to quickly retrieve a set of atoms based on their values, one needs to explicitly create an index.
At the lowest level, indices are just key-value tables that the storage layer manages. There are also bi-directional indices where a set of keys matching a given value can be retrieved. Some type implementations work directly with the storage layer to maintain internal indices normally hidden from the user. Such internal indices are of no concern to us here. Suffice it to mention that given a unique name, you can create an index using the HGStore API and then put whatever you want in it as long as you can translated your data to/from byte buffers.
Indexing at the level of atoms is supported by an HGIndexManager that is associated with every
HyperGraph instance. Every time an atom is added, removed or replaced, the
HyperGraph will trigger an event with its
HGIndexManager to update all relevant indices.
Indices themselves are created by registering indexers, which are implementations of the HGIndexer class, with the index manager. An
HGIndexer is essentially responsible for creating a key given an atom. It is always associated with a specific atom type. So indices are always type-based. Moreover, sub-types are automatically indexed when an index is registered for a super-type.
In practice, the two most frequently used
HGIndexer implementations are ByPartIndexer and ByTargetIndexer. The
ByPartIndexer lets you create an index based on some atom property. For example if you have a
SiteUser Java bean, with a bean property called
HGHandle siteUserType = graph.getTypeSystem().getTypeHandle(SiteUser.class); graph.getIndexManager().register(new ByPartIndexer(siteUserType, "email");
Now, when you query for site users by email (e.g.
hg.and(hg.type(SiteUser.class), hg.eq("email", "firstname.lastname@example.org"))), the index will be used.
ByTargetIndexer lets you index links by targets at specific positions. Take the predefined
HGSubsumes link as an example which links something general (target at position 0) to something specific (target at position 1). You can index all
HGSubsumes atoms by their second target like this:
graph.getIndexManager().register(new ByTargetIndexer(graph.getTypeSystem().getTypeHandle(HGSubsumes.class), 1));
Note that indexing by link targets is only useful when doing queries on ordered links. Otherwise, for unordered links the implicit indexing by incidence sets suffices. To take advantage of the index above, you would write a query like this:
List<HGHandle> L = hg.findAll(hg.and(hg.type(HGSubsumes.class), hg.orderedLink(hg.anyHandle(), someHandle)));
Note the use of
hg.anyHandle() at position 0 of the ordered link condition. It is important to be explicit about the exact form of an ordered link in your query. Otherwise, the query system will not be able to associate the provided value (
someHandle in the example above) with an available index.
When such indexers are registered with the system, an automatic indexing process is triggered the next time the database is opened. If you want to force the indexing to happen right now, call the following API:
If you have existing atoms of the type specified in the indexer, they will all be added to the index and this can take some time. Indexer can also be removed by calling
HGIndexManager.unregister. Remove an indexer doesn't take much time.
HGIndexer instances are stored as HGDB atoms. For instance, one can list all by-value-part indices with the following query:
List<ByPartIndexer> byPartIndexers = hg.getAll(graph, hg.type(ByPartIndexer.class));
That said, removing an
HGIndexer atom without going through the
HGIndexerManager.unregister method would be a bad idea because the underlying storage won't be cleaned up.