In this final chapter we are offering a smörgåsbord of tips and tricks which might become useful as you dive deeper and deeper into Hibernate Search.
The SearchFactory
object keeps track of the underlying Lucene resources for Hibernate Search. It is
a convenient way to access Lucene natively. The SearchFactory
can be accessed from a
FullTextSession:
SearchFactory
FullTextSession fullTextSession = Search.getFullTextSession(regularSession);
SearchFactory searchFactory = fullTextSession.getSearchFactory();
The interface SearchIntegrator
gives access to advanced functionality and internals of Hibernate Search;
these are typically only useful to integrate Hibernate Search with other libraries.
These are called SPI - Service Provide Interface - to better differentiate them from normal APIs.
You can recognize them by noticing an spi
element in the package name, such as in org.hibernate.search.spi.SearchIntegrator
.
You can access the SearchIntegrator
SPI using the SearchFactory (Accessing the SearchFactory
),
or by extracting the instance from the Hibernate native SessionFactory
or the JPA EntityManagerFactory
as in the following examples:
SearchIntegrator
from a SearchFactory
SearchIntegrator si = searchFactory.unwrap(SearchIntegrator.class);
SearchIntegrator
from a SessionFactory
SearchIntegrator si = org.hibernate.search.orm.spi.SearchIntegratorHelper.extractFromSessionFactory( sf );
SearchIntegrator
from an EntityManagerFactory
SearchIntegrator si = org.hibernate.search.orm.spi.SearchIntegratorHelper.extractFromEntityManagerFactory( emf );
Queries in Lucene are executed on an IndexReader
. Hibernate Search caches index readers to maximize
performance and implements other strategies to retrieve updated IndexReaders in order to minimize IO
operations. Your code can access these cached resources, but you have to follow some "good citizen"
rules.
IndexReader
IndexReader reader = searchFactory.getIndexReaderAccessor().open(Order.class);
try {
//perform read-only operations on the reader
}
finally {
searchFactory.getIndexReaderAccessor().close(reader);
}
In this example the SearchFactory figures out which indexes are needed to query this entity. Using
the configured ReaderProvider (described in [search-architecture-readerstrategy]) on each index,
it returns a compound IndexReader
on top of all involved indexes. Because this IndexReader
is
shared amongst several clients, you must adhere to the following rules:
-
Never call
indexReader.close()
, but always callreaderProvider.closeReader(reader)
, using a finally block. -
Don’t use this
IndexReader
for modification operations: it’s a read-only instance, you would get an exception.
Aside from those rules, you can use the IndexReader
freely, especially to do native Lucene queries.
Using this shared IndexReaders will be more efficient than by opening one directly from - for
example - the filesystem.
As an alternative to the method open(Class… types)
you can use open(String… indexNames)
in this case you pass in one or more index names; using this strategy you can also select a subset
of the indexes for any indexed type if sharding is used.
IndexReader
by index namesIndexReader reader = searchFactory
.getIndexReaderAccessor()
.open("Products.1", "Products.3");
A Directory is the most common abstraction used by Lucene to represent the index storage; Hibernate Search doesn’t interact directly with a Lucene Directory but abstracts these interactions via an IndexManager: an index does not necessarily need to be implemented by a Directory.
If you are certain that your index is represented as a Directory and need to access it, you can get
a reference to the Directory via the IndexManager. You will have to cast the IndexManager instance
to a DirectoryBasedIndexManager and then use getDirectoryProvider().getDirectory()
to get a
reference to the underlying Directory. This is not recommended, if you need low level access to the
index using Lucene APIs we suggest to see Using an IndexReader
instead.
In some cases it can be useful to split (shard) the data into several Lucene indexes. There are two main use use cases:
-
A single index is so big that index update times are slowing the application down. In this case static sharding can be used to split the data into a pre-defined number of shards.
-
Data is naturally segmented by customer, region, language or other application parameter and the index should be split according to these segments. This is a use case for dynamic sharding.
Tip
|
By default sharding is not enabled. |
To enable static sharding set the hibernate.search.<indexName>.sharding_strategy.nbr_of_shards property as seen in Enabling index sharding.
hibernate.search.[default|<indexName>].sharding_strategy.nbr_of_shards = 5
The default sharding strategy which gets enabled by setting this property, splits the data according to the hash value of the document id (generated by the FieldBridge). This ensures a fairly balanced sharding. You can replace the default strategy by implementing a custom IndexShardingStrategy. To use your custom strategy you have to set the hibernate.search.[default|<indexName>].sharding_strategy property to the fully qualified class name of your custom IndexShardingStrategy.
hibernate.search.[default|<indexName>].sharding_strategy = my.custom.RandomShardingStrategy
Dynamic sharding allows you to manage the shards yourself and even create new shards on the fly. To do so you need to implement the interface ShardIdentifierProvider and set the hibernate.search.[default|<indexName>].sharding_strategy property to the fully qualified name of this class. Note that instead of implementing the interface directly, you should rather derive your implementation from org.hibernate.search.store.ShardIdentifierProviderTemplate which provides a basic implementation. Let’s look at Custom ShardIdentifierProvider for an example.
public static class AnimalShardIdentifierProvider extends ShardIdentifierProviderTemplate {
@Override
public String getShardIdentifier(Class<?> entityType, Serializable id,
String idAsString, Document document) {
if (entityType.equals(Animal.class)) {
String typeValue = document.getField("type").stringValue();
addShard(typeValue);
return typeValue;
}
throw new RuntimeException("Animal expected but found " + entityType);
}
@Override
protected Set<String> loadInitialShardNames(Properties properties, BuildContext buildContext) {
ServiceManager serviceManager = buildContext.getServiceManager();
SessionFactory sessionFactory = serviceManager.requestService(
HibernateSessionFactoryService.class).getSessionFactory();
Session session = sessionFactory.openSession();
try {
Criteria initialShardsCriteria = session.createCriteria(Animal.class);
initialShardsCriteria.setProjection(Projections.distinct(Property.forName("type")));
List<String> initialTypes = initialShardsCriteria.list();
return new HashSet<String>(initialTypes);
}
finally {
session.close();
}
}
}
The are several things happening in AnimalShardIdentifierProvider
. First off its purpose is to
create one shard per animal type (e.g. mammal, insect, etc.). It does so by inspecting the class
type and the Lucene document passed to the getShardIdentifier()
method. It extracts the type field
from the document and uses it as shard name. getShardIdentifier()
is called for every addition to
the index and a new shard will be created with every new animal type encountered. The base class
ShardIdentifierProviderTemplate
maintains a set with all known shards to which any identifier must
be added by calling addShard()
.
It is important to understand that Hibernate Search cannot know which shards already exist when the
application starts. When using ShardIdentifierProviderTemplate
as base class of a
ShardIdentifierProvider
implementation, the initial set of shard identifiers must be returned by the
loadInitialShardNames()
method. How this is done will depend on the use case. However, a common case
in combination with Hibernate ORM is that the initial shard set is defined by the distinct
values of a given database column. Custom ShardIdentifierProvider shows how to handle
such a case. AnimalShardIdentifierProvider
makes in its loadInitialShardNames()
implementation use
of a service called HibernateSessionFactoryService
(see also Using external services) which is
available within an ORM environment. It allows to request a Hibernate SessionFactory
instance which
can be used to run a Criteria query in order to determine the initial set of shard identifiers.
Last but not least, the ShardIdentifierProvider
also allows for optimizing searches by selecting
which shard to run a query against. By activating a filter (see [query-filter-shard]), a sharding
strategy can select a subset of the shards used to answer a query (getShardIdentifiersForQuery()
,
not shown in the example) and thus speed up the query execution.
Important
|
This ShardIdentifierProvider is considered experimental. We might need to apply some changes to the defined method signatures to accommodate for unforeseen use cases. Please provide feedback if you have ideas, or just to let us know how you’re using this API. |
It is technically possible to store the information of more than one entity into a single Lucene index. There are two ways to accomplish this:
Warning
|
Support for indexing multiple entity types in the same index is deprecated and is going to be removed in Hibernate Search 6. |
-
Configuring the underlying directory providers to point to the same physical index directory. In practice, you set the property
hibernate.search.[fully qualified entity name].indexName
to the same value. As an example, let’s use the same index (directory) for theFurniture
andAnimal
entities. We just setindexName
for both entities to "Animal". Both entities will then be stored in the Animal directory:
hibernate.search.org.hibernate.search.test.shards.Furniture.indexName = Animal hibernate.search.org.hibernate.search.test.shards.Animal.indexName = Animal
-
Setting the @Indexed annotation’s index attribute of the entities you want to merge to the same value. If we again wanted all Furniture instances to be indexed in the Animal index along with all instances of Animal we would specify @Indexed(index="Animal") on both Animal and Furniture classes.
Note
|
This is only presented here so that you know the option is available. There is really not much benefit in sharing indexes. |
A Service
in Hibernate Search is a class implementing the interface
org.hibernate.search.engine.service.spi.Service
and providing a default no-arg constructor.
Theoretically that’s all that is needed to request a given service type from the Hibernate Search
ServiceManager
. In practice you want probably want to add some service life cycle methods
(implement Startable
and Stoppable
) as well as actual methods providing some functionality.
Hibernate Search uses the service approach to decouple different components of the system. Let’s have a closer look at services and how they are used.
Many of of the pluggable contracts of Hibernate Search can use services. Services are accessible via
the BuildContext
interface as in the following example.
public CustomDirectoryProvider implements DirectoryProvider<RAMDirectory> {
private ServiceManager serviceManager;
private ClassLoaderService classLoaderService;
public void initialize(
String directoryProviderName,
Properties properties,
BuildContext context) {
//get a reference to the ServiceManager
this.serviceManager = context.getServiceManager();
}
public void start() {
//get the current ClassLoaderService
classLoaderService = serviceManager.requestService(ClassLoaderService.class);
}
public RAMDirectory getDirectory() {
//use the ClassLoaderService
}
public stop() {
//make sure to release all services
serviceManager.releaseService(ClassLoaderService.class);
}
}
When you request a service, an instance of the requested service type is returned to you.
Make sure release the service via ServiceManager.releaseService
once you don’t need it
anymore. Note that the service can be released in the DirectoryProvider.stop
method if
the DirectoryProvider
uses the service during its lifetime or could be released right away
if the service is only needed during initialization time.
To implement a service, you need to create an interface which identifies it and extends
org.hibernate.search.engine.service.spi.Service
. You can then add additional methods to your service
interface as needed.
Naturally you will also need to provide an implementation of your service interface. This
implementation must have a public no-arg constructor. Optionally your service can also
implement the life cycle methods org.hibernate.search.engine.service.spi.Startable
and/or org.hibernate.search.engine.service.spi.Stoppable
. These methods will be called by the
ServiceManager
when the service is created respectively the last reference to a requested service
is released.
Services are retrieved from the ServiceManager.requestService
using the Class
object of the
interface you define as a key.
To transparently discover services Hibernate Search uses the Java ServiceLoader mechanism. This means
you need to add a service file to your jar under /META-INF/services/
named after the fully qualified
classname of your service interface. The content of the file contains the fully qualified
classname of your service implementation.
/META-INF/services/org.infinispan.hibernate.search.spi.CacheManagerService
org.infinispan.hibernate.search.impl.DefaultCacheManagerService
Note
|
Hibernate Search only supports a single service implementation of a given service. There is no mechanism to select between multiple versions of a service. It is an error to have multiple jars defining each a different implementation for the same service. If you want to override the implementation of a already existing service at runtime you will need to look at Provided services. |
Important
|
Provided services are usually used by frameworks integrating with Hibernate Search and not by library users themselves. |
As an alternative to manages services, a service can be provided by the environment bootstrapping
Hibernate Search. For example, Infinispan which uses Hibernate Search as its internal search engine,
passes the CacheContainer
to Hibernate Search.
In this case, the CacheContainer
instance is not managed by Hibernate Search and the start/stop
methods defined by optional Stoppable
and Startable
interfaces will be ignored.
A Service implementation which is only used as a Provided Service doesn’t need to have a public constructor taking no arguments.
Note
|
Provided services have priority over managed services. If a provided service is registered with the
same |
The provided services are passed to Hibernate Search via the SearchConfiguration
interface: as
implementor of method getProvidedServices
you can return a Map
of all services you need to
provide.
Note
|
When implementing a custom |
Lucene allows the user to customize its scoring formula by extending org.apache.lucene.search.similarities.Similarity. The abstract methods defined in this class match the factors of the following formula calculating the score of query q for document d:
score(q,d) = coord(q,d) · queryNorm(q) · ∑ ~t in q~ ( tf(t in d) · idf(t) 2 · t.getBoost() · norm(t,d) )
Factor | Description |
---|---|
tf(t ind) |
Term frequency factor for the term (t) in the document (d). |
idf(t) |
Inverse document frequency of the term. |
coord(q,d) |
Score factor based on how many of the query terms are found in the specified document. |
queryNorm(q) |
Normalizing factor used to make scores between queries comparable. |
t.getBoost() |
Field boost. |
norm(t,d) |
Encapsulates a few (indexing time) boost and length factors. |
It is beyond the scope of this manual to explain this formula in more detail. Please refer to Similarity’s Javadocs for more information.
Hibernate Search provides two ways to modify Lucene’s similarity calculation.
First you can set the default similarity by specifying the fully specified classname of your Similarity implementation using the property hibernate.search.similarity. The default value is org.apache.lucene.search.similarities.ClassicSimilarity.
Secondly, you can override the similarity used for a specific index by setting the similarity
property for this index (see [search-configuration-directory] for more information about index
configuration):
hibernate.search.[default|<indexname>].similarity = my.custom.Similarity
As an example, let’s assume it is not important how often a term appears in a document. Documents
with a single occurrence of the term should be scored the same as documents with multiple
occurrences. In this case your custom implementation of the method tf(float freq)
should return 1.0.
Note
|
When two entities share the same index they must declare the same Similarity implementation. |
The term multi-tenancy in general is applied to software development to indicate an architecture in which a single running instance of an application simultaneously serves multiple clients (tenants). Isolating information (data, customizations, etc) pertaining to the various tenants is a particular challenge in these systems. This includes the data owned by each tenant stored in the database. You will find more details on how to enable multi-tenancy with Hibernate in the Hibernate ORM developer’s guide.
Hibernate Search supports multi-tenancy on top of Hibernate ORM, it stores the tenant identifier in the document and automatically filters the query results.
The FullTextSession
will be bound to the specific tenant ("client-A" in the example)
and the mass indexer will only index the entities associated to that tenant identifier.
Session session = getSessionFactory()
.withOptions()
.tenantIdentifier( "client-A" )
.openSession();
FullTextSession session = Search.getFullTextSession( session );
The use of a tenant identifier will have the following effects:
-
Every document saved or updated in the index will have an additional field
__HSearch_TenantId
containing the tenant identifier. -
Every search will be filtered using the tenant identifier.
-
The MassIndexer (see [search-batchindex-massindexer]) will only affect the currently selected tenant.
Note that not using a tenant will return all the matching results for all the tenants in the index.