Let’s start with the most basic configuration question - how do I enable Hibernate Search?
The good news is that Hibernate Search is enabled out of the box when detected on the classpath by
Hibernate ORM. If, for some reason you need to disable it, set
hibernate.search.autoregister_listeners
to false. Note that there is no performance penalty
when the listeners are enabled but no entities are annotated as indexed.
By default, every time an object is inserted, updated or deleted through Hibernate, Hibernate Search updates the according Lucene index. It is sometimes desirable to disable that features if either your index is read-only or if index updates are done in a batch way (see [search-batchindex]).
To disable event based indexing, set
hibernate.search.indexing_strategy = manual
Note
|
In most case, the JMS backend provides the best of both world, a lightweight event based system keeps track of all changes in the system, and the heavyweight indexing process is done by a separate process or machine. |
The role of the index manager component is described in [search-architecture]. Hibernate Search provides two possible implementations for this interface to choose from.
-
directory-based
: the default implementation which uses the Lucene Directory abstraction to manage index files. -
near-real-time
: avoid flushing writes to disk at each commit. This index manager is also Directory based, but also makes uses of Lucene’s NRT functionality.
To select an alternative you specify the property:
hibernate.search.[default|<indexname>].indexmanager = near-real-time
The default IndexManager implementation. This is the one mostly referred to in this documentation. It is highly configurable and allows you to select different settings for the reader strategy, back ends and directory providers. Refer to Directory configuration, Worker configuration and Reader strategy configuration for more details.
The NRTIndexManager is an extension of the default IndexManager, leveraging the Lucene NRT (Near
Real Time) features for extreme low latency index writes. As a trade-off it requires a non-clustered
and non-shared index. In other words, it will ignore configuration settings for alternative back
ends other than lucene
and will acquire exclusive write locks on the Directory.
To achieve this low latency writes, the IndexWriter will not flush every change to disk. Queries will be allowed to read updated state from the unflushed index writer buffers; the downside of this strategy is that if the application crashes or the IndexWriter is otherwise killed you’ll have to rebuild the indexes as some updates might be lost.
Because of these downsides, and because a master node in cluster can be configured for good performance as well, the NRT configuration is only recommended for non clustered websites with a limited amount of data.
It is also possible to configure a custom IndexManager implementation by specifying the fully qualified class name of your custom implementation. This implementation must have a no-argument constructor:
hibernate.search.[default|<indexname>].indexmanager = my.corp.myapp.CustomIndexManager
Tip
|
Your custom index manager implementation doesn’t need to use the same components as the default implementations. For example, you can delegate to a remote indexing service which doesn’t expose a Directory interface. |
As we have seen in Configuring the IndexManager the default index manager uses Lucene’s notion of a Directory to store the index files. The Directory implementation can be customized and Lucene comes bundled with a file system and an in-memory implementation. DirectoryProvider is the Hibernate Search abstraction around a Lucene Directory and handles the configuration and the initialization of the underlying Lucene resources. List of built-in DirectoryProvider shows the list of the directory providers available in Hibernate Search together with their corresponding options.
To configure your DirectoryProvider you have to understand that each indexed entity is associated to a Lucene index (except of the case where multiple entities share the same index - [section-sharing-indexes]). The name of the index is given by the index property of the @Indexed annotation. If the index property is not specified the fully qualified name of the indexed class will be used as name (recommended).
Knowing the index name, you can configure the directory provider and any additional options by using
the prefix hibernate.search.<indexname>
. The name default (hibernate.search.default
) is
reserved and can be used to define properties which apply to all indexes.
Configuring directory providers shows how hibernate.search.default.directory_provider
is used to set the default directory provider to be the filesystem one. hibernate.search.default.indexBase
sets then the default base directory for the indexes. As a result the index for the entity Status is
created in /usr/lucene/indexes/org.hibernate.example.Status
.
The index for the Rule entity, however, is using an in-memory directory, because the default
directory provider for this entity is overridden by the property
hibernate.search.Rules.directory_provider
.
Finally the Action entity uses a custom directory provider CustomDirectoryProvider
specified via
hibernate.search.Actions.directory_provider
.
package org.hibernate.example;
@Indexed
public class Status { ... }
@Indexed(index="Rules")
public class Rule { ... }
@Indexed(index="Actions")
public class Action { ... }
hibernate.search.default.directory_provider = filesystem hibernate.search.default.indexBase = /usr/lucene/indexes hibernate.search.Rules.directory_provider = local-heap hibernate.search.Actions.directory_provider = com.acme.hibernate.CustomDirectoryProvider
Tip
|
Using the described configuration scheme you can easily define common rules like the directory provider and base directory, and override those defaults later on on a per index basis. |
Name and description | Properties |
---|---|
local-heap: Directory using the local JVM heap. Local heap directories and all contained indexes are lost when the JVM shuts down. This option is only provided for use in testing configurations with small (trivial) indexes and low concurrency, where it could slightly improve performance. In setups requiring larger indexes and/or high concurrency, a file system based directory (see below) will achieve better performance. The directory will be uniquely identified (in the same deployment unit) by the |
none |
filesystem: File system based directory. The directory used will be <indexBase>/<indexName> |
|
filesystem-master: File system based directory. Like The recommended value for the refresh period is (at least) 50% higher that the time to copy the information (default 3600 seconds - 60 minutes). Note that the copy is based on an incremental copy mechanism reducing the average copy time. DirectoryProvider typically used on the master node in a JMS back end cluster. |
|
filesystem-slave: File system based directory. Like The recommended value for the refresh period is (at least) 50% higher that the time to copy the information (default 3600 seconds - 60 minutes). Note that the copy is based on an incremental copy mechanism reducing the average copy time. If a copy is still in progress when refresh period elapses, the second copy operation will be skipped. DirectoryProvider typically used on slave nodes using a JMS back end. |
|
infinispan: Infinispan based directory. Use it to store the index in a distributed grid, making index changes visible to all elements of the cluster very quickly. Also see Infinispan Directory configuration for additional requirements and configuration settings. Infinispan needs a global configuration and additional dependencies; the settings defined here apply to each different index. |
|
Tip
|
If the built-in directory providers do not fit your needs, you can write your own directory provider
by implementing the |
Infinispan is a distributed, scalable, cloud friendly data grid platform, which Hibernate Search can use to store the Lucene index. Your application can benefits in this case from Infinispan’s distribution capabilities making index updates available on all nodes with short latency.
This section describes how to configure Hibernate Search to use an Infinispan Lucene Directory.
When using an Infinispan Directory the index is stored in memory and shared across multiple nodes. It is considered a single directory distributed across all participating nodes: if a node updates the index, all other nodes are updated as well. Updates on one node can be immediately searched for in the whole cluster.
The default configuration replicates all data which defines the index across all nodes, thus consuming a significant amount of memory but providing the best query performance. For large indexes it’s suggested to enable data distribution, so that each piece of information is replicated to a subset of all cluster members. The distribution option will reduce the amount of memory required for each node but is less efficient as it will cause high network usage among the nodes.
It is also possible to offload part or most information to a CacheStore
, such as plain filesystem,
Amazon S3, Cassandra, MongoDB or standard relational databases. You can configure it to have a
CacheStore
on each node or have a single centralized one shared by each node.
A popular choice is to use a replicated index aiming to keep the whole index in memory, combined with
a CacheStore
as safety valve in case the index gets larger than expected.
See the Infinispan documentation for all Infinispan configuration options.
To use the Infinispan directory via Maven, add the following dependencies:
<dependency>
<groupId>org.hibernate</groupId>
<artifactId>hibernate-search-orm</artifactId>
<version>{hibernateSearchVersion}</version>
</dependency>
<dependency>
<groupId>org.infinispan</groupId>
<artifactId>infinispan-directory-provider</artifactId>
<version>{infinispanVersion}</version>
</dependency>
Important
|
This dependency changed in Hibernate Search version 5.2. Previously the DirectoryProvider was provided by the Hibernate Search project and had Maven coordinates 'org.hibernate:hibernate-search-infinispan', but the Infinispan team is now maintaining this extension point so since this version please use the Maven definition as in the previous example. The version printed above was the latest known compatible at the time of publishing this Hibernate Search version: it’s possible that more recently improved versions of Infinispan have been published which are compatible with this same Hibernate Search version. |
Even when using an Infinispan directory it’s still recommended to use the JMS Master/Slave or
JGroups backend, because in Infinispan all nodes will share the same index and it is likely that
IndexWriter
instances being active on different nodes will try to acquire the lock on the same
index. So instead of sending updates directly to the index, send it to a JMS queue or JGroups
channel and have a single node apply all changes on behalf of all other nodes.
Configuring a non-default backend is not a requirement but a performance optimization as locks are enabled to have a single node writing.
To configure a JMS slave only the backend must be replaced, the directory provider must be set to
infinispan
; set the same directory provider on the master, they will connect without the need to
setup the copy job across nodes. Using the JGroups backend is very similar - just combine the
backend configuration with the infinispan
directory provider.
The most simple configuration only requires to enable the backend:
hibernate.search.[default|<indexname>].directory_provider = infinispan
That’s all what is needed to get a cluster-replicated index, but the default configuration does not enable any form of permanent persistence for the index; to enable such a feature an Infinispan configuration file should be provided.
To use Infinispan, Hibernate Search requires a CacheManager; it can lookup and reuse an existing CacheManager, via JNDI, or start and manage a new one. In the latter case Hibernate Search will start and stop it ( closing occurs when the Hibernate SessionFactory is closed).
To use and existing CacheManager via JNDI (optional parameter):
hibernate.search.infinispan.cachemanager_jndiname = [jndiname]
To start a new CacheManager from a configuration file (optional parameter):
hibernate.search.infinispan.configuration_resourcename = [infinispan configuration filename]
If both parameters are defined, JNDI will have priority. If none of these is defined, Hibernate
Search will use the default Infinispan configuration included in infinispan-directory-provider.jar
.
This configuration should work fine in most cases but does not store the index in a persistent cache
store.
As mentioned in List of built-in DirectoryProvider, each index makes use of three caches, so three
different caches should be configured as shown in the default-hibernatesearch-infinispan.xml
provided in the infinispan-directory-provider.jar
. Several indexes can share the same caches.
Infinispan relies on JGroups for its networking functionality, so unless you are using Infinispan on a single node, an Infinispan configuration file will refer to a JGroups configuration file. This coupling is not always practical and we provide a property to override the used JGroups configuration file:
hibernate.search.infinispan.configuration.transport_override_resourcename = jgroups-ec2.xml
This allows to just switch the JGroups configuration while keeping the rest of the Infinispan configuration.
The file jgroups-ec2.xml
used in the example above is one of the several JGroups configurations
included in Infinispan. It is a good starting point to run on Amazon EC2 networks. For more details
and examples see usage of pre-configured JGroups stacks
in the Infinispan configuration guide.
It is possible to refine how Hibernate Search interacts with Lucene through the worker configuration. There exist several architectural components and possible extension points. Let’s have a closer look.
First there is a Worker. An implementation of the Worker interface is responsible for receiving all entity changes, queuing them by context and applying them once a context ends. The most intuitive context, especially in connection with ORM, is the transaction. For this reason Hibernate Search will per default use the TransactionalWorker to scope all changes per transaction. One can, however, imagine a scenario where the context depends for example on the number of entity changes or some other application (lifecycle) events. For this reason the Worker implementation is configurable as shown in Scope configuration.
Property |
Description |
hibernate.search.worker.scope |
The fully qualified class name of the
Worker implementation to use. If this
property is not set, empty or |
hibernate.search.default.worker.* |
All configuration properties prefixed with
|
hibernate.search.worker.enlist_in_transaction |
Defaults to |
Once a context ends it is time to prepare and apply the index changes. This can be done synchronously or asynchronously from within a new thread. Synchronous updates have the advantage that the index is at all times in sync with the databases. Asynchronous updates, on the other hand, can help to minimize the user response time. The drawback is potential discrepancies between database and index states. Lets look at the configuration options shown in Execution configuration.
Note
|
The following options can be different on each index; in fact they need the indexName prefix or use
|
Property |
Description |
hibernate.search.<indexName>.worker.execution |
|
So far all work is done within the same Virtual Machine (VM), no matter which execution mode. The total amount of work has not changed for the single VM. Luckily there is a better approach, namely delegation. It is possible to send the indexing work to a different server by configuring hibernate.search.default.worker.backend - see Backend configuration. Again this option can be configured differently for each index.
Property |
Description |
hibernate.search.<indexName>.worker.backend |
You can also specify the fully qualified name of a class implementing Please note that instances of |
Property |
Description |
hibernate.search.<indexName>.worker.jms.connection_factory |
Mandatory for the JMS back end. Defines the JNDI name to
lookup the JMS connection factory from ( |
hibernate.search.<indexName>.worker.jms.queue |
Mandatory for the JMS back end. Defines the JNDI name to lookup the JMS queue from. The queue will be used to post work messages. |
hibernate.search.<indexName>.worker.jms.login |
Optional for the JMS slaves. Use it when your queue requires login credentials to define your login. |
hibernate.search.<indexName>.worker.jms.login |
Optional for the JMS slaves. Use it when your queue requires login credentials to define your password. |
Since these components use JNDI, don’t forget to configure the Hibernate ORM properties for the initial context lookup.
Property |
Description |
hibernate.jndi.class |
Name of the javax.naming.InitialContext implementation class to use |
hibernate.jndi.url |
Name of the JNDI InitialContext connection url |
See also the JNDI configuration in Hibernate ORM.
Warning
|
As you probably noticed, some of the shown properties are correlated which means that not all combinations of property values make sense. In fact you can end up with a non-functional configuration. This is especially true for the case that you provide your own implementations of some of the shown interfaces. Make sure to study the existing code before you write your own Worker or BackendQueueProcessor implementation. |
This section describes in greater detail how to configure the Master/Slave Hibernate Search architecture.
JMS back end configuration.
Every index update operation is sent to a JMS queue. Index querying operations are executed on a local index copy.
### slave configuration ## DirectoryProvider # (remote) master location hibernate.search.default.sourceBase = /mnt/mastervolume/lucenedirs/mastercopy # local copy location hibernate.search.default.indexBase = /Users/prod/lucenedirs # refresh every half hour hibernate.search.default.refresh = 1800 # appropriate directory provider hibernate.search.default.directory_provider = filesystem-slave ## Backend configuration hibernate.search.default.worker.backend = jms hibernate.search.default.worker.jms.connection_factory = /ConnectionFactory hibernate.search.default.worker.jms.queue = queue/hibernatesearch #optionally authentication credentials: hibernate.search.default.worker.jms.login = myname hibernate.search.default.worker.jms.password = wonttellyou #optional jndi configuration (check your JMS provider for more information) ## Enqueue indexing tasks within an XA transaction with the database (optional) hibernate.search.worker.enlist_in_transaction = true
The enlist_in_transaction
option can be enabled if you need strict guarantees of
indexing work to be stored in the queue within the same transaction of the database
changes, however this will require both the RDBMs datasource and the JMS queue to be XA enabled.
Make sure to use a XA JMS queue and that your database supports XA as we are talking about coordinated transactional systems.
The default for enlist_in_transaction
is false
as often it is desirable to not have
the database transaction fail in case there are issues with indexing.
It is possible to apply compensating operations to the index by implementing a custom
ErrorHandler
(see Exception handling), or simply re-synchronize the whole index
state by starting the MassIndexer (see [search-batchindex-massindexer].
Tip
|
A file system local copy is recommended for faster search results. |
Every index update operation is taken from a JMS queue and executed. The master index is copied on a regular basis.
### master configuration ## DirectoryProvider # (remote) master location where information is copied to hibernate.search.default.sourceBase = /mnt/mastervolume/lucenedirs/mastercopy # local master location hibernate.search.default.indexBase = /Users/prod/lucenedirs # refresh every half hour hibernate.search.default.refresh = 1800 # appropriate directory provider hibernate.search.default.directory_provider = filesystem-master ## Backend configuration #The backend is not set: use the default one which is 'local'
Tip
|
It is recommended that the refresh period be higher than the expected copy time; if a copy operation is still being performed when the next refresh triggers, the second refresh is skipped: it’s safe to set this value low even when the copy time is not known. |
In addition to the Hibernate Search framework configuration, a Message Driven Bean has to be written and set up to process the index works queue through JMS.
@MessageDriven(activationConfig = {
@ActivationConfigProperty(propertyName="destinationType",
propertyValue="javax.jms.Queue"),
@ActivationConfigProperty(propertyName="destination",
propertyValue="queue/hibernatesearch")
} )
public class MDBSearchController extends AbstractJMSHibernateSearchController
implements MessageListener {
@PersistenceContext EntityManager em;
@Override
protected SearchIntegrator getSearchIntegrator() {
FullTextEntityManager fullTextEntityManager = Search.getFullTextEntityManager(em);
return fullTextEntityManager.getSearchFactory().unwrap(SearchIntegrator.class);
}
}
This example inherits from the abstract JMS controller class available in the Hibernate Search
source code and implements a JavaEE MDB. This implementation is given as an example and can be
adjusted to make use of non Java EE Message Driven Beans.
Essentially what you need to do is to connect the specific JMS Queue with the SearchFactory
instance of the EntityManager.
As an advanced alternative, you can implement your own logic by not extending AbstractJMSHibernateSearchController
but rather to use it as an implementation example.
This section describes how to configure the JGroups Master/Slave back end. The master and slave roles are similar to what is illustrated in JMS Master/Slave back end, only a different backend (hibernate.search.default.worker.backend) needs to be set.
A specific backend can be configured to act either as a slave using jgroupsSlave
, as a master
using jgroupsMaster
, or can automatically switch between the roles as needed by using jgroups
.
Note
|
Either you specify a single |
All backends configured to use JGroups share the same channel. The JGroups JChannel is the main communication link across all nodes participating in the same cluster group; since it is convenient to have just one channel shared across all backends, the Channel configuration properties are not defined on a per-worker section but are defined globally. See JGroups channel configuration.
Table JGroups backend configuration properties contains all configuration options which can be set
independently on each index backend. These apply to all three variants of the backend:
jgroupsSlave
, jgroupsMaster
, jgroups
. It is very unlikely that you need to change any of these
from their defaults.
Property |
Description |
hibernate.search.<indexName>.jgroups.block_waiting_ack |
Set to either |
hibernate.search.<indexName>.jgroups.messages_timeout |
The timeout of waiting for a single command to be
acknowledged and executed when
|
hibernate.search.<indexName>.jgroups.delegate_backend |
The master node receiving indexing operations forwards
them to a standard backend to be performed. Defaults to
|
Every index update operation is sent through a JGroups channel to the master node. Index querying
operations are executed on a local index copy. Enabling the JGroups worker only makes sure the index
operations are sent to the master, you still have to synchronize configuring an appropriate
directory (See filesystem-master
, filesystem-slave
or infinispan
options in Directory configuration).
### slave configuration hibernate.search.default.worker.backend = jgroupsSlave
Every index update operation is taken from a JGroups channel and executed. The master index is copied on a regular basis.
### master configuration hibernate.search.default.worker.backend = jgroupsMaster
Important
|
This feature is considered experimental. In particular during a re-election process there is a small window of time in which indexing requests could be lost. |
In this mode the different nodes will autonomously elect a master node. When a master fails, a new node is elected automatically.
When setting this backend it is expected that all Hibernate Search instances in the same cluster use
the same backend for each specific index: this configuration is an alternative to the static
jgroupsMaster
and jgroupsSlave
approach so make sure to not mix them.
To synchronize the indexes in this configuration avoid filesystem-master
and filesystem-slave
directory providers as their behaviour can not be switched dynamically; use the Infinispan
Directory
instead, which has no need for different configurations on each instance and allows
dynamic switching of writers; see also Infinispan Directory configuration.
### automatic configuration hibernate.search.default.worker.backend = jgroups
Tip
|
Should you use The dynamic |
Configuring the JGroups channel essentially entails specifying the transport in terms of a network protocol stack. To configure the JGroups transport, point the configuration property hibernate.search.services.jgroups.configurationFile to a JGroups configuration file; this can be either a file path or a Java resource name.
Tip
|
If no property is explicitly specified it is assumed that the JGroups default configuration file
|
The default cluster name is Hibernate Search Cluster
which can be configured as seen in JGroups cluster name configuration.
hibernate.search.services.jgroups.clusterName = My-Custom-Cluster-Id
The cluster name is what identifies a group: by changing the name you can run different clusters in the same network in isolation.
For programmatic configurations, one additional option is available to configure the JGroups
channel: to pass an existing channel instance to Hibernate Search directly using the property
hibernate.search.services.jgroups.providedChannel
, as shown in the following example.
import org.hibernate.search.backend.impl.jgroups.JGroupsChannelProvider;
org.jgroups.JChannel channel = ...
Map<String,String> properties = new HashMap<String,String)(1);
properties.put( JGroupsChannelProvider.CHANNEL_INJECT, channel );
EntityManagerFactory emf = Persistence.createEntityManagerFactory( "userPU", properties );
The different reader strategies are described in [search-architecture-readerstrategy]. Out of the box strategies are:
-
shared
: share index readers across several queries. This strategy is very efficient. -
not-shared
: create an index reader for each individual query. Very simple implementation. -
async
: only opens a new index reader periodically. This is the most efficient implementation, but queries might return out of date values.
The default reader strategy is shared
.
You can pick the reader strategy by changing the .reader.strategy
configuration property,
scoped to the "default" index or to a specific index.
For example:
hibernate.search.[default|<indexname>].reader.strategy = async hibernate.search.[default|<indexname>].reader.async_refresh_period_ms = 8000
Adding the above properties switches to the async
strategy, and configures it to refresh
the index reader each 8 seconds.
Alternatively you can use a custom implementation of a org.hibernate.search.indexes.spi.ReaderProvider
:
hibernate.search.[default|<indexname>].reader.strategy = my.corp.myapp.CustomReaderProvider
where my.corp.myapp.CustomReaderProvider is the custom strategy implementation.
When using clustering features, Hibernate Search needs to find an implementation of the
SerializationProvider
service on the classpath.
An implementation of the service based on Apache Avro can be found using the following GAV coordinates:
org.hibernate:hibernate-search-serialization-avro:{hibernateSearchVersion}
You can add the coordinates to your pom file or download all the required dependecies and add them to your classpath. Hibernate Search will find the service implementation without any additional configuration.
Alternatively, you can create a custom service implementation:
package example.provider.serializer
import org.hibernate.search.indexes.serialization.spi.Deserializer;
import org.hibernate.search.indexes.serialization.spi.SerializationProvider;
import org.hibernate.search.indexes.serialization.spi.Serializer;
public class ExampleOfSerializationProvider implements SerializationProvider {
@Override
public Serializer getSerializer() {
Serializer serializer = ...
return serializer;
}
@Override
public Deserializer getDeserializer() {
Deserializer deserializer = ...
return deserializer;
}
}
Hibernate Search uses the Java ServiceLoader mechanism to transparently discover services. In this case you will add the following file in your classpath:
/META-INF/services/org.hibernate.search.indexes.serialization.spi.SerializationProvider
example.provider.serializer.ExampleOfSerializationProvider
You will find more details about services in the section [section-services].
Hibernate Search allows you to configure how exceptions are handled during the indexing process. If no configuration is provided then exceptions are logged to the log output by default. It is possible to explicitly declare the exception logging mechanism as seen below:
hibernate.search.error_handler = log
The default exception handling occurs for both synchronous and asynchronous indexing. Hibernate Search provides an easy mechanism to override the default error handling implementation.
In order to provide your own implementation you must implement the ErrorHandler interface, which provides the handle(ErrorContext context) method. ErrorContext provides a reference to the primary LuceneWork instance, the underlying exception and any subsequent LuceneWork instances that could not be processed due to the primary exception.
public interface ErrorContext {
List<LuceneWork> getFailingOperations();
LuceneWork getOperationAtFault();
Throwable getThrowable();
boolean hasErrors();
}
To register this error handler with Hibernate Search you must declare the fully qualified classname of your ErrorHandler implementation in the configuration properties:
hibernate.search.error_handler = CustomerErrorHandler
Alternatively, an ErrorHandler instance may be passed via the configuration value map used when bootstrapping Hibernate Search programmatically.
Even though Hibernate Search will try to shield you as much as possible from Lucene specifics, there are several Lucene specifics which can be directly configured, either for performance reasons or for satisfying a specific use case. The following sections discuss these configuration options.
Hibernate Search allows you to tune the Lucene indexing performance by specifying a set of
parameters which are passed through to underlying Lucene IndexWriter
such as mergeFactor
,
maxMergeDocs
and maxBufferedDocs
. You can specify these parameters either as default values
applying for all indexes, on a per index basis, or even per shard.
There are several low level IndexWriter
settings which can be tuned for different use cases.
These parameters are grouped by the indexwriter
keyword:
hibernate.search.[default|<indexname>].indexwriter.<parameter_name>
If no value is set for an indexwriter
value in a specific shard configuration, Hibernate Search
will look at the index section, then at the default section.
hibernate.search.Animals.2.indexwriter.max_merge_docs = 10 hibernate.search.Animals.2.indexwriter.merge_factor = 20 hibernate.search.Animals.2.indexwriter.max_buffered_docs = default hibernate.search.default.indexwriter.max_merge_docs = 100 hibernate.search.default.indexwriter.ram_buffer_size = 64
The configuration in Example performance option configuration will result in these settings applied on the second shard of the Animal index:
-
max_merge_docs
= 10 -
merge_factor
= 20 -
ram_buffer_size
= 64MB -
max_buffered_docs
= Lucene default
All other values will use the defaults defined in Lucene.
The default for all values is to leave them at Lucene’s own default. The values listed in
List of indexing performance and behavior properties depend for this reason on the version of Lucene you are using.
The values shown are relative to version 2.4
. For more information about Lucene indexing performance,
please refer to the Lucene documentation.
Property | Description | Default Value |
---|---|---|
hibernate.search.[default|<indexname>].exclusive_index_use |
Set to |
|
hibernate.search.[default|<indexname>].max_queue_length |
Each index has a separate "pipeline" which contains the updates to be applied to the index.
When this queue is full adding more operations to the queue becomes a blocking operation. Configuring
this setting doesn’t make much sense unless the |
|
hibernate.search.[default|<indexname>].index_flush_interval |
The interval in milliseconds between flushes
of write operations to the index storage. Ignored unless |
|
hibernate.search.[default|<indexname>].indexwriter.max_buffered_delete_terms |
Determines the minimal number of delete terms required before the buffered in-memory delete terms are applied and flushed. If there are documents buffered in memory at the time, they are merged and a new segment is created. |
Disabled (flushes by RAM usage) |
hibernate.search.[default|<indexname>].indexwriter.max_buffered_docs |
Controls the amount of documents buffered in memory during indexing. The bigger the more RAM is consumed. |
Disabled (flushes by RAM usage) |
hibernate.search.[default|<indexname>].indexwriter.max_merge_docs |
Defines the largest number of documents allowed in a segment. Smaller values perform better on frequently changing indexes, larger values provide better search performance if the index does not change often. |
Unlimited (Integer.MAX_VALUE) |
hibernate.search.[default|<indexname>].indexwriter.merge_factor |
Controls segment merge frequency and size. Determines how often segment indexes are merged when insertion occurs. With smaller values, less RAM is used while indexing, and searches on unoptimized indexes are faster, but indexing speed is slower. With larger values, more RAM is used during indexing, and while searches on unoptimized indexes are slower, indexing is faster. Thus larger values (> 10) are best for batch index creation, and smaller values (< 10) for indexes that are interactively maintained. The value must not be lower than 2. |
10 |
hibernate.search.[default|<indexname>].indexwriter.merge_min_size |
Controls segment merge frequency and size. Segments smaller than this size (in MB) are always
considered for the next segment merge operation.
Setting this too large might result in expensive merge operations, even tough they are less frequent.
See also |
0 MB (actually ~1K) |
hibernate.search.[default|<indexname>].indexwriter.merge_max_size |
Controls segment merge frequency and size. Segments larger than this size (in MB) are never merged
in bigger segments. This helps reduce memory requirements and avoids some merging operations at the
cost of optimal search speed. When optimizing an index this value is ignored.
See also |
Unlimited |
hibernate.search.[default|<indexname>].indexwriter.merge_max_optimize_size |
Controls segment merge frequency and size. Segments larger than this size (in MB) are not merged
in bigger segments even when optimizing the index (see |
Unlimited |
hibernate.search.[default|<indexname>].indexwriter.merge_calibrate_by_deletes |
Controls segment merge frequency and size. Set to |
|
hibernate.search.[default|<indexname>].indexwriter.ram_buffer_size |
Controls the amount of RAM in MB dedicated to document buffers. When used together max_buffered_docs a flush occurs for whichever event happens first. Generally for faster indexing performance it’s best to flush by RAM usage instead of document count and use as large a RAM buffer as you can. |
16 MB |
hibernate.search.enable_dirty_check |
Not all entity changes require an update of the Lucene index. If all of the updated entity
properties (dirty properties) are not indexed Hibernate Search will skip the re-indexing work.
Disable this option if you use a custom |
true |
hibernate.search.[default|<indexname>].indexwriter.infostream |
Enable low level trace information about Lucene’s internal components. Will cause significant performance degradation: should only be used for troubleshooting purposes. |
false |
Tip
|
When your architecture permits it, always keep
|
Tip
|
To tune the indexing speed it might be useful to time the object loading from database in isolation
from the writes to the index. To achieve this set the hibernate.search.[default|<indexname>].worker.backend blackhole The recommended approach is to focus first on optimizing the object loading by enabling the |
Warning
|
The |
The options merge_max_size
, merge_max_optimize_size
, merge_calibrate_by_deletes
give you control on the maximum size of the segments being created, but you need to understand how
they affect file sizes. If you need to hard limit the size, consider that merging a segment is about
adding it together with another existing segment to form a larger one, so you might want to set the
max_size
for merge operations to less than half of your hard limit. Also segments might
initially be generated larger than your expected size at first creation time: before they are ever
merged. A segment is never created much larger than ram_buffer_size
, but the threshold is
checked as an estimate.
Example:
//to be fairly confident no files grow above 15MB, use: hibernate.search.default.indexwriter.ram_buffer_size = 10 hibernate.search.default.indexwriter.merge_max_optimize_size = 7 hibernate.search.default.indexwriter.merge_max_size = 7
Tip
|
When using the Infinispan Directory to cluster indexes make sure that your segments are smaller than
the |
Apache Lucene allows to log a very detailed trace log from its internals using a feature called "infostream". To access these details, Hibernate Search can be configured to capture this internal trace from Apache Lucene and redirect it to your logger.
-
Enable
TRACE
level logging for the categoryorg.hibernate.search.backend.lucene.infostream
-
Activate the feature on the index you want to inspect:
hibernate.search.[default|<indexname>].indexwriter.infostream=true
Keep in mind that this feature has a performance cost, and although most logger frameworks allow the TRACE
level to be reconfigured at runtime,
enabling the infostream
property will slow you down even if the logger is disabled.
Lucene Directorys have default locking strategies which work generally good enough for most cases, but it’s possible to specify for each index managed by Hibernate Search a specific LockingFactory you want to use. This is generally not needed but could be useful.
Some of these locking strategies require a filesystem-level lock.
They may be used with the local-heap
directory provider,
but in this case the indexBase
configuration option
(usually not needed when using a local-heap
directory provider)
must be specified to point to a filesystem location where the lock marker files will be stored.
To select a locking factory, set the hibernate.search.<index>.locking_strategy
option to
one of simple
, native
, single
or none
. Alternatively set it to the fully qualified name of
an implementation of org.hibernate.search.store.LockFactoryProvider
.
name | Class | Description |
---|---|---|
simple |
org.apache.lucene.store.SimpleFSLockFactory |
Safe implementation based on Java’s File API, it marks the usage of the index by creating a marker file. If for some reason you had to kill your application, you will need to remove this file before restarting it. |
native |
org.apache.lucene.store.NativeFSLockFactory |
As does This implementation has known problems on NFS, avoid it on network shares.
|
single |
org.apache.lucene.store.SingleInstanceLockFactory |
This LockFactory doesn’t use a file marker but is a Java object lock held in memory; therefore it’s possible to use it only when you are sure the index is not going to be shared by any other process. This is the default implementation for the |
none |
org.apache.lucene.store.NoLockFactory |
All changes to this index are not coordinated by any lock; test your application carefully and make sure you know what it means. |
Configuration example:
hibernate.search.default.locking_strategy = simple hibernate.search.Animals.locking_strategy = native hibernate.search.Books.locking_strategy = org.custom.components.MyLockingFactory
The Infinispan Directory uses a custom implementation; it’s still possible to override it but make sure you understand how that will work, especially with clustered indexes.
While Hibernate Search strives to offer a backwards compatible API making it easy to port your application to newer versions, it still delegates to Apache Lucene to handle the index writing and searching. This creates a dependency to the Lucene index format. The Lucene developers of course attempt to keep a stable index format, but sometimes a change in the format can not be avoided. In those cases you either have to re-index all your data or use an index upgrade tool. Sometimes Lucene is also able to read the old format so you don’t need to take specific actions (besides making backup of your index).
While an index format incompatibility is a rare event, it can happen more often that Lucene’s Analyzer implementations might slightly change its behavior. This can lead to a poor recall score, possibly missing many hits from the results.
Hibernate Search exposes a configuration property hibernate.search.lucene_version
which
instructs the analyzers and other Lucene classes to conform to their behavior as defined in an
(older) specific version of Lucene. See also org.apache.lucene.util.Version
contained in the
lucene-core.jar. Depending on the specific version of Lucene you’re using you might have different
options available. When this option is not specified, Hibernate Search will instruct Lucene to use
the default version, which is usually the best option for new projects. Still it’s recommended to
define the version you’re using explicitly in the configuration so that when you happen to upgrade
Lucene the analyzers will not change behavior. You can then choose to update this value at a later
time, when you for example have the chance to rebuild the index from scratch.
hibernate.search.lucene_version = LUCENE_47
This option is global for the configured SearchFactory and affects all Lucene APIs having such a parameter, as this should be applied consistently. So if you are also making use of Lucene bypassing Hibernate Search, make sure to apply the same value too.
After looking at all these different configuration options, it is time to have a look at an API
which allows you to programmatically access parts of the configuration. Via the metadata API you can
determine the indexed types and also how they are mapped (see [search-mapping]) to the index
structure. The entry point into this API is the SearchFactory. It offers two methods, namely
getIndexedTypes()
and getIndexedTypeDescriptor(Class<?>)
. The former returns a set of all
indexed type, where as the latter allows to retrieve a so called IndexedTypeDescriptorfor a given
type. This descriptor allows you determine whether the type is indexed at all and, if so, whether
the index is for example sharded or not (see [advanced-features-sharding]). It also allows you to
determine the static boost of the type (see [section-boost-annotation]) as well as its dynamic
boost strategy (see [section-dynamic-boost]). Most importantly, however, you get information about
the indexed properties and generated Lucene Document fields. This is exposed via PropertyDescriptors
respectively FieldDescriptors. The easiest way to get to know the API is to explore it via the IDE
or its javadocs.
Note
|
All descriptor instances of the metadata API are read only. They do not allow to change any runtime configuration. |
Hibernate Search is included in the WildFly application server, and since WildFly 10 the module is automatically activated (added to the classpath of your deployment) if you are using Hibernate ORM and have any indexed entities.
Alternatively you can opt to use a different version of the module by downloading and unzipping a different
moduleset and setting the wildfly.jpa.hibernate.search.module
property in your persistence.xml
.
The modules system in WildFly allows to safely run multiple versions of Hibernate ORM and Hibernate Search in parallel, but if you download an alternative version make sure the Hibernate Search version you choose is compatible with the Hibernate ORM version you choose.
Warning
|
This version of Hibernate Search The modules distributed by Hibernate Search WildFly includes an older version of Hibernate ORM, so you will need to upgrade this dependency as well. The Hibernate ORM / WildFly update instructions can be found here. Not least, as the same guide explains you might need to exclude the Javassist version. |
The activation of the Hibernate Search modules in WildFly is automatic, provided you’re having at least one
entity annotated with org.hibernate.search.annotations.Indexed
.
You can control this behaviour of the JPA deployer explicitly; for example to make sure Hibernate Search
and Apache Lucene classes are available to your application even though you haven’t annotated any entity,
set the following property in your persistence.xml
:
wildfly.jpa.hibernate.search.module=org.hibernate.search.orm:main
You can download the latest Hibernate Search provided module and install it. This is often the best approach as you will benefit from all the latest improvements of Hibernate Search. Because of the modular design in WildFly, these additional modules can coexist with the embedded modules and won’t affect any other application, unless you explicitly reconfigure it to use the newer module.
You can download the latest pre-packaged Hibernate Search modules from Sourceforge. As a convenience these zip files are also distributed as Maven artifacts: org.hibernate:hibernate-search-modules-{hibernateSearchVersion}-wildfly-11-dist:zip.
Unpack the modules in your WildFly modules
directory: this will create modules for Hibernate Search and Apache Lucene.
The Hibernate Search modules are:
-
org.hibernate.search.orm, for users of Hibernate Search with Hibernate; this will transitively include Hibernate ORM.
-
org.hibernate.search.engine, for projects depending on the internal indexing engine that don’t require other dependencies to Hibernate.
-
org.hibernate.search.backend-jms, in case you want to use the JMS backend described in JMS Architecture.
Next you will need to make sure the JPA deployer of WildFly provides you with the version you have chosen, instead of the default version
bundled with the application server.
Set the following property in your persistence.xml
:
wildfly.jpa.hibernate.search.module=org.hibernate.search.orm:{hibernateSearchVersion}
See also the WildFly JPA configuration
More information about the modules configuration in WildFly can be found in the Class Loading in WildFly wiki.
Tip
|
Modular classloading is a feature of JBoss EAP 7 as well, but if you are using JBoss EAP, you’re reading the wrong version of the user guide! JBoss EAP subscriptions include official support for Hibernate Search and come with a different edition of this guide specifically tailored for EAP users. |
If you are updating the version of Hibernate Search in WildFly as described in the previous paragraph,
you might need to update Infinispan as well.
The process is very similar: download the modules from
Infinispan project downloads, picking a compatible version,
and decompress the modules into the modules
directory of your WildFly installation.
Hibernate Search version {hibernateSearchVersion}
was compiled and tested with Infinispan version
{infinispanVersion}
; generally a more recent version of either project is expected to be backwards
compatible for cross-project integration purposes as long as they have the same "major.minor" family
version.
For example for a version of Hibernate Search depending on Infinispan 8.2.4.Final
it should be
safe to upgrade Infinispan to 8.2.6.Final
, but an upgrade to 8.3.0.Final
might not work.