hibernate-search-documentation/src/main/docbook/en-US/modules/configuration.xml

<?xml version="1.0" encoding="UTF-8"?>
<!--
  ~ Hibernate, Relational Persistence for Idiomatic Java
  ~
  ~  Copyright (c) 2010, Red Hat, Inc. and/or its affiliates or third-party contributors as
  ~  indicated by the @author tags or express copyright attribution
  ~  statements applied by the authors.  All third-party contributions are
  ~  distributed under license by Red Hat, Inc.
  ~
  ~  This copyrighted material is made available to anyone wishing to use, modify,
  ~  copy, or redistribute it subject to the terms and conditions of the GNU
  ~  Lesser General Public License, as published by the Free Software Foundation.
  ~
  ~  This program is distributed in the hope that it will be useful,
  ~  but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
  ~  or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Lesser General Public License
  ~  for more details.
  ~
  ~  You should have received a copy of the GNU Lesser General Public License
  ~  along with this distribution; if not, write to:
  ~  Free Software Foundation, Inc.
  ~  51 Franklin Street, Fifth Floor
  ~  Boston, MA  02110-1301  USA
  -->
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
"http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [
<!ENTITY % BOOK_ENTITIES SYSTEM "../hsearch.ent">
%BOOK_ENTITIES;
]>
<chapter id="search-configuration">
  <title>Configuration</title>

  <section id="search-configuration-event" revision="2">
    <title>Enabling Hibernate Search and automatic indexing</title>

    <para>Let's start with the most basic configuration question - how do I
    enable Hibernate Search?</para>

    <section>
      <title>Enabling Hibernate Search</title>

      <para>The good news is that Hibernate Search is enabled out of the box
      when detected on the classpath by Hibernate Core. If, for some reason
      you need to disable it, set
      <literal>hibernate.search.autoregister_listeners</literal> to
      <constant>false</constant>. Note that there is no performance penalty
      when the listeners are enabled but no entities are annotated as
      indexed.</para>
    </section>

    <section>
      <title>Automatic indexing</title>

      <para>By default, every time an object is inserted, updated or deleted
      through Hibernate, Hibernate Search updates the according Lucene index.
      It is sometimes desirable to disable that features if either your index
      is read-only or if index updates are done in a batch way (see <xref
      linkend="search-batchindex"/>).</para>

      <para>To disable event based indexing, set</para>

      <programlisting>hibernate.search.indexing_strategy = manual</programlisting>

      <note>
        <para>In most case, the JMS backend provides the best of both world, a
        lightweight event based system keeps track of all changes in the
        system, and the heavyweight indexing process is done by a separate
        process or machine.</para>
      </note>
    </section>
  </section>

  <section id="configuration-indexmanager">
    <title>Configuring the <classname>IndexManager</classname></title>

    <para>The role of the index manager component is described in <xref
    linkend="search-architecture"/>. Hibernate Search provides two possible
    implementations for this interface to choose from.</para>

    <itemizedlist>
      <listitem>
        <para><literal>directory-based</literal>: the default implementation
        which uses the Lucene <classname>Directory</classname> abstraction to
        manage index files.</para>
      </listitem>

      <listitem>
        <para><literal>near-real-time</literal>: avoid flushing writes to disk
        at each commit. This index manager is also
        <classname>Directory</classname> based, but also makes uses of
        Lucene's NRT functionallity.</para>
      </listitem>
    </itemizedlist>

    <para>To select an alternative you specify the property:</para>

    <programlisting>hibernate.search.[default|&lt;indexname&gt;].indexmanager = near-real-time</programlisting>

    <section>
      <title><literal>directory-based</literal></title>

      <para>The default <classname>IndexManager</classname> implementation.
      This is the one mostly referred to in this documentation. It is highly
      configurable and allows you to select different settings for the reader
      strategy, back ends and directory providers. Refer to <xref
      linkend="search-configuration-directory"/>, <xref
      linkend="configuration-worker"/> and <xref
      linkend="configuration-reader-strategy"/> for more details.</para>
    </section>

    <section>
      <title><literal>near-real-time</literal></title>

      <para>The <classname>NRTIndexManager</classname> is an extension of the
      default <classname>IndexManager</classname>, leveraging the Lucene NRT
      (Near Real Time) features for extreme low latency index writes. As a
      tradeoff it requires a non-clustered and non-shared index. In other
      words, it will ignore configuration settings for alternative back ends
      other than <literal>lucene</literal> and will acquire exclusive write
      locks on the <classname>Directory</classname>.</para>

      <para>To achieve this low latency writes, the
      <classname>IndexWriter</classname> will not flush every change to disk.
      Queries will be allowed to read updated state from the unflushed index
      writer buffers; the downside of this strategy is that if the application
      crashes or the <classname>IndexWriter</classname> is otherwise killed
      you'll have to rebuild the indexes as some updates might be lost.</para>

      <para>Because of these downsides, and because a master node in cluster
      can be configured for good performance as well, the NRT configuration is
      only recommended for non clustered websites with a limited amount of
      data.</para>
    </section>

    <section>
      <title>Custom</title>

      <para>It is also possible to configure a custom
      <classname>IndexManager</classname> implementation by specifying the
      fully qualified class name of your custom implementation. This
      implementation must have a no-argument constructor:<programlisting>hibernate.search.[default|&lt;indexname&gt;].indexmanager = my.corp.myapp.CustomIndexManager</programlisting></para>

      <tip>
        <para>Your custom index manager implementation doesn't need to use the
        same components as the default implementations. For example, you can
        delegate to a remote indexing service which doesn't expose a
        <classname>Directory</classname> interface.</para>
      </tip>
    </section>
  </section>

  <section id="search-configuration-directory" revision="1">
    <title>Directory configuration</title>

    <para>As we have seen in <xref linkend="configuration-indexmanager"/> the
    default index manager uses Lucene's notion of a
    <classname>Directory</classname> to store the index files. The
    <classname>Directory</classname> implementation can be customized and
    Lucene comes bundled with a file system and an in-memory implementation.
    <classname>DirectoryProvider</classname> is the Hibernate Search
    abstraction around a Lucene <classname>Directory</classname> and handles
    the configuration and the initialization of the underlying Lucene
    resources. <xref linkend="directory-provider-table"/> shows the list of
    the directory providers available in Hibernate Search together with their
    corresponding options.</para>

    <para>To configure your <classname>DirectoryProvider</classname> you have
    to understand that each indexed entity is associated to a Lucene index
    (except of the case where multiple entities share the same index - <xref
    linkend="section-sharing-indexes"/>). The name of the index is given by
    the <constant>index</constant> property of the
    <classname>@Indexed</classname> annotation. If the
    <constant>index</constant> property is not specified the fully qualified
    name of the indexed class will be used as name (recommended).</para>

    <para>Knowing the index name, you can configure the directory provider and
    any additional options by using the prefix
    <constant>hibernate.search.</constant><replaceable>&lt;indexname&gt;</replaceable>.
    The name <constant>default</constant>
    (<constant>hibernate.search.default</constant>) is reserved and can be
    used to define properties which apply to all indexes. <xref
    linkend="example-configuring-directory-providers"/> shows how
    <constant>hibernate.search.default.directory_provider</constant> is used
    to set the default directory provider to be the filesystem one.
    <constant>hibernate.search.default.indexBase</constant> sets then the
    default base directory for the indexes. As a result the index for the
    entity <classname>Status</classname> is created in
    <filename>/usr/lucene/indexes/org.hibernate.example.Status</filename>.</para>

    <para>The index for the <classname>Rule</classname> entity, however, is
    using an in-memory directory, because the default directory provider for
    this entity is overriden by the property
    <constant>hibernate.search.Rules.directory_provider</constant>.</para>

    <para>Finally the <classname>Action</classname> entity uses a custom
    directory provider <classname>CustomDirectoryProvider</classname>
    specified via
    <constant>hibernate.search.Actions.directory_provider</constant>.</para>

    <example>
      <title>Specifying the index name</title>

      <programlisting language="JAVA" role="JAVA">package org.hibernate.example;

@Indexed
public class Status { ... }

@Indexed(index="Rules")
public class Rule { ... }

@Indexed(index="Actions")
public class Action { ... }</programlisting>
    </example>

    <example id="example-configuring-directory-providers">
      <title>Configuring directory providers</title>

      <programlisting>hibernate.search.default.directory_provider = filesystem
hibernate.search.default.indexBase = /usr/lucene/indexes
hibernate.search.Rules.directory_provider = ram
hibernate.search.Actions.directory_provider = com.acme.hibernate.CustomDirectoryProvider</programlisting>
    </example>

    <tip>
      <para>Using the described configuration scheme you can easily define
      common rules like the directory provider and base directory, and
      override those defaults later on on a per index basis.</para>
    </tip>

    <table id="directory-provider-table">
      <title>List of built-in <classname>DirectoryProvider</classname></title>

      <tgroup cols="2">
        <thead>
          <row>
            <entry align="center">Name and description</entry>

            <entry align="center">Properties</entry>
          </row>
        </thead>

        <tbody>
          <row>
            <entry><property>ram</property>: Memory based directory, the
            directory will be uniquely identified (in the same deployment
            unit) by the <literal>@Indexed.index</literal> element</entry>

            <entry>none</entry>
          </row>

          <row>
            <entry><property>filesystem</property>: File system based
            directory. The directory used will be &lt;indexBase&gt;/&lt;
            indexName &gt;</entry>

            <entry><para><literal>indexBase</literal> : base
            directory</para><para><literal>indexName</literal>: override
            @Indexed.index (useful for sharded
            indexes)</para><para><literal>locking_strategy</literal> :
            optional, see <xref
            linkend="search-configuration-directory-lockfactories"/>
            </para><para><literal>filesystem_access_type</literal>: allows to
            determine the exact type of <classname>FSDirectory</classname>
            implementation used by this
            <classname>DirectoryProvider</classname>. Allowed values are
            <literal>auto</literal> (the default value, selects
            <classname>NIOFSDirectory</classname> on non Windows systems,
            <classname>SimpleFSDirectory</classname> on Windows),
            <literal>simple</literal>
            (<classname>SimpleFSDirectory</classname>), <literal>nio</literal>
            (<classname>NIOFSDirectory</classname>), <literal>mmap</literal>
            (<classname>MMapDirectory</classname>). Make sure to refer to
            Javadocs of these <classname>Directory</classname> implementations
            before changing this setting. Even though
            <classname>NIOFSDirectory</classname> or
            <classname>MMapDirectory</classname> can bring substantial
            performace boosts they also have their issues.</para></entry>
          </row>

          <row>
            <entry><para><property>filesystem-master</property>: File system
            based directory. Like <literal>filesystem</literal>. It also
            copies the index to a source directory (aka copy directory) on a
            regular basis.</para><para>The recommended value for the refresh
            period is (at least) 50% higher that the time to copy the
            information (default 3600 seconds - 60 minutes).</para><para>Note
            that the copy is based on an incremental copy mechanism reducing
            the average copy time.</para><para>DirectoryProvider typically
            used on the master node in a JMS back end cluster.</para><para>The
            <literal> buffer_size_on_copy</literal> optimum depends on your
            operating system and available RAM; most people reported good
            results using values between 16 and 64MB.</para></entry>

            <entry><para><literal>indexBase</literal>: base
            directory</para><para><literal>indexName</literal>: override
            @Indexed.index (useful for sharded
            indexes)</para><para><literal>sourceBase</literal>: source (copy)
            base directory.</para><para><literal>source</literal>: source
            directory suffix (default to <literal>@Indexed.index</literal>).
            The actual source directory name being
            <filename>&lt;sourceBase&gt;/&lt;source&gt;</filename>
            </para><para><literal>refresh</literal>: refresh period in seconds
            (the copy will take place every <constant>refresh</constant>
            seconds). If a copy is still in progress when the following
            <constant>refresh</constant> period elapses, the second copy
            operation will be
            skipped.</para><para><literal>buffer_size_on_copy</literal>: The
            amount of MegaBytes to move in a single low level copy
            instruction; defaults to
            16MB.</para><para><literal>locking_strategy</literal> : optional,
            see <xref linkend="search-configuration-directory-lockfactories"/>
            </para><para><literal>filesystem_access_type</literal>: allows to
            determine the exact type of <classname>FSDirectory</classname>
            implementation used by this
            <classname>DirectoryProvider</classname>. Allowed values are
            <literal>auto</literal> (the default value, selects
            <classname>NIOFSDirectory</classname> on non Windows systems,
            <classname>SimpleFSDirectory</classname> on Windows),
            <literal>simple</literal>
            (<classname>SimpleFSDirectory</classname>), <literal>nio</literal>
            (<classname>NIOFSDirectory</classname>), <literal>mmap</literal>
            (<classname>MMapDirectory</classname>). Make sure to refer to
            Javadocs of these <classname>Directory</classname> implementations
            before changing this setting. Even though
            <classname>NIOFSDirectory</classname> or
            <classname>MMapDirectory</classname> can bring substantial
            performace boosts they also have their issues.</para></entry>
          </row>

          <row>
            <entry><para><property>filesystem-slave</property>: File system
            based directory. Like <literal>filesystem</literal>, but retrieves
            a master version (source) on a regular basis. To avoid locking and
            inconsistent search results, 2 local copies are
            kept.</para><para>The recommended value for the refresh period is
            (at least) 50% higher that the time to copy the information
            (default 3600 seconds - 60 minutes).</para><para>Note that the
            copy is based on an incremental copy mechanism reducing the
            average copy time. If a copy is still in progress when
            <constant>refresh</constant> period elapses, the second copy
            operation will be skipped.</para><para>DirectoryProvider typically
            used on slave nodes using a JMS back end.</para><para>The
            <literal> buffer_size_on_copy</literal> optimum depends on your
            operating system and available RAM; most people reported good
            results using values between 16 and 64MB.</para></entry>

            <entry><para><literal>indexBase</literal>: Base
            directory</para><para><literal>indexName</literal>: override
            @Indexed.index (useful for sharded
            indexes)</para><para><literal>sourceBase</literal>: Source (copy)
            base directory.</para><para><literal>source</literal>: Source
            directory suffix (default to <literal>@Indexed.index</literal>).
            The actual source directory name being
            <filename>&lt;sourceBase&gt;/&lt;source&gt;</filename>
            </para><para><literal>refresh</literal>: refresh period in second
            (the copy will take place every refresh
            seconds).</para><para><literal>buffer_size_on_copy</literal>: The
            amount of MegaBytes to move in a single low level copy
            instruction; defaults to
            16MB.</para><para><literal>locking_strategy</literal> : optional,
            see <xref linkend="search-configuration-directory-lockfactories"/>
            </para><para><literal>retry_marker_lookup</literal> : optional,
            default to 0. Defines how many times we look for the marker files
            in the source directory before failing. Waiting 5 seconds between
            each try. </para><para><literal>retry_initialize_period</literal>
            : optional, set an integer value in seconds to enable the retry
            initialize feature: if the slave can't find the master index it
            will try again until it's found in background, without preventing
            the application to start: fullText queries performed before the
            index is initialized are not blocked but will return empty
            results. When not enabling the option or explicitly setting it to
            zero it will fail with an exception instead of scheduling a retry
            timer. To prevent the application from starting without an invalid
            index but still control an initialization timeout, see
            <literal>retry_marker_lookup</literal>
            instead.</para><para><literal>filesystem_access_type</literal>:
            allows to determine the exact type of
            <classname>FSDirectory</classname> implementation used by this
            <classname>DirectoryProvider</classname>. Allowed values are
            <literal>auto</literal> (the default value, selects
            <classname>NIOFSDirectory</classname> on non Windows systems,
            <classname>SimpleFSDirectory</classname> on Windows),
            <literal>simple</literal>
            (<classname>SimpleFSDirectory</classname>), <literal>nio</literal>
            (<classname>NIOFSDirectory</classname>), <literal>mmap</literal>
            (<classname>MMapDirectory</classname>). Make sure to refer to
            Javadocs of these <classname>Directory</classname> implementations
            before changing this setting. Even though
            <classname>NIOFSDirectory</classname> or
            <classname>MMapDirectory</classname> can bring substantial
            performace boosts they also have their issues.</para></entry>
          </row>

          <row>
            <entry><para><property>infinispan</property>: Infinispan based
            directory. Use it to store the index in a distributed grid, making
            index changes visible to all elements of the cluster very quickly.
            Also see <xref linkend="infinispan-directories"/> for additional
            requirements and configuration settings. Infinispan needs a global
            configuration and additional dependencies; the settings defined
            here apply to each different index.</para></entry>

            <entry><para><literal>locking_cachename</literal>: name of the
            Infinispan cache to use to store
            locks.</para><para><literal>data_cachename</literal> : name of the
            Infinispan cache to use to store the largest data chunks; this
            area will contain the largest objects, use replication if you have
            enough memory or switch to distribution.</para>
            <para><literal>metadata_cachename</literal>: name of the
            Infinispan cache to use to store the metadata relating to the
            index; this data is rather small and read very often, it's
            recommended to have this cache setup using replication.</para>
            <para><literal>chunk_size</literal>: large files of the index are
            split in smaller chunks, you might want to set the highest value
            efficiently handled by your network. Networking tuning might be
            useful.</para></entry>
          </row>
        </tbody>
      </tgroup>
    </table>

    <tip>
      <para>If the built-in directory providers do not fit your needs, you can
      write your own directory provider by implementing the
      <classname>org.hibernate.store.DirectoryProvider</classname> interface.
      In this case, pass the fully qualified class name of your provider into
      the <literal>directory_provider</literal> property. You can pass any
      additional properties using the prefix
      <constant>hibernate.search.</constant><replaceable>&lt;indexname&gt;</replaceable>.</para>
    </tip>

    <section id="infinispan-directories" revision="2">
      <title role="bold">Infinispan Directory configuration</title>

      <para>Infinispan is a distributed, scalable, highly available data grid
      platform which supports autodiscovery of peer nodes. Using Infinispan
      and Hibernate Search in combination, it is possible to store the Lucene
      index in a distributed environment where index updates are quickly
      available on all nodes.</para>

      <para>This section describes in greater detail how to configure
      Hibernate Search to use an Infinispan Lucene Directory.</para>

      <para>When using an Infinispan Directory the index is stored in memory
      and shared across multiple nodes. It is considered a single directory
      distributed across all participating nodes. If a node updates the index,
      all other nodes are updated as well. Updates on one node can be
      immediately searched for in the whole cluster.</para>

      <para>The default configuration replicates all data defining the index
      across all nodes, thus consuming a significant amount of memory. For
      large indexes it's suggested to enable data distribution, so that each
      piece of information is replicated to a subset of all cluster
      members.</para>

      <para>It is also possible to offload part or most information to a
      <literal>CacheStore</literal>, such as plain filesystem, Amazon S3,
      Cassandra, Berkley DB or standard relational databases. You can
      configure it to have a <literal>CacheStore</literal> on each node or
      have a single centralized one shared by each node.</para>

      <para>See the <ulink
      url="https://docs.jboss.org/author/display/ISPN/Home"> Infinispan
      documentation</ulink> for all Infinispan configuration options.</para>

      <section>
        <title>Requirements</title>

        <para>To use the Infinispan directory via Maven, add the following
        dependencies:</para>

        <example>
          <title>Maven dependencies for Hibernate Search</title>

          <programlisting language="XML" role="XML">&lt;dependency&gt;
   &lt;groupId&gt;org.hibernate&lt;/groupId&gt;
   &lt;artifactId&gt;hibernate-search&lt;/artifactId&gt;
   &lt;version&gt;&version;&lt;/version&gt;
&lt;/dependency&gt;
&lt;dependency&gt;
   &lt;groupId&gt;org.hibernate&lt;/groupId&gt;
   &lt;artifactId&gt;hibernate-search-infinispan&lt;/artifactId&gt;
   &lt;version&gt;&version;&lt;/version&gt;
&lt;/dependency&gt;</programlisting>
        </example>

        <para>For the non-maven users, add
        <literal>hibernate-search-infinispan.jar</literal>,
        <literal>infinispan-lucene-directory.jar</literal> and
        <literal>infinispan-core.jar</literal> to your application classpath.
        These last two jars are distributed by <ulink
        url="http://www.jboss.org/infinispan/downloads">Infinispan</ulink>.</para>
      </section>

      <section>
        <title>Architecture</title>

        <para>Even when using an Infinispan directory it's still recommended
        to use the JMS Master/Slave or JGroups backend, because in Infinispan
        all nodes will share the same index and it is likely that
        <classname>IndexWriter</classname>s being active on different nodes
        will try to acquire the lock on the same index. So instead of sending
        updates directly to the index, send it to a JMS queue or JGroups
        channel and have a single node apply all changes on behalf of all
        other nodes.</para>

        <para>Configuring a non-default backend is not a requirement but a
        performance optimization as locks are enabled to have a single node
        writing.</para>

        <para>To configure a JMS slave only the backend must be replaced, the
        directory provider must be set to <literal>infinispan</literal>; set
        the same directory provider on the master, they will connect without
        the need to setup the copy job across nodes. Using the JGroups backend
        is very similar - just combine the backend configuration with the
        <literal>infinispan</literal> directory provider.</para>
      </section>

      <section>
        <title>Infinispan Configuration</title>

        <para>The most simple configuration only requires to enable the
        backend:</para>

        <programlisting>hibernate.search.[default|&lt;indexname&gt;].directory_provider = infinispan</programlisting>

        <para>That's all what is needed to get a cluster-replicated index, but
        the default configuration does not enable any form of permanent
        persistence for the index; to enable such a feature an Infinispan
        configuration file should be provided.</para>

        <para>To use Infinispan, Hibernate Search requirest a
        <classname>CacheManager</classname>; it can lookup and reuse an
        existing <classname>CacheManager,</classname> via JNDI, or start and
        manage a new one. In the latter case Hibernate Search will start and
        stop it ( closing occurs when the Hibernate
        <classname>SessionFactory</classname> is closed).</para>

        <para>To use and existing <classname>CacheManager</classname> via JNDI
        (optional parameter):</para>

        <programlisting>hibernate.search.infinispan.cachemanager_jndiname = [jndiname]</programlisting>

        <para>To start a new <classname>CacheManager</classname> from a
        configuration file (optional parameter):</para>

        <programlisting>hibernate.search.infinispan.configuration_resourcename = [infinispan configuration filename]</programlisting>

        <para>If both parameters are defined, JNDI will have priority. If none
        of these is defined, Hibernate Search will use the default Infinispan
        configuration included in
        <literal>hibernate-search-infinispan.jar</literal>. This configuration
        should work fine in most cases but does not store the index in a
        persistent cache store.</para>

        <para>As mentioned in <xref linkend="directory-provider-table"/>, each
        index makes use of three caches, so three different caches should be
        configured as shown in the
        <literal>default-hibernatesearch-infinispan.xml</literal> provided in
        the <literal>hibernate-search-infinispan.jar</literal>. Several
        indexes can share the same caches.</para>
      </section>
    </section>
  </section>

  <section>
    <title id="configuration-worker">Worker configuration</title>

    <para>It is possible to refine how Hibernate Search interacts with Lucene
    through the worker configuration. There exist several architectural
    components and possible extension points. Let's have a closer look.</para>

    <para>First there is a <classname>Worker</classname>. An implementation of
    the <classname>Worker</classname> interface is reponsible for receiving
    all entity changes, queuing them by context and applying them once a
    context ends. The most intuative context, especially in connection with
    ORM, is the transaction. For this reason Hibernate Search will per default
    use the <classname>TransactionalWorker</classname> to scope all changes
    per transaction. One can, however, imagine a scenario where the context
    depends for example on the number of entity changes or some other
    application (lifecycle) events. For this reason the
    <classname>Worker</classname> implementation is configurable as shown in
    <xref linkend="table-worker-configuration"/>.</para>

    <table id="table-worker-configuration">
      <title>Scope configuration</title>

      <tgroup cols="2">
        <tbody>
          <row>
            <entry><emphasis role="bold">Property</emphasis></entry>

            <entry><emphasis role="bold">Description</emphasis></entry>
          </row>

          <row>
            <entry><property>hibernate.search.worker.scope</property></entry>

            <entry>The fully qualifed class name of the
            <classname>Worker</classname> implementation to use. If this
            property is not set, empty or <literal>transaction</literal> the
            default <classname>TransactionalWorker</classname> is
            used.</entry>
          </row>

          <row>
            <entry><property>hibernate.search.worker.*</property></entry>

            <entry>All configuration properties prefixed with
            <literal>hibernate.search.worker</literal> are passed to the
            Worker during initialization. This allows adding custom, worker
            specific parameters.</entry>
          </row>
        </tbody>
      </tgroup>
    </table>

    <para>Once a context ends it is time to prepare and apply the index
    changes. This can be done synchronously or asynchronously from within a
    new thread. Synchronous updates have the advantage that the index is at
    all times in sync with the databases. Asynchronous updates, on the other
    hand, can help to minimize the user response time. The drawback is
    potential discrepancies between database and index states. Lets look at
    the configuration options shown in <xref
    linkend="table-work-execution-configuration"/>.</para>

    <note>
      <para>The following options can be different on each index; in fact they
      need the indexName prefix or use <literal>default</literal> to set the
      default value for all indexes.</para>
    </note>

    <table id="table-work-execution-configuration">
      <title>Execution configuration</title>

      <tgroup cols="2">
        <tbody>
          <row>
            <entry><emphasis role="bold">Property</emphasis></entry>

            <entry><emphasis role="bold">Description</emphasis></entry>
          </row>

          <row>
            <entry><property>hibernate.search.&lt;indexName&gt;.​worker.execution</property></entry>

            <entry><para><literal>sync</literal>: synchronous execution
            (default)</para><para><literal>async</literal>: asynchronous
            execution</para></entry>
          </row>

          <row>
            <entry><property>hibernate.search.&lt;indexName&gt;.​worker.thread_pool.size</property></entry>

            <entry>The backend can apply updates from the same transaction
            context (or batch) in parallel, using a threadpool. The default
            value is 1. You can experiment with larger values if you have many
            operations per transaction.</entry>
          </row>

          <row>
            <entry><property>hibernate.search.&lt;indexName&gt;.​worker.buffer_queue.max</property></entry>

            <entry>Defines the maximal number of work queue if the thread poll
            is starved. Useful only for asynchronous execution. Default to
            infinite. If the limit is reached, the work is done by the main
            thread.</entry>
          </row>
        </tbody>
      </tgroup>
    </table>

    <para>So far all work is done within the same Virtual Machine (VM), no
    matter which execution mode. The total amount of work has not changed for
    the single VM. Luckily there is a better approach, namely delegation. It
    is possible to send the indexing work to a different server by configuring
    hibernate.search.worker.backend - see <xref
    linkend="table-backend-configuration"/>. Again this option can be
    configured differently for each index.</para>

    <table id="table-backend-configuration">
      <title>Backend configuration</title>

      <tgroup cols="2">
        <tbody>
          <row>
            <entry><emphasis role="bold">Property</emphasis></entry>

            <entry><emphasis role="bold">Description</emphasis></entry>
          </row>

          <row>
            <entry><property>hibernate.search.&lt;indexName&gt;.​worker.backend</property></entry>

            <entry><para><literal>lucene</literal>: The default backend which
            runs index updates in the same VM. Also used when the property is
            undefined or empty.</para><para><literal>jms</literal>: JMS
            backend. Index updates are send to a JMS queue to be processed by
            an indexing master. See <xref
            linkend="table-jms-backend-configuration"/> for additional
            configuration options and <xref linkend="jms-backend"/> for a more
            detailed descripton of this
            setup.</para><para><literal>jgroupsMaster</literal> or
            <literal>jgroupsSlave</literal>: Backend using <ulink
            url="http://www.jgroups.org/">JGroups</ulink> as communication
            layer. See <xref linkend="jgroups-backend"/> for a more detailed
            description of this
            setup.</para><para><literal>blackhole</literal>: Mainly a
            test/developer setting which ignores all indexing
            work</para><para>You can also specify the fully qualified name of
            a class implementing <classname>BackendQueueProcessor</classname>.
            This way you can implement your own communication layer. The
            implementation is responsilbe for returning a
            <classname>Runnable</classname> instance which on execution will
            process the index work.</para></entry>
          </row>
        </tbody>
      </tgroup>
    </table>

    <table id="table-jms-backend-configuration">
      <title>JMS backend configuration</title>

      <tgroup cols="2">
        <tbody>
          <row>
            <entry><emphasis role="bold">Property</emphasis></entry>

            <entry><emphasis role="bold">Description</emphasis></entry>
          </row>

          <row>
            <entry><property>hibernate.search.&lt;indexName&gt;.​worker.jndi.*</property></entry>

            <entry>Defines the JNDI properties to initiate the InitialContext
            (if needed). JNDI is only used by the JMS back end.</entry>
          </row>

          <row>
            <entry><property>hibernate.search.&lt;indexName&gt;.​worker.jms.connection_factory</property></entry>

            <entry>Mandatory for the JMS back end. Defines the JNDI name to
            lookup the JMS connection factory from
            (<literal>/ConnectionFactory</literal> by default in JBoss
            AS)</entry>
          </row>

          <row>
            <entry><property>hibernate.search.&lt;indexName&gt;.​worker.jms.queue</property></entry>

            <entry>Mandatory for the JMS back end. Defines the JNDI name to
            lookup the JMS queue from. The queue will be used to post work
            messages.</entry>
          </row>

          <row>
            <entry><property>hibernate.search.&lt;indexName&gt;.​worker.jms.login</property></entry>

            <entry>Optional for the JMS slaves. Use it when your queue requires login credentials
            to define your login.</entry>
          </row>

          <row>
            <entry><property>hibernate.search.&lt;indexName&gt;.​worker.jms.login</property></entry>

            <entry>Optional for the JMS slaves. Use it when your queue requires login credentials
            to define your password.</entry>
          </row>
        </tbody>
      </tgroup>
    </table>

    <warning>
      <para>As you probably noticed, some of the shown properties are
      correlated which means that not all combinations of property values make
      sense. In fact you can end up with a non-functional configuration. This
      is especially true for the case that you provide your own
      implementations of some of the shown interfaces. Make sure to study the
      existing code before you write your own <classname>Worker</classname> or
      <classname>BackendQueueProcessor</classname> implementation.</para>
    </warning>

    <section id="jms-backend">
      <title>JMS Master/Slave back end</title>

      <para>This section describes in greater detail how to configure the
      Master/Slave Hibernate Search architecture.</para>

      <mediaobject>
        <imageobject role="html">
          <imagedata align="center" fileref="jms-backend.png" format="PNG"/>
        </imageobject>

        <imageobject role="fo">
          <imagedata align="center" depth="" fileref="jms-backend.png"
                     format="PNG" scalefit="1" width="12cm"/>
        </imageobject>

        <caption><para>JMS back end configuration.</para></caption>
      </mediaobject>

      <section>
        <title>Slave nodes</title>

        <para>Every index update operation is sent to a JMS queue. Index
        querying operations are executed on a local index copy.</para>

        <example>
          <title>JMS Slave configuration</title>

          <programlisting>### slave configuration

## DirectoryProvider
# (remote) master location
hibernate.search.default.sourceBase = /mnt/mastervolume/lucenedirs/mastercopy

# local copy location
hibernate.search.default.indexBase = /Users/prod/lucenedirs

# refresh every half hour
hibernate.search.default.refresh = 1800

# appropriate directory provider
hibernate.search.default.directory_provider = filesystem-slave

## Backend configuration
hibernate.search.default.worker.backend = jms
hibernate.search.default.worker.jms.connection_factory = /ConnectionFactory
hibernate.search.default.worker.jms.queue = queue/hibernatesearch
#optionally authentication credentials:
hibernate.search.default.worker.jms.login = myname
hibernate.search.default.worker.jms.password = wonttellyou
#optional jndi configuration (check your JMS provider for more information)

## Optional asynchronous execution strategy
# hibernate.search.default.worker.execution = async
# hibernate.search.default.worker.thread_pool.size = 2
# hibernate.search.default.worker.buffer_queue.max = 50</programlisting>
        </example>

        <tip>
          <para>A file system local copy is recommended for faster search
          results.</para>
        </tip>
      </section>

      <section>
        <title>Master node</title>

        <para>Every index update operation is taken from a JMS queue and
        executed. The master index is copied on a regular basis.</para>

        <example>
          <title>JMS Master configuration</title>

          <programlisting>### master configuration

## DirectoryProvider
# (remote) master location where information is copied to
hibernate.search.default.sourceBase = /mnt/mastervolume/lucenedirs/mastercopy

# local master location
hibernate.search.default.indexBase = /Users/prod/lucenedirs

# refresh every half hour
hibernate.search.default.refresh = 1800

# appropriate directory provider
hibernate.search.default.directory_provider = filesystem-master

## Backend configuration
#Backend is the default lucene one</programlisting>
        </example>

        <tip>
          <para>It is recommended that the refresh period be higher than the expected copy time; if a copy operation
          is still being performed when the next refresh triggers, the second refresh is skipped:
          it's safe to set this value low even when the copy time is not known.</para>
        </tip>

        <para>In addition to the Hibernate Search framework configuration, a
        Message Driven Bean has to be written and set up to process the index
        works queue through JMS.</para>

        <example>
          <title>Message Driven Bean processing the indexing queue</title>

          <programlisting language="JAVA" role="JAVA">@MessageDriven(activationConfig = {
      @ActivationConfigProperty(propertyName="destinationType", 
                                propertyValue="javax.jms.Queue"),
      @ActivationConfigProperty(propertyName="destination", 
                                propertyValue="queue/hibernatesearch"),
      @ActivationConfigProperty(propertyName="DLQMaxResent", propertyValue="1")
   } )
public class MDBSearchController extends AbstractJMSHibernateSearchController 
                                 implements MessageListener {
    @PersistenceContext EntityManager em;
    
    //method retrieving the appropriate session
    protected Session getSession() {
        return (Session) em.getDelegate();
    }

    //potentially close the session opened in #getSession(), not needed here
    protected void cleanSessionIfNeeded(Session session) 
    }
}</programlisting>
        </example>

        <para>This example inherits from the abstract JMS controller class
        available in the Hibernate Search source code and implements a JavaEE
        MDB. This implementation is given as an example and can be adjusted to
        make use of non Java EE Message Driven Beans. For more information
        about the <methodname>getSession()</methodname> and
        <methodname>cleanSessionIfNeeded()</methodname>, please check
        <classname>AbstractJMSHibernateSearchController</classname>'s
        javadoc.</para>
      </section>
    </section>

    <section id="jgroups-backend">
      <title>JGroups Master/Slave back end</title>

      <para>This section describes how to configure the JGroups Master/Slave
      back end. The configuration examples illustrated in <xref
      linkend="jms-backend"/> also apply here, only a different backend
      (<constant>hibernate.search.worker.backend</constant>) needs to be
      set.</para>

      <para>All backends configured to use JGroups share the same Channel. The
      JGroups <classname>JChannel</classname> is the main communication link
      across all nodes participating in the same cluster group; since it is
      convenient and more efficient to have just one channel shared across all
      backends, the Channel configuration properties are not defined on a
      per-worker section but globally. See <xref
      linkend="jgroups-channel-configuration"/>.</para>

      <section>
        <title>Slave nodes</title>

        <para>Every index update operation is sent through a JGroups channel
        to the master node. Index querying operations are executed on a local
        index copy. Enabling the JGroups worker only makes sure the index
        operations are sent to the master, you still have to synchronize
        configuring an appropriate directory (See
        <literal>filesystem-master</literal>,
        <literal>filesystem-slave</literal> or <literal>infinispan</literal>
        options in <xref linkend="search-configuration-directory"/>).</para>

        <example>
          <title>JGroups Slave configuration</title>

          <programlisting>### slave configuration
hibernate.search.default.worker.backend = jgroupsSlave     </programlisting>
        </example>
      </section>

      <section>
        <title>Master node</title>

        <para>Every index update operation is taken from a JGroups channel and
        executed. The master index is copied on a regular basis.</para>

        <example>
          <title>JGroups Master configuration</title>

          <programlisting>### master configuration
hibernate.search.default.worker.backend = jgroupsMaster   </programlisting>
        </example>
      </section>

      <section id="jgroups-channel-configuration">
        <title>JGroups channel configuration</title>

        <para>Configuring the JGroups channel essentially entails specifying
        the transport in terms of a network protocol stack. To configure the
        JGroups transport, point the configuration property
        <constant>hibernate.search.services.jgroups.configurationFile</constant>
        to a JGroups configuration file; this can be either a file path or a
        Java resource name.</para>

        <tip>
          <para>If no property is explicitly specified it is assumed that the
          JGroups default configuration file <literal>flush-udp.xml</literal>
          is used. This example configuration is known to work in most
          scenarios, with the notable exception of Amazon AWS; refer to the
          <ulink url="http://www.jgroups.org/manual-3.x/html/">JGroups
          manual</ulink> for more examples and protocol configuration
          details.</para>
        </tip>

        <para>The default channel name is <literal>Hibernate Search
        Cluster</literal> which can be configured as seen in <xref
        linkend="example-jgroups-channel-name"/>.</para>

        <example id="example-jgroups-channel-name">
          <title>JGroups channel name configuration</title>

          <programlisting>hibernate.search.services.jgroups.clusterName = My-Custom-Cluster-Id</programlisting>
        </example>

        <section>
          <title>JGroups channel instance injection</title>

          <para>For programmatic configurations, one additional option is
          available to configure the JGroups channel: to pass an existing
          channel instance to Hibernate Search directly using the property
          <literal>hibernate.search.services.jgroups.providedChannel</literal>,
          as shown in the following example.</para>
        </section>

        <para/>

        <programlisting language="JAVA" role="JAVA">import org.hibernate.search.backend.impl.jgroups.JGroupsChannelProvider;

org.jgroups.JChannel channel = ...
Map&lt;String,String&gt; properties = new HashMap&lt;String,String)(1);
properties.put( JGroupsChannelProvider.CHANNEL_INJECT, channel );
EntityManagerFactory emf = Persistence.createEntityManagerFactory( "userPU", properties );</programlisting>
      </section>
    </section>
  </section>

  <section id="configuration-reader-strategy">
    <title>Reader strategy configuration</title>

    <para>The different reader strategies are described in <xref
    linkend="search-architecture-readerstrategy"/>. Out of the box strategies
    are:</para>

    <itemizedlist>
      <listitem>
        <para><literal>shared</literal>: share index readers across several
        queries. This strategy is the most efficient.</para>
      </listitem>

      <listitem>
        <para><literal>not-shared</literal>: create an index reader for each
        individual query</para>
      </listitem>
    </itemizedlist>

    <para>The default reader strategy is <literal>shared</literal>. This can
    be adjusted:</para>

    <programlisting>hibernate.search.[default|&lt;indexname&gt;].reader.strategy = not-shared</programlisting>

    <para>Adding this property switches to the <literal>not-shared</literal>
    strategy.</para>

    <para>Or if you have a custom reader strategy:</para>

    <programlisting>hibernate.search.[default|&lt;indexname&gt;].reader.strategy = my.corp.myapp.CustomReaderProvider</programlisting>

    <para>where <classname>my.corp.myapp.CustomReaderProvider</classname> is
    the custom strategy implementation.</para>
  </section>

  <section id="lucene-indexing-performance" revision="3">
    <title>Tuning Lucene indexing performance</title>

    <para>Hibernate Search allows you to tune the Lucene indexing performance
    by specifying a set of parameters which are passed through to underlying
    Lucene <literal>IndexWriter</literal> such as
    <literal>mergeFactor</literal>, <literal>maxMergeDocs</literal> and
    <literal>maxBufferedDocs</literal>. You can specify these parameters
    either as default values applying for all indexes, on a per index basis,
    or even per shard.</para>

    <para>There are several low level <literal>IndexWriter</literal> settings
    which can be tuned for different use cases. These parameters are grouped
    by the <literal>indexwriter</literal> keyword: <programlisting>hibernate.search.[default|&lt;indexname&gt;].indexwriter.&lt;parameter_name&gt;</programlisting></para>

    <para>If no value is set for an <literal>indexwriter</literal> value in a
    specific shard configuration, Hibernate Search will look at the index
    section, then at the default section.<example
        id="example-performamce-option-configuration">
        <title>Example performance option configuration</title>

        <programlisting>hibernate.search.Animals.2.indexwriter.max_merge_docs = 10
hibernate.search.Animals.2.indexwriter.merge_factor = 20
hibernate.search.Animals.2.indexwriter.term_index_interval = default
hibernate.search.default.indexwriter.max_merge_docs = 100
hibernate.search.default.indexwriter.ram_buffer_size = 64</programlisting>
      </example> The configuration in <xref
    linkend="example-performamce-option-configuration"/> will result in these
    settings applied on the second shard of the <classname>Animal</classname>
    index:</para>

    <itemizedlist>
      <listitem>
        <para><literal>max_merge_docs</literal> = 10</para>
      </listitem>

      <listitem>
        <para><literal>merge_factor</literal> = 20</para>
      </listitem>

      <listitem>
        <para><literal>ram_buffer_size</literal> = 64MB</para>
      </listitem>

      <listitem>
        <para><literal>term_index_interval</literal> = Lucene default</para>
      </listitem>
    </itemizedlist>

    <para>All other values will use the defaults defined in Lucene.</para>

    <para>The default for all values is to leave them at Lucene's own default.
    The values listed in <xref linkend="table-performance-parameters"/> depend
    for this reason on the version of Lucene you are using. The values shown
    are relative to version <literal>2.4</literal>. For more information about
    Lucene indexing performance, please refer to the Lucene
    documentation.</para>

    <info>
      <para>Previous versions of Search had the notion of
      <literal>batch</literal> and <literal>transaction</literal> properties.
      This is no longer the case as the backend will always perform work using
      the same settings.</para>
    </info>

    <table id="table-performance-parameters">
      <title>List of indexing performance and behavior properties</title>

      <tgroup cols="3">
        <thead>
          <row>
            <entry align="center">Property</entry>

            <entry align="center">Description</entry>

            <entry align="center">Default Value</entry>
          </row>
        </thead>

        <tbody>
          <row>
            <entry>
              <property>hibernate.search.​[default|&lt;indexname&gt;].​exclusive_index_use</property>
            </entry>

            <entry>
              <para>Set to <literal>true</literal> when no other process will
              need to write to the same index. This will enable Hibernate
              Search to work in exlusive mode on the index and improve
              performance when writing changes to the index.</para>
            </entry>

            <entry><literal>true</literal> (improved performance, releases
            locks only at shutdown)</entry>
          </row>

          <row>
            <entry>
              <property>hibernate.search.​[default|&lt;indexname&gt;].​max_queue_length</property>
            </entry>

            <entry>
              <para>Each index has a separate "pipeline" which contains the
              updates to be applied to the index. When this queue is full
              adding more operations to the queue becomes a blocking
              operation. Configuring this setting doesn't make much sense
              unless the <literal>worker.execution</literal> is configured as
              <literal>async</literal>.</para>
            </entry>

            <entry>
              <literal>1000</literal>
            </entry>
          </row>

          <row>
            <entry>
              <property>hibernate.search.​[default|&lt;indexname&gt;].​indexwriter.max_buffered_delete_terms</property>
            </entry>

            <entry>
              <para>Determines the minimal number of delete terms required
              before the buffered in-memory delete terms are applied and
              flushed. If there are documents buffered in memory at the time,
              they are merged and a new segment is created.</para>
            </entry>

            <entry>Disabled (flushes by RAM usage)</entry>
          </row>

          <row>
            <entry>
              <property>hibernate.search.​[default|&lt;indexname&gt;].​indexwriter.max_buffered_docs</property>
            </entry>

            <entry>
              <para>Controls the amount of documents buffered in memory during
              indexing. The bigger the more RAM is consumed.</para>
            </entry>

            <entry>Disabled (flushes by RAM usage)</entry>
          </row>

          <row>
            <entry>
              <property>hibernate.search.​[default|&lt;indexname&gt;].​indexwriter.max_merge_docs</property>
            </entry>

            <entry>
              <para>Defines the largest number of documents allowed in a
              segment. Smaller values perform better on frequently changing
              indexes, larger values provide better search performance if the
              index does not change often.</para>
            </entry>

            <entry>Unlimited (Integer.MAX_VALUE)</entry>
          </row>

          <row>
            <entry>
              <property>hibernate.search.​[default|&lt;indexname&gt;].​indexwriter.merge_factor</property>
            </entry>

            <entry>
              <para>Controls segment merge frequency and size.</para>

              <para>Determines how often segment indexes are merged when
              insertion occurs. With smaller values, less RAM is used while
              indexing, and searches on unoptimized indexes are faster, but
              indexing speed is slower. With larger values, more RAM is used
              during indexing, and while searches on unoptimized indexes are
              slower, indexing is faster. Thus larger values (&gt; 10) are
              best for batch index creation, and smaller values (&lt; 10) for
              indexes that are interactively maintained. The value must not be
              lower than 2.</para>
            </entry>

            <entry>10</entry>
          </row>

          <row>
            <entry>
              <property>hibernate.search.​[default|&lt;indexname&gt;].​indexwriter.merge_min_size</property>
            </entry>

            <entry>
              <para>Controls segment merge frequency and size.</para>

              <para>Segments smaller than this size (in MB) are always
              considered for the next segment merge operation.</para>

              <para>Setting this too large might result in expensive merge
              operations, even tough they are less frequent.</para>

              <para>See also
              <classname>org.apache.lucene.index.LogDocMergePolicy</classname>.
              <literal>minMergeSize</literal>.</para>
            </entry>

            <entry>0 MB (actually ~1K)</entry>
          </row>

          <row>
            <entry>
              <property>hibernate.search.​[default|&lt;indexname&gt;].​indexwriter.merge_max_size</property>
            </entry>

            <entry>
              <para>Controls segment merge frequency and size.</para>

              <para>Segments larger than this size (in MB) are never merged in
              bigger segments.</para>

              <para>This helps reduce memory requirements and avoids some
              merging operations at the cost of optimal search speed. When
              optimizing an index this value is ignored.</para>

              <para>See also
              <classname>org.apache.lucene.index.LogDocMergePolicy</classname>.
              <literal>maxMergeSize</literal>.</para>
            </entry>

            <entry>Unlimited</entry>
          </row>

          <row>
            <entry>
              <property>hibernate.search.​[default|&lt;indexname&gt;].​indexwriter.merge_max_optimize_size</property>
            </entry>

            <entry>
              <para>Controls segment merge frequency and size.</para>

              <para>Segments larger than this size (in MB) are not merged in
              bigger segments even when optimizing the index (see
              <literal>merge_max_size</literal> setting as well).</para>

              <para>Applied to
              <classname>org.apache.lucene.index.LogDocMergePolicy</classname>.
              <literal>maxMergeSizeForOptimize</literal>.</para>
            </entry>

            <entry>Unlimited</entry>
          </row>

          <row>
            <entry>
              <property>hibernate.search.​[default|&lt;indexname&gt;].​indexwriter.merge_calibrate_by_deletes</property>
            </entry>

            <entry>
              <para>Controls segment merge frequency and size.</para>

              <para>Set to <literal>false</literal> to not consider deleted
              documents when estimating the merge policy.</para>

              <para>Applied to
              <classname>org.apache.lucene.index.LogMergePolicy</classname>.
              <literal>calibrateSizeByDeletes</literal>.</para>
            </entry>

            <entry>
              <literal>true</literal>
            </entry>
          </row>

          <row>
            <entry>
              <property>hibernate.search.​[default|&lt;indexname&gt;].​indexwriter.ram_buffer_size</property>
            </entry>

            <entry>
              <para>Controls the amount of RAM in MB dedicated to document
              buffers. When used together max_buffered_docs a flush occurs for
              whichever event happens first.</para>

              <para>Generally for faster indexing performance it's best to
              flush by RAM usage instead of document count and use as large a
              RAM buffer as you can.</para>
            </entry>

            <entry>16 MB</entry>
          </row>

          <row>
            <entry>
              <property>hibernate.search.​[default|&lt;indexname&gt;].​indexwriter.term_index_interval</property>
            </entry>

            <entry>
              <para>Expert: Set the interval between indexed terms.</para>

              <para>Large values cause less memory to be used by IndexReader,
              but slow random-access to terms. Small values cause more memory
              to be used by an IndexReader, and speed random-access to terms.
              See Lucene documentation for more details.</para>
            </entry>

            <entry>128</entry>
          </row>

          <row>
            <entry>
              <property>hibernate.search.​[default|&lt;indexname&gt;].​indexwriter.use_compound_file</property>
            </entry>

            <entry>The advantage of using the compound file format is that
            less file descriptors are used. The disadvantage is that indexing
            takes more time and temporary disk space. You can set this
            parameter to <literal>false</literal> in an attempt to improve the
            indexing time, but you could run out of file descriptors if
            <literal>mergeFactor</literal> is also large.<para>Boolean
            parameter, use "<literal>true</literal>" or
            "<literal>false</literal>". The default value for this option is
            <literal>true</literal>.</para></entry>

            <entry>true</entry>
          </row>

          <row>
            <entry>
              <property>hibernate.search.​enable_dirty_check</property>
            </entry>

            <entry>
              <para>Not all entity changes require an update of the Lucene
              index. If all of the updated entity properties (dirty
              properties) are not indexed Hibernate Search will skip the
              re-indexing work.</para>

              <para>Disable this option if you use custom
              <literal>FieldBridge</literal>s which need to be invoked at each
              update event (even though the property for which the field
              bridge is configured has not changed).</para>

              <para>This optimization will not be applied on classes using a
              <literal>@ClassBridge</literal> or a
              <literal>@DynamicBoost</literal>.</para>

              <para>Boolean parameter, use "<literal>true</literal>" or
              "<literal>false</literal>". The default value for this option is
              <literal>true</literal>.</para>
            </entry>

            <entry>true</entry>
          </row>
        </tbody>
      </tgroup>
    </table>

    <tip>
      <para>When your architecture permits it, always keep
      <literal>hibernate.search.default.exclusive_index_use=true</literal> as
      it greatly improves efficiency in index writing. This is the default
      since Hibernate Search version 4.</para>
    </tip>

    <tip>
      <para>To tune the indexing speed it might be useful to time the object
      loading from database in isolation from the writes to the index. To
      achieve this set the <literal>blackhole</literal> as worker backend and
      start your indexing routines. This backend does not disable Hibernate
      Search: it will still generate the needed changesets to the index, but
      will discard them instead of flushing them to the index. In contrast to
      setting the <literal>hibernate.search.indexing_strategy</literal> to
      <literal>manual</literal>, using <literal>blackhole</literal> will
      possibly load more data from the database. because associated entities
      are re-indexed as well.</para>

      <programlisting>hibernate.search.[default|&lt;indexname&gt;].worker.backend blackhole</programlisting>

      <para>The recommended approach is to focus first on optimizing the
      object loading, and then use the timings you achieve as a baseline to
      tune the indexing process.</para>
    </tip>

    <warning>
      <para>The <literal>blackhole</literal> backend is not meant to be used
      in production, only as a tool to identify indexing bottlenecks.</para>
    </warning>

    <section id="lucene-segment-size" revision="1">
      <title>Control segment size</title>

      <para>The options <literal>merge_max_size</literal>,
      <literal>merge_max_optimize_size</literal>,
      <literal>merge_calibrate_by_deletes</literal> give you control on the
      maximum size of the segments being created, but you need to understand
      how they affect file sizes. If you need to hard limit the size, consider
      that merging a segment is about adding it together with another existing
      segment to form a larger one, so you might want to set the
      <literal>max_size</literal> for merge operations to less than half of
      your hard limit. Also segments might initially be generated larger than
      your expected size at first creation time: before they are ever merged.
      A segment is never created much larger than
      <literal>ram_buffer_size</literal>, but the threshold is checked as an
      estimate.</para>

      <para>Example:</para>

      <programlisting>//to be fairly confident no files grow above 15MB, use:
hibernate.search.default.indexwriter.ram_buffer_size = 10
hibernate.search.default.indexwriter.merge_max_optimize_size = 7
hibernate.search.default.indexwriter.merge_max_size = 7</programlisting>
    </section>

    <tip>
      <para>When using the Infinispan Directory to cluster indexes make sure
      that your segments are smaller than the <literal>chunk_size</literal> so
      that you avoid fragmenting segments in the grid. Note that the
      <literal>chunk_size</literal> of the Infinispan Directory is expressed
      in bytes, while the index tuning options are in MB.</para>
    </tip>
  </section>

  <section id="search-configuration-directory-lockfactories" revision="1">
    <title>LockFactory configuration</title>

    <para>Lucene <classname>Directory</classname>s have default locking
    strategies which work generally good enough for most cases, but it's
    possible to specify for each index managed by Hibernate Search a specific
    <classname>LockingFactory</classname> you want to use. This is generally
    not needed but could be useful.</para>

    <para>Some of these locking strategies require a filesystem level lock and
    may be used even on RAM based indexes, this combination is valid but in
    this case the <literal>indexBase</literal> configuration option usually
    needed only for filesystem based <classname>Directory</classname>
    instances must be specified to point to a filesystem location where to
    store the lock marker files.</para>

    <para>To select a locking factory, set the
    <literal>hibernate.search.&lt;index&gt;.locking_strategy</literal> option
    to one of <literal>simple</literal>, <literal>native</literal>,
    <literal>single</literal> or <literal>none</literal>. Alternatively set it
    to the fully qualified name of an implementation of
    <literal>org.hibernate.search.store.LockFactoryProvider</literal>.</para>

    <table id="search-configuration-directory-lockfactories-table">
      <title>List of available LockFactory implementations</title>

      <tgroup cols="3">
        <thead>
          <row>
            <entry align="center">name</entry>

            <entry align="center">Class</entry>

            <entry align="center">Description</entry>
          </row>
        </thead>

        <tbody>
          <row>
            <entry><property>simple</property></entry>

            <entry>org.apache.lucene.store.​SimpleFSLockFactory</entry>

            <entry><para>Safe implementation based on Java's File API, it
            marks the usage of the index by creating a marker file.</para>
            <para>If for some reason you had to kill your application, you
            will need to remove this file before restarting it.</para></entry>
          </row>

          <row>
            <entry><property>native</property></entry>

            <entry>org.apache.lucene.store.​NativeFSLockFactory</entry>

            <entry><para>As does <literal>simple</literal> this also marks the
            usage of the index by creating a marker file, but this one is
            using native OS file locks so that even if the JVM is terminated
            the locks will be cleaned up.</para> <para>This implementation has
            known problems on NFS, avoid it on network shares.</para>
            <para><literal>native</literal> is the default implementation for
            the <literal>filesystem</literal>,
            <literal>filesystem-master</literal> and
            <literal>filesystem-slave</literal> directory
            providers.</para></entry>
          </row>

          <row>
            <entry><property>single</property></entry>

            <entry>org.apache.lucene.store.​SingleInstanceLockFactory</entry>

            <entry><para>This LockFactory doesn't use a file marker but is a
            Java object lock held in memory; therefore it's possible to use it
            only when you are sure the index is not going to be shared by any
            other process.</para> <para>This is the default implementation for
            the <literal>ram</literal> directory provider.</para></entry>
          </row>

          <row>
            <entry><property>none</property></entry>

            <entry>org.apache.lucene.store.​NoLockFactory</entry>

            <entry><para>All changes to this index are not coordinated by any
            lock; test your application carefully and make sure you know what
            it means.</para></entry>
          </row>
        </tbody>
      </tgroup>
    </table>

    <para>Configuration example:</para>

    <programlisting>hibernate.search.default.locking_strategy = simple
hibernate.search.Animals.locking_strategy = native
hibernate.search.Books.locking_strategy = org.custom.components.MyLockingFactory</programlisting>

    <para>The Infinispan Directory uses a custom implementation; it's still
    possible to override it but make sure you understand how that will work,
    especially with clustered indexes.</para>
  </section>

  <section>
    <title>Exception Handling Configuration</title>

    <para>Hibernate Search allows you to configure how exceptions are handled
    during the indexing process. If no configuration is provided then
    exceptions are logged to the log output by default. It is possible to
    explicitly declare the exception logging mechanism as seen below:</para>

    <programlisting>hibernate.search.error_handler = log</programlisting>

    <para>The default exception handling occurs for both synchronous and
    asynchronous indexing. Hibernate Search provides an easy mechanism to
    override the default error handling implementation.</para>

    <para>In order to provide your own implementation you must implement the
    <classname>ErrorHandler</classname> interface, which provides the
    <code>handle(ErrorContext context)</code> method.
    <code>ErrorContext</code> provides a reference to the primary
    <code>LuceneWork</code> instance, the underlying exception and any
    subsequent <code>LuceneWork</code> instances that could not be processed
    due to the primary exception.</para>

    <programlisting language="JAVA" role="JAVA">public interface ErrorContext {
   List&lt;LuceneWork&gt; getFailingOperations();
   LuceneWork getOperationAtFault();
   Throwable getThrowable();
   boolean hasErrors();
}</programlisting>

    <para>To register this error handler with Hibernate Search you must
    declare the fully qualified classname of your
    <classname>ErrorHandler</classname> implementation in the configuration
    properties:</para>

    <para><programlisting>hibernate.search.error_handler = CustomerErrorHandler</programlisting></para>
  </section>

  <section>
    <title>Index format compatibility</title>

    <para>While Hibernate Search strives to offer a backwards compatible API
    to make it easy to port your application to newer versions, it delegates
    to Apache Lucene to handle the index writing and searching. The Lucene
    developers too attempt to keep a stable index format, but sometimes an
    update in the index format can not be avoided; in those rare cases you
    either have to reindex all your data, or use an index upgrade tool, or
    sometimes Lucene is able to read the old format so you don't need to take
    specific actions (besides making backup of your index).</para>

    <para>While an index format incompatibility is an exceptional event, more
    often when upgrading Lucene the Analyzer implementations might slightly
    change behaviour, and this could lead to a poor recall score, possibly
    missing many hits from the results.</para>

    <para>Hibernate Search exposes a configuration property
    <literal>hibernate.search.lucene_version</literal> which instructs the
    Analyzers and other Lucene classes to conform to their behaviour as
    defined in an (older) specific version of Lucene. See also
    <classname>org.apache.lucene.util.Version</classname> contained in the
    lucene-core.jar, depending on the specific version of Lucene you're using
    you might have different options available. When this option is not
    specified, Hibernate Search will instruct Lucene to use the default of
    it's current version, which is usually the best option for new projects.
    Still it's recommended to define the version you're using explicitly in
    the configuration so that when you happen to upgrade Lucene the Analyzers
    will not change behaviour; you can then choose to update this value in a
    second time, maybe when you have the chance to rebuild the index from
    scratch.</para>

    <example>
      <title>Force Analyzers to be compatible with a Lucene 3.0 created
      index</title>

      <programlisting>hibernate.search.lucene_version = LUCENE_30</programlisting>
    </example>

    <para>This option is global for the configured
    <classname>SearchFactory</classname> and affects all Lucene APIs having
    such a parameter, as this should be applied consistently. So if you are
    also making use of Lucene bypassing Hibernate Search, make sure to apply
    the same value too.</para>
  </section>

  <section id="search-configuration-deploy-on-AS7">
    <title>How to package Hibernate Search applications for JBoss AS 7.1</title>

    <para>Provided you're deploying on JBoss AS 7.2.x or JBoss EAP 6.1, there is an additional way to add the search dependencies to your application.</para>

    <para>In JBoss AS 7 class loading is based on modules that have to define explicit dependencies on other modules.
    Modules allow to share the same artifacts across multiple applications getting you smaller and quicker deloyments.</para>

    <para>More details about modules are described in <ulink url="https://docs.jboss.org/author/display/AS72/Class+Loading+in+AS7">Class Loading in AS7</ulink>.</para>

    <para>You can download the pre-packaged Hibernate Search modules from:</para>
    <itemizedlist>
      <listitem>
        <para><ulink url="https://downloads.sourceforge.net/project/hibernate/hibernate-search/&version;/hibernate-search-modules-&version;-jbossas-72-dist.zip">Sourceforge</ulink></para>
      </listitem>
      <listitem>
        <para>Maven: <ulink url="https://repository.jboss.org/nexus/index.html#nexus-search;gav~org.hibernate~hibernate-search-modules~~~">org.hibernate:hibernate-search-modules-&version;-jbossas-72-dist:zip</ulink></para>
      </listitem>
    </itemizedlist>

    <para>Unpack the modules in your JBoss AS <literal>modules</literal> directory: this will create modules for Hibernate Search, Apache Lucene and some useful Solr libraries. The Hibernate Search modules are:</para>

    <itemizedlist>
      <listitem>
        <para><emphasis>org.hibernate.search.orm:main</emphasis>, for users of Hibernate Search with Hibernate; this will transitively include Hibernate ORM.</para>
      </listitem>
      <listitem>
        <para><emphasis>org.hibernate.search.engine:main</emphasis>, for projects depending on the internal indexing engine that don't require other dependencies to Hibernate.</para>
      </listitem>
    </itemizedlist>

    <para>There are two ways to include the dependencies in your project:</para>

    <variablelist>
      <varlistentry>
        <term>Using the manifest</term>
        <listitem>
          <para>Add this entry to the MANIFEST.MF in your archive:</para>
          <programlisting>Dependencies: org.hibernate.search.orm services</programlisting>
        </listitem>
      </varlistentry>
      <varlistentry>
        <term>Using jboss-deployment-structure.xml</term>
        <listitem>
          <para>This is a proprietary JBoss AS descriptor, add a WEB-INF/jboss-deployment-structure.xml in your archive with content:</para>
          <programlisting language="XML" role="XML">&lt;jboss-deployment-structure&gt;
      &lt;deployment&gt;
          &lt;dependencies&gt;
              &lt;module name="org.hibernate.search.orm" services="export" /&gt;
          &lt;/dependencies&gt;
      &lt;/deployment&gt;
  &lt;/jboss-deployment-structure&gt;
          </programlisting>
          <para>More information about the descriptor can be found in the  <ulink url="https://docs.jboss.org/author/display/AS72/Class+Loading+in+AS7">JBoss AS 7 documentation</ulink>.</para>
        </listitem>
      </varlistentry>
    </variablelist>
  </section>
</chapter>