Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Fetching contributors…

Cannot retrieve contributors at this time

executable file 346 lines (299 sloc) 16.095 kb
<?xml version="1.0" encoding="UTF-8"?>
<!--
~ Hibernate, Relational Persistence for Idiomatic Java
~
~ Copyright (c) 2010, Red Hat, Inc. and/or its affiliates or third-party contributors as
~ indicated by the @author tags or express copyright attribution
~ statements applied by the authors. All third-party contributions are
~ distributed under license by Red Hat, Inc.
~
~ This copyrighted material is made available to anyone wishing to use, modify,
~ copy, or redistribute it subject to the terms and conditions of the GNU
~ Lesser General Public License, as published by the Free Software Foundation.
~
~ This program is distributed in the hope that it will be useful,
~ but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
~ or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
~ for more details.
~
~ You should have received a copy of the GNU Lesser General Public License
~ along with this distribution; if not, write to:
~ Free Software Foundation, Inc.
~ 51 Franklin Street, Fifth Floor
~ Boston, MA 02110-1301 USA
-->
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
"http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [
<!ENTITY % BOOK_ENTITIES SYSTEM "../hsearch.ent">
%BOOK_ENTITIES;
]>
<chapter id="manual-index-changes">
<title>Manual index changes</title>
<para>As Hibernate core applies changes to the Database, Hibernate Search
detects these changes and will update the index automatically (unless the
EventListeners are disabled). Sometimes changes are made to the database
without using Hibernate, as when backup is restored or your data is
otherwise affected; for these cases Hibernate Search exposes the Manual
Index APIs to explicitly update or remove a single entity from the index, or
rebuild the index for the whole database, or remove all references to a
specific type.</para>
<para>All these methods affect the Lucene Index only, no changes are applied
to the Database.</para>
<section>
<title>Adding instances to the index</title>
<para>Using <classname>FullTextSession</classname>.<methodname>index(T
entity)</methodname> you can directly add or update a specific object
instance to the index. If this entity was already indexed, then the index
will be updated. Changes to the index are only applied at transaction
commit.</para>
<example>
<title>Indexing an entity via <methodname>FullTextSession.index(T
entity)</methodname></title>
<programlisting language="JAVA" role="JAVA">FullTextSession fullTextSession = Search.getFullTextSession(session);
Transaction tx = fullTextSession.beginTransaction();
Object customer = fullTextSession.load( Customer.class, 8 );
<emphasis role="bold">fullTextSession.index(customer);</emphasis>
tx.commit(); //index only updated at commit time</programlisting>
</example>
<para>In case you want to add all instances for a type, or for all indexed
types, the recommended approach is to use a
<classname>MassIndexer</classname>: see <xref
linkend="search-batchindex-massindexer"/> for more details.</para>
<para>The method <methodname>FullTextSession.index(T
entity)</methodname> is considered an explicit indexing operation, so any
registered <classname>EntityIndexingInterceptor</classname> won't be applied
in this case. For more information on <classname>EntityIndexingInterceptor</classname>
see <xref linkend="search-mapping-indexinginterceptor"/>.</para>
</section>
<section>
<title>Deleting instances from the index</title>
<para>It is equally possible to remove an entity or all entities of a
given type from a Lucene index without the need to physically remove them
from the database. This operation is named purging and is also done
through the <classname>FullTextSession</classname>.</para>
<example>
<title>Purging a specific instance of an entity from the index</title>
<programlisting language="JAVA" role="JAVA">FullTextSession fullTextSession = Search.getFullTextSession(session);
Transaction tx = fullTextSession.beginTransaction();
for (Customer customer : customers) {
<emphasis role="bold">fullTextSession.purge( Customer.class, customer.getId() );</emphasis>
}
tx.commit(); //index is updated at commit time</programlisting>
</example>
<para>Purging will remove the entity with the given id from the Lucene
index but will not touch the database.</para>
<para>If you need to remove all entities of a given type, you can use the
<methodname>purgeAll</methodname> method. This operation removes all
entities of the type passed as a parameter as well as all its
subtypes.</para>
<example>
<title>Purging all instances of an entity from the index</title>
<programlisting language="JAVA" role="JAVA">FullTextSession fullTextSession = Search.getFullTextSession(session);
Transaction tx = fullTextSession.beginTransaction();
<emphasis role="bold">fullTextSession.purgeAll( Customer.class );</emphasis>
//optionally optimize the index
//fullTextSession.getSearchFactory().optimize( Customer.class );
tx.commit(); //index changes are applied at commit time </programlisting>
</example>
<para>As in the previous example, it is suggested to optimize the index
after many purge operation to actually free the used space.</para>
<para>As is the case with method <methodname>FullTextSession.index(T
entity)</methodname>, also <methodname>purge</methodname> and
<methodname>purgeAll</methodname> are considered explicit indexinging
operations: any registered <classname>EntityIndexingInterceptor</classname>
won't be applied. For more information on <classname>EntityIndexingInterceptor</classname>
see <xref linkend="search-mapping-indexinginterceptor"/>.</para>
<note>
<para>Methods <methodname>index</methodname>,
<methodname>purge</methodname> and <methodname>purgeAll</methodname> are
available on <classname>FullTextEntityManager</classname> as
well.</para>
</note>
<note>
<para>All manual indexing methods (<methodname>index</methodname>,
<methodname>purge</methodname> and <methodname>purgeAll</methodname>)
only affect the index, not the database, nevertheless they are
transactional and as such they won't be applied until the transaction is
successfully committed, or you make use of
<methodname>flushToIndexes</methodname>.</para>
</note>
</section>
<section id="search-batchindex">
<title>Rebuilding the whole index</title>
<para>If you change the entity mapping to the index, chances are that the
whole Index needs to be updated; For example if you decide to index a an
existing field using a different analyzer you'll need to rebuild the index
for affected types. Also if the Database is replaced (like restored from a
backup, imported from a legacy system) you'll want to be able to rebuild
the index from existing data. Hibernate Search provides two main
strategies to choose from:</para>
<itemizedlist>
<listitem>
<para>Using
<classname>FullTextSession</classname>.<methodname>flushToIndexes()</methodname>
periodically, while using
<classname>FullTextSession</classname>.<methodname>index()</methodname>
on all entities.</para>
</listitem>
<listitem>
<para>Use a <classname>MassIndexer</classname>.</para>
</listitem>
</itemizedlist>
<section id="search-batchindex-flushtoindexes">
<title>Using flushToIndexes()</title>
<para>This strategy consists in removing the existing index and then
adding all entities back to the index using
<classname>FullTextSession</classname>.<methodname>purgeAll()</methodname>
and
<classname>FullTextSession</classname>.<methodname>index()</methodname>,
however there are some memory and efficiency contraints. For maximum
efficiency Hibernate Search batches index operations and executes them
at commit time. If you expect to index a lot of data you need to be
careful about memory consumption since all documents are kept in a queue
until the transaction commit. You can potentially face an
<classname>OutOfMemoryException</classname> if you don't empty the queue
periodically: to do this you can use
<methodname>fullTextSession.flushToIndexes()</methodname>. Every time
<methodname>fullTextSession.flushToIndexes()</methodname> is called (or
if the transaction is committed), the batch queue is processed applying
all index changes. Be aware that, once flushed, the changes cannot be
rolled back.</para>
<example>
<title>Index rebuilding using index() and flushToIndexes()</title>
<programlisting language="JAVA" role="JAVA">fullTextSession.setFlushMode(FlushMode.MANUAL);
fullTextSession.setCacheMode(CacheMode.IGNORE);
transaction = fullTextSession.beginTransaction();
//Scrollable results will avoid loading too many objects in memory
ScrollableResults results = fullTextSession.createCriteria( Email.class )
.setFetchSize(BATCH_SIZE)
.scroll( ScrollMode.FORWARD_ONLY );
int index = 0;
while( results.next() ) {
index++;
fullTextSession.index( results.get(0) ); //index each element
if (index % BATCH_SIZE == 0) {
fullTextSession.flushToIndexes(); //apply changes to indexes
fullTextSession.clear(); //free memory since the queue is processed
}
}
transaction.commit();</programlisting>
</example>
<para>Try to use a batch size that guarantees that your application will
not run out of memory: with a bigger batch size objects are fetched
faster from database but more memory is needed.</para>
</section>
<section id="search-batchindex-massindexer">
<title>Using a MassIndexer</title>
<para>Hibernate Search's <classname>MassIndexer</classname> uses several
parallel threads to rebuild the index; you can optionally select which
entities need to be reloaded or have it reindex all entities. This
approach is optimized for best performance but requires to set the
application in maintenance mode: making queries to the index is not
recommended when a MassIndexer is busy.</para>
<example>
<title>Index rebuilding using a MassIndexer</title>
<programlisting language="JAVA" role="JAVA">fullTextSession.createIndexer().startAndWait();</programlisting>
</example>
<para>This will rebuild the index, deleting it and then reloading all
entities from the database. Although it's simple to use, some tweaking
is recommended to speed up the process: there are several parameters
configurable.</para>
<warning>
<para>During the progress of a MassIndexer the content of the index is
undefined! If a query is performed while the MassIndexer is working
most likely some results will be missing.</para>
</warning>
<example>
<title>Using a tuned MassIndexer</title>
<programlisting language="JAVA" role="JAVA">fullTextSession
.createIndexer( User.class )
.batchSizeToLoadObjects( 25 )
.cacheMode( CacheMode.NORMAL )
.threadsToLoadObjects( 5 )
.idFetchSize( 150 )
.threadsForSubsequentFetching( 20 )
.progressMonitor( monitor ) //a MassIndexerProgressMonitor implementation
.startAndWait();</programlisting>
</example>
<para>This will rebuild the index of all User instances (and subtypes),
and will create 5 parallel threads to load the User instances using
batches of 25 objects per query; these loaded User instances are then
pipelined to 20 parallel threads to load the attached lazy collections
of User containing some information needed for the index. The number of
threads working on actual index writing is defined by the backend
configuration of each index. See the option
<literal>worker.thread_pool.size</literal> in <xref
linkend="table-work-execution-configuration"/>.</para>
<para>It is recommended to leave cacheMode to
<literal>CacheMode.IGNORE</literal> (the default), as in most reindexing
situations the cache will be a useless additional overhead; it might be
useful to enable some other <literal>CacheMode</literal> depending on
your data: it might increase performance if the main entity is relating
to enum-like data included in the index.</para>
<tip>
<para>The "sweet spot" of number of threads to achieve best
performance is highly dependent on your overall architecture, database
design and even data values. To find out the best number of threads
for your application it is recommended to use a profiler: all internal
thread groups have meaningful names to be easily identified with most
tools.</para>
</tip>
<note>
<para>The MassIndexer was designed for speed and is unaware of
transactions, so there is no need to begin one or committing. Also
because it is not transactional it is not recommended to let users use
the system during it's processing, as it is unlikely people will be
able to find results and the system load might be too high
anyway.</para>
</note>
</section>
<section>
<title>Useful parameters for batch indexing</title>
<para>Other parameters which affect indexing time and memory consumption
are:</para>
<itemizedlist>
<listitem>
<para><literal>hibernate.search.[default|&lt;indexname&gt;].exclusive_index_use</literal></para>
</listitem>
<listitem>
<para><literal>hibernate.search.[default|&lt;indexname&gt;].indexwriter.max_buffered_docs</literal></para>
</listitem>
<listitem>
<para><literal>hibernate.search.[default|&lt;indexname&gt;].indexwriter.max_merge_docs</literal></para>
</listitem>
<listitem>
<para><literal>hibernate.search.[default|&lt;indexname&gt;].indexwriter.merge_factor</literal></para>
</listitem>
<listitem>
<para><literal>hibernate.search.[default|&lt;indexname&gt;].indexwriter.merge_min_size</literal></para>
</listitem>
<listitem>
<para><literal>hibernate.search.[default|&lt;indexname&gt;].indexwriter.merge_max_size</literal></para>
</listitem>
<listitem>
<para><literal>hibernate.search.[default|&lt;indexname&gt;].indexwriter.merge_max_optimize_size</literal></para>
</listitem>
<listitem>
<para><literal>hibernate.search.[default|&lt;indexname&gt;].indexwriter.merge_calibrate_by_deletes</literal></para>
</listitem>
<listitem>
<para><literal>hibernate.search.[default|&lt;indexname&gt;].indexwriter.ram_buffer_size</literal></para>
</listitem>
<listitem>
<para><literal>hibernate.search.[default|&lt;indexname&gt;].indexwriter.term_index_interval</literal></para>
</listitem>
</itemizedlist>
<para>Previous versions also had a <literal>max_field_length</literal> but
this was removed from Lucene, it's possible to obtain a similar effect by
using a <classname>LimitTokenCountAnalyzer</classname>.</para>
<para>All <literal>.indexwriter</literal> parameters are Lucene specific
and Hibernate Search is just passing these parameters through - see <xref
linkend="lucene-indexing-performance"/> for more details.</para>
<para>The <classname>MassIndexer</classname> uses a forward only
scrollable result to iterate on the primary keys to be loaded, but MySQL's
JDBC driver will load all values in memory; to avoid this "optimisation"
set <literal>idFetchSize</literal> to
<literal>Integer.MIN_VALUE</literal>.</para>
</section>
</section>
</chapter>
Jump to Line
Something went wrong with that request. Please try again.