/
batchindex.xml
executable file
·351 lines (296 loc) · 15.4 KB
/
batchindex.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
<?xml version="1.0" encoding="UTF-8"?>
<!--
~ Hibernate, Relational Persistence for Idiomatic Java
~
~ Copyright (c) 2010, Red Hat, Inc. and/or its affiliates or third-party contributors as
~ indicated by the @author tags or express copyright attribution
~ statements applied by the authors. All third-party contributions are
~ distributed under license by Red Hat, Inc.
~
~ This copyrighted material is made available to anyone wishing to use, modify,
~ copy, or redistribute it subject to the terms and conditions of the GNU
~ Lesser General Public License, as published by the Free Software Foundation.
~
~ This program is distributed in the hope that it will be useful,
~ but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
~ or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
~ for more details.
~
~ You should have received a copy of the GNU Lesser General Public License
~ along with this distribution; if not, write to:
~ Free Software Foundation, Inc.
~ 51 Franklin Street, Fifth Floor
~ Boston, MA 02110-1301 USA
-->
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
"http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [
<!ENTITY % BOOK_ENTITIES SYSTEM "../hsearch.ent">
%BOOK_ENTITIES;
]>
<chapter id="manual-index-changes">
<title>Manual index changes</title>
<para>As Hibernate core applies changes to the Database, Hibernate Search
detects these changes and will update the index automatically (unless the
EventListeners are disabled). Sometimes changes are made to the database
without using Hibernate, as when backup is restored or your data is
otherwise affected; for these cases Hibernate Search exposes the Manual
Index APIs to explicitly update or remove a single entity from the index, or
rebuild the index for the whole database, or remove all references to a
specific type.</para>
<para>All these methods affect the Lucene Index only, no changes are applied
to the Database.</para>
<section>
<title>Adding instances to the index</title>
<para>Using <classname>FullTextSession</classname>.<methodname>index(T
entity)</methodname> you can directly add or update a specific object
instance to the index. If this entity was already indexed, then the index
will be updated. Changes to the index are only applied at transaction
commit.</para>
<example>
<title>Indexing an entity via <methodname>FullTextSession.index(T
entity)</methodname></title>
<programlisting language="JAVA" role="JAVA">FullTextSession fullTextSession = Search.getFullTextSession(session);
Transaction tx = fullTextSession.beginTransaction();
Object customer = fullTextSession.load( Customer.class, 8 );
<emphasis role="bold">fullTextSession.index(customer);</emphasis>
tx.commit(); //index only updated at commit time</programlisting>
</example>
<para>In case you want to add all instances for a type, or for all indexed
types, the recommended approach is to use a
<classname>MassIndexer</classname>: see <xref
linkend="search-batchindex-massindexer"/> for more details.</para>
<para>The method <methodname>FullTextSession.index(T
entity)</methodname> is considered an explicit indexing operation, so any
registered <classname>EntityIndexingInterceptor</classname> won't be applied
in this case. For more information on <classname>EntityIndexingInterceptor</classname>
see <xref linkend="search-mapping-indexinginterceptor"/>.</para>
</section>
<section>
<title>Deleting instances from the index</title>
<para>It is equally possible to remove an entity or all entities of a
given type from a Lucene index without the need to physically remove them
from the database. This operation is named purging and is also done
through the <classname>FullTextSession</classname>.</para>
<example>
<title>Purging a specific instance of an entity from the index</title>
<programlisting language="JAVA" role="JAVA">FullTextSession fullTextSession = Search.getFullTextSession(session);
Transaction tx = fullTextSession.beginTransaction();
for (Customer customer : customers) {
<emphasis role="bold">fullTextSession.purge( Customer.class, customer.getId() );</emphasis>
}
tx.commit(); //index is updated at commit time</programlisting>
</example>
<para>Purging will remove the entity with the given id from the Lucene
index but will not touch the database.</para>
<para>If you need to remove all entities of a given type, you can use the
<methodname>purgeAll</methodname> method. This operation removes all
entities of the type passed as a parameter as well as all its
subtypes.</para>
<example>
<title>Purging all instances of an entity from the index</title>
<programlisting language="JAVA" role="JAVA">FullTextSession fullTextSession = Search.getFullTextSession(session);
Transaction tx = fullTextSession.beginTransaction();
<emphasis role="bold">fullTextSession.purgeAll( Customer.class );</emphasis>
//optionally optimize the index
//fullTextSession.getSearchFactory().optimize( Customer.class );
tx.commit(); //index changes are applied at commit time </programlisting>
</example>
<para>As in the previous example, it is suggested to optimize the index
after many purge operation to actually free the used space.</para>
<para>As is the case with method <methodname>FullTextSession.index(T
entity)</methodname>, also <methodname>purge</methodname> and
<methodname>purgeAll</methodname> are considered explicit indexinging
operations: any registered <classname>EntityIndexingInterceptor</classname>
won't be applied. For more information on <classname>EntityIndexingInterceptor</classname>
see <xref linkend="search-mapping-indexinginterceptor"/>.</para>
<note>
<para>Methods <methodname>index</methodname>,
<methodname>purge</methodname> and <methodname>purgeAll</methodname> are
available on <classname>FullTextEntityManager</classname> as
well.</para>
</note>
<note>
<para>All manual indexing methods (<methodname>index</methodname>,
<methodname>purge</methodname> and <methodname>purgeAll</methodname>)
only affect the index, not the database, nevertheless they are
transactional and as such they won't be applied until the transaction is
successfully committed, or you make use of
<methodname>flushToIndexes</methodname>.</para>
</note>
</section>
<section id="search-batchindex">
<title>Rebuilding the whole index</title>
<para>If you change the entity mapping to the index, chances are that the
whole Index needs to be updated; For example if you decide to index a an
existing field using a different analyzer you'll need to rebuild the index
for affected types. Also if the Database is replaced (like restored from a
backup, imported from a legacy system) you'll want to be able to rebuild
the index from existing data. Hibernate Search provides two main
strategies to choose from:</para>
<itemizedlist>
<listitem>
<para>Using
<classname>FullTextSession</classname>.<methodname>flushToIndexes()</methodname>
periodically, while using
<classname>FullTextSession</classname>.<methodname>index()</methodname>
on all entities.</para>
</listitem>
<listitem>
<para>Use a <classname>MassIndexer</classname>.</para>
</listitem>
</itemizedlist>
<section id="search-batchindex-flushtoindexes">
<title>Using flushToIndexes()</title>
<para>This strategy consists in removing the existing index and then
adding all entities back to the index using
<classname>FullTextSession</classname>.<methodname>purgeAll()</methodname>
and
<classname>FullTextSession</classname>.<methodname>index()</methodname>,
however there are some memory and efficiency contraints. For maximum
efficiency Hibernate Search batches index operations and executes them
at commit time. If you expect to index a lot of data you need to be
careful about memory consumption since all documents are kept in a queue
until the transaction commit. You can potentially face an
<classname>OutOfMemoryException</classname> if you don't empty the queue
periodically: to do this you can use
<methodname>fullTextSession.flushToIndexes()</methodname>. Every time
<methodname>fullTextSession.flushToIndexes()</methodname> is called (or
if the transaction is committed), the batch queue is processed applying
all index changes. Be aware that, once flushed, the changes cannot be
rolled back.</para>
<example>
<title>Index rebuilding using index() and flushToIndexes()</title>
<programlisting language="JAVA" role="JAVA">fullTextSession.setFlushMode(FlushMode.MANUAL);
fullTextSession.setCacheMode(CacheMode.IGNORE);
transaction = fullTextSession.beginTransaction();
//Scrollable results will avoid loading too many objects in memory
ScrollableResults results = fullTextSession.createCriteria( Email.class )
.setFetchSize(BATCH_SIZE)
.scroll( ScrollMode.FORWARD_ONLY );
int index = 0;
while( results.next() ) {
index++;
fullTextSession.index( results.get(0) ); //index each element
if (index % BATCH_SIZE == 0) {
fullTextSession.flushToIndexes(); //apply changes to indexes
fullTextSession.clear(); //free memory since the queue is processed
}
}
transaction.commit();</programlisting>
</example>
<para>Try to use a batch size that guarantees that your application will
not run out of memory: with a bigger batch size objects are fetched
faster from database but more memory is needed.</para>
</section>
<section id="search-batchindex-massindexer">
<title>Using a MassIndexer</title>
<para>Hibernate Search's <classname>MassIndexer</classname> uses several
parallel threads to rebuild the index; you can optionally select which
entities need to be reloaded or have it reindex all entities. This
approach is optimized for best performance but requires to set the
application in maintenance mode: making queries to the index is not
recommended when a MassIndexer is busy.</para>
<example>
<title>Index rebuilding using a MassIndexer</title>
<programlisting language="JAVA" role="JAVA">fullTextSession.createIndexer().startAndWait();</programlisting>
</example>
<para>This will rebuild the index, deleting it and then reloading all
entities from the database. Although it's simple to use, some tweaking
is recommended to speed up the process: there are several parameters
configurable.</para>
<warning>
<para>During the progress of a MassIndexer the content of the index is
undefined! If a query is performed while the MassIndexer is working
most likely some results will be missing.</para>
</warning>
<example>
<title>Using a tuned MassIndexer</title>
<programlisting language="JAVA" role="JAVA">fullTextSession
.createIndexer( User.class )
.batchSizeToLoadObjects( 25 )
.cacheMode( CacheMode.NORMAL )
.threadsToLoadObjects( 5 )
.idFetchSize( 150 )
.threadsForSubsequentFetching( 20 )
.progressMonitor( monitor ) //a MassIndexerProgressMonitor implementation
.startAndWait();</programlisting>
</example>
<para>This will rebuild the index of all User instances (and subtypes),
and will create 5 parallel threads to load the User instances using
batches of 25 objects per query; these loaded User instances are then
pipelined to 20 parallel threads to load the attached lazy collections
of User containing some information needed for the index. The number of
threads working on actual index writing is defined by the backend
configuration of each index. See the option
<literal>worker.thread_pool.size</literal> in <xref
linkend="table-work-execution-configuration"/>.</para>
<para>It is recommended to leave cacheMode to
<literal>CacheMode.IGNORE</literal> (the default), as in most reindexing
situations the cache will be a useless additional overhead; it might be
useful to enable some other <literal>CacheMode</literal> depending on
your data: it might increase performance if the main entity is relating
to enum-like data included in the index.</para>
<tip>
<para>The "sweet spot" of number of threads to achieve best
performance is highly dependent on your overall architecture, database
design and even data values. To find out the best number of threads
for your application it is recommended to use a profiler: all internal
thread groups have meaningful names to be easily identified with most
tools.</para>
</tip>
<note>
<para>The MassIndexer was designed for speed and is unaware of
transactions, so there is no need to begin one or committing. Also
because it is not transactional it is not recommended to let users use
the system during it's processing, as it is unlikely people will be
able to find results and the system load might be too high
anyway.</para>
</note>
</section>
<para>Other parameters which affect indexing time and memory consumption
are:</para>
<itemizedlist>
<listitem>
<literal>hibernate.search.[default|<indexname>].exclusive_index_use</literal>
</listitem>
<listitem>
<literal>hibernate.search.[default|<indexname>].indexwriter.max_buffered_docs</literal>
</listitem>
<listitem>
<literal>hibernate.search.[default|<indexname>].indexwriter.max_merge_docs</literal>
</listitem>
<listitem>
<literal>hibernate.search.[default|<indexname>].indexwriter.merge_factor</literal>
</listitem>
<listitem>
<literal>hibernate.search.[default|<indexname>].indexwriter.merge_min_size</literal>
</listitem>
<listitem>
<literal>hibernate.search.[default|<indexname>].indexwriter.merge_max_size</literal>
</listitem>
<listitem>
<literal>hibernate.search.[default|<indexname>].indexwriter.merge_max_optimize_size</literal>
</listitem>
<listitem>
<literal>hibernate.search.[default|<indexname>].indexwriter.merge_calibrate_by_deletes</literal>
</listitem>
<listitem>
<literal>hibernate.search.[default|<indexname>].indexwriter.ram_buffer_size</literal>
</listitem>
<listitem>
<literal>hibernate.search.[default|<indexname>].indexwriter.term_index_interval</literal>
</listitem>
</itemizedlist>
<para>Previous versions also had a <literal>max_field_length</literal> but
this was removed from Lucene, it's possible to obtain a similar effect by
using a <classname>LimitTokenCountAnalyzer</classname>.</para>
<para>All <literal>.indexwriter</literal> parameters are Lucene specific
and Hibernate Search is just passing these parameters through - see <xref
linkend="lucene-indexing-performance"/> for more details.</para>
<para>The <classname>MassIndexer</classname> uses a forward only
scrollable result to iterate on the primary keys to be loaded, but MySQL's
JDBC driver will load all values in memory; to avoid this "optimisation"
set <literal>idFetchSize</literal> to
<literal>Integer.MIN_VALUE</literal>.</para>
</section>
</chapter>