/
configuration.xml
executable file
·1717 lines (1402 loc) · 75.9 KB
/
configuration.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<?xml version="1.0" encoding="UTF-8"?>
<!--
~ Hibernate, Relational Persistence for Idiomatic Java
~
~ Copyright (c) 2010, Red Hat, Inc. and/or its affiliates or third-party contributors as
~ indicated by the @author tags or express copyright attribution
~ statements applied by the authors. All third-party contributions are
~ distributed under license by Red Hat, Inc.
~
~ This copyrighted material is made available to anyone wishing to use, modify,
~ copy, or redistribute it subject to the terms and conditions of the GNU
~ Lesser General Public License, as published by the Free Software Foundation.
~
~ This program is distributed in the hope that it will be useful,
~ but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
~ or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
~ for more details.
~
~ You should have received a copy of the GNU Lesser General Public License
~ along with this distribution; if not, write to:
~ Free Software Foundation, Inc.
~ 51 Franklin Street, Fifth Floor
~ Boston, MA 02110-1301 USA
-->
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
"http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [
<!ENTITY % BOOK_ENTITIES SYSTEM "../hsearch.ent">
%BOOK_ENTITIES;
]>
<chapter id="search-configuration">
<title>Configuration</title>
<section id="search-configuration-event" revision="2">
<title>Enabling Hibernate Search and automatic indexing</title>
<para>Let's start with the most basic configuration question - how do I
enable Hibernate Search?</para>
<section>
<title>Enabling Hibernate Search</title>
<para>The good news is that Hibernate Search is enabled out of the box
when detected on the classpath by Hibernate Core. If, for some reason
you need to disable it, set
<literal>hibernate.search.autoregister_listeners</literal> to
<constant>false</constant>. Note that there is no performance penalty
when the listeners are enabled but no entities are annotated as
indexed.</para>
</section>
<section>
<title>Automatic indexing</title>
<para>By default, every time an object is inserted, updated or deleted
through Hibernate, Hibernate Search updates the according Lucene index.
It is sometimes desirable to disable that features if either your index
is read-only or if index updates are done in a batch way (see <xref
linkend="search-batchindex"/>).</para>
<para>To disable event based indexing, set</para>
<programlisting>hibernate.search.indexing_strategy = manual</programlisting>
<note>
<para>In most case, the JMS backend provides the best of both world, a
lightweight event based system keeps track of all changes in the
system, and the heavyweight indexing process is done by a separate
process or machine.</para>
</note>
</section>
</section>
<section id="configuration-indexmanager">
<title>Configuring the <classname>IndexManager</classname></title>
<para>The role of the index manager component is described in <xref
linkend="search-architecture"/>. Hibernate Search provides two possible
implementations for this interface to choose from.</para>
<itemizedlist>
<listitem>
<para><literal>directory-based</literal>: the default implementation
which uses the Lucene <classname>Directory</classname> abstraction to
manage index files.</para>
</listitem>
<listitem>
<para><literal>near-real-time</literal>: avoid flushing writes to disk
at each commit. This index manager is also
<classname>Directory</classname> based, but also makes uses of
Lucene's NRT functionallity.</para>
</listitem>
</itemizedlist>
<para>To select an alternative you specify the property:</para>
<programlisting>hibernate.search.[default|<indexname>].indexmanager = near-real-time</programlisting>
<section>
<title><literal>directory-based</literal></title>
<para>The default <classname>IndexManager</classname> implementation.
This is the one mostly referred to in this documentation. It is highly
configurable and allows you to select different settings for the reader
strategy, back ends and directory providers. Refer to <xref
linkend="search-configuration-directory"/>, <xref
linkend="configuration-worker"/> and <xref
linkend="configuration-reader-strategy"/> for more details.</para>
</section>
<section>
<title><literal>near-real-time</literal></title>
<para>The <classname>NRTIndexManager</classname> is an extension of the
default <classname>IndexManager</classname>, leveraging the Lucene NRT
(Near Real Time) features for extreme low latency index writes. As a
tradeoff it requires a non-clustered and non-shared index. In other
words, it will ignore configuration settings for alternative back ends
other than <literal>lucene</literal> and will acquire exclusive write
locks on the <classname>Directory</classname>.</para>
<para>To achieve this low latency writes, the
<classname>IndexWriter</classname> will not flush every change to disk.
Queries will be allowed to read updated state from the unflushed index
writer buffers; the downside of this strategy is that if the application
crashes or the <classname>IndexWriter</classname> is otherwise killed
you'll have to rebuild the indexes as some updates might be lost.</para>
<para>Because of these downsides, and because a master node in cluster
can be configured for good performance as well, the NRT configuration is
only recommended for non clustered websites with a limited amount of
data.</para>
</section>
<section>
<title>Custom</title>
<para>It is also possible to configure a custom
<classname>IndexManager</classname> implementation by specifying the
fully qualified class name of your custom implementation. This
implementation must have a no-argument constructor:<programlisting>hibernate.search.[default|<indexname>].indexmanager = my.corp.myapp.CustomIndexManager</programlisting></para>
<tip>
<para>Your custom index manager implementation doesn't need to use the
same components as the default implementations. For example, you can
delegate to a remote indexing service which doesn't expose a
<classname>Directory</classname> interface.</para>
</tip>
</section>
</section>
<section id="search-configuration-directory" revision="1">
<title>Directory configuration</title>
<para>As we have seen in <xref linkend="configuration-indexmanager"/> the
default index manager uses Lucene's notion of a
<classname>Directory</classname> to store the index files. The
<classname>Directory</classname> implementation can be customized and
Lucene comes bundled with a file system and an in-memory implementation.
<classname>DirectoryProvider</classname> is the Hibernate Search
abstraction around a Lucene <classname>Directory</classname> and handles
the configuration and the initialization of the underlying Lucene
resources. <xref linkend="directory-provider-table"/> shows the list of
the directory providers available in Hibernate Search together with their
corresponding options.</para>
<para>To configure your <classname>DirectoryProvider</classname> you have
to understand that each indexed entity is associated to a Lucene index
(except of the case where multiple entities share the same index - <xref
linkend="section-sharing-indexes"/>). The name of the index is given by
the <constant>index</constant> property of the
<classname>@Indexed</classname> annotation. If the
<constant>index</constant> property is not specified the fully qualified
name of the indexed class will be used as name (recommended).</para>
<para>Knowing the index name, you can configure the directory provider and
any additional options by using the prefix
<constant>hibernate.search.</constant><replaceable><indexname></replaceable>.
The name <constant>default</constant>
(<constant>hibernate.search.default</constant>) is reserved and can be
used to define properties which apply to all indexes. <xref
linkend="example-configuring-directory-providers"/> shows how
<constant>hibernate.search.default.directory_provider</constant> is used
to set the default directory provider to be the filesystem one.
<constant>hibernate.search.default.indexBase</constant> sets then the
default base directory for the indexes. As a result the index for the
entity <classname>Status</classname> is created in
<filename>/usr/lucene/indexes/org.hibernate.example.Status</filename>.</para>
<para>The index for the <classname>Rule</classname> entity, however, is
using an in-memory directory, because the default directory provider for
this entity is overriden by the property
<constant>hibernate.search.Rules.directory_provider</constant>.</para>
<para>Finally the <classname>Action</classname> entity uses a custom
directory provider <classname>CustomDirectoryProvider</classname>
specified via
<constant>hibernate.search.Actions.directory_provider</constant>.</para>
<example>
<title>Specifying the index name</title>
<programlisting language="JAVA" role="JAVA">package org.hibernate.example;
@Indexed
public class Status { ... }
@Indexed(index="Rules")
public class Rule { ... }
@Indexed(index="Actions")
public class Action { ... }</programlisting>
</example>
<example id="example-configuring-directory-providers">
<title>Configuring directory providers</title>
<programlisting>hibernate.search.default.directory_provider = filesystem
hibernate.search.default.indexBase = /usr/lucene/indexes
hibernate.search.Rules.directory_provider = ram
hibernate.search.Actions.directory_provider = com.acme.hibernate.CustomDirectoryProvider</programlisting>
</example>
<tip>
<para>Using the described configuration scheme you can easily define
common rules like the directory provider and base directory, and
override those defaults later on on a per index basis.</para>
</tip>
<table id="directory-provider-table">
<title>List of built-in <classname>DirectoryProvider</classname></title>
<tgroup cols="2">
<thead>
<row>
<entry align="center">Name and description</entry>
<entry align="center">Properties</entry>
</row>
</thead>
<tbody>
<row>
<entry><property>ram</property>: Memory based directory, the
directory will be uniquely identified (in the same deployment
unit) by the <literal>@Indexed.index</literal> element</entry>
<entry>none</entry>
</row>
<row>
<entry><property>filesystem</property>: File system based
directory. The directory used will be <indexBase>/<
indexName ></entry>
<entry><para><literal>indexBase</literal> : base
directory</para><para><literal>indexName</literal>: override
@Indexed.index (useful for sharded
indexes)</para><para><literal>locking_strategy</literal> :
optional, see <xref
linkend="search-configuration-directory-lockfactories"/>
</para><para><literal>filesystem_access_type</literal>: allows to
determine the exact type of <classname>FSDirectory</classname>
implementation used by this
<classname>DirectoryProvider</classname>. Allowed values are
<literal>auto</literal> (the default value, selects
<classname>NIOFSDirectory</classname> on non Windows systems,
<classname>SimpleFSDirectory</classname> on Windows),
<literal>simple</literal>
(<classname>SimpleFSDirectory</classname>), <literal>nio</literal>
(<classname>NIOFSDirectory</classname>), <literal>mmap</literal>
(<classname>MMapDirectory</classname>). Make sure to refer to
Javadocs of these <classname>Directory</classname> implementations
before changing this setting. Even though
<classname>NIOFSDirectory</classname> or
<classname>MMapDirectory</classname> can bring substantial
performace boosts they also have their issues.</para></entry>
</row>
<row>
<entry><para><property>filesystem-master</property>: File system
based directory. Like <literal>filesystem</literal>. It also
copies the index to a source directory (aka copy directory) on a
regular basis.</para><para>The recommended value for the refresh
period is (at least) 50% higher that the time to copy the
information (default 3600 seconds - 60 minutes).</para><para>Note
that the copy is based on an incremental copy mechanism reducing
the average copy time.</para><para>DirectoryProvider typically
used on the master node in a JMS back end cluster.</para><para>The
<literal> buffer_size_on_copy</literal> optimum depends on your
operating system and available RAM; most people reported good
results using values between 16 and 64MB.</para></entry>
<entry><para><literal>indexBase</literal>: base
directory</para><para><literal>indexName</literal>: override
@Indexed.index (useful for sharded
indexes)</para><para><literal>sourceBase</literal>: source (copy)
base directory.</para><para><literal>source</literal>: source
directory suffix (default to <literal>@Indexed.index</literal>).
The actual source directory name being
<filename><sourceBase>/<source></filename>
</para><para><literal>refresh</literal>: refresh period in seconds
(the copy will take place every <constant>refresh</constant>
seconds). If a copy is still in progress when the following
<constant>refresh</constant> period elapses, the second copy
operation will be
skipped.</para><para><literal>buffer_size_on_copy</literal>: The
amount of MegaBytes to move in a single low level copy
instruction; defaults to
16MB.</para><para><literal>locking_strategy</literal> : optional,
see <xref linkend="search-configuration-directory-lockfactories"/>
</para><para><literal>filesystem_access_type</literal>: allows to
determine the exact type of <classname>FSDirectory</classname>
implementation used by this
<classname>DirectoryProvider</classname>. Allowed values are
<literal>auto</literal> (the default value, selects
<classname>NIOFSDirectory</classname> on non Windows systems,
<classname>SimpleFSDirectory</classname> on Windows),
<literal>simple</literal>
(<classname>SimpleFSDirectory</classname>), <literal>nio</literal>
(<classname>NIOFSDirectory</classname>), <literal>mmap</literal>
(<classname>MMapDirectory</classname>). Make sure to refer to
Javadocs of these <classname>Directory</classname> implementations
before changing this setting. Even though
<classname>NIOFSDirectory</classname> or
<classname>MMapDirectory</classname> can bring substantial
performace boosts they also have their issues.</para></entry>
</row>
<row>
<entry><para><property>filesystem-slave</property>: File system
based directory. Like <literal>filesystem</literal>, but retrieves
a master version (source) on a regular basis. To avoid locking and
inconsistent search results, 2 local copies are
kept.</para><para>The recommended value for the refresh period is
(at least) 50% higher that the time to copy the information
(default 3600 seconds - 60 minutes).</para><para>Note that the
copy is based on an incremental copy mechanism reducing the
average copy time. If a copy is still in progress when
<constant>refresh</constant> period elapses, the second copy
operation will be skipped.</para><para>DirectoryProvider typically
used on slave nodes using a JMS back end.</para><para>The
<literal> buffer_size_on_copy</literal> optimum depends on your
operating system and available RAM; most people reported good
results using values between 16 and 64MB.</para></entry>
<entry><para><literal>indexBase</literal>: Base
directory</para><para><literal>indexName</literal>: override
@Indexed.index (useful for sharded
indexes)</para><para><literal>sourceBase</literal>: Source (copy)
base directory.</para><para><literal>source</literal>: Source
directory suffix (default to <literal>@Indexed.index</literal>).
The actual source directory name being
<filename><sourceBase>/<source></filename>
</para><para><literal>refresh</literal>: refresh period in second
(the copy will take place every refresh
seconds).</para><para><literal>buffer_size_on_copy</literal>: The
amount of MegaBytes to move in a single low level copy
instruction; defaults to
16MB.</para><para><literal>locking_strategy</literal> : optional,
see <xref linkend="search-configuration-directory-lockfactories"/>
</para><para><literal>retry_marker_lookup</literal> : optional,
default to 0. Defines how many times we look for the marker files
in the source directory before failing. Waiting 5 seconds between
each try. </para><para><literal>retry_initialize_period</literal>
: optional, set an integer value in seconds to enable the retry
initialize feature: if the slave can't find the master index it
will try again until it's found in background, without preventing
the application to start: fullText queries performed before the
index is initialized are not blocked but will return empty
results. When not enabling the option or explicitly setting it to
zero it will fail with an exception instead of scheduling a retry
timer. To prevent the application from starting without an invalid
index but still control an initialization timeout, see
<literal>retry_marker_lookup</literal>
instead.</para><para><literal>filesystem_access_type</literal>:
allows to determine the exact type of
<classname>FSDirectory</classname> implementation used by this
<classname>DirectoryProvider</classname>. Allowed values are
<literal>auto</literal> (the default value, selects
<classname>NIOFSDirectory</classname> on non Windows systems,
<classname>SimpleFSDirectory</classname> on Windows),
<literal>simple</literal>
(<classname>SimpleFSDirectory</classname>), <literal>nio</literal>
(<classname>NIOFSDirectory</classname>), <literal>mmap</literal>
(<classname>MMapDirectory</classname>). Make sure to refer to
Javadocs of these <classname>Directory</classname> implementations
before changing this setting. Even though
<classname>NIOFSDirectory</classname> or
<classname>MMapDirectory</classname> can bring substantial
performace boosts they also have their issues.</para></entry>
</row>
<row>
<entry><para><property>infinispan</property>: Infinispan based
directory. Use it to store the index in a distributed grid, making
index changes visible to all elements of the cluster very quickly.
Also see <xref linkend="infinispan-directories"/> for additional
requirements and configuration settings. Infinispan needs a global
configuration and additional dependencies; the settings defined
here apply to each different index.</para></entry>
<entry><para><literal>locking_cachename</literal>: name of the
Infinispan cache to use to store
locks.</para><para><literal>data_cachename</literal> : name of the
Infinispan cache to use to store the largest data chunks; this
area will contain the largest objects, use replication if you have
enough memory or switch to distribution.</para>
<para><literal>metadata_cachename</literal>: name of the
Infinispan cache to use to store the metadata relating to the
index; this data is rather small and read very often, it's
recommended to have this cache setup using replication.</para>
<para><literal>chunk_size</literal>: large files of the index are
split in smaller chunks, you might want to set the highest value
efficiently handled by your network. Networking tuning might be
useful.</para></entry>
</row>
</tbody>
</tgroup>
</table>
<tip>
<para>If the built-in directory providers do not fit your needs, you can
write your own directory provider by implementing the
<classname>org.hibernate.store.DirectoryProvider</classname> interface.
In this case, pass the fully qualified class name of your provider into
the <literal>directory_provider</literal> property. You can pass any
additional properties using the prefix
<constant>hibernate.search.</constant><replaceable><indexname></replaceable>.</para>
</tip>
<section id="infinispan-directories" revision="2">
<title role="bold">Infinispan Directory configuration</title>
<para>Infinispan is a distributed, scalable, highly available data grid
platform which supports autodiscovery of peer nodes. Using Infinispan
and Hibernate Search in combination, it is possible to store the Lucene
index in a distributed environment where index updates are quickly
available on all nodes.</para>
<para>This section describes in greater detail how to configure
Hibernate Search to use an Infinispan Lucene Directory.</para>
<para>When using an Infinispan Directory the index is stored in memory
and shared across multiple nodes. It is considered a single directory
distributed across all participating nodes. If a node updates the index,
all other nodes are updated as well. Updates on one node can be
immediately searched for in the whole cluster.</para>
<para>The default configuration replicates all data defining the index
across all nodes, thus consuming a significant amount of memory. For
large indexes it's suggested to enable data distribution, so that each
piece of information is replicated to a subset of all cluster
members.</para>
<para>It is also possible to offload part or most information to a
<literal>CacheStore</literal>, such as plain filesystem, Amazon S3,
Cassandra, Berkley DB or standard relational databases. You can
configure it to have a <literal>CacheStore</literal> on each node or
have a single centralized one shared by each node.</para>
<para>See the <ulink
url="https://docs.jboss.org/author/display/ISPN/Home"> Infinispan
documentation</ulink> for all Infinispan configuration options.</para>
<section>
<title>Requirements</title>
<para>To use the Infinispan directory via Maven, add the following
dependencies:</para>
<example>
<title>Maven dependencies for Hibernate Search</title>
<programlisting language="XML" role="XML"><dependency>
<groupId>org.hibernate</groupId>
<artifactId>hibernate-search</artifactId>
<version>&version;</version>
</dependency>
<dependency>
<groupId>org.hibernate</groupId>
<artifactId>hibernate-search-infinispan</artifactId>
<version>&version;</version>
</dependency></programlisting>
</example>
<para>For the non-maven users, add
<literal>hibernate-search-infinispan.jar</literal>,
<literal>infinispan-lucene-directory.jar</literal> and
<literal>infinispan-core.jar</literal> to your application classpath.
These last two jars are distributed by <ulink
url="http://www.jboss.org/infinispan/downloads">Infinispan</ulink>.</para>
</section>
<section>
<title>Architecture</title>
<para>Even when using an Infinispan directory it's still recommended
to use the JMS Master/Slave or JGroups backend, because in Infinispan
all nodes will share the same index and it is likely that
<classname>IndexWriter</classname>s being active on different nodes
will try to acquire the lock on the same index. So instead of sending
updates directly to the index, send it to a JMS queue or JGroups
channel and have a single node apply all changes on behalf of all
other nodes.</para>
<para>Configuring a non-default backend is not a requirement but a
performance optimization as locks are enabled to have a single node
writing.</para>
<para>To configure a JMS slave only the backend must be replaced, the
directory provider must be set to <literal>infinispan</literal>; set
the same directory provider on the master, they will connect without
the need to setup the copy job across nodes. Using the JGroups backend
is very similar - just combine the backend configuration with the
<literal>infinispan</literal> directory provider.</para>
</section>
<section>
<title>Infinispan Configuration</title>
<para>The most simple configuration only requires to enable the
backend:</para>
<programlisting>hibernate.search.[default|<indexname>].directory_provider = infinispan</programlisting>
<para>That's all what is needed to get a cluster-replicated index, but
the default configuration does not enable any form of permanent
persistence for the index; to enable such a feature an Infinispan
configuration file should be provided.</para>
<para>To use Infinispan, Hibernate Search requirest a
<classname>CacheManager</classname>; it can lookup and reuse an
existing <classname>CacheManager,</classname> via JNDI, or start and
manage a new one. In the latter case Hibernate Search will start and
stop it ( closing occurs when the Hibernate
<classname>SessionFactory</classname> is closed).</para>
<para>To use and existing <classname>CacheManager</classname> via JNDI
(optional parameter):</para>
<programlisting>hibernate.search.infinispan.cachemanager_jndiname = [jndiname]</programlisting>
<para>To start a new <classname>CacheManager</classname> from a
configuration file (optional parameter):</para>
<programlisting>hibernate.search.infinispan.configuration_resourcename = [infinispan configuration filename]</programlisting>
<para>If both parameters are defined, JNDI will have priority. If none
of these is defined, Hibernate Search will use the default Infinispan
configuration included in
<literal>hibernate-search-infinispan.jar</literal>. This configuration
should work fine in most cases but does not store the index in a
persistent cache store.</para>
<para>As mentioned in <xref linkend="directory-provider-table"/>, each
index makes use of three caches, so three different caches should be
configured as shown in the
<literal>default-hibernatesearch-infinispan.xml</literal> provided in
the <literal>hibernate-search-infinispan.jar</literal>. Several
indexes can share the same caches.</para>
</section>
</section>
</section>
<section>
<title id="configuration-worker">Worker configuration</title>
<para>It is possible to refine how Hibernate Search interacts with Lucene
through the worker configuration. There exist several architectural
components and possible extension points. Let's have a closer look.</para>
<para>First there is a <classname>Worker</classname>. An implementation of
the <classname>Worker</classname> interface is reponsible for receiving
all entity changes, queuing them by context and applying them once a
context ends. The most intuative context, especially in connection with
ORM, is the transaction. For this reason Hibernate Search will per default
use the <classname>TransactionalWorker</classname> to scope all changes
per transaction. One can, however, imagine a scenario where the context
depends for example on the number of entity changes or some other
application (lifecycle) events. For this reason the
<classname>Worker</classname> implementation is configurable as shown in
<xref linkend="table-worker-configuration"/>.</para>
<table id="table-worker-configuration">
<title>Scope configuration</title>
<tgroup cols="2">
<tbody>
<row>
<entry><emphasis role="bold">Property</emphasis></entry>
<entry><emphasis role="bold">Description</emphasis></entry>
</row>
<row>
<entry><property>hibernate.search.worker.scope</property></entry>
<entry>The fully qualifed class name of the
<classname>Worker</classname> implementation to use. If this
property is not set, empty or <literal>transaction</literal> the
default <classname>TransactionalWorker</classname> is
used.</entry>
</row>
<row>
<entry><property>hibernate.search.worker.*</property></entry>
<entry>All configuration properties prefixed with
<literal>hibernate.search.worker</literal> are passed to the
Worker during initialization. This allows adding custom, worker
specific parameters.</entry>
</row>
</tbody>
</tgroup>
</table>
<para>Once a context ends it is time to prepare and apply the index
changes. This can be done synchronously or asynchronously from within a
new thread. Synchronous updates have the advantage that the index is at
all times in sync with the databases. Asynchronous updates, on the other
hand, can help to minimize the user response time. The drawback is
potential discrepancies between database and index states. Lets look at
the configuration options shown in <xref
linkend="table-work-execution-configuration"/>.</para>
<note>
<para>The following options can be different on each index; in fact they
need the indexName prefix or use <literal>default</literal> to set the
default value for all indexes.</para>
</note>
<table id="table-work-execution-configuration">
<title>Execution configuration</title>
<tgroup cols="2">
<tbody>
<row>
<entry><emphasis role="bold">Property</emphasis></entry>
<entry><emphasis role="bold">Description</emphasis></entry>
</row>
<row>
<entry><property>hibernate.search.<indexName>.worker.execution</property></entry>
<entry><para><literal>sync</literal>: synchronous execution
(default)</para><para><literal>async</literal>: asynchronous
execution</para></entry>
</row>
<row>
<entry><property>hibernate.search.<indexName>.worker.thread_pool.size</property></entry>
<entry>The backend can apply updates from the same transaction
context (or batch) in parallel, using a threadpool. The default
value is 1. You can experiment with larger values if you have many
operations per transaction.</entry>
</row>
<row>
<entry><property>hibernate.search.<indexName>.worker.buffer_queue.max</property></entry>
<entry>Defines the maximal number of work queue if the thread poll
is starved. Useful only for asynchronous execution. Default to
infinite. If the limit is reached, the work is done by the main
thread.</entry>
</row>
</tbody>
</tgroup>
</table>
<para>So far all work is done within the same Virtual Machine (VM), no
matter which execution mode. The total amount of work has not changed for
the single VM. Luckily there is a better approach, namely delegation. It
is possible to send the indexing work to a different server by configuring
hibernate.search.worker.backend - see <xref
linkend="table-backend-configuration"/>. Again this option can be
configured differently for each index.</para>
<table id="table-backend-configuration">
<title>Backend configuration</title>
<tgroup cols="2">
<tbody>
<row>
<entry><emphasis role="bold">Property</emphasis></entry>
<entry><emphasis role="bold">Description</emphasis></entry>
</row>
<row>
<entry><property>hibernate.search.<indexName>.worker.backend</property></entry>
<entry><para><literal>lucene</literal>: The default backend which
runs index updates in the same VM. Also used when the property is
undefined or empty.</para><para><literal>jms</literal>: JMS
backend. Index updates are send to a JMS queue to be processed by
an indexing master. See <xref
linkend="table-jms-backend-configuration"/> for additional
configuration options and <xref linkend="jms-backend"/> for a more
detailed descripton of this
setup.</para><para><literal>jgroupsMaster</literal> or
<literal>jgroupsSlave</literal>: Backend using <ulink
url="http://www.jgroups.org/">JGroups</ulink> as communication
layer. See <xref linkend="jgroups-backend"/> for a more detailed
description of this
setup.</para><para><literal>blackhole</literal>: Mainly a
test/developer setting which ignores all indexing
work</para><para>You can also specify the fully qualified name of
a class implementing <classname>BackendQueueProcessor</classname>.
This way you can implement your own communication layer. The
implementation is responsilbe for returning a
<classname>Runnable</classname> instance which on execution will
process the index work.</para></entry>
</row>
</tbody>
</tgroup>
</table>
<table id="table-jms-backend-configuration">
<title>JMS backend configuration</title>
<tgroup cols="2">
<tbody>
<row>
<entry><emphasis role="bold">Property</emphasis></entry>
<entry><emphasis role="bold">Description</emphasis></entry>
</row>
<row>
<entry><property>hibernate.search.<indexName>.worker.jndi.*</property></entry>
<entry>Defines the JNDI properties to initiate the InitialContext
(if needed). JNDI is only used by the JMS back end.</entry>
</row>
<row>
<entry><property>hibernate.search.<indexName>.worker.jms.connection_factory</property></entry>
<entry>Mandatory for the JMS back end. Defines the JNDI name to
lookup the JMS connection factory from
(<literal>/ConnectionFactory</literal> by default in JBoss
AS)</entry>
</row>
<row>
<entry><property>hibernate.search.<indexName>.worker.jms.queue</property></entry>
<entry>Mandatory for the JMS back end. Defines the JNDI name to
lookup the JMS queue from. The queue will be used to post work
messages.</entry>
</row>
<row>
<entry><property>hibernate.search.<indexName>.worker.jms.login</property></entry>
<entry>Optional for the JMS slaves. Use it when your queue requires login credentials
to define your login.</entry>
</row>
<row>
<entry><property>hibernate.search.<indexName>.worker.jms.login</property></entry>
<entry>Optional for the JMS slaves. Use it when your queue requires login credentials
to define your password.</entry>
</row>
</tbody>
</tgroup>
</table>
<warning>
<para>As you probably noticed, some of the shown properties are
correlated which means that not all combinations of property values make
sense. In fact you can end up with a non-functional configuration. This
is especially true for the case that you provide your own
implementations of some of the shown interfaces. Make sure to study the
existing code before you write your own <classname>Worker</classname> or
<classname>BackendQueueProcessor</classname> implementation.</para>
</warning>
<section id="jms-backend">
<title>JMS Master/Slave back end</title>
<para>This section describes in greater detail how to configure the
Master/Slave Hibernate Search architecture.</para>
<mediaobject>
<imageobject role="html">
<imagedata align="center" fileref="jms-backend.png" format="PNG"/>
</imageobject>
<imageobject role="fo">
<imagedata align="center" depth="" fileref="jms-backend.png"
format="PNG" scalefit="1" width="12cm"/>
</imageobject>
<caption><para>JMS back end configuration.</para></caption>
</mediaobject>
<section>
<title>Slave nodes</title>
<para>Every index update operation is sent to a JMS queue. Index
querying operations are executed on a local index copy.</para>
<example>
<title>JMS Slave configuration</title>
<programlisting>### slave configuration
## DirectoryProvider
# (remote) master location
hibernate.search.default.sourceBase = /mnt/mastervolume/lucenedirs/mastercopy
# local copy location
hibernate.search.default.indexBase = /Users/prod/lucenedirs
# refresh every half hour
hibernate.search.default.refresh = 1800
# appropriate directory provider
hibernate.search.default.directory_provider = filesystem-slave
## Backend configuration
hibernate.search.default.worker.backend = jms
hibernate.search.default.worker.jms.connection_factory = /ConnectionFactory
hibernate.search.default.worker.jms.queue = queue/hibernatesearch
#optionally authentication credentials:
hibernate.search.default.worker.jms.login = myname
hibernate.search.default.worker.jms.password = wonttellyou
#optional jndi configuration (check your JMS provider for more information)
## Optional asynchronous execution strategy
# hibernate.search.default.worker.execution = async
# hibernate.search.default.worker.thread_pool.size = 2
# hibernate.search.default.worker.buffer_queue.max = 50</programlisting>
</example>
<tip>
<para>A file system local copy is recommended for faster search
results.</para>
</tip>
</section>
<section>
<title>Master node</title>
<para>Every index update operation is taken from a JMS queue and
executed. The master index is copied on a regular basis.</para>
<example>
<title>JMS Master configuration</title>
<programlisting>### master configuration
## DirectoryProvider
# (remote) master location where information is copied to
hibernate.search.default.sourceBase = /mnt/mastervolume/lucenedirs/mastercopy
# local master location
hibernate.search.default.indexBase = /Users/prod/lucenedirs
# refresh every half hour
hibernate.search.default.refresh = 1800
# appropriate directory provider
hibernate.search.default.directory_provider = filesystem-master
## Backend configuration
#Backend is the default lucene one</programlisting>
</example>
<tip>
<para>It is recommended that the refresh period be higher than the expected copy time; if a copy operation
is still being performed when the next refresh triggers, the second refresh is skipped:
it's safe to set this value low even when the copy time is not known.</para>
</tip>
<para>In addition to the Hibernate Search framework configuration, a
Message Driven Bean has to be written and set up to process the index
works queue through JMS.</para>
<example>
<title>Message Driven Bean processing the indexing queue</title>
<programlisting language="JAVA" role="JAVA">@MessageDriven(activationConfig = {
@ActivationConfigProperty(propertyName="destinationType",
propertyValue="javax.jms.Queue"),
@ActivationConfigProperty(propertyName="destination",
propertyValue="queue/hibernatesearch"),
@ActivationConfigProperty(propertyName="DLQMaxResent", propertyValue="1")
} )
public class MDBSearchController extends AbstractJMSHibernateSearchController
implements MessageListener {
@PersistenceContext EntityManager em;
//method retrieving the appropriate session
protected Session getSession() {
return (Session) em.getDelegate();
}
//potentially close the session opened in #getSession(), not needed here
protected void cleanSessionIfNeeded(Session session)
}
}</programlisting>
</example>
<para>This example inherits from the abstract JMS controller class
available in the Hibernate Search source code and implements a JavaEE
MDB. This implementation is given as an example and can be adjusted to
make use of non Java EE Message Driven Beans. For more information
about the <methodname>getSession()</methodname> and
<methodname>cleanSessionIfNeeded()</methodname>, please check
<classname>AbstractJMSHibernateSearchController</classname>'s
javadoc.</para>
</section>
</section>
<section id="jgroups-backend">
<title>JGroups Master/Slave back end</title>
<para>This section describes how to configure the JGroups Master/Slave
back end. The configuration examples illustrated in <xref
linkend="jms-backend"/> also apply here, only a different backend
(<constant>hibernate.search.worker.backend</constant>) needs to be
set.</para>
<para>All backends configured to use JGroups share the same Channel. The
JGroups <classname>JChannel</classname> is the main communication link
across all nodes participating in the same cluster group; since it is
convenient and more efficient to have just one channel shared across all
backends, the Channel configuration properties are not defined on a
per-worker section but globally. See <xref
linkend="jgroups-channel-configuration"/>.</para>
<section>
<title>Slave nodes</title>
<para>Every index update operation is sent through a JGroups channel
to the master node. Index querying operations are executed on a local
index copy. Enabling the JGroups worker only makes sure the index
operations are sent to the master, you still have to synchronize
configuring an appropriate directory (See
<literal>filesystem-master</literal>,
<literal>filesystem-slave</literal> or <literal>infinispan</literal>
options in <xref linkend="search-configuration-directory"/>).</para>
<example>
<title>JGroups Slave configuration</title>
<programlisting>### slave configuration
hibernate.search.default.worker.backend = jgroupsSlave </programlisting>
</example>
</section>
<section>
<title>Master node</title>
<para>Every index update operation is taken from a JGroups channel and
executed. The master index is copied on a regular basis.</para>
<example>
<title>JGroups Master configuration</title>
<programlisting>### master configuration
hibernate.search.default.worker.backend = jgroupsMaster </programlisting>
</example>
</section>
<section id="jgroups-channel-configuration">
<title>JGroups channel configuration</title>
<para>Configuring the JGroups channel essentially entails specifying
the transport in terms of a network protocol stack. To configure the
JGroups transport, point the configuration property
<constant>hibernate.search.services.jgroups.configurationFile</constant>
to a JGroups configuration file; this can be either a file path or a
Java resource name.</para>
<tip>
<para>If no property is explicitly specified it is assumed that the
JGroups default configuration file <literal>flush-udp.xml</literal>
is used. This example configuration is known to work in most
scenarios, with the notable exception of Amazon AWS; refer to the
<ulink url="http://www.jgroups.org/manual-3.x/html/">JGroups
manual</ulink> for more examples and protocol configuration
details.</para>
</tip>
<para>The default channel name is <literal>Hibernate Search
Cluster</literal> which can be configured as seen in <xref
linkend="example-jgroups-channel-name"/>.</para>
<example id="example-jgroups-channel-name">
<title>JGroups channel name configuration</title>
<programlisting>hibernate.search.services.jgroups.clusterName = My-Custom-Cluster-Id</programlisting>
</example>