-
Notifications
You must be signed in to change notification settings - Fork 242
/
advanced-features.xml
624 lines (518 loc) · 25.7 KB
/
advanced-features.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
<?xml version="1.0" encoding="UTF-8"?>
<!--
~ Hibernate, Relational Persistence for Idiomatic Java
~
~ Copyright (c) 2010, Red Hat, Inc. and/or its affiliates or third-party contributors as
~ indicated by the @author tags or express copyright attribution
~ statements applied by the authors. All third-party contributions are
~ distributed under license by Red Hat, Inc.
~
~ This copyrighted material is made available to anyone wishing to use, modify,
~ copy, or redistribute it subject to the terms and conditions of the GNU
~ Lesser General Public License, as published by the Free Software Foundation.
~
~ This program is distributed in the hope that it will be useful,
~ but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
~ or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
~ for more details.
~
~ You should have received a copy of the GNU Lesser General Public License
~ along with this distribution; if not, write to:
~ Free Software Foundation, Inc.
~ 51 Franklin Street, Fifth Floor
~ Boston, MA 02110-1301 USA
-->
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
"http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [
<!ENTITY % BOOK_ENTITIES SYSTEM "../hsearch.ent">
%BOOK_ENTITIES;
]>
<chapter id="search-lucene-native">
<title>Advanced features</title>
<para>In this final chapter we are offering a smorgasbord of tips and tricks
which might become useful as you dive deeper and deeper into Hibernate
Search.</para>
<section>
<title>Accessing the SearchFactory</title>
<para>The <classname>SearchFactory</classname> object keeps track of the
underlying Lucene resources for Hibernate Search. It is a convenient way
to access Lucene natively. The <literal>SearchFactory</literal> can be
accessed from a <classname>FullTextSession</classname>:</para>
<example>
<title>Accessing the <classname>SearchFactory</classname></title>
<programlisting language="JAVA" role="JAVA">FullTextSession fullTextSession = Search.getFullTextSession(regularSession);
SearchFactory searchFactory = fullTextSession.getSearchFactory();</programlisting>
</example>
</section>
<section id="IndexReaders">
<title>Using an IndexReader</title>
<para>Queries in Lucene are executed on an
<classname>IndexReader</classname>. Hibernate Search caches index readers
to maximize performance and implements other strategies to retrieve
updated <classname>IndexReader</classname>s in order to minimize IO
operations. Your code can access these cached resources, but you have to
follow some "good citizen" rules.</para>
<example>
<title>Accessing an <classname>IndexReader</classname></title>
<programlisting language="JAVA" role="JAVA">IndexReader reader = searchFactory.getIndexReaderAccessor().open(Order.class);
try {
//perform read-only operations on the reader
}
finally {
searchFactory.getIndexReaderAccessor().close(reader);
} </programlisting>
</example>
<para>In this example the <classname>SearchFactory</classname> figures out
which indexes are needed to query this entity. Using the configured
<classname>ReaderProvider</classname> (described in <xref
linkend="search-architecture-readerstrategy"/>) on each index, it returns
a compound <literal>IndexReader</literal> on top of all involved indexes.
Because this <classname>IndexReader</classname> is shared amongst several
clients, you must adhere to the following rules:</para>
<itemizedlist>
<listitem>
<para>Never call indexReader.close(), but always call
readerProvider.closeReader(reader), using a finally block.</para>
</listitem>
<listitem>
<para>Don't use this <classname>IndexReader</classname> for
modification operations: it's a readonly
<classname>IndexReader</classname>, you would get an
exception).</para>
</listitem>
</itemizedlist>
<para>Aside from those rules, you can use the
<classname>IndexReader</classname> freely, especially to do native Lucene
queries. Using this shared <classname>IndexReader</classname>s will be
more efficient than by opening one directly from - for example - the
filesystem.</para>
<para>As an alternative to the method <methodname>open(Class...
types)</methodname> you can use <methodname>open(String...
indexNames)</methodname>; in this case you pass in one or more index
names; using this strategy you can also select a subset of the indexes for
any indexed type if sharding is used.</para>
<example>
<title>Accessing an <classname>IndexReader by index
names</classname></title>
<programlisting language="JAVA" role="JAVA">IndexReader reader = searchFactory
.getIndexReaderAccessor()
.open("Products.1", "Products.3");</programlisting>
</example>
</section>
<section>
<title>Accessing a Lucene Directory</title>
<para>A <classname>Directory</classname> is the most common abstraction
used by Lucene to represent the index storage; Hibernate Search doesn't
interact directly with a Lucene <classname>Directory</classname> but
abstracts these interactions via an <classname>IndexManager</classname>:
an index does not necessarily need to be implemented by a
<classname>Directory</classname>.</para>
<para>If you are certain that your index is represented as a
<classname>Directory</classname> and need to access it, you can get a
reference to the <classname>Directory</classname> via the
<classname>IndexManager</classname>. You will have to cast the
<classname>IndexManager</classname> instance to a
<classname>DirectoryBasedIndexManager</classname> and then use
<literal>getDirectoryProvider().getDirectory()</literal> to get a
reference to the underlying <classname>Directory</classname>. This is not
recommended, if you need low level access to the index using Lucene APIs
we suggest to see <xref linkend="IndexReaders"/> instead.</para>
</section>
<section id="advanced-features-sharding">
<title>Sharding indexes</title>
<para>In some cases it can be useful to split (shard) the data into
several Lucene indexes. There are two main use use cases:</para>
<itemizedlist>
<listitem>
<para>A single index is so big that index update times are slowing the
application down. In this case static sharding can be used to split
the data into a pre-defined number of shards.</para>
</listitem>
<listitem>
<para>Data is naturally segmented by customer, region, language or
other application parameter and the index should be split according to
these segments. This is a use case for dynamic sharding.</para>
</listitem>
</itemizedlist>
<tip>
<para>By default sharding is not enabled.</para>
</tip>
<section>
<title id="advanced-features-static-sharding">Static sharding</title>
<para>To enable static sharding set the
<constant>hibernate.search.<indexName>.sharding_strategy.nbr_of_shards</constant>
property as seen in <xref linkend="example-index-sharding"/>.</para>
<example id="example-index-sharding">
<title>Enabling index sharding</title>
<programlisting>hibernate.search.[default|<indexName>].sharding_strategy.nbr_of_shards = 5</programlisting>
</example>
<para>The default sharding strategy which gets enabled by setting this
property, splits the data according to the hash value of the document id
(generated by the <classname>FieldBridge</classname>). This ensures a
fairly balanced sharding. You can replace the default strategy by
implementing a custom <classname>IndexShardingStrategy</classname>. To
use your custom strategy you have to set the
<constant>hibernate.search.[default|<indexName>].sharding_strategy</constant>
property to the fully qualified class name of your custom
<classname>IndexShardingStrategy</classname>.</para>
<example id="example-index-sharding-strategy">
<title>Registering a custom IndexShardingStrategy</title>
<programlisting>hibernate.search.[default|<indexName>].sharding_strategy = my.custom.RandomShardingStrategy</programlisting>
</example>
</section>
<section id="advanced-features-dynamic-sharding">
<title>Dynamic sharding</title>
<para>Dynamic sharding allows you to manage the shards yourself and even
create new shards on the fly. To do so you need to implement the
interface <classname>ShardIdentifierProvider</classname> and set the
<constant>hibernate.search.[default|<indexName>].sharding_strategy</constant>
property to the fully qualified name of this class. Note that instead of
implementing the interface directly, you should rather derive your
implementation from
<classname>org.hibernate.search.store.ShardIdentifierProviderTemplate</classname>
which provides a basic implementation. Let's look at <xref
linkend="example-custom-shard-identifier-provider"/> for an
example.<example id="example-custom-shard-identifier-provider">
<title>Custom <classname>ShardidentiferProvider</classname></title>
<programlisting>public static class AnimalShardIdentifierProvider extends ShardIdentifierProviderTemplate {
@Override
public String getShardIdentifier(Class<?> entityType, Serializable id,
String idAsString, Document document) {
if ( entityType.equals(Animal.class) ) {
String type = document.getFieldable("type").stringValue();
addShard(type);
return type;
}
throw new RuntimeException("Animal expected but found " + entityType);
}
@Override
protected Set<String> loadInitialShardNames(Properties properties, BuildContext buildContext) {
ServiceManager serviceManager = buildContext.getServiceManager();
SessionFactory sessionFactory = serviceManager.requestService(
HibernateSessionFactoryServiceProvider.class, buildContext);
Session session = sessionFactory.openSession();
try {
Criteria initialShardsCriteria = session.createCriteria(Animal.class);
initialShardsCriteria.setProjection( Projections.distinct(Property.forName("type")));
@SuppressWarnings("unchecked")
List<String> initialTypes = initialShardsCriteria.list();
return new HashSet<String>(initialTypes);
}
finally {
session.close();
}
}
}</programlisting>
</example>The are several things happening in
<classname>AnimalShardIdentifierProvider</classname>. First off its
purpose is to create one shard per animal type (e.g. mammal, insect,
etc.). It does so by inspecting the class type and the Lucene document
passed to the <methodname>getShardIdentifier()</methodname> method. It
extracts the <constant>type</constant> field from the document and uses
it as shard name. <methodname>getShardIdentifier()</methodname> is
called for every addition to the index and a new shard will be created
with every new animal type encountered. The base class
<classname>ShardIdentifierProviderTemplate</classname> maintains a set
with all known shards to which any identifier must be added by calling
<methodname>addShard()</methodname>.</para>
<para>It is important to understand that Hibernate Search cannot know which
shards already exist when the application starts. When using
<classname>ShardIdentifierProviderTemplate</classname> as base class of
a <classname>ShardIdentifierProvider</classname> implementation, the
initial set of shard identifiers must be returned by the
<methodname>loadInitialShardNames()</methodname> method. How this is
done will depend on the use case. However, a common case in combination
with Hibernate ORM is that the initial shard set is defined by the the
distinct values of a given database column. <xref
linkend="example-custom-shard-identifier-provider"/> shows how to handle
such a case. <classname>AnimalShardIdentifierProvider</classname> makes
in its <methodname>loadInitialShardNames()</methodname> implementation
use of a service called
<classname>HibernateSessionFactoryServiceProvider</classname> (see also
<xref linkend="section-services"/>) which is available within an ORM
environment. It allows to request a Hibernate
<classname>SessionFactory</classname> instance which can be used to run
a <classname>Criteria</classname> query in order to determine the
initial set of shard identifers.</para>
<para>Last but not least, the
<classname>ShardIdentifierProvider</classname> also allows for
optimizing searches by selecting which shard to run a query against. By
activating a filter (see <xref linkend="query-filter-shard"/>), a
sharding strategy can select a subset of the shards used to answer a
query (<classname>getShardIdentifiersForQuery()</classname>, not shown
in the example) and thus speed up the query execution.</para>
</section>
<important>
<para>This <classname>ShardIdentifierProvider</classname> is considered
experimental. We might need to apply some changes to the defined method
signatures to accomodate for unforeseen use cases. Please provide
feedback if you have ideas, or just to let us know how you're using
this API.</para>
</important>
</section>
<section id="section-sharing-indexes">
<title>Sharing indexes</title>
<para>It is technically possible to store the information of more than one
entity into a single Lucene index. There are two ways to accomplish
this:</para>
<itemizedlist>
<listitem>
<para>Configuring the underlying directory providers to point to the
same physical index directory. In practice, you set the property
<literal>hibernate.search.[fully qualified entity
name].indexName</literal> to the same value. As an example let’s use
the same index (directory) for the <classname>Furniture</classname>
and <classname>Animal</classname> entity. We just set
<literal>indexName</literal> for both entities to for example
“Animal”. Both entities will then be stored in the Animal
directory.</para>
<para><programlisting>hibernate.search.org.hibernate.search.test.shards.Furniture.indexName = Animal
hibernate.search.org.hibernate.search.test.shards.Animal.indexName = Animal</programlisting></para>
</listitem>
<listitem>
<para>Setting the <code>@Indexed</code> annotation’s
<methodname>index</methodname> attribute of the entities you want to
merge to the same value. If we again wanted all
<classname>Furniture</classname> instances to be indexed in the
<classname>Animal</classname> index along with all instances of
<classname>Animal</classname> we would specify
<code>@Indexed(index="Animal")</code> on both
<classname>Animal</classname> and <classname>Furniture</classname>
classes.<note>
<para>This is only presented here so that you know the option is
available. There is really not much benefit in sharing
indexes.</para>
</note></para>
</listitem>
</itemizedlist>
</section>
<section id="section-services">
<title>Using external services</title>
<para>Any of the pluggable contracts we have seen so far allows for the
injection of a service. The most notable example being the
<classname>DirectoryProvider</classname>. The full list is:</para>
<itemizedlist>
<listitem>
<para><classname>DirectoryProvider</classname></para>
</listitem>
<listitem>
<para><classname>ReaderProvider</classname></para>
</listitem>
<listitem>
<para><classname>OptimizerStrategy</classname></para>
</listitem>
<listitem>
<para><classname>BackendQueueProcessor</classname></para>
</listitem>
<listitem>
<para><classname>Worker</classname></para>
</listitem>
<listitem>
<para><classname>ErrorHandler</classname></para>
</listitem>
<listitem>
<para><classname>MassIndexerProgressMonitor</classname></para>
</listitem>
</itemizedlist>
<para>Some of these components need to access a service which is either
available in the environment or whose lifecycle is bound to the
<classname>SearchFactory</classname>. Sometimes, you even want the same
service to be shared amongst several instances of these contract. One
example is the ability the share an Infinispan cache instance between
several directory providers running in different <literal>JVM</literal>s
to store the various indexes using the same underlying infrastructure;
this provides real-time replication of indexes across nodes.</para>
<section>
<title>Exposing a service</title>
<para>To expose a service, you need to implement
<classname>org.hibernate.search.spi.ServiceProvider<T></classname>.
<classname>T</classname> is the type of the service you want to use.
Services are retrieved by components via their
<classname>ServiceProvider</classname> class implementation.</para>
<section>
<title>Managed services</title>
<para>If your service ought to be started when Hibernate Search starts
and stopped when Hibernate Search stops, you can use a managed
service. Make sure to properly implement the
<methodname>start</methodname> and <methodname>stop</methodname>
methods of <classname>ServiceProvider</classname>. When the service is
requested, the <methodname>getService</methodname> method is
called.</para>
<example>
<title>Example of ServiceProvider implementation</title>
<programlisting language="JAVA" role="JAVA">public class CacheServiceProvider implements ServiceProvider<Cache> {
private CacheManager manager;
public void start(Properties properties) {
//read configuration
manager = new CacheManager(properties);
}
public Cache getService() {
return manager.getCache(DEFAULT);
}
void stop() {
manager.close();
}
}</programlisting>
</example>
<note>
<para>The <classname>ServiceProvider</classname> implementation must
have a no-arg constructor.</para>
</note>
<para>To be transparently discoverable, such service should have an
accompanying
<filename>META-INF/services/org.hibernate.search.spi.ServiceProvider</filename>
whose content list the (various) service provider
implementation(s).</para>
<example>
<title>Content of
META-INF/services/org.hibernate.search.spi.ServiceProvider</title>
<programlisting>com.acme.infra.hibernate.CacheServiceProvider</programlisting>
</example>
</section>
<section>
<title>Provided services</title>
<para>Alternatively, the service can be provided by the environment
bootstrapping Hibernate Search. For example, Infinispan which uses
Hibernate Search as its internal search engine can pass the
<classname>CacheContainer</classname> to Hibernate Search. In this
case, the <classname>CacheContainer</classname> instance is not
managed by Hibernate Search and the
<methodname>start</methodname>/<methodname>stop</methodname> methods
of its corresponding service provider will not be used.</para>
<note>
<para>Provided services have priority over managed services. If a
provider service is registered with the same
<classname>ServiceProvider</classname> class as a managed service,
the provided service will be used.</para>
</note>
<para>The provided services are passed to Hibernate Search via the
<classname>SearchConfiguration</classname> interface
(<methodname>getProvidedServices</methodname>).</para>
<important>
<para>Provided services are used by frameworks controlling the
lifecycle of Hibernate Search and not by traditional users.</para>
</important>
<para>If, as a user, you want to retrieve a service instance from the
environment, use registry services like JNDI and look the service up
in the provider.</para>
</section>
</section>
<section>
<title>Using a service</title>
<para>Many of of the pluggable contracts of Hibernate Search can use
services. Services are accessible via the
<classname>BuildContext</classname> interface.</para>
<example>
<title>Example of a directory provider using a cache service</title>
<programlisting language="JAVA" role="JAVA">public CustomDirectoryProvider implements DirectoryProvider<RAMDirectory> {
private BuildContext context;
public void initialize(
String directoryProviderName,
Properties properties,
BuildContext context) {
//initialize
this.context = context;
}
public void start() {
Cache cache = context.requestService(CacheServiceProvider.class);
//use cache
}
public RAMDirectory getDirectory() {
// use cache
}
public stop() {
//stop services
context.releaseService(CacheServiceProvider.class);
}
}</programlisting>
</example>
<para>When you request a service, an instance of the service is served
to you. Make sure to then release the service. This is fundamental. Note
that the service can be released in the
<methodname>DirectoryProvider.stop</methodname> method if the
<classname>DirectoryProvider</classname> uses the service during its
lifetime or could be released right away of the service is simply used
at initialization time.</para>
</section>
</section>
<section>
<title>Customizing Lucene's scoring formula</title>
<para>Lucene allows the user to customize its scoring formula by extending
<classname>org.apache.lucene.search.Similarity</classname>. The abstract
methods defined in this class match the factors of the following formula
calculating the score of query q for document d:</para>
<para><emphasis role="bold">score(q,d) = coord(q,d) · queryNorm(q) · ∑
<subscript>t in q</subscript> ( tf(t in d) · idf(t)
<superscript>2</superscript> · t.getBoost() · norm(t,d) )
</emphasis></para>
<para><informaltable align="left" width="">
<tgroup cols="2">
<thead>
<row>
<entry align="center">Factor</entry>
<entry align="center">Description</entry>
</row>
</thead>
<tbody>
<row>
<entry align="left">tf(t ind)</entry>
<entry>Term frequency factor for the term (t) in the document
(d).</entry>
</row>
<row>
<entry align="left">idf(t)</entry>
<entry>Inverse document frequency of the term.</entry>
</row>
<row>
<entry align="left">coord(q,d)</entry>
<entry>Score factor based on how many of the query terms are
found in the specified document.</entry>
</row>
<row>
<entry align="left">queryNorm(q)</entry>
<entry>Normalizing factor used to make scores between queries
comparable.</entry>
</row>
<row>
<entry align="left">t.getBoost()</entry>
<entry>Field boost.</entry>
</row>
<row>
<entry align="left">norm(t,d)</entry>
<entry>Encapsulates a few (indexing time) boost and length
factors.</entry>
</row>
</tbody>
</tgroup>
</informaltable>It is beyond the scope of this manual to explain this
formula in more detail. Please refer to
<classname>Similarity</classname>'s Javadocs for more information.</para>
<para>Hibernate Search provides two ways to modify Lucene's similarity
calculation.</para>
<para>First you can set the default similarity by specifying the fully
specified classname of your <classname>Similarity</classname>
implementation using the property
<constant>hibernate.search.similarity</constant>. The default value is
<classname>org.apache.lucene.search.DefaultSimilarity</classname>.</para>
<para>Secondly, you can override the similarity used for a specific index
by setting the <literal>similarity</literal> property for this index (see
<xref linkend="search-configuration-directory"/> for more information
about index configuration):</para>
<programlisting>hibernate.search.[default|<indexname>].similarity = my.custom.Similarity</programlisting>
<para>As an example, let's assume it is not important how often a term
appears in a document. Documents with a single occurrence of the term
should be scored the same as documents with multiple occurrences. In this
case your custom implementation of the method <methodname>tf(float freq)
</methodname> should return 1.0.</para>
<note>
<para>When two entities share the same index they must declare the same
<classname>Similarity</classname> implementation.</para>
</note>
<note>
<para>The use of <classname>@Similarity</classname> which was used to
configure the similarity on a class level is deprecated since Hibernate
Search 4.4. Instead of using the annotation use the configuration
property.</para>
</note>
</section>
</chapter>