/
fedoragsearch-doc.html
1094 lines (1030 loc) · 52.9 KB
/
fedoragsearch-doc.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<html>
<head>
<title>Fedora Generic Search Service</title>
<link rel="stylesheet" type="text/css" href="css/docstyle.css"/>
<link rel="stylesheet" type="text/css" href="../css/docstyle.css"/>
<style type="text/css">
.toc {
background: #CCCCCC;
}
.toc p {
margin-left: 20px;
line-height: 30px;
}
.toc dt {
margin-left: 20px;
line-height: 30px;
}
ul {
list-style: square outside none;
padding-top: 6px;
}
ul ul {
list-style: disc outside none;
padding-top: 6px;
margin-bottom: 10px;
}
li.MsoNormal {
mso-style-parent:"";
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman";
margin-left:0in; margin-right:0in; margin-top:0in
}
</style>
</head>
<body>
<div id="header">
<a href="" id="logo"></a>
<div id="title">
<h1>Fedora Generic Search Service Version 2.4.2</h1>
<h2>compatible with Fedora Version 3.5</h2>
</div>
</div>
<div>
<p>This is the one-and-only documentation page for the Fedora Generic Search Service,
abbreviated fedoragsearch or GSearch.
</p>
<p>You, the reader, are presumably responsible for or involved in making your digital contents in Fedora
searchable for your end-users. GSearch makes this task relatively easy.
</p>
<p>GSearch comes with three plugins for top-class open-source search engines, Apache Lucene, Apache Solr, and Zebra.
</p>
<p>Your choice of search engine plugin depends on circumstances: </p>
<ul>
<li>If you are one developer or a small team, you may prefer to take the easiest way, that is the Lucene plugin.
</li>
<li>If you want all options open, you choose the Solr plugin, where you need to know and do much more.
</li>
<li>The Zebra plugin is the choice that nobody has taken, because it is in a culture different from Fedora.
</li>
</ul>
<p>The choice is taken by configuration.
</p>
</div>
<p></p>
<div class="toc">
<p><b>Table of Contents</b></p>
<dl>
<dt><a href="#DEMONSTRATION">I. DEMONSTRATION</a></dt>
<dt> <a href="#owndemo">See a demo at your own site, almost out-of-the-box</a></dt>
<dt><a href="#OVERALLDESCRIPTION">II. OVERALL DESCRIPTION</a></dt>
<dt> <a href="#majorfeatures">Major features</a></dt>
<dt> <a href="#updateIndex">More on the updateIndex operation</a></dt>
<dt> <a href="#engines">Search engine plugins</a></dt>
<dt><a href="#CONFIGURATION">III. CONFIGURATION</a></dt>
<dt> <a href="#realapp">Install and configure for your application</a></dt>
<dt> <a href="#config">Create the configuration files</a></dt>
<dt> <a href="#indexingstylesheet">Generate indexing stylesheet from example foxml files</a></dt>
<dt> <a href="#basicproperties">Edit and use the basic property values</a></dt>
<dt> <a href="#gfauto">Configuring GSearch and Fedora for automatic updates</a></dt>
<dt> <a href="#multilingual">Multilingual configuration</a></dt>
<dt><a href="#FURTHERUSAGE">IV. FURTHER USAGE</a></dt>
<dt> <a href="#extraction">Full-text and metadata extraction from datastreams using Apache Tika</a></dt>
<dt> <a href="#endusersearch">Customizable end-user search client</a></dt>
<dt> <a href="#searchresfilt">Search result filtering</a></dt>
<dt> <a href="#configman">Management of GSearch configurations in Fedora objects</a></dt>
<dt> <a href="#objectsnnindex">Many-to-many relationship between Fedora objects and index documents</a></dt>
<dt> <a href="#reposnnindex">Many-to-many relationship between repositories and indexes</a></dt>
<dt> <a href="#embeddedqueries">Embedded queries</a></dt>
<dt> <a href="#source">Building from source</a></dt>
<dt><a href="#HISTORY">V. HISTORY</a></dt>
<dt> <a href="#new241">New features in version 2.4.1</a></dt>
<dt> <a href="#new24">New features in version 2.4</a></dt>
<dt> <a href="#new23">New features in version 2.3</a></dt>
<dt> <a href="#new22">New features in version 2.2</a></dt>
<dt> <a href="#new211">New features in version 2.1.1</a></dt>
<dt> <a href="#new21">New features in version 2.1</a></dt>
<dt> <a href="#new20">New features in version 2.0</a></dt>
<dt> <a href="#background">Background</a></dt>
</dl>
</div>
<!--
<div>
<a name="DTUdemo"><h2>See a demo at DTU</h2></a>
<p>This documentation page is also visible at
<a href="http://miranth.cvt.dk/fedoragsearch" target="DTUdemo">the DTU demo site</a>.
</p>
<p>The demo uses a Fedora 3.5 repository, where the set of Fedora demo objects has been ingested
and indexed by GSearch. You can view it through the GSearch administrator interface,
which has 5 pages. Step through it in this sequence (login as gsearchGuest:gsearchGuestPass):</p>
<ul>
<li>
<a href="http://miranth.cvt.dk/fedoragsearch/rest?operation=getRepositoryInfo" target="DTUdemo">The Repository Info page</a>.
</li>
<li>
<a href="http://miranth.cvt.dk/fedoragsearch/rest?operation=getIndexInfo" target="DTUdemo">The Index Info page</a>.
</li>
<li>
<a href="http://miranth.cvt.dk/fedoragsearch/rest?operation=updateIndex" target="DTUdemo">The Update Index page</a>.
</li>
<li>
<a href="http://miranth.cvt.dk/fedoragsearch/rest?operation=browseIndex" target="DTUdemo">The Browse Index page</a>.
</li>
<li>
<a href="http://miranth.cvt.dk/fedoragsearch/rest?operation=gfindObjects" target="DTUdemo">The Search page</a>.
</li>
</ul>
<p>You may press buttons and enter queries, but you will notice that you are not authorized to do the updateIndex actions.
</p>
</div>
-->
<div>
<a name="DEMONSTRATION"><h1>I. DEMONSTRATION</h1></a>
</div>
<div>
<a name="owndemo"><h2>See a demo at your own site, almost out-of-the-box</h2></a>
<p>Perform these steps:</p>
<ul>
<li>Create a Fedora 3.5 installation by quick install. The only piece of custom configuration needed is setting
the value of the param enabled to true for the Messaging module in fedora.fcfg:
<pre><module role="org.fcrepo.server.messaging.Messaging" class="org.fcrepo.server.messaging.MessagingModule">
<comment>Fedora's Java Messaging Service (JMS) Module</comment>
<param name="enabled" value="true"/></pre>
</li>
<li>Download fedoragsearch.war
from either <a href="http://www.cvt.dk/fedoragsearch">the DTU prerelease site</a>,
or from <a href="https://wiki.duraspace.org/display/FCSVCS/Fedora+Framework+Services">the official Duraspace site</a>.
Alternatively, you may <a href="#source">build fedoragsearch.war from source</a>.
</li>
<li>Copy fedoragsearch.war into the tomcat webapps directory of your Fedora installation.
Tomcat will unpack it, if it is running, or else when you start it.
</li>
<li>Create a GSearch administrator in fedora-users.xml
<pre><user name="fgsAdmin" password="fgsAdminPassword">
<attribute name="fedoraRole">
<value>administrator</value>
</attribute>
</user></pre>
Notice, only users with names 'fedoraAdmin', 'fgsTester' and names starting with 'fgsAdmin'
are authorized to perform updateIndex actions.
</li>
<li>Create the set of configuration files.
All you need to do is edit a few of the property values
in the file webapps/fedoragsearch/FgsConfig/fgsconfig-basic.properties, including passwords,
and run
<pre>> ant -f fgsconfig-basic.xml</pre>
This ant script ends with writing the fgsconfigFinal files
to the classpath location that you have chosen.
Therefore, you need to run it with permission to write there.
</li>
<li>Restart tomcat.
</li>
<li>Now this documentation page is visible at
<a href="#" target="owndemo">your own demo site</a>.
and the admin pages are visible here:
</li>
<li>
<ul>
<li>
<a href="rest?operation=updateIndex" target="owndemorest">The Update Index page</a>.
</li>
<li>
<a href="rest?operation=browseIndex" target="owndemorest">The Browse Index page</a>.
</li>
<li>
<a href="rest?operation=gfindObjects" target="owndemorest">The Search page</a>.
</li>
<li>
<a href="rest?operation=getRepositoryInfo" target="owndemorest">The Repository Info page</a>.
</li>
<li>
<a href="rest?operation=getIndexInfo" target="owndemorest">The Index Info page</a>.
</li>
</ul>
<li>There is a
<a href="rest?operation=gfindObjects&restXslt=enduserSearchToHtml" target="owndemorest">customizable end-user search client page</a>.
See the section on <a href="#endusersearch">Customizable end-user search client</a>.
</li>
</li>
<li>Ingest the Fedora 3.5 demo objects, there are 41, 20 of them are data objects and will be indexed.
View the admin pages.
</li>
</ul>
</div>
<div>
<a name="OVERALLDESCRIPTION"><h1>II. OVERALL DESCRIPTION</h1></a>
</div>
<div>
<a name="majorfeatures"><h2>Major features</h2></a>
<p>The service has the following major features:</p>
<ul>
<li>Indexing of Fedora FOXML records,
including the text contents of datastreams
and the results of disseminator calls.</li>
<li>Search in the index.</li>
<li>Plugin of selected search engines,
so far
<a href="http://lucene.apache.org/">Lucene</a>,
<a href="http://lucene.apache.org/solr">Solr</a> and
<a href="http://www.indexdata.dk/zebra/">Zebra</a>.</li>
</ul>
<p>You are encouraged to share problems and experience with the
Fedora community, send mail to
<a href="mailto:fedora-commons-users@lists.sourceforge.net">fedora-commons-users</a>, or to
<a href="mailto:cwilper@duraspace.org">Chris Wilper</a>, or to
<a href="mailto:gsp@dtic.dtu.dk">Gert Schmeltz Pedersen</a>.</p>
<p>The following figure serves to give a first
understanding for a developer, who will use GSearch in a Fedora application:</p>
<p><img src="images/fgs-model.png"/></p>
<p>The figure shows:</p>
<ul>
<li>A REST client, running in a user's browser, which
may combine accesses to Fedora and to GSearch.</li>
<li>A SOAP client, running anywhere, may do the same.</li>
<li>The Search Service implements a generic set of operations:
<ul>
<li><b>updateIndex</b> - indexing the contents of the Fedora repository.</li>
<li><b>gfindObjects</b> - search similar to Fedora findObjects and to the SRW/SRU operation <b>searchRetrieve</b>.</li>
<li><b>browseIndex</b> - browsing terms in a given index, similar to the SRW/SRU operation <b>scan</b>.</li>
<li><b>getRepositoryInfo</b> - describing the properties of a repository,</li>
<li><b>getIndexInfo</b> - describing the properties of an index.</li>
</ul>
</li>
<li>Engine specific implementations of the operations will receive
client requests, communicate with the engine indexer and search server,
and return the responses in the appropriate form to the clients.</li>
</ul>
<p>GSearch may run in a separate
web server and may index more than one Fedora repository,
and it may update more than one index in parallel.
</p>
<p>XSLT stylesheets are part of the configuration of GSearch,
and XSLT transformations play an essential role in the workflow:
</p>
<p><img src="images/fgs-arch.png"/></p>
<ul>
<li>
All engine specific operations return
an engine specific xml answer, which is transformed
by an engine-specific xslt stylesheet into result page xml.
For a SOAP request this is the answer.
For a REST request this is transformed to an html answer.
There may be any number of xslt stylesheets to select from,
the default ones are selected in the properties file.
Selecting a copy stylesheet will allow the transfer
of an answer untransformed. An alternative result page format
is <a href="http://opensearch.a9.com/">OpenSearch</a>,
which is an RSS2.0 extension.
</li>
<li>Parameters allow clients
to select repository, index, and xslt stylesheets by name.
In a real application, these values may be determined
by the developer in the code,
or by the administrator in the properties file.
</li>
</ul>
</div>
<div>
<a name="updateIndex"><h2>More on the updateIndex operation</h2></a>
<p><img src="images/fgs-arch-indexing.png"/></p>
<ul>
<li>Objects in the Fedora repository are exported
in FOXML format, transformed into an appropriate
document format by the indexing stylesheet, and
indexed by the engine in question. The XML datastreams
are indexed as decided in the stylesheet.
</li>
<li>The following updateIndex actions are available:
<ul>
<li><b>createEmpty</b> - creating or emptying the index.
For a new index, you have to run createEmpty once, before
you can run the other actions.</li>
<li><b>fromFoxmlFiles ( filePath )</b> - indexing FOXML records;
filePath may be null, in which case the configured
Fedora Object Directory is used, so that the whole
of the Fedora repository is indexed.</li>
<li><b>fromPid ( PID )</b> - indexing one FOXML record,
as exported by Fedora API-M; in case a previous
index document with the same PID exists, it is first deleted.
This is the incremental update operation that is called after
all of Fedora's API-M operations that modifies a FedoraObject,
if <a href="#gfauto">GSearch and Fedora are configured for automatic updates</a>.</li>
<li><b>deletePid ( PID )</b> - deleting one index document,
called by automatic updates after a Fedora purgeObject.</li>
</ul>
</li>
</ul>
</div>
<div>
<a name="engines"><h2>Search engine plugins</h2></a>
<h4><a href="http://lucene.apache.org/">Lucene</a></h4>
<p>The Lucene plugin comes in fedoragsearch.war as the java package dk.defxws.fgslucene
together with the Apache Lucene java libraries.</p>
<p>The Lucene plugin is configured during
<a href="#basicproperties">Edit and use the basic property values</a> below,
resulting in the set of GSearch configuration files.</p>
<p>Lucene has a very rich functionality, and this plugin
allows you to configure many of its options, while all the other options
are used with their default values.
As a java programmer, you may
have ideas for further exploitation, which you may realize
by implementing an enhanced version of the plugin.
Please, share such ideas and implementations with the Fedora community.</p>
<h4><a href="http://lucene.apache.org/solr">Solr</a></h4>
<p>The Solr server is downloaded, installed and configured as described at the Solr web site.</p>
<p>The Solr server uses the Lucene java libraries for indexing and search.</p>
<p>The Solr plugin comes in fedoragsearch.war as the java package dk.defxws.fgssolr</p>
<p>The Solr plugin is configured during
<a href="#basicproperties">Edit and use the basic property values</a> below,
resulting in the set of GSearch configuration files.</p>
<p>The Solr plugin has dependencies on the configuration of the Solr server.
You should begin with the schema.xml file provided by GSearch in
FgsConfig/FgsConfigIndexTemplate/Solr/conf/schema-3.6.0-for-fgs-2.4.2.xml .
It has a few modifications aimed at the Fedora demo objects.
You should also consider the autoCommit element in solrconfig.xml .
Besides, you need to go through all the Solr conf files
and make sure they match the index documents generated by your GSearch indexing stylesheet.</p>
<p>This plugin indexes documents via the HTTP POST interface of Solr.
Searches may be performed via the Solr native HTTP GET to the Solr server
and via gfindObjects, which accesses the Lucene index directly.</p>
<p>Solr functionality does not include browsing, however, this is offered
by the plugin via the browseIndex operation,
which also accesses the Lucene index directly.</p>
<p>If you run Islandora</p>
<h4><a href="http://www.indexdata.dk/zebra/">Zebra</a></h4>
<p>The Zebra plugin comes in fedoragsearch.war as the java package dk.defxws.fgszebra .</p>
<p>The Zebra plugin is used by configuration as
seen from FgsConfig/FgsConfigIndexTemplate/Zebra/zebraconfig, which includes a README file,
which explains how to get and install Zebra, and how to configure it.</p>
</div>
<div>
<a name="CONFIGURATION"><h1>III. CONFIGURATION</h1></a>
</div>
<div>
<a name="realapp"><h2>Install and configure for your application</h2></a>
<p>Perform these steps:</p>
<ul>
<li>Download fedoragsearch.war as above and copy it to a tomcat or similar web server.
It does not need to be the web server running Fedora itself.
You may rename the .war file, before you copy it
into the webapps directory, in order to give it another webapp name.
</li>
<li>Set the value of the param enabled to true for the Messaging module
in fedora.fcfg as <a href="#owndemo">for the demo at your own site</a>.
</li>
<li>Now this documentation page is visible at
<a href="#">your own site</a>,
and
<a href="rest?operation=updateIndex">the admin pages are here</a>.
</li>
<li>The SOAP service operations are deployed with the .war file, and
<a href="services/FgsOperations?wsdl">the .wsdl file is available here.</a>
</li>
<li>The choice of search engine is made with the fgsindex.operationsImpl property
in your index.properties file, as set in the file fgsconfig-basic.properties (see below).
If you choose Solr or Zebra, you have to install and start the respective server.
</li>
<li><a name="config"><h3>Create the configuration files</h3></a>
<ul>
<li>
If you <b>migrate from GSearch 2.2 or 2.3 to 2.4.*</b>,
you simply reuse the configuration files you have.
The only things you must do from 2.2 are
rename the root directory of the configuration files
from 'config' to 'fgsconfigFinal',
and if you use sortType AUTO in index.properties (explicitly or by default),
change to STRING (because AUTO is deprecated in Lucene 3.*).
You may want to add new properties introduced in 2.3 and 2.4.*.
If you kept the configuration files within tomcat in the default classpath,
you may want to move them outside, see below.
</li>
<li>
If you <b>start with GSearch 2.4.*</b>, creating the configuration files is much simpler than before. Here are the two basic parts:
<ul>
<li>
<a name="indexingstylesheet"><h3>Generate indexing stylesheet from example foxml files</h3></a>
<ul>
<li>Copy the directory webapps/fedoragsearch/FgsConfig to a location outside tomcat.
</li>
<li>Go to this location.
</li>
<li>Put one or more example foxml files in FgsConfig/indexingXsltGenerator/foxml .
They must end with newline.
If you want to index managed xml datastreams, insert an example inline,
see the example in the test file FgsConfig/indexingXsltGenerator/foxml/test_fgs23.xml.
</li>
<li>At FgsConfig run <pre>>ant generateIndexingXslt</pre>
</li>
<li>Now you have
<pre>FgsConfig/FgsConfigIndexTemplate/Lucene/foxmlToLuceneGenerated.xslt</pre>
and <pre>FgsConfig/FgsConfigIndexTemplate/Solr/foxmlToSolrGenerated.xslt</pre>
You may use them as they are or copy-to-another-name and edit them,
probably there are many index fields that you do not want.
You will put the name into the basic property file
in order to use that indexing stylesheet at indexing time.
</li>
<li>There are foxmlToLucene.xslt and foxmlToSolr.xslt files, useful for the Fedora demo objects,
that you may use for customizing instead of generating from foxml files.
</li>
</ul>
</li>
<li>
<a name="basicproperties"><h3>Edit and use the basic property values</h3></a>
You edit a basic property file and run an ant script with it.
This will insert your property values into your copy of a set of template configuration files,
providing the final set of configuration files.
These may be edited, if you want to select among more than the basic configuration options.
Here are the basic steps in more detail:
<ul>
<li>Edit the file FgsConfig/fgsconfig-basic.properties
</li>
<li>Run with privilege to write to the final config location,
that you stated in fgsconfig-basic.properties:<pre>
> ant -f fgsconfig-basic.xml</pre>
</li>
<li>This has used the property values in fgsconfig-basic.properties
and inserted them into the copies of the template config files,
that now make up the final config files, which have been copied
to the final config location.
</li>
<li>This location of the final config files must be in tomcat classpath,
in order that GSearch can find them at startup.
By default webapps/fedoragsearch/WEB-INF/classes is in tomcat classpath.
Alternatively, you may add another classpath location to tomcat
in catalina.properties in the line starting with <pre>shared loader=</pre>
and state that location in fgsconfig-basic.properties.
Make sure that there is only one 'fgsconfigFinal'-directory
and one log4j.xml file in the classpath.
</li>
<li>You should read through the final config files.
You may edit all the properties of the final config files.
If you do edit them, and they are within tomcat,
be sure to keep a copy outside tomcat.
The reason is, that if you put a new fedoragsearch.war into tomcat webapps,
then tomcat will delete the existing unpacked fedoragsearch directory
with your edited final config files.
</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
<br/>
<li>
The default webapp configuration in
.../webapps/fedoragsearch/WEB-INF/web.xml
enforces authorization based on fedora-users.xml.
Then only users with names 'fedoraAdmin', 'fgsTester'
and names starting with 'fgsAdmin'
are authorized to perform updateIndex actions.
If you want not to enforce authorization,
then copy the file web_withoutAuthN.xml onto web.xml.
Then even updateIndex actions are not protected.
</li>
<li>
Then you may restart fedoragsearch and call http://<HOSTPORT>/fedoragsearch/rest in order to index and search.
The name "rest" may be reconfigured in
.../webapps/fedoragsearch/WEB-INF/web.xml
</li>
<li>
Try the command line client. Change directory to
<pre>.../webapps/fedoragsearch/client/</pre>
make the file executable, and run
<pre>sh runRESTClient.sh</pre>
then you will get the usage instruction.
</li>
<li>For your real applications, you may provide alternative stylesheets
in webapps/fedoragsearch/WEB-INF/classes/config/rest
and set their names in webapps/fedoragsearch/WEB-INF/classes/config/fedoragsearch.properties.
</li>
<li>
Inspect the Lucene index with <a href="http://code.google.com/p/luke/">Luke</a>.
Notice, Luke cannot open an empty Lucene index.
</li>
</ul>
</div>
<div>
<a name="gfauto"><h2>Configuring GSearch and Fedora for automatic updates</h2></a>
<p>
By default, GSearch is configured for automatic updates through Fedora notifications.
For deeper understanding and modification,
see the property fedoragsearch.updaternames in fedoragsearch.properties,
and updater.properties in config/updater/FgsUpdaters
</p>
<p>
By default, Fedora is NOT configured for automatic updates through notifications to GSearch.
</p>
<p>
In order to configure Fedora for automatic updates through notifications to GSearch,
set the value of the param <code>enabled</code> to <code>true</code> for the Messaging module in <code>fedora.fcfg</code>:
<pre><module role="org.fcrepo.server.messaging.Messaging" class="org.fcrepo.server.messaging.MessagingModule">
<comment>Fedora's Java Messaging Service (JMS) Module</comment>
<param name="enabled" value="true"/></pre>
</li>
</p>
<p>As a deprecated alternative to updates via messaging,
it is possible to configure Fedora to
send a signal via REST to GSearch, when objects are added, modified,
and purged. Do NOT enable both alternatives.
<br/>
To enable REST-based updates, edit your <code>fedora.fcfg</code> file
and change the class of the <code>fedora.server.storage.DOManager</code>
module to <code>org.fcrepo.server.storage.GSearchDOManager</code>.
Then populate the following module parameters as needed:
</p>
<ul>
<li> <code>gSearchRESTURL</code> - The REST endpoint for
GSearch, for example, http://localhost:8080/fedoragsearch/rest</li>
<li> <code>gSearchUsername</code> - If GSearch is protected by
authentication, this is the username that Fedora should use to
authenticate.</li>
<li> <code>gSearchPassword</code> - The password for the above
user, if applicable</li>
</ul>
</div>
<div>
<a name="multilingual"><h2>Multilingual configuration</h2></a>
<p>Add the attribute
<pre>URIEncoding="UTF-8"</pre>
to .../tomcat/conf/server.xml
and to .../tomcat/conf/server_fedoraTemplate.xml in order to search
special characters like the Spanish "ñ",
"í" etc. (thanks to Luis Zorita).</p>
</div>
<div>
<a name="FURTHERUSAGE"><h1>IV. FURTHER USAGE</h1></a>
</div>
<div>
<a name="extraction"><h2>Full-text and metadata extraction from datastreams using Apache Tika</h2></a>
<ul>
<li>Tika has a default maximum length of 100000 characters when extracting text from documents.
This can be configured in GSearch in the property fedoragsearch.writeLimit in fedoragsearch.properties.
Setting it to -1 will remove the length restriction.
However, very long documents may take too much time during indexing,
so the default length or other fixed length may be sensible.
Characters in the document beyond the writeLimit will be ignored for indexing,
and a log warning is given.
</li>
</ul>
<table border="1" cellpadding="8">
<tr><th align="left" colspan="2">Parameters for getDatastreamFromTika, getDatastreamTextFromTika, and getDatastreamMetadataFromTika</th></tr>
<tr><td>indexFieldTagName</td><td>either "IndexField" (with the Lucene plugin) or "field" (with the Solr plugin)</td></tr>
<tr><td>textIndexField<br/> (not used with getDatastreamMetadataFromTika)</td><td>fieldSpec for the text index field, null or empty if not to be generated</td></tr>
<tr><td>indexfieldnamePrefix<br/> (not used with getDatastreamTextFromTika)</td><td>optional or empty, prefixed to the metadata index field names</td></tr>
<tr><td>selectedFields<br/> (not used with getDatastreamTextFromTika)</td><td>comma-separated list of metadata fieldSpecs, if empty then all fields are included with default params</td></tr>
<tr><td>fieldSpec</td><td>metadataFieldName [ '=' indexFieldName] [ '/' [index] [ '/' [store] [ '/' [termVector] [ '/' [boost]]]]]</td></tr>
<tr><td>- metadataFieldName</td><td>must be exactly as extracted by Tika from the document.
You may see the available names, if you log in debug mode
and look for "METADATA name=" under "fullDsId=" in the log, when "getFromTika" was called during updateIndex</td></tr>
<tr><td>- indexFieldName</td><td>is used as the generated index field name.
If not given, GSearch uses metadataFieldName after replacement of the characters ' ', ':', '/', '=', '(', ')' with '_'</td></tr>
<tr><td>- the following parameters are used with Lucene (with Solr these values are specified in schema.xml)</td></tr>
<tr><td>- index</td><td>[ 'TOKENIZED' | 'UN_TOKENIZED' ]<br/> # first alternative is default</td></tr>
<tr><td>- store</td><td>[ 'YES' | 'NO' ]<br/> # first alternative is default</td></tr>
<tr><td>- termVector</td><td>[ 'YES' | 'NO' ]<br/> # first alternative is default</td></tr>
<tr><td>- boost</td><td><decimal number><br/> # '1.0' is default</td></tr>
</table>
</div>
<div>
<a name="endusersearch"><h2>Customizable end-user search client</h2></a>
<p>The download contains the following files in webapps/fedoragsearch/ that you may customize:</p>
<ul>
<li>WEB-INF/classes/<configName>/rest/enduserSearchToHtml.xslt (basic page generator)</li>
<li>WEB-INF/classes/<configName>/rest/fgseuBrowseTermsToHtml.xslt (browseIndex by ajax call)</li>
<li>WEB-INF/classes/<configName>/rest/fgseuFacetTermsToHtml.xslt (Solr facet search by ajax call)</li>
<li>WEB-INF/classes/<configName>/rest/fgseuSearchResultToHtml.xslt (gfindObjects search by ajax call)</li>
<li>WEB-INF/classes/<configName>/rest/fieldsUnique.xml (see below)</li>
<li>css/fgseu.css</li>
<li>js/enduserSearch.js</li>
<p>The file fieldsUnique.xml is found in FgsConfig/indexingXsltGenerator/generatedFiles.
It has one element per index field generated from your example foxml files.
You may add, modify, and delete index field elements
to suit the needs of your end-user search client.</p>
<p>From the admin pages, <a href="rest?operation=gfindObjects&restXslt=enduserSearchToHtml">this is the end-user search client page</a>.
</p>
</ul>
</div>
<div>
<a name="searchresfilt"><h2>Search result filtering</h2></a>
<p>Search result filtering
will show only those search hits that the user is actually permitted to read.
Three solutions have been investigated and demonstrated
and <a href="http://dorsdl2.cvt.dk/dorsdl2-10-pedersen.ppt">presented here</a>.
Besides, the demonstration is included with the GSearch distribution
in .../WEB-INF/classes/configDemoSearchResultFiltering/ .
In brief, the three solutions are:</p>
<ul>
<li><b>Post-search filtering</b>, which requires a request to the XACML mechanism for each hit,
and the total number of permitted hits is only known at the end,
a costly procedure especially when few hits are permitted out of a large number.
</li>
<li><b>In-search filtering</b>, which requires additional index fields and query rewriting,
that is, a logical partitioning of the index.
</li>
<li><b>Pre-search filtering</b>, which requires a physical partitioning of the index
and selection of the pertinent index at query time.
</li>
</ul>
<p>Both in-search and pre-search filtering face the challenge
of exact correspondence between the filtering mechanism and the XACML policies.
</p>
<p>For your own purpose, in fedoragsearch.properties, you have to select
the preferred searchResultFilteringType and set the searchResultFilteringModule
to a class that you have to program, as a subclass of the demo class
dk.defxws.fedoragsearch.server.SearchResultFilteringDemoImpl
or as an implementation of the interface
dk.defxws.fedoragsearch.server.SearchResultFiltering .
</p>
</div>
<div>
<a name="configman"><h2>Management of GSearch configurations in Fedora objects</h2></a>
<ul>
<li>This is based on an idea by Adam Soroka. The current implementation of it, may or may not be a "solution" for the needs described.</li>
<li>The "solution" consists in creating Fedora objects to hold the current configuration files as datastreams.
In this way, they can be managed with Fedora tools, and they can be part of RELS-EXT and RELS-INT relationships and XACML policy controls.</li>
<li>The "solution" consists of one action
<pre>http://.../fedoragsearch/rest?operation=configure&configureAction=setFgsConfigObjects</pre>
that copies the fgsconfigFinal files into datastreams of a Fedora object,
where they can be modified (and even further datastreams be created),
and one action
<pre>http://.../fedoragsearch/rest?operation=configure&configureAction=getFgsConfigObjects</pre>
that copies the datastreams into the fgsconfigFinal files,
where the modifications will take effect immediately. </li>
</ul>
</div>
<div>
<a name="objectsnnindex"><h2>Many-to-many relationship between Fedora objects and index documents</h2></a>
<ul>
<li>GSearch now allows more than one index document per Fedora object, their ids are formed as <PID>'$'<suffix>, where the suffix typically is a datastream id.</li>
<li>The opposite, an index document with values from more than one Fedora object, is possible by the use of the document() function of XSLT.</li>
<li>A demonstration is included with the GSearch distribution
in .../WEB-INF/classes/configDemoIndexPerDS_fgs24_1019/ ..
</ul>
</div>
<div>
<a name="reposnnindex"><h2>Many-to-many relationship between repositories and indexes</h2></a>
<ul>
<li>A typical application using GSearch will index one repository in one index.
However, you have the possibility to index
many repositories in one or more indexes in parallel, as shown in the image below.</li>
<li>The GSearch download has an example of one repository to three indexes
in .../WEB-INF/classes/configDemoSearchResultFiltering/ .</li>
<li>In general, you configure each repository and each index in the set of configuration files,
and list them in fedoragsearch.properties</li>
</ul>
<p><img src="images/fgs-manytomany.png"/></p>
</div>
<div>
<a name="embeddedqueries"><h2>Embedded queries</h2></a>
<p>This is a mechanism that allows you to embed risearch queries in Lucene or Solr queries, and vice versa.</p>
<p>This provides interaction with the Resource Index, both when you index and when you search.</p>
<p>It compensates for the lack of joins in bibliographic query languages like in Lucene and Solr,</p>
<p>and it compensates for the lack of text search functionality in logic languages like the risearch query languages.</p>
<p>The full potential of this mechanism still has to be explored and realized.</p>
<p>These preliminary examples show some of the potential:
<ul>
<li>RISEARCH:
<a href="rest?operation=gfindObjects&restXslt=copyXml&query=(::RISEARCH::type=tuples%26lang=itql%26format=Sparql%26query=select+$obj1+from+%3C%23ri%3E+where+$obj1+%3Cinfo:fedora/fedora-system:def/relations-external%23isMemberOf%3E+%3Cinfo:fedora/demo:SmileyStuff%3E::RISEARCH::)">an itql search</a>
</li>
</ul>
<ul>
<li>GSEARCH:
<a href="rest?operation=gfindObjects&restXslt=copyXml&query=(::GSEARCH::operation=gfindObjects%26restXslt=copyXml%26query=dc.creator:apache::GSEARCH::)">a GSearch search</a>
</li>
</ul>
<ul>
<li>GSEARCH with RISEARCH:
<a href="rest?operation=gfindObjects&restXslt=copyXml&query=smiley+and+(::RISEARCH::xsltName/risearchToGsearch?type=tuples%26lang=itql%26format=Sparql%26query=select+$obj1+from+%3C%23ri%3E+where+$obj1+%3Cinfo:fedora/fedora-system:def/relations-external%23isMemberOf%3E+%3Cinfo:fedora/demo:SmileyStuff%3E::RISEARCH::)">a GSearch search with embedded itql</a>
</li>
</ul>
<ul>
<li>RISEARCH with GSEARCH:
<a href="rest?operation=gfindObjects&restXslt=copyXml&query=(::RISEARCH::type=tuples%26lang=itql%26format=Sparql%26query=select+$obj1+from+%3C%23ri%3E+where+$obj1+%3Cinfo:fedora/fedora-system:def/relations-external%23isMemberOf%3E+%3Cinfo:fedora/demo:SmileyStuff%3E+or+(::GSEARCH::xsltName/gsearchToRisearch?operation=gfindObjects%26restXslt=copyXml%26query=dc.creator:apache::GSEARCH::)::RISEARCH::)">an itql search with embedded GSearch</a>
</li>
</ul>
<ul>
<li>SOLR:
<a href="rest?operation=gfindObjects&restXslt=copyXml&query=(::SOLR::facet=true%26facet.field=dc.creator%26fl=dc.creator%26q=dc.creator:apache::SOLR::)">a Solr facet search</a>
</li>
</ul>
</p>
</div>
<div>
<a name="source"><h2>Building from source</h2></a>
<p>Get the source from github:</p>
<pre>
git clone https://github.com/fcrepo/gsearch.git
</pre>
<p>To build fedoragsearch.war in FgsBuild/fromsource for normal installation:</p>
<pre>
cd FedoraGenericSearch
ant buildfromsource
</pre>
<p>To build fedoragsearch.war in FgsBuild/localtest for local testing:</p>
<pre>
cd FedoraGenericSearch
ant -Dlocal.PROTOCOL=<protocol> -Dlocal.HOSTPORT=<hostport> -Dlocal.FEDORA_HOME=<location> -Dlocal.SOLR_HOME=<location> -Dlocal.SOLR_SERVER=<url> buildforlocaltest
</pre>
<p>The fedoragsearch.war for local testing contains a set of configurations
that are used by the test operations below.
You may want to run the test operations, if you are customizing the GSearch code.
</p>
<p>To run tests in tomcat at <protocol>://<hostport>/fedoragsearch
install a Fedora repository with demo objects with MessagingModule enabled,
and create a test user in fedora-users.xml :
</p>
<pre>
<user name="fgsTester" password="fgsTesterPassword">
<attribute name="fedoraRole">
<value>tester</value>
</attribute>
</user>
</pre>
<p>Test operations on the lucene plugin:
</p>
<pre>
ant junit-lucene
ant junit-testsonlucene
ant junit-fgs23
ant junit-lucene-fgs24_1010
ant junit-lucene-fgs24_1019
ant junit-lucene-fgs242_1076
ant junit-lucene-fgs242_1083
</pre>
<p>Test operations on the solr plugin, after startup of the solr server:
</p>
<pre>
ant junit-solr
ant junit-solr-fgs24_1010
</pre>
<p>Test operations on the zebra plugin, install, configure and startup the zebra server:</br>
</p>
<pre>
see $FEDORA_HOME/tomcat/webapps/fedoragsearch/WEB-INF/classes/configDemoOnZebra/index/DemoOnZebra/zebraconfig/README
ant junit-zebra
</pre>
</div>
<div>
<a name="HISTORY"><h1>V. HISTORY</h1></a>
</div>
<div>
<a name="new242"><h2>New features in version 2.4.2</h2></a>
<ul>
<li>Enhanced example URIResolverImpl (<a href="https://jira.duraspace.org/browse/FCREPO-1083">FCREPO-1083</a>)
<ul>
<li>Enhanced dk.defxws.fedoragsearch.server.URIResolverImpl to handle other URIs than for a Fedora repository.
This issue was initiated by a pull request from sarowe at github, thank you.
This class may be set in index.properties for fgsindex.uriResolver .
A developer may implement
<a href="http://xerces.apache.org/xerces2-j/javadocs/api/javax/xml/transform/URIResolver.html">javax.xml.transform.URIResolver</a>
and set it instead.
</li>
</ul>
</li>
<li>Compatibility with Lucene 3.6, Solr 3.6, Axis 1.4, and Tika 1.1
(<a href="https://jira.duraspace.org/browse/FCREPO-1074">FCREPO-1074</a>)
(<a href="https://jira.duraspace.org/browse/FCREPO-1082">FCREPO-1082</a>)
</li>
<li>A function to return a datastream as an XML tree (<a href="https://jira.duraspace.org/browse/FCREPO-1078">FCREPO-1078</a>)
<ul>
<li>Implemented as getDatastreamXML() in dk.defxws.fedoragsearch.server.GenericOperationsImpl .
<br/>Used as in FgsConfig/test_fgs23/foxmlToLuceneWithNotInline.xslt :<br/>
<code>exts:getDatastreamXML($PID, $REPOSNAME, $DSID, $FEDORASOAP, $FEDORAUSER, $FEDORAPASS, $TRUSTSTOREPATH, $TRUSTSTOREPASS)</code>
</li>
</ul>
</li>
<li>New property: fgsindex.lowercaseExpandedTerms (<a href="https://jira.duraspace.org/browse/FCREPO-1076">FCREPO-1076</a>)
<ul>
<li>Adds fgsindex.lowercaseExpandedTerms to index.properties for the lucene plugin.
Default is true, but if set to false, then the
<a href="http://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/queryParser/QueryParser.html#setLowercaseExpandedTerms%28boolean%29">
query processor</a> will not change a query with wildcards to lowercase,
which is the problem for queries on UN_TOKENIZED fields.
</li>
</ul>
</li>
</ul>
<a name="new241"><h2>New features in version 2.4.1</h2></a>
<ul>
<li>Improvement of the control over the length of datastreams in Apache Tika (<a href="https://jira.duraspace.org/browse/FCREPO-1049">FCREPO-1049</a>)
<ul>
<li>Tika has a default maximum length of 100000 characters when extracting text from documents.
This can now be configured in GSearch in the property fedoragsearch.writeLimit in fedoragsearch.properties.
Setting it to -1 will remove the length restriction.
However, very long documents may take too much time during indexing,
so the default length or other fixed length may be sensible.
Characters in the document beyond the writeLimit will be ignored in indexing,
and a log warning is given.
</li>
</ul>
</li>
</ul>
<a name="new24"><h2>New features in version 2.4</h2></a>
<ul>
<li>Compatibility with Lucene 3.5 and Solr 3.5 (<a href="https://jira.duraspace.org/browse/FCREPO-1005">FCREPO-1005</a>)</li>
<li>Useful end-user search page generation from indexing stylesheet (<a href="https://jira.duraspace.org/browse/FCREPO-1006">FCREPO-1006</a>)
<ul>
<li>See the section on <a href="#endusersearch">Customizable end-user search client</a>.</li>
</ul>
</li>
<li>Performance measurements and possibly improvements (<a href="https://jira.duraspace.org/browse/FCREPO-1007">FCREPO-1007</a>).
Measurements taken using Apache JMeter, on a production quality platform, giving some insight into the performance implications of various choices.
Download <a href="https://github.com/fcrepo/gsearch/blob/master/FedoraGenericSearch/src/performance/PerformanceMeasurementsforFedoraGSearch2.3.pdf">the report from github</a>.
Morten Sørensen, DTU Library, is co-developer and co-author on this.
</li>
<li>Filtering of search results by access constraints (<a href="https://jira.duraspace.org/browse/FCREPO-1008">FCREPO-1008</a>)
<ul>
<li><a href="http://pubs.or08.ecs.soton.ac.uk/104/">Based on work presented at OR2008</a>.</li>
<li>Problem: Search results contain hits that the user does not have the access rights to see</li>
<li>Solution: Extend access rights to search results by filtering</li>
<li>Thanks to Swithun Crowe for providing a real life example</li>
<li>See the section on <a href="#searchresfilt">Search result filtering</a>.</li>
</ul>
</li>
<li>Interaction with the Resource Index (<a href="https://jira.duraspace.org/browse/FCREPO-1009">FCREPO-1009</a>)
<ul>
<li>See the section on <a href="#embeddedqueries">Embedded queries</a>.</li>
</ul>
</li>
<li>Use of Apache Tika for full-text and metadata extraction (<a href="https://jira.duraspace.org/browse/FCREPO-1010">FCREPO-1010</a>)
<ul>
<li><a href="http://tika.apache.org/">The Apache Tikaª toolkit</a> extracts text and metadata from documents,
if the format is detectable by AutoDetectParser in Tika.</li>
<li>In addition to the text extraction with PDFBox, GSearch now provides the following text and metadata extraction functions:
<ul>
<li>getDatastreamTextFromTika: retrieves the text only</li>
<li>getDatastreamMetadataFromTika: retrieves metadata only, also for non-text datastreams like images</li>
<li>getDatastreamFromTika: retrieves both text and metadata</li>
</ul>
</li>
<li>See the section on <a href="#extraction">Full-text and metadata extraction from datastreams</a>.</li>
<li>Thanks to Adam Soroka for the suggestion and the review.</li>
</ul>
</li>
<li>Management of GSearch configurations in Fedora objects (<a href="https://jira.duraspace.org/browse/FCREPO-1018">FCREPO-1018</a>)
<ul>
<li>See the section on <a href="#configman">Management of GSearch configurations in Fedora objects</a>.</li>
</ul>
</li>
<li>Exploration of complex GSearch use cases (<a href="https://jira.duraspace.org/browse/FCREPO-1019">FCREPO-1019</a>)
<ul>
<li>Jonathan Green states: "... the index may not always share a 1 to 1 relationship with objects in fedora."</li>
<li>GSearch now allows more than one index document per Fedora object, their ids are formed as <PID>'$'<suffix>, where the suffix in the test case is a datastream id.</li>
<li>The opposite, an index document with values from more than one Fedora object, is possible by the use of the document() function of XSLT.</li>
<li>See the section on <a href="#objectsnnindex">Many-to-many relationship between Fedora objects and index documents</a>.</li>
<li>A typical application using GSearch will index one repository in one index.
However, you have the possibility to index
many repositories in one or more indexes in parallel,
see the section on <a href="#reposnnindex">Many-to-many relationship between repositories and indexes</a>.</li>
</ul>
</li>
</ul>
You may also <a href="https://jira.duraspace.org/secure/IssueNavigator.jspa?mode=hide&requestId=10311">see the complete list of issues for GSearch.</a>
<a name="new23"><h2>New features in version 2.3</h2></a>
<ul>
<li>Fedora 3.5 compatibility
<ul>
<li>Indexing of managed xml datastreams shown with test object
</li>
</ul>
</li>
<li>Lucene 3.4 compatibility</li>
<li>Solr 3.4 compatibility</li>
<li>Zebra 2.0 compatibility</li>
<li>PDFBox 1.6 compatibility</li>
<li>Simplified configuration with two main parts:
<ul>
<li>Indexing stylesheet generated from example foxml files, requiring less xslt experience
</li>
<li>Basic properties specified in simple property file, instead of in ant script
</li>
</ul>
</li>
<li>Selection of xslt processor, xalan or saxon, see fedoragsearch.properties</li>
</ul>
You may also <a href="https://jira.duraspace.org/secure/IssueNavigator.jspa?mode=hide&requestId=10305">see the complete list of issues for GSearch 2.3.</a>
<a name="new22"><h2>New features in version 2.2</h2></a>
<ul>
<li>Fedora 3.1 compatibility</li>
<li>Lucene 2.4.0 compatibility</li>
<li>Solr 1.3.0 compatibility</li>
<li>For the lucene plugin: Search result filtering by access constraints, as defined by XACML policies,
in order to show only those search hits that the user is actually permitted to read.
<a href="#searchresfilt">Read more ...</a>.
</li>
</ul>
<a name="new211"><h2>New features in version 2.1.1</h2></a>
<ul>
<li>Fedora 3.0 compatibility</li>
</ul>
<a name="new21"><h2>New features in version 2.1</h2></a>
<ul>
<li>Fedora 3.0b2 compatibility</li>
<li>Added an update listener which uses the Fedora Messaging Client to listen for
updates being performed through API-M. These update messages contain the information
needed to perform index updates, thereby keeping GSearch up-to-date with the Fedora
repository.</li>
<li>Enhanced the sortFields parameter to gfindObjects for Lucene,
sorting search results by a custom Comparator class,
see the index.properties file in configTestOnLucene and
the test class dk.defxws.fedoragsearch.test.ComparatorSourceTest.</li>
<li>Enhanced the fromFoxmlFiles action of updateIndex for Lucene,
so that all files are attempted to be indexed,
even though one or more may fail,
in which case log messages are given.
Before, one failure would cause abortion.</li>
</ul>
<a name="new20"><h2>New features in version 2.0</h2></a>
<ul>
<li>Added a plugin for the Apache Solr search server.</li>
<li>Added easier configuration, so that you need only edit one file
with property values, then run it with ant.</li>
<li>Updated to Lucene version 2.3.0.</li>
<li>Added params to indexing in the format:
<pre>...&indexDocXslt=[xslt-name][(paramname1=value1[,paramname2=value2[,...]])]</pre>
Use the parameters at indexing time by putting xsl:param statement in the
indexing xslt stylesheet, like this:
<pre><xsl:param name="someparamname" select="defaultvalue"/></pre></li>
<li>Added optimize options for Lucene indexing:<br/>
<pre>fgsindex.mergeFactor and fgsindex.maxBufferedDocs</pre>
will affect performance, see the index.properties file in configTestOnLucene.
Also added
<pre>...?operation=updateIndex&action=optimize</pre>