Skip to content

Commit

Permalink
Documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
Gert Schmeltz Pedersen committed May 24, 2012
1 parent 3ae4afb commit 4799b3c
Showing 1 changed file with 113 additions and 99 deletions.
212 changes: 113 additions & 99 deletions FedoraGenericSearch/src/html/fedoragsearch-doc.html
Expand Up @@ -38,7 +38,7 @@
<div id="header">
<a href="" id="logo"></a>
<div id="title">
<h1>Fedora Generic Search Service Version 2.4.1</h1>
<h1>Fedora Generic Search Service Version 2.4.2</h1>
<h2>compatible with Fedora Version 3.5</h2>
</div>
</div>
Expand Down Expand Up @@ -80,8 +80,7 @@ <h2>compatible with Fedora Version 3.5</h2>
<dt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#config">Create the configuration files</a></dt>
<dt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#indexingstylesheet">Generate indexing stylesheet from example foxml files</a></dt>
<dt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#basicproperties">Edit and use the basic property values</a></dt>
<dt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#gauto">Configuring GSearch for automatic updates</a></dt>
<dt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#fauto">Configuring Fedora for automatic updates</a></dt>
<dt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#gfauto">Configuring GSearch and Fedora for automatic updates</a></dt>
<dt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#multilingual">Multilingual configuration</a></dt>
<dt><a href="#FURTHERUSAGE">IV. FURTHER USAGE</a></dt>
<dt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#extraction">Full-text and metadata extraction from datastreams using Apache Tika</a></dt>
Expand Down Expand Up @@ -309,9 +308,11 @@ <h2>compatible with Fedora Version 3.5</h2>
<li><b>fromPid ( PID )</b> - indexing one FOXML record,
as exported by Fedora API-M; in case a previous
index document with the same PID exists, it is first deleted.
This is the incremental update operation that shall be called after
all of Fedora's API-M operations that modifies a FedoraObject.</li>
<li><b>deletePid ( PID )</b> - deleting one index document.</li>
This is the incremental update operation that is called after
all of Fedora's API-M operations that modifies a FedoraObject,
if <a href="#gfauto">GSearch and Fedora are configured for automatic updates</a>.</li>
<li><b>deletePid ( PID )</b> - deleting one index document,
called by automatic updates after a Fedora purgeObject.</li>
</ul>
</li>
</ul>
Expand All @@ -323,23 +324,38 @@ <h2>compatible with Fedora Version 3.5</h2>
<h4><a href="http://lucene.apache.org/">Lucene</a></h4>
<p>The Lucene plugin comes in fedoragsearch.war as the java package dk.defxws.fgslucene
together with the Apache Lucene java libraries.</p>
<p>The Lucene plugin is used by configuration as explained below.</p>
<p>The Lucene plugin is configured during
<a href="#basicproperties">Edit and use the basic property values</a> below,
resulting in the set of GSearch configuration files.</p>
<p>Lucene has a very rich functionality, and this plugin
exploits a small fraction of it. As a java programmer, you may
allows you to configure many of its options, while all the other options
are used with their default values.
As a java programmer, you may
have ideas for further exploitation, which you may realize
by implementing an enhanced version of the plugin.
Please, share such ideas and implementations with the Fedora community.</p>

<h4><a href="http://lucene.apache.org/solr">Solr</a></h4>
<p>The Solr server is downloaded, installed and configured as described at the Solr web site.</p>
<p>The Solr plugin comes in fedoragsearch.war as the java package dk.defxws.fgssolr.</p>
<p>The Solr plugin is used by configuration as explained below.
It has dependencies on the configuration of the Solr server.</p>
<p>The Solr server uses the Lucene java libraries for indexing and search.</p>
<p>The Solr plugin comes in fedoragsearch.war as the java package dk.defxws.fgssolr</p>
<p>The Solr plugin is configured during
<a href="#basicproperties">Edit and use the basic property values</a> below,
resulting in the set of GSearch configuration files.</p>
<p>The Solr plugin has dependencies on the configuration of the Solr server.
You should begin with the schema.xml file provided by GSearch in
FgsConfig/FgsConfigIndexTemplate/Solr/conf/schema-3.6.0-for-fgs-2.4.2.xml .
It has a few modifications aimed at the Fedora demo objects.
You should also consider the autoCommit element in solrconfig.xml .
Besides, you need to go through all the Solr conf files
and make sure they match the index documents generated by your GSearch indexing stylesheet.</p>
<p>This plugin indexes documents via the HTTP POST interface of Solr.
Searches may be performed via the Solr native HTTP GET to the Solr server
and via gfindObjects, which accesses the Lucene index directly.
Solr functionality does not include browsing, however, this is offered
by the plugin via the browseIndex operation.</p>
and via gfindObjects, which accesses the Lucene index directly.</p>
<p>Solr functionality does not include browsing, however, this is offered
by the plugin via the browseIndex operation,
which also accesses the Lucene index directly.</p>
<p>If you run Islandora</p>

<h4><a href="http://www.indexdata.dk/zebra/">Zebra</a></h4>
<p>The Zebra plugin comes in fedoragsearch.war as the java package dk.defxws.fgszebra .</p>
Expand All @@ -362,12 +378,13 @@ <h4><a href="http://www.indexdata.dk/zebra/">Zebra</a></h4>
You may rename the .war file, before you copy it
into the webapps directory, in order to give it another webapp name.
</li>
<li>Set the value of the param enabled to true for the Messaging module in fedora.fcfg as above.
<li>Set the value of the param enabled to true for the Messaging module
in fedora.fcfg as <a href="#owndemo">for the demo at your own site</a>.
</li>
<li>Now this documentation page is visible at
<a href="#" target="owndemo">your own site</a>.
<a href="#">your own site</a>,
and
<a href="rest?operation=updateIndex" target="owndemo">the admin pages here</a>.
<a href="rest?operation=updateIndex">the admin pages are here</a>.
</li>
<li>The SOAP service operations are deployed with the .war file, and
<a href="services/FgsOperations?wsdl">the .wsdl file is available here.</a>
Expand All @@ -379,15 +396,19 @@ <h4><a href="http://www.indexdata.dk/zebra/">Zebra</a></h4>
<li><a name="config"><h3>Create the configuration files</h3></a>
<ul>
<li>
If you <b>migrate from GSearch 2.2 or 2.3 to 2.4</b>, you simply reuse the configuration files you have.
The only things you must do from 2.2 are rename the root directory of the configuration files
from 'config' to 'fgsconfigFinal', and if you use sortType AUTO in index.properties (explicitly or by default), change to STRING.
You may want to add the new properties.
If you <b>migrate from GSearch 2.2 or 2.3 to 2.4.*</b>,
you simply reuse the configuration files you have.
The only things you must do from 2.2 are
rename the root directory of the configuration files
from 'config' to 'fgsconfigFinal',
and if you use sortType AUTO in index.properties (explicitly or by default),
change to STRING (because AUTO is deprecated in Lucene 3.*).
You may want to add new properties introduced in 2.3 and 2.4.*.
If you kept the configuration files within tomcat in the default classpath,
you may want to move them outside, see below.
</li>
<li>
If you <b>start with GSearch 2.4</b>, creating the configuration files is much simpler than before. Here are the two basic parts:
If you <b>start with GSearch 2.4.*</b>, creating the configuration files is much simpler than before. Here are the two basic parts:
<ul>
<li>
<a name="indexingstylesheet"><h3>Generate indexing stylesheet from example foxml files</h3></a>
Expand All @@ -399,7 +420,7 @@ <h4><a href="http://www.indexdata.dk/zebra/">Zebra</a></h4>
<li>Put one or more example foxml files in FgsConfig/indexingXsltGenerator/foxml .
They must end with newline.
If you want to index managed xml datastreams, insert an example inline,
see the example in the test file test_fgs23.xml.
see the example in the test file FgsConfig/indexingXsltGenerator/foxml/test_fgs23.xml.
</li>
<li>At FgsConfig run <pre>>ant generateIndexingXslt</pre>
</li>
Expand Down Expand Up @@ -494,80 +515,30 @@ <h4><a href="http://www.indexdata.dk/zebra/">Zebra</a></h4>
</div>

<div>
<a name="gauto"><h2>Configuring GSearch for automatic updates</h2></a>
<a name="gfauto"><h2>Configuring GSearch and Fedora for automatic updates</h2></a>

<p>
As of version 2.1, GSearch has the ability to listen to update messages
provided by Fedora. These messages are sent via JMS, so a JMS provider
must be available (a JMS provider is included with Fedora 3.0). In order
to configure the update listener, open updater.properties and set the
following property values. These values will most likely be the same
as those specified in your Fedora configuration.</p>
<ul>
<li>
<strong>java.naming.factory.initial</strong>
<ul>
<li>Default: org.apache.activemq.jndi.ActiveMQInitialContextFactory</li>
<li>Specifies the JNDI initial context which will be used to look up JMS administered objects.</li>
</ul>
</li>
<li>
<strong>java.naming.provider.url</strong>
<ul>
<li>Default: tcp://localhost:61616</li>
<li>Specifies the address at which a connection can be made to the messaging provider.</li>
<li>The update listener will attempt to connect to the messaging provider at this address on server startup,
so make sure that your provider is running and available, otherwise you will see a connection error.</li>
</ul>
</li>
<li>
<strong>connection.factory.name</strong>
<ul>
<li>Default: ConnectionFactory</li>
<li>Specifies the JNDI name of the ConnectionFactory object needed to create a connection to the JMS
provider.</li>
</ul>
</li>
<li>
<strong>topic.fedoraAPIM</strong>
<ul>
<li>Default: fedora.apim.update</li>
<li>Specifies the topic on which to listen for updates.</li>
</ul>
</li>
<li>
<strong>client.id</strong>
<ul>
<li>Default: fedoragsearch0</li>
<li>The identifier of the GSearch client. If you have more than one instance of GSearch running
they must have different client identifiers.</li>
</ul>
</li>
</ul>
<p>
If you decide not to use the automatic updates feature in GSearch, you'll need to open fedoragsearch.properties
and remove (or comment out) the line specifying fedoragsearch.updaternames. This will disable the update
listener.
</p>
</div>

<div>
<a name="fauto"><h2>Configuring Fedora for automatic updates</h2></a>

<p>
Fedora 3.0 added the ability to send a message whenever a change is made to the
content of the repository (through API-M.) This messaging capability must be
enabled and configured to work properly. See the Fedora documentation for
instructions on configuring messaging.
</p>
<p>
As an alternative to updates via messaging, it is possible to configure Fedora to
send a signal via REST to GSearch when objects are added, modified,
and purged. Using messaging is the preferred method for automatic updates, and this
technique, while still available, should be considered deprecated. It it not recommended
to use both the update listener and REST-based updates.
</p>
<p>
<p>
By default, GSearch is configured for automatic updates through Fedora notifications.
For deeper understanding and modification,
see the property fedoragsearch.updaternames in fedoragsearch.properties,
and updater.properties in config/updater/FgsUpdaters
</p>
<p>
By default, Fedora is NOT configured for automatic updates through notifications to GSearch.
</p>
<p>
In order to configure Fedora for automatic updates through notifications to GSearch,
set the value of the param <code>enabled</code> to <code>true</code> for the Messaging module in <code>fedora.fcfg</code>:
<pre>&lt;module role="org.fcrepo.server.messaging.Messaging" class="org.fcrepo.server.messaging.MessagingModule"&gt;
&lt;comment&gt;Fedora's Java Messaging Service (JMS) Module&lt;/comment&gt;
&lt;param name="enabled" value="true"/&gt;</pre>
</li>
</p>
<p>As a deprecated alternative to updates via messaging,
it is possible to configure Fedora to
send a signal via REST to GSearch, when objects are added, modified,
and purged. Do NOT enable both alternatives.
<br/>
To enable REST-based updates, edit your <code>fedora.fcfg</code> file
and change the class of the <code>fedora.server.storage.DOManager</code>
module to <code>org.fcrepo.server.storage.GSearchDOManager</code>.
Expand Down Expand Up @@ -768,7 +739,7 @@ <h4><a href="http://www.indexdata.dk/zebra/">Zebra</a></h4>
<pre>
git clone https://github.com/fcrepo/gsearch.git
</pre>
<p>To build fedoragsearch.war in FgsBuild/fromsource:</p>
<p>To build fedoragsearch.war in FgsBuild/fromsource for normal installation:</p>
<pre>
cd FedoraGenericSearch
ant buildfromsource
Expand All @@ -778,6 +749,10 @@ <h4><a href="http://www.indexdata.dk/zebra/">Zebra</a></h4>
cd FedoraGenericSearch
ant -Dlocal.PROTOCOL=&lt;protocol&gt; -Dlocal.HOSTPORT=&lt;hostport&gt; -Dlocal.FEDORA_HOME=&lt;location&gt; -Dlocal.SOLR_HOME=&lt;location&gt; -Dlocal.SOLR_SERVER=&lt;url&gt; buildforlocaltest
</pre>
<p>The fedoragsearch.war for local testing contains a set of configurations
that are used by the test operations below.
You may want to run the test operations, if you are customizing the GSearch code.
</p>
<p>To run tests in tomcat at &lt;protocol&gt;://&lt;hostport&gt;/fedoragsearch
install a Fedora repository with demo objects with MessagingModule enabled,
and create a test user in fedora-users.xml :
Expand All @@ -797,6 +772,8 @@ <h4><a href="http://www.indexdata.dk/zebra/">Zebra</a></h4>
ant junit-fgs23
ant junit-lucene-fgs24_1010
ant junit-lucene-fgs24_1019
ant junit-lucene-fgs242_1076
ant junit-lucene-fgs242_1083
</pre>
<p>Test operations on the solr plugin, after startup of the solr server:
</p>
Expand All @@ -817,6 +794,43 @@ <h4><a href="http://www.indexdata.dk/zebra/">Zebra</a></h4>
</div>

<div>
<a name="new242"><h2>New features in version 2.4.2</h2></a>
<ul>
<li>Enhanced example URIResolverImpl (<a href="https://jira.duraspace.org/browse/FCREPO-1083">FCREPO-1083</a>)
<ul>
<li>Enhanced dk.defxws.fedoragsearch.server.URIResolverImpl to handle other URIs than for a Fedora repository.
This issue was initiated by a pull request from sarowe at github, thank you.
This class may be set in index.properties for fgsindex.uriResolver .
A developer may implement
<a href="http://xerces.apache.org/xerces2-j/javadocs/api/javax/xml/transform/URIResolver.html">javax.xml.transform.URIResolver</a>
and set it instead.
</li>
</ul>
</li>
<li>Compatibility with Lucene 3.6, Solr 3.6, Axis 1.4, and Tika 1.1
(<a href="https://jira.duraspace.org/browse/FCREPO-1074">FCREPO-1074</a>)
(<a href="https://jira.duraspace.org/browse/FCREPO-1082">FCREPO-1082</a>)
</li>
<li>A function to return a datastream as an XML tree (<a href="https://jira.duraspace.org/browse/FCREPO-1078">FCREPO-1078</a>)
<ul>
<li>Implemented as getDatastreamXML() in dk.defxws.fedoragsearch.server.GenericOperationsImpl .
<br/>Used as in FgsConfig/test_fgs23/foxmlToLuceneWithNotInline.xslt :<br/>
<code>exts:getDatastreamXML($PID, $REPOSNAME, $DSID, $FEDORASOAP, $FEDORAUSER, $FEDORAPASS, $TRUSTSTOREPATH, $TRUSTSTOREPASS)</code>
</li>
</ul>
</li>
<li>New property: fgsindex.lowercaseExpandedTerms (<a href="https://jira.duraspace.org/browse/FCREPO-1076">FCREPO-1076</a>)
<ul>
<li>Adds fgsindex.lowercaseExpandedTerms to index.properties for the lucene plugin.
Default is true, but if set to false, then the
<a href="http://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/queryParser/QueryParser.html#setLowercaseExpandedTerms%28boolean%29">
query processor</a> will not change a query with wildcards to lowercase,
which is the problem for queries on UN_TOKENIZED fields.
</li>
</ul>
</li>
</ul>

<a name="new241"><h2>New features in version 2.4.1</h2></a>
<ul>
<li>Improvement of the control over the length of datastreams in Apache Tika (<a href="https://jira.duraspace.org/browse/FCREPO-1049">FCREPO-1049</a>)
Expand Down Expand Up @@ -1016,8 +1030,6 @@ <h4><a href="http://www.indexdata.dk/zebra/">Zebra</a></h4>
<p>The Fedora Generic Search Service, abbreviated GSearch, is part of the
<a href="https://wiki.duraspace.org/display/FCSVCS/Fedora+Framework+Services">
Fedora Service Framework</a>. </p>
<p>The primary feature of GSearch is that it makes it easy
to make your digital contents in Fedora searchable for yourself and your end-users.</p>
<p>GSearch was developed by
<a href="mailto:gsp@dtic.dtu.dk">Gert Schmeltz Pedersen</a>
at the Technical University of Denmark,
Expand All @@ -1042,6 +1054,7 @@ <h4><a href="http://www.indexdata.dk/zebra/">Zebra</a></h4>
Lasse Aagren,
Leire Urcelay,
Luis Zorita,
Mark Hall,
Matt Zumwalt,
Matthias Razum,
Michael Appleby,
Expand All @@ -1057,6 +1070,7 @@ <h4><a href="http://www.indexdata.dk/zebra/">Zebra</a></h4>
Ryan E. Scherle,
Sam Liberman,
Scott Hammel,
Serhiy Polyakov,
Shunde Zhang,
Simon Lamb,
Stephen Bayliss,
Expand Down

0 comments on commit 4799b3c

Please sign in to comment.