Skip to content

Commit

Permalink
FCREPO-1009 Interaction with the Resource Index
Browse files Browse the repository at this point in the history
  • Loading branch information
Gert Schmeltz Pedersen committed Dec 21, 2011
1 parent f2c5a2b commit 2df2f86
Show file tree
Hide file tree
Showing 7 changed files with 183 additions and 77 deletions.
51 changes: 46 additions & 5 deletions FedoraGenericSearch/src/html/fedoragsearch-doc.html
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,7 @@ <h2>compatible with Fedora Version 3.5</h2>
<dt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#configman">Management of GSearch configurations in Fedora objects</a></dt>
<dt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#objectsnnindex">Many-to-many relationship between Fedora objects and index documents</a></dt>
<dt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#reposnnindex">Many-to-many relationship between repositories and indexes</a></dt>
<dt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#embeddedqueries">Embedded queries</a></dt>
<dt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#source">Building from source</a></dt>
<dt><a href="#HISTORY">V. HISTORY</a></dt>
<dt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#new24">New features in version 2.4</a></dt>
Expand Down Expand Up @@ -624,13 +625,13 @@ <h4><a href="http://www.indexdata.dk/zebra/">Zebra</a></h4>

<p>The download contains the following files in webapps/fedoragsearch/ that you may customize:</p>
<ul>
<li>WEB-INF/classes/&lt;configName&gt;/rest/enduserSearchToHtml.xslt (basic page generator)</li>
<li>WEB-INF/classes/&lt;configName&gt;/rest/fgseuBrowseTermsToHtml.xslt (browseIndex by ajax call)</li>
<li>WEB-INF/classes/&lt;configName&gt;/rest/fgseuFacetTermsToHtml.xslt (Solr facet search by ajax call)</li>
<li>WEB-INF/classes/&lt;configName&gt;/rest/fgseuSearchResultToHtml.xslt (gfindObjects search by ajax call)</li>
<li>WEB-INF/classes/&lt;configName&gt;/rest/fieldsUnique.xml (see below)</li>
<li>css/fgseu.css</li>
<li>js/enduserSearch.js</li>
<li>WEB-INF/classes/&lt;configName&gt;/rest/enduserSearchToHtml.xslt</li>
<li>WEB-INF/classes/&lt;configName&gt;/rest/fgseuBrowseTermsToHtml.xslt</li>
<li>WEB-INF/classes/&lt;configName&gt;/rest/fgseuFacetTermsToHtml.xslt</li>
<li>WEB-INF/classes/&lt;configName&gt;/rest/fgseuSearchResultToHtml.xslt</li>
<li>WEB-INF/classes/&lt;configName&gt;/rest/fieldsUnique.xml</li>
<p>The file fieldsUnique.xml is found in FgsConfig/indexingXsltGenerator/generatedFiles.
It has one element per index field generated from your example foxml files.
You may add, modify, and delete index field elements
Expand Down Expand Up @@ -715,6 +716,42 @@ <h4><a href="http://www.indexdata.dk/zebra/">Zebra</a></h4>
<p><img src="images/fgs-manytomany.png"/></p>
</div>

<div>
<a name="embeddedqueries"><h2>Embedded queries</h2></a>
<p>This is a mechanism that allows you to embed risearch queries in Lucene or Solr queries, and vice versa.</p>
<p>This provides interaction with the Resource Index, both when you index and when you search.</p>
<p>It compensates for the lack of joins in bibliographic query languages like in Lucene and Solr,</p>
<p>and it compensates for the lack of text search functionality in logic languages like the risearch query languages.</p>
<p>The full potential of this mechanism still has to be explored and realized.</p>
<p>These preliminary examples show some of the potential:
<ul>
<li>RISEARCH:
<a href="rest?operation=gfindObjects&restXslt=copyXml&query=(::RISEARCH::type=tuples%26lang=itql%26format=Sparql%26query=select+$obj1+from+%3C%23ri%3E+where+$obj1+%3Cinfo:fedora/fedora-system:def/relations-external%23isMemberOf%3E+%3Cinfo:fedora/demo:SmileyStuff%3E::RISEARCH::)">an itql search</a>
</li>
</ul>
<ul>
<li>GSEARCH:
<a href="rest?operation=gfindObjects&restXslt=copyXml&query=(::GSEARCH::operation=gfindObjects%26restXslt=copyXml%26query=dc.creator:apache::GSEARCH::)">a GSearch search</a>
</li>
</ul>
<ul>
<li>GSEARCH with RISEARCH:
<a href="rest?operation=gfindObjects&restXslt=copyXml&query=smiley+and+(::RISEARCH::xsltName/risearchToGsearch?type=tuples%26lang=itql%26format=Sparql%26query=select+$obj1+from+%3C%23ri%3E+where+$obj1+%3Cinfo:fedora/fedora-system:def/relations-external%23isMemberOf%3E+%3Cinfo:fedora/demo:SmileyStuff%3E::RISEARCH::)">a GSearch search with embedded itql</a>
</li>
</ul>
<ul>
<li>RISEARCH with GSEARCH:
<a href="rest?operation=gfindObjects&restXslt=copyXml&query=(::RISEARCH::type=tuples%26lang=itql%26format=Sparql%26query=select+$obj1+from+%3C%23ri%3E+where+$obj1+%3Cinfo:fedora/fedora-system:def/relations-external%23isMemberOf%3E+%3Cinfo:fedora/demo:SmileyStuff%3E+and+(::GSEARCH::xsltName/gsearchToRisearch?operation=gfindObjects%26restXslt=copyXml%26query=dc.creator:apache::GSEARCH::)::RISEARCH::)">an itql search with embedded GSearch</a>
</li>
</ul>
<ul>
<li>SOLR:
<a href="rest?operation=gfindObjects&restXslt=copyXml&query=(::SOLR::facet=true%26facet.field=dc.creator%26fl=dc.creator%26q=dc.creator:apache::SOLR::)">a Solr facet search</a>
</li>
</ul>
</p>
</div>

<div>
<a name="source"><h2>Building from source</h2></a>
<p>Get the source from github:</p>
Expand Down Expand Up @@ -782,6 +819,7 @@ <h4><a href="http://www.indexdata.dk/zebra/">Zebra</a></h4>
</ul>
</li>
<li>Performance measurements and possibly improvements (<a href="https://jira.duraspace.org/browse/FCREPO-1007">FCREPO-1007</a>).
Measurements taken using Apache JMeter, on a production quality platform, giving some insight into the performance implications of various choices.
Download <a href="https://github.com/fcrepo/gsearch/blob/master/FedoraGenericSearch/src/performance/PerformanceMeasurementsforFedoraGSearch2.3.pdf">the report from github</a>.
Morten S&#248;rensen, DTU Library, is co-developer and co-author on this.
</li>
Expand All @@ -795,6 +833,9 @@ <h4><a href="http://www.indexdata.dk/zebra/">Zebra</a></h4>
</ul>
</li>
<li>Interaction with the Resource Index (<a href="https://jira.duraspace.org/browse/FCREPO-1009">FCREPO-1009</a>)
<ul>
<li>See the section on <a href="#embeddedqueries">Embedded queries</a>.</li>
</ul>
</li>
<li>Use of Apache Tika for full-text and metadata extraction (<a href="https://jira.duraspace.org/browse/FCREPO-1010">FCREPO-1010</a>)
<ul>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -539,17 +539,20 @@ private void checkConfig() throws ConfigException {
analyzerClassName = defaultAnalyzer;
}
checkAnalyzerClass(indexName, analyzerClassName);
StringTokenizer configFieldAnalyzers = new StringTokenizer(getFieldAnalyzers(indexName));
while (configFieldAnalyzers.hasMoreElements()) {
String fieldAnalyzer = configFieldAnalyzers.nextToken();
int i = fieldAnalyzer.indexOf("::");
if (i<0) {
errors.append("\n*** "+configName+"/index/"+indexName+" fgsindex.fieldAnalyzer="+fieldAnalyzer+ " missing '::'");
} else {
analyzerClassName = fieldAnalyzer.substring(i+2).trim();
checkAnalyzerClass(indexName, analyzerClassName);
}
}
String configFieldAnalyzers = getFieldAnalyzers(indexName);
if (configFieldAnalyzers != null && configFieldAnalyzers.length()>0) {
StringTokenizer stFieldAnalyzers = new StringTokenizer(configFieldAnalyzers);
while (stFieldAnalyzers.hasMoreElements()) {
String fieldAnalyzer = stFieldAnalyzers.nextToken();
int i = fieldAnalyzer.indexOf("::");
if (i<0) {
errors.append("\n*** "+configName+"/index/"+indexName+" fgsindex.fieldAnalyzer="+fieldAnalyzer+ " missing '::'");
} else {
analyzerClassName = fieldAnalyzer.substring(i+2).trim();
checkAnalyzerClass(indexName, analyzerClassName);
}
}
}
}

// Add untokenizedFields property for lucene
Expand Down Expand Up @@ -1188,12 +1191,14 @@ public GenericOperationsImpl getOperationsImpl(String fgsUserNameParam, String i

private String insertSystemProperties(String propertyValue) {
String result = propertyValue;
while (result.indexOf("${") > -1) {
if (logger.isDebugEnabled())
logger.debug("insertSystemProperties propertyValue="+result);
while (result != null && result.indexOf("${") > -1) {
if (logger.isDebugEnabled())
logger.debug("propertyValue="+result);
logger.debug("insertSystemProperties propertyValue="+result);
result = insertSystemProperty(result);
if (logger.isDebugEnabled())
logger.debug("propertyValue="+result);
logger.debug("insertSystemProperties propertyValue="+result);
}
return result;
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -256,7 +256,7 @@ private String handleEmbeddedQueries(String embedType, String query)
}
String decodedEmbeddedQuery = "";
try {
decodedEmbeddedQuery = URLDecoder.decode(embeddedQuery, "UTF-8");
decodedEmbeddedQuery = URLDecoder.decode(newQueryPart, "UTF-8");
} catch (UnsupportedEncodingException e) {
throw new GenericSearchException("handleEmbeddedQueries decode exception="+e.toString());
}
Expand Down Expand Up @@ -287,7 +287,7 @@ private String processEmbeddedQuery(String embedType, String embeddedQuery)
// secondPart = "?" + secondPart;
} else {
firstPart = embeddedQuery.substring(0, i);
secondPart = embeddedQuery.substring(i);
secondPart = embeddedQuery.substring(i+1);
}
String embeddedRepositoryName = config.getRepositoryName(null);
String embeddedIndexName = config.getIndexName(null);
Expand Down Expand Up @@ -318,39 +318,31 @@ private String processEmbeddedQuery(String embedType, String embeddedQuery)
if ("GSEARCH".equals(embedType)) {
try {
baseUrl = getBaseURL(config.getSoapBase())+"/rest";
userPassword = config.getSoapUser()+":"+config.getSoapPass();
} catch (Exception e) {
throw new GenericSearchException("processEmbeddedQuery getBaseURL exception=\n"+e.toString());
throw new GenericSearchException("processEmbeddedQuery GSEARCH getBaseURL exception=\n"+e.toString());
}
userPassword = config.getSoapUser()+":"+config.getSoapPass();
// secondPart = encodeQuery(secondPart, "query");
} else if ("RISEARCH".equals(embedType)) {
baseUrl = config.getFedoraSoap(embeddedRepositoryName);
try {
baseUrl = getBaseURL(config.getFedoraSoap(embeddedRepositoryName))+"/risearch";
} catch (Exception e) {
throw new GenericSearchException("processEmbeddedQuery RISEARCH getBaseURL exception=\n"+e.toString());
}
userPassword = config.getFedoraUser(embeddedRepositoryName)+":"+config.getFedoraPass(embeddedRepositoryName);
secondPart = encodeQuery(secondPart, "query");
} else if ("SOLR".equals(embedType)) {
try {
baseUrl = config.getIndexBase(embeddedIndexName)+"/select";
baseUrl = config.getIndexBase(embeddedIndexName);
} catch (Exception e) {
throw new GenericSearchException("processEmbeddedQuery embeddedIndexName="+embeddedIndexName+" hasnoSolrserver exception=\n"+e.toString());
}
userPassword = config.getSoapUser()+":"+config.getSoapPass();
String queryContents = "";
i = secondPart.indexOf("q=");
j = -1;
if (i > -1) {
j = secondPart.indexOf("&", i+2);
if (j == -1) {
j = secondPart.length();
}
queryContents = secondPart.substring(i+2, j);
}
if (i == -1 || queryContents.length() == 0) {
throw new GenericSearchException("processEmbeddedQuery: No query contents found?"+" finalQuery=\n"+secondPart);
throw new GenericSearchException("processEmbeddedQuery SOLR embeddedIndexName="+embeddedIndexName+" hasnoSolrserver exception=\n"+e.toString());
}
try {
queryContents = URLEncoder.encode(queryContents, "UTF-8");
} catch (UnsupportedEncodingException e) {
throw new GenericSearchException(e.toString());
if (baseUrl == null) {
throw new GenericSearchException("processEmbeddedQuery SOLR embeddedIndexName="+embeddedIndexName+" hasnoSolrserver baseUrl=null");
}
secondPart = secondPart.substring(0, i+2) + queryContents + secondPart.substring(j);
baseUrl += "/select";
userPassword = config.getSoapUser()+":"+config.getSoapPass();
secondPart = encodeQuery(secondPart, "q");
}
String urlString = baseUrl+"?"+secondPart;
if (logger.isDebugEnabled())
Expand Down Expand Up @@ -401,26 +393,50 @@ private String processEmbeddedQuery(String embedType, String embeddedQuery)
new StreamSource(content),
config.getURIResolver(embeddedIndexName),
params);
// i = resultXml.indexOf("<?xml");
// if (i>-1) {
// j = resultXml.indexOf("?>", i);
// if (j > -1) {
// resultXml.delete(0, j+2);
// }
// }
String newQueryPart = resultXml.toString();
i = resultXml.indexOf("newQueryPart>");
String findString = "result:newQueryPart xmlns:result=\"http://www.w3.org/2001/sw/DataAccess/rf1/result\">";
i = resultXml.indexOf(findString);
if (i>-1) {
j = resultXml.indexOf("</newQueryPart", i);
j = resultXml.indexOf("</result:newQueryPart", i);
if (j > -1) {
newQueryPart = resultXml.substring(i, j);
newQueryPart = resultXml.substring(i+findString.length(), j);
}
}
if (logger.isDebugEnabled())
logger.debug("processEmbeddedQuery newQueryPart=\n"+newQueryPart);
return newQueryPart;
}

public String encodeQuery(String secondPart, String queryName)
throws GenericSearchException {
if (logger.isDebugEnabled())
logger.debug("encodeQuery" + " queryName="+queryName + " secondPart="+secondPart);
String queryContents = "";
int i = secondPart.indexOf(queryName+"=");
int j = -1;
if (i > -1) {
j = secondPart.indexOf("&", i+queryName.length());
if (j == -1) {
j = secondPart.length();
}
queryContents = secondPart.substring(i+1+queryName.length(), j);
}
if (i == -1 || queryContents.length() == 0) {
throw new GenericSearchException("processEmbeddedQuery: No query contents found?"+" finalQuery=\n"+secondPart);
}
if (logger.isDebugEnabled())
logger.debug("encodeQuery" + " queryContents="+queryContents);
try {
queryContents = URLEncoder.encode(queryContents, "UTF-8");
} catch (UnsupportedEncodingException e) {
throw new GenericSearchException(e.toString());
}
String result = secondPart.substring(0, i+1+queryName.length()) + queryContents + secondPart.substring(j);
if (logger.isDebugEnabled())
logger.debug("encodeQuery" + " secondPart="+result);
return result;
}

public String browseIndex(
String startTerm,
int termPageSize,
Expand Down
2 changes: 1 addition & 1 deletion FgsConfig/FgsConfigIndexTemplate/Lucene/index.properties
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ fgsindex.defaultGetIndexInfoResultXslt = copyXml
fgsindex.indexDir = INDEXDIR

fgsindex.analyzer = org.apache.lucene.analysis.standard.StandardAnalyzer
fgsindex.fieldAnalyzers = dc.title::org.apache.lucene.analysis.KeywordAnalyzer dc.creator::org.apache.lucene.analysis.standard.StandardAnalyzer
fgsindex.fieldAnalyzers = dc.title::org.apache.lucene.analysis.standard.StandardAnalyzer dc.creator::org.apache.lucene.analysis.standard.StandardAnalyzer
# used to index and query TOKENIZED index fields
# for UN_TOKENIZED index fields see fgsindex.untokenizedFields further down

Expand Down
22 changes: 22 additions & 0 deletions FgsConfig/FgsConfigTemplate/rest/gsearchToRisearch.xslt
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:result="http://www.w3.org/2001/sw/DataAccess/rf1/result"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:output method="xml" indent="yes" encoding="UTF-8"/>

<xsl:template match="/">
<result:newQueryPart>
<xsl:text>(</xsl:text>
<xsl:for-each select="//object">
<xsl:if test="position()>1"> or </xsl:if>
<xsl:text>$obj1 &lt;mulgara:is&gt;</xsl:text>
<xsl:text>&lt;info:fedora/</xsl:text><xsl:value-of select="field[@name='PID']"/><xsl:text>&gt;</xsl:text>
</xsl:for-each>
<xsl:text>)</xsl:text>
</result:newQueryPart>
</xsl:template>

<xsl:template match="text()"/>

</xsl:stylesheet>
22 changes: 22 additions & 0 deletions FgsConfig/FgsConfigTemplate/rest/risearchToGsearch.xslt
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:result="http://www.w3.org/2001/sw/DataAccess/rf1/result"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:output method="xml" indent="yes" encoding="UTF-8"/>

<xsl:template match="/">
<result:newQueryPart>
<xsl:text>(</xsl:text>
<xsl:for-each select="//result:result">
<xsl:if test="position()>1"> or </xsl:if>
<xsl:text>PID:</xsl:text>
<xsl:text>"</xsl:text><xsl:value-of select="substring-after(result:obj1/@uri, '/')"/><xsl:text>"</xsl:text>
</xsl:for-each>
<xsl:text>)</xsl:text>
</result:newQueryPart>
</xsl:template>

<xsl:template match="text()"/>

</xsl:stylesheet>
42 changes: 21 additions & 21 deletions FgsLucene/src/java/dk/defxws/fgslucene/OperationsImpl.java
Original file line number Diff line number Diff line change
Expand Up @@ -561,27 +561,27 @@ public Analyzer getQueryAnalyzer(String indexName)
}
if (logger.isDebugEnabled())
logger.debug("getQueryAnalyzer configFieldAnalyzers=" + configFieldAnalyzers);
if (configFieldAnalyzers == null)
configFieldAnalyzers = "";
StringTokenizer stConfigFieldAnalyzers = new StringTokenizer(configFieldAnalyzers);
while (stConfigFieldAnalyzers.hasMoreElements()) {
String fieldAnalyzer = stConfigFieldAnalyzers.nextToken();
if (logger.isDebugEnabled())
logger.debug("getQueryAnalyzer fieldAnalyzer=" + fieldAnalyzer);
int i = fieldAnalyzer.indexOf("::");
if (i<0) {
throw new ConfigException("getQueryAnalyzer fgsindex.fieldAnalyzer="+fieldAnalyzer+ " missing '::'");
}
String fieldName = "-";
String analyzerClassName = "-";
try {
fieldName = fieldAnalyzer.substring(0, i);
analyzerClassName = fieldAnalyzer.substring(i+2);
fieldAnalyzers.put(fieldName, getAnalyzer(indexName, analyzerClassName));
} catch (Exception e) {
throw new ConfigException("getQueryAnalyzer getAnalyzer fieldName="+fieldName+" analyzerClassName="+analyzerClassName+" :\n", e);
}
}
if (configFieldAnalyzers != null && configFieldAnalyzers.length()>0) {
StringTokenizer stConfigFieldAnalyzers = new StringTokenizer(configFieldAnalyzers);
while (stConfigFieldAnalyzers.hasMoreElements()) {
String fieldAnalyzer = stConfigFieldAnalyzers.nextToken();
if (logger.isDebugEnabled())
logger.debug("getQueryAnalyzer fieldAnalyzer=" + fieldAnalyzer);
int i = fieldAnalyzer.indexOf("::");
if (i<0) {
throw new ConfigException("getQueryAnalyzer fgsindex.fieldAnalyzer="+fieldAnalyzer+ " missing '::'");
}
String fieldName = "-";
String analyzerClassName = "-";
try {
fieldName = fieldAnalyzer.substring(0, i);
analyzerClassName = fieldAnalyzer.substring(i+2);
fieldAnalyzers.put(fieldName, getAnalyzer(indexName, analyzerClassName));
} catch (Exception e) {
throw new ConfigException("getQueryAnalyzer getAnalyzer fieldName="+fieldName+" analyzerClassName="+analyzerClassName+" :\n", e);
}
}
}
StringTokenizer untokenizedFields = new StringTokenizer(config.getUntokenizedFields(indexName));
while (untokenizedFields.hasMoreElements()) {
String fieldName = untokenizedFields.nextToken();
Expand Down

0 comments on commit 2df2f86

Please sign in to comment.