Skip to content

pamulapati/rosette-nameindex-clouderasearch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 

Repository files navigation

Testing Basis Technology Rosette Software with Cloudera Search

Introduction

This document describes how to integrate and test Basis Technology’s Rosette®’s name indexing and matching technology with Cloudera Search.

Compatibility

To get started, you will need a [CDH cluster with Cloudera Search/Solr] (https://www.cloudera.com/documentation/enterprise/latest/topics/installation_installation.html) services running on [Java SDK 1.8 or later](https://www.cloudera.com/documentation/enterprise/latest/topics/cdh_ig_jdk_installation.html).

Installation

Install Rosette on all the Solr nodes of the CDH cluster. To do this, you’ll need to unzip the SDK and Documentation files to the same directory (which we will call BT_ROOT) and then copy the license file to the BT_ROOT/rlp/rlp/licenses subdirectory.

You must include a Java property setting that points to the root of a Rosette SDK. You can achieve this in Cloudera Manager, by searching “Java Configuration Options for Solr Server” and appending the existing value with -Dbt.root=<BT_ROOT> (e.g. -Dbt.root=/usr/bt/rlp). Click “Save Changes” and restart your Solr services.

Alt text

You can use the project made available in this git to get you started by pulling the namesearch folder to /root/namesearch on a node in the cluster

Alt text export PROJECT_HOME=/root/namesearch

vi conf/schema.xml and observe

vi conf/solrconfig.xml and verify that the [[BT_ROOT]] matches to where you unzipped Rosette SDK

<lib path="[[BT_ROOT]]/lib/jvm/bt-adm-model-2.1.2.jar"/>
<lib path="[[BT_ROOT]]/lib/jvm/btcommon-api-36.0.2.jar"/>
<lib path="[[BT_ROOT]]/lib/jvm/btcommon-api-jackson-36.0.2.jar"/>
<lib path="[[BT_ROOT]]/lib/jvm/btcommon-lib-37.0.1.jar"/>
<lib path="[[BT_ROOT]]/lib/jvm/btrlpnc.jar"/>
<lib path="[[BT_ROOT]]/lib/jvm/bt-rni-solr[[xxx]]-plugins.jar"/>
<lib path="[[BT_ROOT]]/lib/jvm/bt-rni-solrj[[xxx]]--client.jar"/>
<lib path="[[BT_ROOT]]/lib/jvm/icu4j-55.1.jar"/>
<lib path="[[BT_ROOT]]/lib/jvm/log4j-1.2.15.jar"/>
<lib path="[[BT_ROOT]]/lib/jvm/slf4j-api-1.6.4.jar"/>
<lib path="[[BT_ROOT]]/lib/jvm/slf4j-jdk14-1.6.4.jar"/>
<lib path="[[BT_ROOT]]/lib/jvm/slf4j-log4j12-1.6.4.jar"/>

Running the tests

Create namesearch insteance directory in SolrCloud
sudo solrctl --zk [ZK ensemble] instancedir --create namesearch $PROJECT_HOME Example: sudo solrctl --zk solrtest-1.vpc.cloudera.com:2181,solrtest-2.vpc.cloudera.com:2181,solrtest-3.vpc.cloudera.com:2181/solr instancedir --create namesearch $PROJECT_HOME

Create namesearch collection in SolrCloud
sudo solrctl --zk [ZK ensemble] collection --create namesearch -s 3 -r 1

Example sudo solrctl --zk solrtest-1.vpc.cloudera.com:2181,solrtest-2.vpc.cloudera.com:2181,solrtest-3.vpc.cloudera.com:2181/solr collection --create namesearch -s 3 -r 1

You have name search application ready.

Index a single document with following values into your Solr cluster:
Notice that the Primary Name in this test document is "Robert Smith".

"primaryName": "Robert Smith",
"dob": "1995-12-31T23:59:59Z",
"profile": "Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo ligula eget dolor. Aenean massa."

Viewing Test Results

Query for primaryName:bob
http://hostname:8983/solr/namesearch_shard3_replica1/select?q=primaryName%3Abob&wt=json&indent=true
Notice that the Primary Name used in search is "bob", As you can see in the results below Cloudera Search found the document with "Robert Smith" using Basis Technology’s Rosette® to perform fuzzy name indexing and matching plugin.

Alt text

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published