Skip to content
manaswinivedula edited this page Jun 26, 2020 · 7 revisions

Apache Solr and Apache Lucene

Task 1

1. Initially, generated an instance directory in solr named music2 and edited the schema.xml file

solrctl instancedir --generate /tmp/music2

gedit /tmp/music2/conf/schema.xml

2. These are the added fields in the schema.xml file

<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />

<field name="asin" type="string" indexed="true" stored="true" multiValued="false" />

<field name="reviewerName" type="string" indexed="true" stored="true" multiValued="false" />

<field name="helpful" type="string" indexed="true" stored="true" multiValued="false" />

<field name="reviewText" type="string" indexed="true" stored="true" multiValued="false" />

<field name="overall" type="string" indexed="true" stored="true" multiValued="false" />

<field name="summary" type="string" indexed="true" stored="true" multiValued="false" />

<field name="unixReviewTime" type="string" indexed="true" stored="true" multiValued="false" />

<field name="ReviewTime" type="string" indexed="true" stored="true" multiValued="false" />

3.After editing and saving that Schema.XML file now creating a directory names music2 and loading this lucene schema file to the Solr and creating a collection music2

solrctl instancedir --create music2 /tmp/music2

solrctl collection --create music2

4. Now, copy all the data from the CSV file, and paste it to the solr document.

Queries:

Word Matching:

1. Searching a word that matches "Corbin" in the reviewer name

reviewerName:"Corbin"

Wild Card Matching:

2.Getting all the records of summaries which are containing cables word in them.

summary:*cables

Range Searches:

Searching for the records whose overall rating range is between 3 to 4

overall : [3 TO 4]

Fuzzy Logic:

4. Fetching all the records whose reviewer name matches with at least 50% of the Name "Matt" and getting them in ascending order.

reviewerName:Matt~0.5

Proximity search:

5. getting all the records whose summary containing the words cable and excellent within the 5 words distance.

summary:"Cable Excellent"~5

Task2

1.Initially, I generated an instance directory in solr named booksnew and edited the schema.xml file. After modifying and saving that schema.XML file, then creating a directory named booksnew and loading this Lucene schema file to the Solr and creating a collection booksnew

solrctl instancedir --generate /tmp/booknew

gedit /tmp/booknew/conf/schema.xml

solrctl instancedir --create booknew /tmp/booknew

solrctl collection --create booknew

2. These are the fields that are added to the schema.xml file

<field name="series_t" type="text_general" indexed="true" stored="true"/>

<field name="sequence_i" type="text_general" indexed="true" stored="true"/>

<field name="genre_s" type="text_general" indexed="true" stored="true"/>

3. Now, copy all the data from the CSV file, and paste it to the solr document.

Queries:

1. To fetch all the records with the fantasy genres

genre_s:fantasy

2.To get the records which are out of stock and their price is above 6 dollars.

price:[6 TO *] AND inStock: false

3.To get the records having the words song and fire in the series_t and their proximity is about 10.

series_t:"song fire"~10

4. To get the records whose price is between 5 to 7 and author includes Scott in their author name

price:[5 TO 7] AND author: *scott

5. To get the records which are in stock and are of sequence 1.

inStock:true AND sequence_i:1

References:

1.http://www.solrtutorial.com/schema-xml.html

  1. https://lucene.apache.org/solr/guide/8_5/solr-tutorial.html

3.http://www.lucenetutorial.com/lucene-vs-solr.html

Clone this wiki locally