# Motivating example.

We're going to define some content so that we can query it.

In [1]:
import lucene

from org.apache.lucene.analysis.standard import StandardAnalyzer
from org.apache.lucene.index import IndexWriter, IndexWriterConfig, DirectoryReader
from org.apache.lucene.store import RAMDirectory
from org.apache.lucene.document import Document, Field, TextField

from org.apache.lucene.index import DirectoryReader
from org.apache.lucene.queryparser.classic import QueryParser
from org.apache.lucene.search import IndexSearcher

In [2]:
env = lucene.initVM()

Let's take a look at contents we're going to store (or *index*, since that's a verb for storing something in index (noun)).

In [3]:
contents = [
  "Humpty Dumpty sat on a wall,",
  "Humpty Dumpty had a great fall.",
  "All the king's horses and all the king's men",
  "Couldn't put Humpty together again."
]

### Index setup (just ignore it for now)

In [4]:
analyzer = StandardAnalyzer()
index_writer = IndexWriter(
    RAMDirectory(),
    IndexWriterConfig(analyzer)
)

Now we're going to actually add documents to index.

In [5]:
for content in contents:
    doc = Document()
    
    field = TextField("content", content, Field.Store.YES)
    
    doc.add(field)
    
    index_writer.addDocument(doc)

In [6]:
index_writer.commit()

7

Search setup. Ingore it for now.

In [7]:
index_searcher = IndexSearcher(
  DirectoryReader.open(
    index_writer.getDirectory()))
query_parser = QueryParser("content", analyzer)

Setting up query. Set `queryString` for something you wish to search for.

In [8]:
query_string = "humpty dumpty"
query = query_parser.parse(query_string)
maxNoResults = 5

top_docs = index_searcher.search(query, maxNoResults)

Return query results, sorted by score

In [10]:
print("Total hits: " + str(top_docs.totalHits))

for score_doc in top_docs.scoreDocs:
    doc = index_searcher.doc(score_doc.doc)
    print(str(score_doc.score) + ": " + doc.getField('content').stringValue())

Total hits: 3
1.1433706283569336: Humpty Dumpty sat on a wall,
1.0308873653411865: Humpty Dumpty had a great fall.
0.35024189949035645: Couldn't put Humpty together again.


What just happened? What are these numbers?

We're going to look at this in the following notebooks. 