This is an appendix to the 検索クエリパーサー自作入門 article published on the LegalOn Technologies Engineering Blog.
The goal is to learn how to create a search query parser through implementation examples.
- Install Python. (^3.11)
- Running Docker daemon.
- Install Visual Studio Code and ANTLR4 grammar syntax support.
.vscode -- Settings for ANTLR4 grammar syntax support
elasticsearch -- Elasticsearch for local use
src -- Source of sample programs
tests -- Unit test
Makefile -- Define simplified commands
The query syntax we will define is simple as follows.
-
Conjunctive query
A search withA AND B
will result in hits only for documents that contain bothA
andB
. -
Disjunctive query
A search withA OR B
will result in documents containing eitherA
orB
. -
Negative query
A search withNOT A
will result in hits only for documents that do not containA
. -
Compound query
A query that combines multiple Conjunctive, Disjunctive, and Negative queries.- Operator priority
Evaluated with the precedence of theNOT
operator, theAND
operator, or theOR
operator.
However, queries enclosed in parentheses are evaluated with priority.- In the case of
NOT A OR B AND C
, evaluation is performed in the order((NOT A) OR (B AND C))
. - In the case of
(NOT A OR B) AND C
, evaluation is performed in the order((NOT A) OR B) AND C)
.
- In the case of
- Operator priority
The query syntax in this example implementation is defined in the BNF below.
This grammar is an example; other ways of expression are possible.
<or_operator>::= OR
<and_operator>::= AND
<not_operator>::= NOT
<expr>::= <term>|<expr><or_operator><term>
<term>::= <factor>|<term><and_operator><factor>
<factor>::= <keyword>|<not_operator><keyword>
<keyword>::= (<expr>)|<alphabets>
<alphabets>::= a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z
- Install poetry
make install-poetry
If there is an executable in ~/.local/bin/poetry
, add it to the PATH.
- Set up
make setup
- Run Elasticsearch
make compose-up
- Index sample documents
make index-documents
- Run Driver1.py (Using QueryParser)
make run-driver1
Output:
{'operator': 'AND', 'children': [{'operator': 'OR', 'children': [{'value': 'apple'}, {'value': 'orange'}]}, {'operator': 'NOT', 'children': [{'value': 'banana'}]}]}
- Run Driver2.py (Using QueryBuilder)
make run-driver2
Output:
{
"query": {
"bool": {
"must": [
{
"bool": {
"should": [
{
"match": {
"fruit": {
"query": "apple"
}
}
},
{
"match": {
"fruit": {
"query": "orange"
}
}
}
],
"minimum_should_match": 1
}
},
{
"bool": {
"must_not": [
{
"match": {
"fruit": {
"query": "banana"
}
}
}
]
}
}
]
}
}
}
- Run Driver3.py (Using Searcher)
make run-driver3
Output:
{
"took": 10,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 0.9808291,
"hits": [
{
"_index": "fruit_basket",
"_id": "1",
"_score": 0.9808291,
"_source": {
"fruit": "orange"
}
},
{
"_index": "fruit_basket",
"_id": "2",
"_score": 0.9808291,
"_source": {
"fruit": "apple"
}
}
]
}
}
- test about Sercher.py
make test
Licensed under either of
- Apache License, Version 2.0 (LICENSE-APACHE)
- MIT license (LICENSE-MIT)
at your option.