Skip to content

Commit

Permalink
Merge pull request #63 from jurismarches/ag-naming
Browse files Browse the repository at this point in the history
Naming - work on adding name to queries and visualizing matching explanations
  • Loading branch information
yolocodefrench committed Jan 6, 2021
2 parents 48869be + 6ec12bb commit 681ef10
Show file tree
Hide file tree
Showing 22 changed files with 2,046 additions and 519 deletions.
7 changes: 6 additions & 1 deletion .gitignore
Expand Up @@ -4,7 +4,11 @@ __pycache__/
*$py.class

# PLY
luqum/parser.out
parser.out
parsetab.py

# coverage
cover/

# C extensions
*.so
Expand Down Expand Up @@ -43,6 +47,7 @@ htmlcov/
.coverage
.coverage.*
.cache
.venv
nosetests.xml
coverage.xml
*,cover
Expand Down
24 changes: 24 additions & 0 deletions CHANGELOG.rst
Expand Up @@ -7,6 +7,30 @@ and this project tries to adhere to `Semantic Versioning`_.
.. _`Keep a Changelog`: http://keepachangelog.com/en/1.0.0/
.. _`Semantic Versioning`: http://semver.org/spec/v2.0.0.html

Rolling
=======

Changed
-------

- completely modified the naming module and `auto_name` function, as it was not practical as is.

Added
-----

- added tools to build visual explanations about why a request matches a results
(leveraging `elasticsearch named queries`__.
- added a visitor and transformer that tracks path to element while visiting the tree.

__ https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-body.html#request-body-search-queries-and-filters

Fixed
-----

- fixed the handling of names when transforming luqum tree to elasticsearch queries
and added integration tests.


0.10.0 - 2020-09-22
===================

Expand Down
19 changes: 12 additions & 7 deletions docs/source/api.rst
Expand Up @@ -40,6 +40,18 @@ luqum.elasticsearch
:member-order: bysource


Naming and explaining matches
==============================


luqum.naming
------------

.. automodule:: luqum.naming
:members:
:member-order: bysource


Utilities
==========

Expand All @@ -52,13 +64,6 @@ luqum.visitor: Manipulating trees
:member-order: bysource


luqum.naming: Naming query parts
---------------------------------

.. automodule:: luqum.naming
:members:
:member-order: bysource

luqum.auto_head_tail: Automatic addition of spaces
--------------------------------------------------

Expand Down
63 changes: 37 additions & 26 deletions docs/source/quick_start.rst
Expand Up @@ -3,6 +3,7 @@ Quick start

>>> from unittest import TestCase
>>> t = TestCase()
>>> t.maxDiff = None


.. _tutorial-parsing:
Expand Down Expand Up @@ -339,8 +340,8 @@ We can pretty print it::
.. _`elasticsearch_dsl`: https://pypi.python.org/pypi/elasticsearch-dsl
.. _`Elasticsearch queries DSL`: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html

Named Queries
--------------
Named Queries: explaining a match
---------------------------------

.. py:currentmodule:: luqum.naming
Expand All @@ -355,7 +356,18 @@ Say we have a query::
We can use :py:func:`auto_name` to automatically add names::

>>> from luqum.naming import auto_name
>>> auto_name(tree)
>>> names = auto_name(tree)

names contains a dict association names to path in the luqum tree.
For example the first name "a" is associated with element "foo",
and we can retrieve it easily thanks to small utils for navigating the tree::

>>> from luqum.naming import element_from_path, element_from_name
>>> element_from_name(tree, "a", names)
Fuzzy(Word('foo'), 2)
>>> element_from_path(tree, (0, 0))
Word('foo')


The generated elastic search queries use the names
when building the query (see `elastic named queries`__)::
Expand All @@ -364,35 +376,34 @@ when building the query (see `elastic named queries`__)::
>>> t.assertDictEqual(
... es_query,
... {'bool': {'should': [
... {'fuzzy': {'text': {'_name': '0_0', 'fuzziness': 2.0, 'value': 'foo'}}},
... {'fuzzy': {'text': {'fuzziness': 2.0, 'value': 'foo', '_name': 'a'}}},
... {'bool': {'must': [
... {'match': {'text': {
... '_name': '0_1_0',
... 'query': 'bar',
... 'zero_terms_query': 'all'}}},
... {'match': {'text': {
... '_name': '0_1_1',
... 'query': 'baz',
... 'zero_terms_query': 'all'}}}
... {'match': {'text': {'query': 'bar', 'zero_terms_query': 'all', '_name': 'c'}}},
... {'match': {'text': {'query': 'baz', 'zero_terms_query': 'all', '_name': 'd'}}}
... ]}}
... ]}}
... )

If you use this on elasticsearch, for each record,
elastic will return the part of the queries matched by the record, using their names.

To display it to the user, we can find back which name refers to which part of the query,
using :py:func:`name_index` and :py:func:`extract`::

>>> from luqum.naming import name_index, extract
>>> index = name_index(tree)
>>> index["0_1_0"] # for each name, associate start index and length
(10, 3)
>>> extract(expr, "0_1_0", index)
'bar'
>>> extract(expr, "0_1", index)
'bar AND baz'
Imagine elasticsearch returned us we match on 'b' and 'c'::

>>> matched_queries = ['b', 'c']

To display it to the user, we have two step to undergo:
first identifying every matching element using :py:class:`MatchingPropagator`::

>>> from luqum.naming import MatchingPropagator, matching_from_names
>>> propagate_matching = MatchingPropagator()
>>> paths_ok, paths_ko = propagate_matching(tree, *matching_from_names(matched_queries, names))

And then using :py:class:`HTMLMarker` to display it in html (you could make your own also)::

>>> from luqum.naming import HTMLMarker
>>> mark_html = HTMLMarker() # you can customize some parameters, refer to doc
>>> mark_html(tree, paths_ok, paths_ko)
'<span class="ok"><span class="ko">foo~2 </span>OR (<span class="ko"><span class="ok">bar </span>AND baz</span>)</span>'


__ https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-named-queries-and-filters.html
__ https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-body.html#request-body-search-queries-and-filters
95 changes: 95 additions & 0 deletions luqum/elasticsearch/nested.py
@@ -0,0 +1,95 @@
"""If you have a query with a nested query containing operations,
when using named queries, Elasticsearch won't report inner matching.
This is a problem if you extensively use it.
"""


def get_first_name(query):
if isinstance(query, dict):
if "_name" in query:
return query["_name"]
elif "bool" in query:
# do not go down bool
return None
else:
children = query.values()
elif isinstance(query, list):
children = query
else:
return None
iter_candidates = (get_first_name(child) for child in children)
candidates = [candidate for candidate in iter_candidates if candidate is not None]
return candidates[0] if candidates else None


def extract_nested_queries(query, query_nester=None):
"""given a query,
extract all queries that are under a nested query and boolean operations,
returning an atomic nested version of them.
Those nested queries, also take care of changing the name to the nearest inner name,
This is useful for Elasticsearch won't go down explaining why a nested query is matching.
:param dict query: elasticsearch query to analyze
:param callable query_nester: this is the function called to nest sub queries, leave it default
:return list: queries that you should run to get all matching
.. note:: because we re-nest part of bool queries, results might not be accurate
for::
{"bool": "must" : [
{"nested": {"path": "a", "match": {"x": "y"}}},
{"nested": {"path": "a", "match": {"x": "z"}}}
]}
is not the same as::
{"nested": {"path": "a", "bool": "must": [{"match": {"x": "y"}}, {"match": {"x": "z"}}]}}
if x is multivalued.
The first would match `{"a": [{"x": "y"}, {"x": "z"}]}`
While the second would only match if `x` contains `"y z"` or `"z y"`
"""
queries = [] # this contains our result
in_nested = query_nester is not None
sub_query_nester = query_nester
if isinstance(query, dict):
if "nested" in query:
params = {k: v for k, v in query["nested"].items() if k not in ("query", "name")}

def sub_query_nester(req, name):
nested = {"nested": {"query": req, **params}}
if query_nester is not None:
nested = query_nester(nested, name)
if name is not None:
nested["nested"]["_name"] = name
return nested

bool_param = {"must", "should", "must_not"} & set(query.keys())
if bool_param and in_nested:
# we are in a list of operations in a bool inside a nested,
# make a query with nested on sub arguments
op, = bool_param # must or should or must_not
# normalize to a list
sub_queries = query[op] if isinstance(query[op], list) else [query[op]]
# add nesting
nested_sub_queries = [
query_nester(sub_query, get_first_name(sub_query)) for sub_query in sub_queries
]
# those are queries we want to return
queries.extend(nested_sub_queries)
# continue processing in each sub query
# (before nesting, nesting is contained in query_nester)
children = sub_queries
else:
children = query.values()
elif isinstance(query, list):
children = query
else:
# leaf: final recursivity
children = []

# recurse
for child_query in children:
queries.extend(
extract_nested_queries(child_query, query_nester=sub_query_nester)
)
return queries
14 changes: 11 additions & 3 deletions luqum/elasticsearch/tree.py
Expand Up @@ -126,7 +126,11 @@ def __init__(self, q, *args, **kwargs):
def json(self):
# field:* is transformed to exists query
if self.q == '*':
return {"exists": {"field": self.field}}
query = {"exists": {"field": self.field}}
name = getattr(self, "_name", None)
if name is not None:
query["exists"]["_name"] = name
return query
return super().json


Expand Down Expand Up @@ -233,10 +237,11 @@ class ENested(AbstractEOperation):
Take care to remove ENested children
"""

def __init__(self, nested_path, nested_fields, items, *args, **kwargs):
def __init__(self, nested_path, nested_fields, items, *args, _name=None, **kwargs):

self._nested_path = [nested_path]
self.items = self._exclude_nested_children(items)
self._name = _name

@property
def nested_path(self):
Expand Down Expand Up @@ -289,7 +294,10 @@ def _exclude_nested_children(self, subtree):

@property
def json(self):
return {'nested': {'path': self.nested_path, 'query': self.items.json}}
data = {'nested': {'path': self.nested_path, 'query': self.items.json}}
if self._name:
data['nested']['_name'] = self._name
return data


class EShould(EOperation):
Expand Down

0 comments on commit 681ef10

Please sign in to comment.