Skip to content

Conversation

hkethi002
Copy link
Contributor

@hkethi002 hkethi002 commented Jul 28, 2017

When search is given a list of values for a field in filters, if "null" is included, it will return all containers with those values and those for which the field does not have a value.

Review Checklist

  • Tests were added to cover all code changes
  • Documentation was added / updated
  • Code and tests follow standards in CONTRIBUTING.md

@hkethi002 hkethi002 requested a review from nagem July 28, 2017 19:41
]
}
},
{"exists":{"field":k.split('.')[0]}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be rare that they would reference a top-level key like container_type, but we might want to be a bit more fault tolerant if they do reference a top level key. In that case, this line is unnecessary.

Copy link
Contributor

@nagem nagem Jul 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see below where you are appending to the should - my mistake. Also this comment is supposed to go with the comment above, but I'm sure you picked that up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I think I might have been confusing with my original comment about removing this line - I was assuming it would only be dynamically removed when there is no dot notation in the key (a top level key would have no dot notation). Wouldn't we still need it to prevent the return of documents that would never have the key anyway?

if "null" in v:
try:
v.remove("null")
except AttributeError:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In which cases would this fail? When v is a string, not a list? Mind making that more clear? Maybe checking v's type and acting on that knowledge instead?

@nagem
Copy link
Contributor

nagem commented Jul 28, 2017

Nice work, glad it was possible without too much complication. Mind adding the "missing": "null" key back in to the default term aggregations at the top? And in aggregate_field_values()?

v = None
null_filter = {
'bool': {
'should': [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is the only bool keyword at this level, it works out having this be a should, but shoulds are used mostly to affect the scoring and it is not required that they match (when with more strict bool options like must). I would switch it to a must so if new logic is added here, it has the intention.

Copy link
Contributor

@nagem nagem Jul 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading the docs again it looks like since this is also in a filter context and not a query context, should means "must match at least one".

If the bool query is a filter context or has neither must or filter then at least one of the should queries must match a document for it to match the bool query.

https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-bool-query.html#query-dsl-bool-query

Which doesn't actually make a difference in this context but ...
image

Copy link
Contributor Author

@hkethi002 hkethi002 Jul 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it was needed for when multiple boxes were checked because if a must was used would that not return only docs that both have the field set to a value and not, which would be none of them?

@nagem
Copy link
Contributor

nagem commented Jul 31, 2017

Changes look good, nice work. I'm going to test locally before giving it the final LGTM.

@nagem
Copy link
Contributor

nagem commented Jul 31, 2017

Seems to be something off in the logic:

Request:

{"filters":[{"terms":{"subject.sex":["null"]}}],"return_type":"session","all_data":false}

Result:

uwsgi_1           | 2017-07-31 21:05:31      scitran.api                  base.py   339:ERRO Traceback (most recent call last):
uwsgi_1           |   File "/usr/local/lib/python2.7/dist-packages/webapp2.py", line 570, in dispatch
uwsgi_1           |     return method(*args, **kwargs)
uwsgi_1           |   File "./api/auth/__init__.py", line 63, in check_login
uwsgi_1           |     return handler_method(self, *args, **kwargs)
uwsgi_1           |   File "./api/handlers/dataexplorerhandler.py", line 484, in search
uwsgi_1           |     return_type, filters, search_string = self._parse_request()
uwsgi_1           |   File "./api/handlers/dataexplorerhandler.py", line 340, in _parse_request
uwsgi_1           |     null_filter['bool']['should'][0]['must'].append({'exists': {'field': k.split('.')[0]}})
uwsgi_1           | KeyError: 'must'
uwsgi_1           |  request_id=9c5d87fe-1501535131
uwsgi_1           | [pid: 21|app: 0|req: 13/49] 172.19.0.1 () {50 vars in 1020 bytes} [Mon Jul 31 21:05:31 2017] POST /api/dataexplorer/search?facets=true => generated 41 bytes in 84 msecs (HTTP/1.1 500) 3 headers in 130 bytes (2 switches on core 1)

Using the UI I started a new search and checked the "null" option for subject sex.

@hkethi002
Copy link
Contributor Author

Small typo in parse_request function fixed that was causing that error when I added the {"exists":{"field":k.split('.')[0]}} line back in.

@nagem
Copy link
Contributor

nagem commented Aug 1, 2017

Fix works for me, feel free to merge!

@hkethi002 hkethi002 merged commit 615b948 into master Aug 1, 2017
@hkethi002 hkethi002 deleted the null-search branch August 1, 2017 20:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants