Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elements additions #62

Closed
wants to merge 2 commits into from

Conversation

Panos512
Copy link
Contributor

@Panos512 Panos512 commented Oct 4, 2016

  • Adds MalformedQuery element which is handled on various installations.
  • Adds Wildcard element to support queries containing wildcard characters (* and #).

* Adds a new ast element called MalformedQuery, this will be
  used by invenio_search to interpret malformed queries into
  google-like searches.

Signed-off-by: Panos Paparrigopoulos <panos.paparrigopoulos@cern.ch>
@Panos512 Panos512 changed the title Parsing additions Elements additions Oct 4, 2016
Copy link
Member

@jirikuncar jirikuncar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is breaking syntax change. We need approval from more services. cc @egabancho @tiborsimko

@@ -1,7 +1,8 @@
# -*- coding: utf-8 -*-
#
# This file is part of Invenio-Query-Parser.
# Copyright (C) 2014, 2016 CERN.
#
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not needed

@jirikuncar jirikuncar added this to the v1.0.0 milestone Oct 4, 2016
@jirikuncar
Copy link
Member

We should also check that Elastic query builder works with the new ast element.

@jirikuncar
Copy link
Member

@Panos512 Please test new element with all walkers and contrib/elasticsearch. Thanks

@egabancho
Copy link
Member

In principle I don't see any issue to include this, just a few questions

  • which difference does it make? I meant, with the current implementation the query it generates seems the same ...
  • why also # as wildcard and not only *?
  • what about *foo?

and as @jirikuncar mentioned

Please test new element with all walkers and contrib/elasticsearch

@jacquerie
Copy link

jacquerie commented Oct 13, 2016

  • why also # as wildcard and not only *?

Legacy from SPIRES. The two symbols were equivalent there, and they are still equivalent in the current INSPIRE: https://inspirehep.net/search?ln=en&ln=en&p=find+j+%22Phys.Rev.Lett.%2C105*%22 and https://inspirehep.net/search?ln=en&ln=en&p=find+j+%22Phys.Rev.Lett.%2C105%23%22

@Panos512
Copy link
Contributor Author

  • which difference does it make? I meant, with the current implementation the query it generates > seems the same ...

For our case we treat WildcardQuery differently (https://github.com/inspirehep/inspire-next/blob/master/inspirehep/modules/search/walkers/elasticsearch.py#L94), and we are going to use the MalformedQuery element to show messages to users when they perform malformed queries.

  • what about *foo?

The problem with accepting wildcards everywhere in a word is that there are some particles that contain wildcard characters (e.g. D/sJ*(2632)). So accepting them only in the end of a word is partly solving the problem.

and as @jirikuncar mentioned

Please test new element with all walkers and contrib/elasticsearch

Apart from invenio-query-parser tests, that pass at the moment, do we test elasticsearch query generation somewhere else?

@egabancho
Copy link
Member

The problem with accepting wildcards everywhere in a word is that there are some particles that contain wildcard characters (e.g. D/sJ*(2632)). So accepting them only in the end of a word is partly solving the problem.

IMHO this type of strings should be escaped using quotes to avoid ambiguity, i.e. "D/sJ*(2632)".
We have more cases like this, for example titles containing and or or, and not because of that we stopped using this keywords in the middle of queries.

Apart from invenio-query-parser tests, that pass at the moment, do we test elasticsearch query generation somewhere else?

Test are passing indeed but, you didn't add any test to the walkers to check that your change is not breaking them, which is what we are asking for.
In fact you didn't modify the walkers at all, that's why I wondered before which difference it makes 😉

Maybe if this is so inspire/spires centric you should consider updating contrib/spires instead (just a thought).

@Panos512
Copy link
Contributor Author

IMHO this type of strings should be escaped using quotes to avoid ambiguity, i.e. "D/sJ*(2632)".
We have more cases like this, for example titles containing and or or, and not because of that we stopped using this keywords in the middle of queries.

I'll take a look into this. Indeed it could be done like this and it's the optimal solution for us. But are we sure that the users are going to escape words with special characters? Maybe we could white-list all known particles and ignore special characters in them ( I could resurrect #39 for this ).

Test are passing indeed but, you didn't add any test to the walkers to check that your change is not breaking them, which is what we are asking for.
In fact you didn't modify the walkers at all, that's why I wondered before which difference it makes 😉

That's true, I will update the walkers + tests.

Maybe if this is so inspire/spires centric you should consider updating contrib/spires instead (just a thought).

The thing is that we need this functionality in the invenio-syntax and not only on spires-syntax.

* Adds wildcards (`*`, `#`) support to queries.

Signed-off-by: Panos Paparrigopoulos <panos.paparrigopoulos@cern.ch>
@kaplun
Copy link
Member

kaplun commented Nov 15, 2016

@jirikuncar I think @Panos512 has fullfilled the requests, and thus this PR is in good state to be merged.

@@ -182,3 +186,7 @@ class RegexValue(Leaf):

class EmptyQuery(Leaf):
pass


class MalformedQuery(Leaf):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you still need MalformedQuery?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally yes. This is then needed to implement: inspirehep/inspire-next#1182

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case I don't see any test for it.

@@ -80,6 +81,24 @@ def visit(self, node, op):
def visit(self, node):
return node.value

@visitor(WildcardQuery)
def visit(self, node):
def query(keyword):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see tests for this part.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Panos has now left. Can you maybe guide us on concretely what type of tests we would need to write?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

def query(keyword):
fields = self.get_fields_for_keyword(keyword, mode='p')
if len(fields) > 1:
res = Q('bool', should=[
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return Q('bool', should=[

query=value,
default_field=k,
analyze_wildcard=True) for k in fields])
else:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- else
+ return Q('query_string',

@jacquerie jacquerie self-assigned this Nov 15, 2016
@lnielsen
Copy link
Member

Closing PR as it's been stale for a long time.

@lnielsen lnielsen closed this Feb 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants