Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problems with phrase searches #2

Closed
disorder opened this issue Jul 11, 2010 · 11 comments
Closed

problems with phrase searches #2

disorder opened this issue Jul 11, 2010 · 11 comments

Comments

@disorder
Copy link

Keywords are recognized in phrases, simple example:
db.search("'not working'")

@johnl
Copy link
Owner

johnl commented Jul 12, 2010

Have you tried the phrase query in double quotes? The xapian query parser docs say that phrase searches are done with double quotes:

http://xapian.org/docs/queryparser.html

@disorder
Copy link
Author

I tried everything. Just try for yourself:

require 'xapian-fu'
include XapianFu
db = XapianDb.new(:dir => 'example.db', :create => true,
                  :store => [:title, :year])
db.search('"not working"')

# RuntimeError: QueryParserError: Syntax: <expression> NOT <expression>

@johnl
Copy link
Owner

johnl commented Jul 12, 2010

Hi,

I think this is a limitation of the Xapian query parser. You can work around it by turning off boolean keywords:

db.search('"not working"', :boolean => false)

but if you want to do more advanced searches with a phrase and some boolean keywords, I think you'd have to build the query programmatically using the Xapian interface itself. Try asking on the Xapian mailing list about though, there might be a way to work around it that we can't think of.

One other thing to note though, "not" and "and" are in the English stop words list, so they'll actually be ignored during indexing and in your phrase searches. You'll need to disable stop words, or write your own list if you need to include these.

@disorder
Copy link
Author

Xapian Ruby binding is able to search for phrases, that's what I used to work around this issue. There's something really wrong going on in QueryParser class.

@johnl
Copy link
Owner

johnl commented Jul 12, 2010

Hi,

I don't think anything is wrong with XapianFu's QueryParser, it just has different default settings.

The XapianFu QueryParser enables parsing of the boolean subexpressions by default. Xapian's own QueryParser does not. I've just checked and both XapianFu's and Xapian's QueryParsers produce the exact same query with :boolean disabled.

So just do your query in XapianFu with :boolean set to false and it's the same as using the Ruby binding direct.

John.

@disorder
Copy link
Author

You're confusing phrases, boolean searches and subexpressions now. I just thought I report the issue, that is all.

You should compare queries with multiple phrases but I have no idea how you can do that with boolean search turned off.

QueryParser is cutting down significantly on Xapian abilities. Also search "field:term" or with subexpression like in Xapian docs "title:(mice men)" won't work. Or to also include current issue "title:(something and not "xor")".

@johnl
Copy link
Owner

johnl commented Jul 12, 2010

Hi disorder,

I really appreciate your report! I just don't know to fix it - I think it's a limitation of Xapian's query parser - try setting the boolean flag in your direct Ruby binding QueryParser and you'll get the same error! The XapianFu QueryParser does very little - it passes all the work of actually parsing the query to the Xapian QueryParser.

And I'm not confusing phrases, boolean searches and subexpressions - any boolean query in Xapian by definition involves two subexpressions:

http://xapian.org/docs/queryparser.html

John.

@disorder
Copy link
Author

That is syntax definition, you can compose arbitrary complex query including nested subexpressions, multiple operators and love/hate symbols.

'example and (subexpression xor ("or" or "not")) +something -else'

Xapian::Query((Zexampl:(pos=1) AND ((Zsometh:(pos=5) AND_MAYBE (Zsubexpress:(pos=2) XOR (or:(pos=3) OR not:(pos=4)))) AND_NOT Zels:(pos=6))))

QueryParser would not understand this at all, in fact it will fail miserably:
QueryParserError: Syntax: OR (RuntimeError)

It would make sense to get rid of all query string mangling and also allow to pass Xapian::QueryParser flags directly.

@johnl
Copy link
Owner

johnl commented Jul 21, 2010

Hi Disorder,

sorry for the delay in replying, I didn't get a notification email from github.

XapianFu::QueryParser is just a very thin wrapper for Xapian::QueryParser - any "query string mangling" is done by Xapian, not XapianFu. XapianFu::QueryParser just constructs a Xapian::QueryParser with the stemmer, stopper and fields configured (it does not touch the actual query string (see lines 75 to 101 in lib/xapian_fu/query_parser.rb).

As far as I can tell, any differences you're seeing between XapianFu::QueryParser and Xapian::QueryParser are solely due to differences from the default options.

For more advanced useage though, I could add the ability to pass in a constructed Xapian::Query object to the search method though. What do you think about that?

Thanks again for the reports,

John.

@disorder
Copy link
Author

Hi,

So what would be the equivalent of this code in xapian-fu?:

qp = Xapian::QueryParser.new()
stemmer = Xapian::Stem.new("english")
qp.stemmer = stemmer
qp.database = @ro # @ro from xapian-fu
qp.stemming_strategy = Xapian::QueryParser::STEM_SOME
qflags = Xapian::QueryParser::FLAG_DEFAULT|Xapian::QueryParser::FLAG_BOOLEAN_ANY_CASE
query = qp.parse_query(queryString, qflags)

@johnl
Copy link
Owner

johnl commented Jul 7, 2011

Hi disorder,

you were right - phrase searching was completely broken. disabled by default and no way to enable it. I've no idea what I was doing when I couldn't reproduce this - it was easy to reproduce this time around.

So now I added support for it and it works now, you just need to set the :phrase option to true when doing a search (README has an example)

Released new version of gem with this in just now, 1.3.

@johnl johnl closed this as completed Jul 7, 2011
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants