Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

escaping special characters #191

Closed
anorkus opened this Issue · 8 comments

2 participants

@anorkus

hi,

i am using couchdb-lucene-0.10.0 and jdk 1.7 and i've stumbled upon problem with couchdb-lucene. Using standard analyser and querying (for example):
_fti/_design/lucene/search?q=dataset_name:QCD_2MuPEtaFilter_tuneD6T_7TeV-pythia6&limit=20&debug=true

it splits query on "-" character. If I escape it using "\" character (as suggested in apache query syntax it still splits it on "-"):
_fti/_design/lucene/search?q=dataset_name:QCD_2MuPEtaFilter_tuneD6T_7TeV-pythia6&limit=20&debug=true

results in:
{
"analyzer": "class org.apache.lucene.analysis.standard.StandardAnalyzer",
"etag": "1e6d311c56b0eb",
"fetch_duration": 0,
"limit": 1,
"plan": "BooleanQuery(TermQuery(dataset_name:qcd_2mupetafilter_tuned6t_7tev,boost=1.0)TermQuery(dataset_name:pythia6,boost=1.0),boost=1.0)",
"q": "dataset_name:qcd_2mupetafilter_tuned6t_7tev dataset_name:pythia6",
"id": "BPH-Summer11-00009",
"score": 5.513772487640381
}],
"search_duration": 5,
"skip": 0,
"total_rows": 2465
}

P.S. i removed the rows element as its irrelevant.

Is this a normal behaviour or should i be using different analyser for wildcard search on string including minuses/underscores/numbers?

Best regards,
Antanas Norkus

@rnewson
Owner
@anorkus

Thanks for response,
the problem with keyword analyzer that we would like to use search with wild cards, and it seems that it doesn't support it so nicely as standard analyzer.
if i search for
_fti/_design/lucene/search?q=dataset_name:Bc_EtaPtFilter_7TeV-bcvegpy2&debug=true
it returns me result nicely, but if i want to search:
_fti/_design/lucene/search?q=dataset_name:Bc_EtaPtFilter_7TeV-&debug=true
it finds:
{
"analyzer": "class org.apache.lucene.analysis.KeywordAnalyzer",
"etag": "1ebe77fc3d66ce",
"fetch_duration": 0,
"limit": 25,
"plan": "PrefixQuery(dataset_name:bc_etaptfilter_7tev-,boost=1.0)",
"q": "dataset_name:bc_etaptfilter_7tev-
",
"rows": [],
"search_duration": 0,
"skip": 0,
"total_rows": 0
}

P.S. can you elaborate more on your last sentence. Do i get it correctly "The escaping with \ is for querying" so if i query (using standard analyzer) for:
_fti/_design/lucene/search?q=dataset_name:QCD_2MuPEtaFilter_tuneD6T_7TeV\-pythia6&debug=true

it returns:
{
"analyzer": "class org.apache.lucene.analysis.standard.StandardAnalyzer",
"etag": "1ebf58e344a834",
"fetch_duration": 1,
"limit": 25,
"plan": "BooleanQuery(TermQuery(dataset_name:qcd_2mupetafilter_tuned6t_7tev,boost=1.0)TermQuery(dataset_name:pythia6,boost=1.0),boost=1.0)",
"q": "dataset_name:qcd_2mupetafilter_tuned6t_7tev dataset_name:pythia6",
"search_duration": 1,
"skip": 0,
"total_rows": 2465
}

and the it find everything that has dataset_name "qcd_2mupetafilter_tuned6t_7tev" OR "pythia6"
I need to find with AND. So now i just split by hand on all "-" and adding +AND+key:value before querying lucene and then most of the results are correct.

Best regards,
Antanas Norkus

@rnewson
Owner

I think you need;

[lucene]
lowercaseExpandedTerms=false
@anorkus

I will test this one. Is there a place (documentation) to find all possible couchdb-lucene configuration parameters?

@rnewson
Owner

Yes, that option is documented in the couchdb-lucene README file at https://github.com/rnewson/couchdb-lucene.

@anorkus

Tested, it worked nicely with keyword analyzer. Will leave it for now, on monday i will run full test + maybe my supervisor will do some meaningful physic's queries.
I saw that README has those 2 fields:
"allowLeadingWildcard" and "lowercaseExpandedTerms" explained. I didn't thought that 2nd one solves the problem with keyword search+wild cards.
My last question was in case there are more options that can be tuned in config file and where could i find all possible options to read and fine-tune couchdb-lucene.

Best regards,
Antanas Norkus

@rnewson
Owner

Yeah, there are more properties ('git grep ini.get' shows them all but that's obviously not as good as a document). The other settings that aren't documented in the README or the example .ini file allow tweaks to merge factor, whether compound files and how much ram to use before flushing to disk. All of those options should be considered 'expert mode' settings, though.

@anorkus

Ok,

thanks for pointing to get these options ;) but still it would be great to have a documentation for all possible options (no matter if they are expert mode ones). Those two mentioned options also are "expert" ones but they are documented.

I'm closing this non problematic issue. Thanks for help ;)

Have a good weekend,
Antanas Norkus

@anorkus anorkus closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.