Unanalyzed Fields #34

Closed
wants to merge 10 commits into
from

Conversation

Projects
None yet
2 participants
Contributor

adamalix commented May 11, 2012

This pull request introduces Unanalyzed fields that have the ability to be used as filter queries that are optionally cached in ElasticSearch.

adamalix and others added some commits May 10, 2012

Initial stab at Term queries and filter caching:
This commit adds the ability to have unanalyzed fields and to also cache filter
queries to speed up queries on Elasticsearch.
TODOs:
- Debug commented out line in testUnanalyzed Unit test (ElasticQueryTest:308)
  `"terms" : { "field" : [ [ "value1, "value2" ] ] } ` queries don't work
  because of a SearchParseException.
- Make SlashemGeoField extend SlashemUnanalyzedStringField
- Decide what to do for Solr queries on this field type (in the case of
  SlashemGeoField).  Do we want two separate field types one for both backends?
  The current plan is to not implement Term queries for Solr because we receive
  no benefit there.
Fix query gen for ES Empty[T] queries.
TODO: We should optimize our tree before generating queries
Changed SlashemGeoField to be unanalyzed:
- SlashemGeoField now extends SlashemUnanalyzedStringField
- Fixed extend() for Term[T] queries because of failing unit tests
@@ -782,28 +788,54 @@ trait SolrSchema[M <: Record[M]] extends SlashemSchema[M] {
}
+/**
+ * A field type for unanalyzed queries. Results in using Term[V] queries.
@holdenk

holdenk May 11, 2012

Contributor

Maybe add available for ES only.

@adamalix

adamalix May 11, 2012

Contributor

Then do we want separate SlashemGeoFields for Solr / ES? This should behave the same for ES / Solr with the current impl.

@holdenk

holdenk May 11, 2012

Contributor

If SlashemGeoFields generate Term[V] queries, then Term[V] queries need to
have query-gen support for Solr.

On Thu, May 10, 2012 at 5:32 PM, Adam Alix <
reply@reply.github.com

wrote:

@@ -782,28 +788,54 @@ trait SolrSchema[M <: Record[M]] extends
SlashemSchema[M] {

}

+/**

  • * A field type for unanalyzed queries. Results in using Term[V]
    queries.

Then do we want separate SlashemGeoFields for Solr / ES? This should
behave the same for ES / Solr with the current impl.


Reply to this email directly or view it on GitHub:
https://github.com/foursquare/slashem/pull/34/files#r806027

Cell : 425-233-8271

@adamalix

adamalix May 11, 2012

Contributor

It does, you're not seeing the latest because of the way this thing does reviews.

@holdenk

holdenk May 11, 2012

Contributor

Fuck github. Sorry for the dumb comment :p

@@ -328,6 +346,21 @@ class ElasticQueryTest extends SpecsMatchers with ScalaCheckMatchers {
Assert.assertEquals(ids2.intersect(idsWithFavNums).length, 3)
}
+ @Test
+ def testTermQueries {
@holdenk

holdenk May 11, 2012

Contributor

Add a test for in with empty list, also add a test for nin

+ escaped match {
+ // hack to fix wrapping the queries in a List()
+ case true => {'"' + escape(query.mkString("")) + '"'}
+ case false => '"' + query.toString + '"'
@holdenk

holdenk May 11, 2012

Contributor

Don't you want to do query.mkString("") here to?
Also add a test case for single element Term[T] query against Solr.

+ case term::Nil => EQueryBuilders.termQuery(fieldName, term).boost(weight)
+ case terms => {
+ val moarTerms = terms.toSeq.map(_.toString)
+ EQueryBuilders.termsQuery(fieldName, moarTerms: _*).boost(weight)
@holdenk

holdenk May 11, 2012

Contributor

Is this going to work with an empty terms? Or do we need to special case it?

+ case term::Nil => EFilterBuilders.termFilter(fieldName, term).cache(cached)
+ case terms => {
+ val moarTerms = terms.toSeq.map(_.toString)
+ EFilterBuilders.termsFilter(fieldName, moarTerms: _*).cache(cached)
@holdenk

holdenk May 11, 2012

Contributor

Is this going to work with an empty terms? Or do we need to special case it?

@@ -843,6 +875,14 @@ trait SlashemField[V, M <: Record[M]] extends OwnedField[M] {
//Slashem field types
class SlashemStringField[T <: Record[T]](owner: T) extends StringField[T](owner, 0) with SlashemField[String, T]
+/**
+ * Field type that can be queried without analyzing whitespace.
@holdenk

holdenk May 11, 2012

Contributor

I'd recommend "Field type that be queried without analyzing. An example of this would be a multi-value field or a whitespace tokenized field where search terms are always for a specific token". What do you think?

Added changes from Pull Request 34 feedback:
- Added Solr tests for Terms queries
- Added more comprehensive docstrings
- More comprehensive Elastic Tests
- Fixed Solr Terms queries
Contributor

adamalix commented May 14, 2012

Deleted branch because it got corrupt. Opening new request.

@adamalix adamalix closed this May 14, 2012

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment