Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Term queries #36

Merged
merged 6 commits into from about 2 years ago

2 participants

Adam Alix Holden Karau
Adam Alix
Collaborator

This pull request introduces unanalyzed query types and fields that have the ability to be used as filter queries that are optionally cached in ElasticSearch.

adamalix added some commits
Adam Alix adamalix Initial stab at Term queries and filter caching:
This commit adds the ability to have unanalyzed fields and to also cache filter
queries to speed up queries on Elasticsearch.
TODOs:
- Debug commented out line in testUnanalyzed Unit test (ElasticQueryTest:308)
  `"terms" : { "field" : [ [ "value1, "value2" ] ] } ` queries don't work
  because of a SearchParseException.
- Make SlashemGeoField extend SlashemUnanalyzedStringField
- Decide what to do for Solr queries on this field type (in the case of
  SlashemGeoField).  Do we want two separate field types one for both backends?
  The current plan is to not implement Term queries for Solr because we receive
  no benefit there.
a1ac4bc
Adam Alix adamalix Fixed Terms queries and filters. 75d6c4f
Adam Alix adamalix Tests for Term filter queries. 7a6eeae
Adam Alix adamalix Changed SlashemGeoField to be unanalyzed:
- SlashemGeoField now extends SlashemUnanalyzedStringField
- Fixed extend() for Term[T] queries because of failing unit tests
db1a33a
Adam Alix adamalix Added changes from Pull Request 34 feedback:
- Added Solr tests for Terms queries
- Added more comprehensive docstrings
- More comprehensive Elastic Tests
- Fixed Solr Terms queries
4d33731
src/main/scala/com/foursquare/slashem/Ast.scala
((6 lines not shown))
  413 + *
  414 + * By default, elasticFilter() will always be cached!
  415 + */
  416 + case class Term[T](query: Iterable[T], escaped: Boolean = true, cached: Boolean = true) extends Query[T] {
  417 + // hack for single term queries
  418 + def this(query: T) = this(List(query))
  419 + /** @inheritdoc */
  420 + //def extend() = throw new UnimplementedException("Slashem does not support Term queries Solr")
  421 + def extend(): String = {
  422 + escaped match {
  423 + // hack to fix wrapping the queries in a List()
  424 + case true => {
  425 + val queries = query.map(q => {'"' + escape(q.toString) + '"'})
  426 + queries.mkString(" OR ")
  427 + }
  428 +// case true => {'"' + query.mkString("\" OR \"")
1
Holden Karau
holdenk added a note

Commented out code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
src/main/scala/com/foursquare/slashem/Ast.scala
@@ -7,6 +7,7 @@ import org.elasticsearch.index.query.{FilterBuilder => ElasticFilterBuilder,
7 7 QueryBuilder => ElasticQueryBuilder,
8 8 QueryBuilders => EQueryBuilders,
9 9 QueryStringQueryBuilder}
  10 +import scalaj.collection.Imports._
1
Holden Karau
holdenk added a note

killme

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Holden Karau holdenk merged commit a71fd7d into from
Holden Karau holdenk closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Showing 6 unique commits by 1 author.

May 14, 2012
Adam Alix adamalix Initial stab at Term queries and filter caching:
This commit adds the ability to have unanalyzed fields and to also cache filter
queries to speed up queries on Elasticsearch.
TODOs:
- Debug commented out line in testUnanalyzed Unit test (ElasticQueryTest:308)
  `"terms" : { "field" : [ [ "value1, "value2" ] ] } ` queries don't work
  because of a SearchParseException.
- Make SlashemGeoField extend SlashemUnanalyzedStringField
- Decide what to do for Solr queries on this field type (in the case of
  SlashemGeoField).  Do we want two separate field types one for both backends?
  The current plan is to not implement Term queries for Solr because we receive
  no benefit there.
a1ac4bc
Adam Alix adamalix Fixed Terms queries and filters. 75d6c4f
Adam Alix adamalix Tests for Term filter queries. 7a6eeae
Adam Alix adamalix Changed SlashemGeoField to be unanalyzed:
- SlashemGeoField now extends SlashemUnanalyzedStringField
- Fixed extend() for Term[T] queries because of failing unit tests
db1a33a
Adam Alix adamalix Added changes from Pull Request 34 feedback:
- Added Solr tests for Terms queries
- Added more comprehensive docstrings
- More comprehensive Elastic Tests
- Fixed Solr Terms queries
4d33731
Adam Alix adamalix Killed commented out code and unnecessary import. c0b1673
This page is out of date. Refresh to see the latest.
48 src/main/scala/com/foursquare/slashem/Ast.scala
@@ -407,6 +407,50 @@ object Ast {
407 407 }
408 408 }
409 409
  410 + /**
  411 + * A term query. Used for queries that don't need to be analyzed
  412 + *
  413 + * By default, elasticFilter() will always be cached!
  414 + */
  415 + case class Term[T](query: Iterable[T], escaped: Boolean = true, cached: Boolean = true) extends Query[T] {
  416 + // hack for single term queries
  417 + def this(query: T) = this(List(query))
  418 + /** @inheritdoc */
  419 + //def extend() = throw new UnimplementedException("Slashem does not support Term queries Solr")
  420 + def extend(): String = {
  421 + escaped match {
  422 + // hack to fix wrapping the queries in a List()
  423 + case true => {
  424 + val queries = query.map(q => {'"' + escape(q.toString) + '"'})
  425 + queries.mkString(" OR ")
  426 + }
  427 + case false => '"' + query.mkString(" OR ") + '"'
  428 + }
  429 + }
  430 + /** @inheritdoc */
  431 + def elasticExtend(qf: List[WeightedField], pf: List[PhraseWeightedField], mm: Option[String]): ElasticQueryBuilder = {
  432 + val fieldName = qf.head.fieldName
  433 + val weight = qf.head.weight.toFloat
  434 + query match {
  435 + case term::Nil => EQueryBuilders.termQuery(fieldName, term).boost(weight)
  436 + case terms => {
  437 + val moarTerms = terms.toSeq.map(_.toString)
  438 + EQueryBuilders.termsQuery(fieldName, moarTerms: _*).boost(weight)
  439 + }
  440 + }
  441 + }
  442 + /** @inheritdoc */
  443 + override def elasticFilter(qf: List[WeightedField]): ElasticFilterBuilder = {
  444 + val fieldName = qf.head.fieldName
  445 + query match {
  446 + case term::Nil => EFilterBuilders.termFilter(fieldName, term).cache(cached)
  447 + case terms => {
  448 + val moarTerms = terms.toSeq.map(_.toString)
  449 + EFilterBuilders.termsFilter(fieldName, moarTerms: _*).cache(cached)
  450 + }
  451 + }
  452 + }
  453 + }
410 454
411 455 case class Range[T](q1: Query[T],q2: Query[T]) extends Query[T] {
412 456 /** @inheritdoc */
@@ -487,7 +531,7 @@ object Ast {
487 531 }
488 532
489 533 /**
490   - * Class representing clauses ANDed together
  534 + * Class representing queries ANDed together
491 535 */
492 536 case class And[T](queries: Query[T]*) extends Query[T] {
493 537 /** @inheritdoc */
@@ -507,7 +551,7 @@ object Ast {
507 551 }
508 552 }
509 553 /**
510   - * Case class representing a list of clauses ORed together
  554 + * Case class representing a list of queries ORed together
511 555 */
512 556 case class Or[T](queries: Query[T]*) extends Query[T] {
513 557 /** @inheritdoc */
67 src/main/scala/com/foursquare/slashem/Schema.scala
@@ -55,6 +55,12 @@ case class SolrResponseException(code: Int, reason: String, solrName: String, qu
55 55 }
56 56 }
57 57
  58 +case class UnimplementedException(reason: String) extends RuntimeException {
  59 + override def getMessage(): String = {
  60 + "Not implemented: %s".format(reason)
  61 + }
  62 +}
  63 +
58 64 /** The response header. There are normally more fields in the response header we could extract, but
59 65 * we don't at present. */
60 66 case class ResponseHeader @JsonCreator()(@JsonProperty("status")status: Int, @JsonProperty("QTime")QTime: Int)
@@ -433,8 +439,8 @@ trait SolrGeoHash {
433 439 }
434 440 //Default geohash, does nothing.
435 441 object NoopSolrGeoHash extends SolrGeoHash {
436   - def coverString (geoLat: Double, geoLong: Double, radiusInMeters: Int, maxCells: Int ): Seq[String] = List("pleaseUseaRealGeoHash")
437   - def rectCoverString(topRight: (Double, Double), bottomLeft: (Double, Double), maxCells: Int = 0, minLevel: Int = 0, maxLevel: Int = 0): Seq[String] = List("pleaseUseaRealGeoHash")
  442 + def coverString (geoLat: Double, geoLong: Double, radiusInMeters: Int, maxCells: Int ): Seq[String] = List("pleaseUseaRealGeoHash", "thisIsForFunctionalityTests")
  443 + def rectCoverString(topRight: (Double, Double), bottomLeft: (Double, Double), maxCells: Int = 0, minLevel: Int = 0, maxLevel: Int = 0): Seq[String] = List("pleaseUseaRealGeoHash", "thisIsForFunctionalityTests")
438 444 }
439 445
440 446 trait SlashemSchema[M <: Record[M]] extends Record[M] {
@@ -782,28 +788,54 @@ trait SolrSchema[M <: Record[M]] extends SlashemSchema[M] {
782 788
783 789 }
784 790
  791 +/**
  792 + * A field type for unanalyzed queries. Results in using Term[V] queries.
  793 + */
  794 +trait SlashemUnanalyzedField[V, M <: Record[M]] extends SlashemField[V, M] {
  795 + self: Field[V, M] =>
  796 +
  797 + override val unanalyzed = true
  798 +}
785 799
786 800 trait SlashemField[V, M <: Record[M]] extends OwnedField[M] {
787 801 self: Field[V, M] =>
788 802 import Helpers._
789 803
790   - //Note eqs and neqs results in phrase queries!
791   - def eqs(v: V) = Clause[V](self.queryName, Group(Phrase(v)))
792   - def neqs(v: V) = Clause[V](self.queryName, Phrase(v),false)
  804 + // Override this value to produce unanalyzed queries!
  805 + val unanalyzed = false
  806 +
  807 + def produceQuery(v: V): Query[V] = {
  808 + unanalyzed match {
  809 + // use new to use Term's additional non-default constructor
  810 + case true => new Term(v)
  811 + case false => Phrase(v)
  812 + }
  813 + }
  814 +
  815 + def produceGroupedQuery(v: Iterable[V]): Query[V] = {
  816 + unanalyzed match {
  817 + // we don't want to groupWithOr and instead take advantage of "terms" queries
  818 + case true => Term(v)
  819 + case false => groupWithOr(v.map({x: V => produceQuery(x)}))
  820 + }
  821 + }
  822 +
  823 + def eqs(v: V) = Clause[V](self.queryName, Group(produceQuery(v)))
  824 + def neqs(v: V) = Clause[V](self.queryName, produceQuery(v),false)
793 825 //With a boost
794   - def eqs(v: V, b: Float) = Clause[V](self.queryName, Boost(Group(Phrase(v)),b))
795   - def neqs(v: V, b:Float) = Clause[V](self.queryName, Boost(Phrase(v),b),false)
  826 + def eqs(v: V, b: Float) = Clause[V](self.queryName, Boost(Group(produceQuery(v)),b))
  827 + def neqs(v: V, b:Float) = Clause[V](self.queryName, Boost(produceQuery(v),b),false)
796 828
797 829
798 830 //This allows for bag of words style matching.
799 831 def contains(v: V) = Clause[V](self.queryName, Group(BagOfWords(v)))
800 832 def contains(v: V, b: Float) = Clause[V](self.queryName, Boost(Group(BagOfWords(v)),b))
801 833
802   - def in(v: Iterable[V]) = Clause[V](self.queryName, groupWithOr(v.map({x: V => Phrase(x)})))
803   - def nin(v: Iterable[V]) = Clause[V](self.queryName, groupWithOr(v.map({x: V => Phrase(x)})),false)
  834 + def in(v: Iterable[V]) = Clause[V](self.queryName, produceGroupedQuery(v))
  835 + def nin(v: Iterable[V]) = Clause[V](self.queryName, produceGroupedQuery(v),false)
804 836
805   - def in(v: Iterable[V], b: Float) = Clause[V](self.queryName, Boost(groupWithOr(v.map({x: V => Phrase(x)})),b))
806   - def nin(v: Iterable[V], b: Float) = Clause[V](self.queryName, Boost(groupWithOr(v.map({x: V => Phrase(x)})),b),false)
  837 + def in(v: Iterable[V], b: Float) = Clause[V](self.queryName, Boost(produceGroupedQuery(v),b))
  838 + def nin(v: Iterable[V], b: Float) = Clause[V](self.queryName, Boost(produceGroupedQuery(v),b),false)
807 839
808 840 def inRange(v1: V, v2: V) = Clause[V](self.queryName, Group(Range(BagOfWords(v1),BagOfWords(v2))))
809 841 def ninRange(v1: V, v2: V) = Clause[V](self.queryName, Group(Range(BagOfWords(v1),BagOfWords(v2))),false)
@@ -843,6 +875,17 @@ trait SlashemField[V, M <: Record[M]] extends OwnedField[M] {
843 875
844 876 //Slashem field types
845 877 class SlashemStringField[T <: Record[T]](owner: T) extends StringField[T](owner, 0) with SlashemField[String, T]
  878 +/**
  879 + * Field type that can be queried without analyzing.
  880 + *
  881 + * Ex: multi-value field or a whitespace tokenized field where
  882 + * search terms are always for a specific token.
  883 + *
  884 + * @see SlashemStringField
  885 + */
  886 +class SlashemUnanalyzedStringField[T <: Record[T]](owner: T)
  887 + extends StringField[T](owner, 0) with SlashemUnanalyzedField[String, T]
  888 +
846 889 //Allows for querying against the default filed in solr. This field doesn't have a name
847 890 class SlashemDefaultStringField[T <: Record[T]](owner: T) extends StringField[T](owner, 0) with SlashemField[String, T] {
848 891 override def name = ""
@@ -951,7 +994,7 @@ class SlashemPointField[T <: Record[T]](owner: T) extends PointField[T](owner) w
951 994 class SlashemBooleanField[T <: Record[T]](owner: T) extends BooleanField[T](owner) with SlashemField[Boolean, T]
952 995 class SlashemDateTimeField[T <: Record[T]](owner: T) extends JodaDateTimeField[T](owner) with SlashemField[DateTime, T]
953 996 //More restrictive type so we can access the geohash
954   -class SlashemGeoField[T <: SlashemSchema[T]](owner: T) extends StringField[T](owner,0) with SlashemField[String, T] {
  997 +class SlashemGeoField[T <: SlashemSchema[T]](owner: T) extends SlashemUnanalyzedStringField[T](owner) {
955 998 def inRadius(geoLat: Double, geoLong: Double, radiusInMeters: Int, maxCells: Int = owner.geohash.maxCells) = {
956 999 val cellIds = owner.geohash.coverString(geoLat, geoLong, radiusInMeters, maxCells = maxCells)
957 1000 //If we have an empty cover we default to everything.
36 src/test/scala/com/foursquare/slashem/ElasticQueryTest.scala
@@ -313,20 +313,26 @@ class ElasticQueryTest extends SpecsMatchers with ScalaCheckMatchers {
313 313 def testListFieldIn {
314 314 val response1 = ESimplePanda where (_.favnums in List(2, 3, 4, 5)) fetch()
315 315 val response2 = ESimplePanda where (_.favnums in List(99)) fetch()
  316 + val response3 = ESimplePanda where (_.termsfield in List("termhit", "lol")) fetch()
316 317 Assert.assertEquals(response1.response.results.length, 2)
317 318 Assert.assertEquals(response2.response.results.length, 0)
  319 + Assert.assertEquals(response3.response.results.length, 1)
318 320 }
319 321
320 322 @Test
321 323 def testIntListFieldEmptyIn {
322   - val response = ESimplePanda where (_.favnums in List()) fetch()
323   - Assert.assertEquals(response.response.results.length, 0)
  324 + val response1 = ESimplePanda where (_.favnums in List()) fetch()
  325 + val response2 = ESimplePanda where (_.termsfield in List()) fetch()
  326 + Assert.assertEquals(response1.response.results.length, 0)
  327 + Assert.assertEquals(response2.response.results.length, 0)
324 328 }
325 329
326 330 @Test
327 331 def testIntListFieldEmptyNin {
328   - val response = ESimplePanda where (_.favnums nin List()) fetch()
329   - Assert.assertEquals(response.response.results.length, 8)
  332 + val response1 = ESimplePanda where (_.favnums nin List()) fetch()
  333 + val response2 = ESimplePanda where (_.termsfield nin List()) fetch()
  334 + Assert.assertEquals(response1.response.results.length, 8)
  335 + Assert.assertEquals(response2.response.results.length, 8)
330 336 }
331 337
332 338 @Test
@@ -344,6 +350,26 @@ class ElasticQueryTest extends SpecsMatchers with ScalaCheckMatchers {
344 350 val ids2 = response2.response.oids
345 351 // All three docs with favnums should be returned, none contain 99
346 352 Assert.assertEquals(ids2.intersect(idsWithFavNums).length, 3)
  353 +
  354 + val response3 = ESimplePanda where (_.termsfield nin List("termhit")) fetch()
  355 + val ids3 = response3.response.oids
  356 + // All three docs with favnums should be returned, none contain 99
  357 + Assert.assertEquals(ids3.intersect(idsWithFavNums).length, 2)
  358 + }
  359 +
  360 + @Test
  361 + def testTermQueries {
  362 + val res1 = ESimplePanda where (_.termsfield eqs "termhit") fetch()
  363 + val res2 = ESimplePanda where (_.termsfield in List("randomterm", "termhit")) fetch()
  364 + Assert.assertEquals(res1.response.results.length, 1)
  365 + Assert.assertEquals(res2.response.results.length, 1)
  366 + }
  367 +
  368 + @Test
  369 + def testTermFilters {
  370 + // grab 2 results, filter to 1
  371 + val res1 = ESimplePanda where (_.hugenums contains 1L) filter(_.termsfield in List("termhit", "randomterm")) fetch()
  372 + Assert.assertEquals(res1.response.results.length, 1)
347 373 }
348 374
349 375 @Before
@@ -394,6 +420,7 @@ class ElasticQueryTest extends SpecsMatchers with ScalaCheckMatchers {
394 420 val favnums1 = List(1, 2, 3, 4, 5).asJava
395 421 val favnums2 = List(1, 2, 3, 4, 5).asJava
396 422 val favnums3 = List(6, 7, 8, 9, 10).asJava
  423 + val terms1 = List("termhit", "nohit").asJava
397 424 val nicknames1 = List("jerry", "dawg", "xzibit").asJava
398 425 val nicknames2 = List("xzibit", "alvin").asJava
399 426 val nicknames3 = List("alvin", "nathaniel", "joiner").asJava
@@ -407,6 +434,7 @@ class ElasticQueryTest extends SpecsMatchers with ScalaCheckMatchers {
407 434 .field("favnums", favnums1)
408 435 .field("nicknames", nicknames1)
409 436 .field("hugenums", hugenums1)
  437 + .field("termsfield", terms1)
410 438 .endObject()
411 439 ).execute()
412 440 .actionGet();
1  src/test/scala/com/foursquare/slashem/ElasticTest.scala
@@ -18,6 +18,7 @@ class ESimplePanda extends ElasticSchema[ESimplePanda] {
18 18 object favnums extends SlashemIntListField(this)
19 19 object nicknames extends SlashemStringListField(this)
20 20 object hugenums extends SlashemLongListField(this)
  21 + object termsfield extends SlashemUnanalyzedStringField(this)
21 22 }
22 23
23 24 object ESimpleGeoPanda extends ESimpleGeoPanda with ElasticMeta[ESimpleGeoPanda] {
6 src/test/scala/com/foursquare/slashem/QueryTest.scala
@@ -562,7 +562,7 @@ class QueryTest extends SpecsMatchers with ScalaCheckMatchers {
562 562 "qf" -> "text",
563 563 "qf" -> "ngram_name^0.2",
564 564 "qf" -> "tags^0.01",
565   - "fq" -> "geo_s2_cell_ids:(\"pleaseUseaRealGeoHash\")",
  565 + "fq" -> "geo_s2_cell_ids:(\"pleaseUseaRealGeoHash\" OR \"thisIsForFunctionalityTests\")",
566 566 "tieBreaker" -> "0.2",
567 567 "fl" -> "id,name,userid,mayorid,category_id_0,popularity,decayedPopularity1,lat,lng,checkin_info,score,hasSpecial,address,crossstreet,city,state,zip,country,checkinCount,partitionedPopularity",
568 568 "bq" -> "name:(holden's hobohut)^10.0",
@@ -609,7 +609,7 @@ class QueryTest extends SpecsMatchers with ScalaCheckMatchers {
609 609 "qf" -> "text",
610 610 "qf" -> "ngram_name^0.2",
611 611 "qf" -> "tags^0.01",
612   - "fq" -> "geo_s2_cell_ids:(\"pleaseUseaRealGeoHash\")",
  612 + "fq" -> "geo_s2_cell_ids:(\"pleaseUseaRealGeoHash\" OR \"thisIsForFunctionalityTests\")",
613 613 "tieBreaker" -> "0.2",
614 614 "fl" -> "id,name,userid,mayorid,category_id_0,popularity,decayedPopularity1,lat,lng,checkin_info,score,hasSpecial,address,crossstreet,city,state,zip,country,checkinCount,partitionedPopularity",
615 615 "bq" -> "name:(holden's hobohut)^10.0",
@@ -630,7 +630,7 @@ class QueryTest extends SpecsMatchers with ScalaCheckMatchers {
630 630 "q" -> "(DJ Hixxy)",
631 631 "start" -> "0",
632 632 "rows" -> "10",
633   - "fq" -> "geo_s2_cell_ids:(\"pleaseUseaRealGeoHash\")")
  633 + "fq" -> "geo_s2_cell_ids:(\"pleaseUseaRealGeoHash\" OR \"thisIsForFunctionalityTests\")")
634 634 Assert.assertEquals(Nil, ((qp.toSet &~ expected.toSet)).toList)
635 635 Assert.assertEquals(Nil, (expected.toSet &~ qp.toSet).toList)
636 636 }

Tip: You can add notes to lines in a file. Hover to the left of a line to make a note

Something went wrong with that request. Please try again.