New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Move functions that generate lucene geometries under a utility class #104928

Merged

iverase merged 14 commits into elastic:main from iverase:LuceneGeometriesUtil

Jan 31, 2024

Contributor

iverase commented Jan 30, 2024

We have functions that generate lucene geometries scattered in different places of the code. This PR moves everything under a utility class.


          Move functions that generate Lucene geometries under a utility class

99d3a4d

iverase added :Analytics/Geo >refactoring v8.13.0 labels

iverase requested a review from craigtaverner

January 30, 2024 15:20

iverase marked this pull request as draft

January 30, 2024 15:20

craigtaverner reviewed

View reviewed changes

server/src/main/java/org/elasticsearch/index/mapper/GeoShapeQueryable.java

+                      final Consumer<ShapeType> checker = relation == ShapeRelation.WITHIN ? t -> {
+                          if (t == ShapeType.LINESTRING) {
+                              // Line geometries and WITHIN relation is not supported by Lucene. Throw an error here
+                              // to have same behavior for runtime fields.

Contributor

craigtaverner Jan 30, 2024

Even if not supported by lucene, I presume in ES|QL that simply means we cannot optimize this case by pushing down to lucene, but could still support it using source values.

Contributor Author

iverase Jan 30, 2024

We cannot support it because of quantization. We are changing the language not the indexing model.


          add javadocs

68c7157

craigtaverner reviewed

View reviewed changes

server/src/main/java/org/elasticsearch/index/mapper/GeoShapeQueryable.java Outdated

                       final LatLonGeometry[] luceneGeometries;
                       try {
-                          luceneGeometries = toQuantizeLuceneGeometry(shape, relation);
+                          luceneGeometries = LuceneGeometriesUtil.toLatLonGeometry(geometry, true, checker);

Contributor

craigtaverner Jan 30, 2024

Great that it is possible to call with both quantize and not quantize options.

iverase added 7 commits

January 30, 2024 17:06


          Merge remote-tracking branch 'upstream/main' into LuceneGeometriesUtil

a3143fa


          fix empty geometries

d3e6c9f


          add tests

f2b7710


          Merge remote-tracking branch 'upstream/main' into LuceneGeometriesUtil

db69df6


          more test

12008b4


          typo

66b9999


          finalize testing

3a6bbe2

iverase marked this pull request as ready for review

January 31, 2024 07:31

Collaborator

elasticsearchmachine commented Jan 31, 2024

Pinging @elastic/es-analytical-engine (Team:Analytics)

elasticsearchmachine added the Team:Analytics label

iverase added 3 commits

January 31, 2024 10:13


          iter

af7eed5


          rename

3bf66fc


          rename

b9c7fd4

craigtaverner requested changes

View reviewed changes

Contributor

craigtaverner left a comment

I think this work is great. Just wanted to remove some unused code, and had an idea for an optimization of not-quantizing double[] lat/lon.

server/src/main/java/org/elasticsearch/common/geo/LuceneGeometriesUtils.java Outdated

+                  /**
+                   * Transform an Elasticsearch {@link Geometry} into a lucene {@link LatLonGeometry}
+                   *
+                   * @param geometry the geometry tio transform

Contributor

craigtaverner Jan 31, 2024

Suggested change

      
                 * @param geometry the geometry tio transform
          
                 * @param geometry the geometry to transform

server/src/main/java/org/elasticsearch/common/geo/LuceneGeometriesUtils.java Outdated

+                          public Void visit(org.elasticsearch.geometry.Polygon polygon) {
+                              checker.accept(ShapeType.POLYGON);
+                              if (polygon.isEmpty() == false) {
+                                  org.apache.lucene.geo.Polygon[] holes = new org.apache.lucene.geo.Polygon[polygon.getNumberOfHoles()];

Contributor

craigtaverner Jan 31, 2024

These holes are never used, and then they are recalculated later in toLatLonPolygon, so I think this code should be deleted.

server/src/main/java/org/elasticsearch/common/geo/LuceneGeometriesUtils.java

+                   * @param checker call for every {@link ShapeType} found in the Geometry. It allows to throw an error if a geometry is
+                   *                not supported.
+                   *
+                   * @return an array of {@link LatLonGeometry}

Contributor

craigtaverner Jan 31, 2024

I notice that it flattens GeometryCollection, MultiPoint, MultiLine and MultiPolygon into a collection of non-multi geometries. I presume this loss of hierarchy should not have consequences in the code we plan to use this in. It seems like it will be fine for the ST_INTERSECTS work, and presumably all other cases.

server/src/main/java/org/elasticsearch/common/geo/LuceneGeometriesUtils.java Outdated

+                  }
+                  static double[] quantizeLats(double[] lats, DoubleFunction function) {
+                      return Arrays.stream(lats).map(function::apply).toArray();

Contributor

craigtaverner Jan 31, 2024

I presume we could do a small optimization here if quantize==false we could skip the entire stream...toArray.

For example, make two classes, one with two methods that simply return the passed in arrays, and another with two methods that do quantization. Then we initialize those at the top of the main method.

server/src/main/java/org/elasticsearch/common/geo/LuceneGeometriesUtils.java Outdated

+                   *                a geometry is not supported.
+                   * @return an array of {@link XYGeometry}
+                   */

Contributor

craigtaverner Jan 31, 2024

Extra blank line

server/src/test/java/org/elasticsearch/common/geo/LuceneGeometriesUtilsTests.java

+                      assertThat(geometry, instanceOf(org.apache.lucene.geo.Polygon.class));
+                      org.apache.lucene.geo.Polygon lalonPolygon = (org.apache.lucene.geo.Polygon) geometry;
+                      assertThat(lalonPolygon.getPolyLons(), equalTo(polygon.getPolygon().getLons()));
+                      assertThat(lalonPolygon.getPolyLats(), equalTo(polygon.getPolygon().getLats()));

Contributor

craigtaverner Jan 31, 2024

Should we make assertions on the holes also?

server/src/test/java/org/elasticsearch/common/geo/LuceneGeometriesUtilsTests.java

+                      assertThat(geometry, instanceOf(org.apache.lucene.geo.XYPolygon.class));
+                      org.apache.lucene.geo.XYPolygon xyPolygon = (org.apache.lucene.geo.XYPolygon) geometry;
+                      assertThat(xyPolygon.getPolyX(), equalTo(LuceneGeometriesUtils.doubleArrayToFloatArray(polygon.getPolygon().getX())));
+                      assertThat(xyPolygon.getPolyY(), equalTo(LuceneGeometriesUtils.doubleArrayToFloatArray(polygon.getPolygon().getY())));

Contributor

craigtaverner Jan 31, 2024

Should we make assertions on the holes also?

...tial/src/main/java/org/elasticsearch/xpack/spatial/index/query/ShapeQueryPointProcessor.java

+                              throw new QueryShardException(context, "Field [" + fieldName + "] does not support " + t + " queries");
+                          }
+                      };
+                      final XYGeometry[] luceneGeometries = LuceneGeometriesUtils.toXYGeometry(geometry, checker);

Contributor

craigtaverner Jan 31, 2024

Wow! This is a dramatic simplification!

...tial/src/main/java/org/elasticsearch/xpack/spatial/index/query/ShapeQueryPointProcessor.java

+                          }
+                      };
+                      final XYGeometry[] luceneGeometries = LuceneGeometriesUtils.toXYGeometry(geometry, checker);
+                      Query query = XYPointField.newGeometryQuery(fieldName, luceneGeometries);

Contributor

craigtaverner Jan 31, 2024

Am I understanding this correctly? The old code would generate one queryu for one of Circle, Polygon, MultiPolygon and Rectangle, while the new code would generate a single query containing all of those types? Or perhaps GeometryCollection is not supported so we will never see more than one anyway?

...n/spatial/src/main/java/org/elasticsearch/xpack/spatial/index/query/ShapeQueryProcessor.java

+                      } catch (IllegalArgumentException e) {
+                          throw new QueryShardException(context, "Exception creating query on Field [" + fieldName + "] " + e.getMessage(), e);
+                      }
+                      Query query = XYShape.newGeometryQuery(fieldName, relation.getLuceneRelation(), luceneGeometries);

Contributor

craigtaverner Jan 31, 2024

Another nice simpliciation!


          Code review updates

70a7bb8

* Removed unused code.
* Changed spelling
* Optimized away the no-op version of double[] quantization

craigtaverner approved these changes

View reviewed changes


          Change visibility

beea23c

iverase merged commit 3821745 into elastic:main

15 checks passed

iverase deleted the LuceneGeometriesUtil branch

January 31, 2024 12:59

benwtrent added a commit that referenced this pull request

* Change release version lookup to an instance method (#104902)

* Upgrade to Lucene 9.9.2 (#104753)

This commit upgrades to Lucene 9.9.2.

* Improve `CANNOT_REBALANCE_CAN_ALLOCATE` explanation (#104904)

Clarify that in this situation there is a rebalancing move that would
improve the cluster balance, but there's some reason why rebalancing is
not happening. Also points at the `can_rebalance_cluster_decisions` as
well as the node-by-node decisions since the action needed could be
described in either place.

* Get from translog fails with large dense_vector (#104700)

This change fixes the engine to apply the current codec when retrieving documents from the translog.
We need to use the same codec than the main index in order to ensure that all the source data is indexable.
The internal codec treats some fields differently than the default one, for instance dense_vectors are limited to 1024 dimensions.
This PR ensures that these customizations are applied when indexing document for translog retrieval.

Closes #104639

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>

* [Connector Secrets] Add delete API endpoint (#104815)

* Add DELETE endpoint for /_connector/_secret/{id}
* Add endpoint to write_connector_secrets cluster privilege

* Merge Aggregations into InternalAggregations (#104896)

This commit merges Aggregations into InternalAggregations in order to remove the unnecessary hierarchy.

* [Profiling] Simplify cost calculation (#104816)

* [Profiling] Add the number of cores to HostMetadata

* Update AWS pricelist (remove cost_factor, add usd_per_hour)

* Switch cost calculations from 'cost_factor' to 'usd_per_hour'

* Remove superfluous CostEntry.toXContent()

* Check for Number type in CostEntry.fromSource()

* Add comment

* Retry get_from_translog during relocations (#104579)

During a promotable relocation, a `get_from_translog` sent by the
unpromotable  shard to handle a real-time get might encounter
`ShardNotFoundException` or  `IndexNotFoundException`. In these cases,
we should retry.

This is just for `GET`. I'll open a second PR for `mGET`.  The relevant
IT is in the  Stateless PR.

Relates ES-5727

* indicating fix for 8.12.1 for int8_hnsw (#104912)

* Removing the assumption from some tests that the request builder's request() method always returns the same object (#104881)

* [DOCS] Adds get setting and update settings asciidoc files to security API index (#104916)

* [DOCS] Adds get setting and update settings asciidoc files to security API index.

* [DOCS] Fixes references in docs.

* Reuse APMMeterService of APMTelemetryProvider (#104906)

* Mute more tests that tend to leak searchhits (#104922)

* ESQL: Fix SearchStats#count(String) to count values not rows (#104891)

SearchStats#count incorrectly counts the number of documents (or rows)
 in which a document appears instead of the actual number of values.
This PR fixes this by looking at the term frequency instead of the doc
 count.

Fix #104795

* Adding request source for cohere (#104926)

* Fixing a broken javadoc comment in ReindexDocumentationIT (#104930)

This fixes a javadoc comment that was broken by #104881

* Fix enabling / disabling of APM agent "recording" in APMAgentSettings (#104324)

* Add `type` parameter support, for sorting, to the Query API Key API (#104625)

This adds support for the `type` parameter, for sorting, to the Query API key API.
The type for an API Key can currently be either `rest` or `cross_cluster`.
This was overlooked in #103695 when support for the `type` parameter
was first introduced only for querying.

* Apply publish plugin to es-opensaml-security-api project (#104933)

* Support `match` for the Query API Key API (#104594)

This adds support for the `match` query type to the Query API key Information API.
Note that since string values associated to API Keys are mapped as `keywords`,
a `match` query with no analyzer parameter is effectively equivalent to a `term` query
for such fields (e.g. `name`, `username`, `realm_name`).

Relates: #101691

* [Connectors API] Relax strict response parsing for get/list operations (#104909)

* Limit concurrent shards per node for ESQL (#104832)

Today, we allow ESQL to execute against an unlimited number of shards 
concurrently on each node. This can lead to cases where we open and hold
too many shards, equivalent to opening too many file descriptors or
using too much memory for FieldInfos in ValuesSourceReaderOperator.

This change limits the number of concurrent shards to 10 per node. This 
number was chosen based on the _search API, which limits it to 5.
Besides the primary reason stated above, this change has other
implications:

We might execute fewer shards for queries with LIMIT only, leading to 
scenarios where we execute only some high-priority shards then stop. 
For now, we don't have a partial reduce at the node level, but if we
introduce one in the future, it might not be as efficient as executing
all shards at the same time.  There are pauses between batches because
batches are executed sequentially one by one.  However, I believe the
performance of queries executing against many shards (after can_match)
is less important than resiliency.

Closes #103666

* [DOCS] Support for nested functions in ES|QL STATS...BY (#104788)

* Document nested expressions for stats

* More docs

* Apply suggestions from review

- count-distinct.asciidoc
  - Content restructured, moving the section about approximate counts to end of doc.

- count.asciidoc
  - Clarified that omitting the `expression` parameter in `COUNT` is equivalent to `COUNT(*)`, which counts the number of rows.

- percentile.asciidoc
  - Moved the note about `PERCENTILE` being approximate and non-deterministic to end of doc.

- stats.asciidoc
  - Clarified the `STATS` command
  -  Added a note indicating that individual `null` values are skipped during aggregation

* Comment out mentioning a buggy behavior

* Update sum with inline function example, update test file

* Fix typo

* Delete line

* Simplify wording

* Fix conflict fix typo

---------

Co-authored-by: Liam Thompson <leemthompo@gmail.com>
Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>

* [ML] Passing input type through to cohere request (#104781)

* Pushing input type through to cohere request

* switching logic to allow request to always override

* Fixing failure

* Removing getModelId calls

* Addressing feedback

* Switching to enumset

* [Transform] Unmute 2 remaining continuous tests: HistogramGroupByIT and TermsGroupByIT (#104898)

* Adding ActionRequestLazyBuilder implementation of RequestBuilder (#104927)

This introduces a second implementation of RequestBuilder (#104778). As opposed
to ActionRequestBuilder, ActionRequestLazyBuilder does not create its request
until the request() method is called, and does not hold onto that request (so each
call to request() gets a new request instance).
This PR also updates BulkRequestBuilder to inherit from ActionRequestLazyBuilder
as an example of its use.

* Update versions to skip after backport to 8.12 (#104953)

* Update/Cleanup references to old tracing.apm.* legacy settings in favor of the telemetry.* settings (#104917)

* Exclude tests that do not work in a mixed cluster scenario (#104935)

* ES|QL: Improve type validation in aggs for UNSIGNED_LONG and better support for VERSION (#104911)

* [Connector API] Make update configuration action non-additive (#104615)

* Save allocating enum values array in two hot spots (#104952)

Our readEnum code instantiates/clones enum value arrays on read.
Normally, this doesn't matter much but the two spots adjusted here are
visibly hot during bulk indexing, causing GBs of allocations during e.g.
the http_logs indexing run.

* ESQL: Correct out-of-range filter pushdowns (#99961)

Fix pushed down filters for binary comparisons that compare a
byte/short/int/long with an out of range value, like
WHERE some_int_field < 1E300.

* [DOCS] Dense vector element type should be float for OpenAI (#104966)

* Fix test assertions (#104963)

* Move functions that generate lucene geometries under a utility class (#104928)

We have functions that generate lucene geometries scattered in different places of the code. This commit moves 
everything under a utility class.

* fixing index versions

---------

Co-authored-by: Simon Cooper <simon.cooper@elastic.co>
Co-authored-by: Chris Hegarty <62058229+ChrisHegarty@users.noreply.github.com>
Co-authored-by: David Turner <david.turner@elastic.co>
Co-authored-by: Jim Ferenczi <jim.ferenczi@elastic.co>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Co-authored-by: Navarone Feekery <13634519+navarone-feekery@users.noreply.github.com>
Co-authored-by: Ignacio Vera <ivera@apache.org>
Co-authored-by: Tim Rühsen <tim.ruehsen@gmx.de>
Co-authored-by: Pooya Salehi <pxsalehi@users.noreply.github.com>
Co-authored-by: Keith Massey <keith.massey@elastic.co>
Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
Co-authored-by: Moritz Mack <mmack@apache.org>
Co-authored-by: Costin Leau <costin@users.noreply.github.com>
Co-authored-by: Jonathan Buttner <56361221+jonathan-buttner@users.noreply.github.com>
Co-authored-by: Albert Zaharovits <albert.zaharovits@elastic.co>
Co-authored-by: Mark Vieira <portugee@gmail.com>
Co-authored-by: Jedr Blaszyk <jedrazb@gmail.com>
Co-authored-by: Nhat Nguyen <nhat.nguyen@elastic.co>
Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
Co-authored-by: Liam Thompson <leemthompo@gmail.com>
Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
Co-authored-by: Przemysław Witek <przemyslaw.witek@elastic.co>
Co-authored-by: Joe Gallo <joe.gallo@elastic.co>
Co-authored-by: Lorenzo Dematté <lorenzo.dematte@elastic.co>
Co-authored-by: Luigi Dell'Aquila <luigi.dellaquila@gmail.com>
Co-authored-by: Armin Braun <me@obrown.io>
Co-authored-by: Alexander Spies <alexander.spies@elastic.co>
Co-authored-by: David Kyle <david.kyle@elastic.co>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/Geo >refactoring Team:Analytics v8.13.0