Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal]: Low Level Implementation Details for Geo Shape Aggregation and future Geo related Features #92

Closed
navneet1v opened this issue Jul 14, 2022 · 6 comments
Assignees
Labels
Design Design Proposal enhancement New feature or request

Comments

@navneet1v
Copy link
Collaborator

navneet1v commented Jul 14, 2022

Introduction

This proposal talks about the low level implementation details for the Geo Shape aggregations. We will be covering what are the components/classes/interfaces required for the Geo Shape aggregations and what should be the right place to put those components.

In whole proposal we will be using components, classes, interfaces terms interchangeably. As we will be focusing on the low level implementation details it is expected that reader is already aware that how indexing and query works on GeoShape in OpenSearch, what is a doc value and how it is used in the aggregation framework.

Goal

Goal of proposal is to answer the following questions.

Note: Module here refer to the different libs, modules folders provided in the OpenSearch core repo.

P0

  1. Which module we should use to write the Geo Shape Aggregation level interfaces? or should we directly go with the Geo-Spatial repo?
  2. Which module we should use to write the Value Source boiler plate code for converting Doc Values back to specific classes for GeoShape, as they have some dependency between classes already present like CoreValueSourceType etc?

P1

  1. What should be the long term vision of Geo team around what is the right place to put the code related to Geo space? The code involves Geo FieldMapper(Geo Point and Geo Shape) used for indexing and querying, Geo Aggregations(both older and newer geo aggregations both on Point and Shape) and Value Source boiler plate code for converting Doc Values back to specific classes.
  2. How we should do the migration of the code mentioned in #1 to the module we decide to move?

Assumptions

  1. The XYPoint and XY shape related features will be added in the Geo-Spatial repo and it will be keep on happening.

Why we need to answer the questions above?

The answer to these questions will help us do the following:

  1. As there will be a lot of common code between the aggregations and we will be abstracting that logic out, hence we need to make sure that we are abstracting them in the right place so that this is not a throw away work.
  2. With the release of XY Point and XY shape in the Geo-Spatial plugin, we need to deicide in long term where we want to put all our Geo related features(in a backward compatible fashion), how we want to provide them to customers.
  3. It will help us provide better estimate on the timelines, efforts for the launch of Geo Shape aggregations and the other features in the future like(GeoLineAggregation, GeoHexAggregations, Aggregations on XY Point and XY Shape).

Current Architecture

As of version 2.1 of OpenSearch the Geo related aggregations(which contains only GeoPoint aggregations) are present in the OpenSearch core repo. There are no aggregations present on the Geo Shape. The Field mapper which init the GeoShape mapper is initialized via the Geo module present in the OpenSearch repo.

We have created a new repo named Geo-Spatial which host GeoJson related features. The XY Point and XY shape indexing and querying functionalities are getting developed and will be released as part of 2.3 release(tentative). The intent of that repo was to provide and encapsulate all the geo-related features via that

New Aggregations which we are building

In the current roadmap we have 4 different aggregations that we will be implementing on the Geo Shape. Below are those aggregations.

  1. GeoBounds
  2. GeoCentroid
  3. GeoTile
  4. GeoHashGrid

To provide more context these aggregations are already implemented for the GeoPoint field. We are also working on planning on implementing the Geo Line Aggregation, GeoHex Gird(customer has already reached out for this) which are already supported in Elastic Search.

Proposed Solution

The solution is to move the Aggregations, Indexing and Query for both Geo Point and GeoShape to the Geo-Spatial plugin provided via Geo-Spatial Repo. As this is breaking changes, hence we are proposing a phased approach, which make sure that we are incrementally progressing to the final goal.

Phase 1

Step1: One by one move aggregation(4 different classes for aggregations + Tests) to the Geo modules folder in the OpenSearch core. The aggregation that we will move will be the one which we are developing over the GeoShape.

Why we need this? Or what is the Pros about it?
The code between the aggregations for GeoPoint and GeoShape is same and we can abstract common logic between GeoPoint and GeoShape rather than duplicating them.

Will there be customer Impact?
No, there will be no customer impact, as these classes are directly build with the OpenSearch core min distribution.

What will happen if we don’t do this?
We either need to copy the same code/logic from GeoPoint Aggregations to GeoShape aggregations or we will be creating more interfaces in the server folder of OpenSearch which will needs to be migrated later on.

Step2: Abstract out the common logic for the GeoShape and GeoPoint aggregation. Allow both the aggregation to depend on this logic.

Why we need this? Or what is the Pros about it?
This will abstract out the common logic and we can reuse the logic across aggregations

Will there be customer Impact?
No, as the aggregations will still be built in min distribution of OpenSearch.

Repeat the above 2 steps for each Aggregation that we are launching for GeoShape. This will make sure that we are moving towards right abstraction.

Step3: Move the FieldMappers, Queries and ValueSource related code(Both GeoShape and GeoPoint), Integration Tests etc. to the Geo module once all the Aggregations have moved to the Geo Module.

Why we need this? Or what is the Pros about it?
This will abstract out the common logic and we can reuse the logic across aggregations

Will there be customer Impact?
No, as the aggregations will still be built in min distribution of OpenSearch.

Phase 2

Step1: Build the proposal to move the Geo Module to Geo-Spatial Repo as a separate plugin or under a GeoSpatial Plugin. Analyze the impact and community feedback around what is the best way to provide the backward compatibly and seamless migration.

Why we need this? or what we want to achieve on this?
This will provide us the insights around what community wants and what extra support will be required from our side for the seamless upgrade.

Step2: Move the actual code from the Geo Module to Geo-Spatial Repo and release it via OpenSearch major version(exact version depends on various factors) release cycle as a breaking change.
On a high level this will be a simple movement of code from Geo Module to Geo-Spatial plugin. The complexity will arise around how customer will migrate. Given that this will be done in major release of OpenSearch(given that it is breaking change), we will use the feedbacks provided in step-1 to give customer all the tools for doing backward compatibility. Example:

  1. We can release a distribution of open search with Geo-spatial plugin already installed.
  2. We can build on top of the Extensions framework which is currently being worked upon OpenSearch.Issue

Below are the pros and cons of the above solution:

Pros:

  1. No change for customers till we go the phase-2 of the solution.
  2. No confusion in customer around from which version they need the GeoSpatial plugin for the geo based aggregations and when they will get the aggregations out of the box from OpenSearch min distribution.
  3. Code remains in 1 single repo always which provides easier debugging, development experience and release experience.

Cons:

  1. Till we have moved the aggregations to the Geo-Spatial repo we need to depend on the OpenSearch Core infrastructure, PR reviewers etc, which can increase the development time, as it is a fast moving repo. Seeing that we have many new aggregations to build this can have large overall impact on development timeline.
  2. Difficulties in back-port as the main and 2.x version of repo has breaking changes as compared to Geo-Spatial repo which has less or no breaking changes.

Alternatives

Alternative 1

Provide Aggregations on GeoShape as a part of Open search min distribution by directly implementing the GeoShape aggregations in the OpenSearch core itself.(Don’t move aggregations to even geo module in the core repo).
Pros:

  1. Code remains in 1 single repo always which provides easier debugging and development experience.
  2. No upgrade required for customers till phase-1.
  3. Customers will start to get out of box aggregations support on geo-shape. Plus they will have proper upgrade paths and SOP when we move the code to plugin.

Cons:

  1. Difficulties in back-port as the main and 2.x version of repo has breaking changes as compared to Geo-Spatial repo which has less or no breaking changes.
  2. We need to depend on the OpenSearch Core infrastructure, PR reviewers etc, which can increase the development time, as it is a fast moving repo. Seeing that we have many new aggregations to build this can have larger impact on development timeline.
  3. There can be potential changes required when we start implementing the aggregations over XY shape and Point.

The cons provided here can be removed working with the OpenSearch team to discuss on the dedicated reviewers and bandwidth for PRs. Also, as we move towards moving the code to geo-module we will have zero to no conflicts as apart from geo-spatial team no one will be working on that in best case. On con #3 as of now we don’t have any plan and customer use-case. Even if we do have that we can still use the common interfaces that we have created in the Geo module.

Overall the solution do have some cons which can create some delays and delayed launches but this is the most customer obsessed way.

Alternative 2

Create the new aggregations for GeoShape directly in the Geo Spatial repo, do all the refactoring required in core OpenSearch repo to avoid code duplication. As a backlog task move the Aggregations(Geo Point) to new repo and keep adding new features on Geo to the Geo-Spatial repo only.

Pros:

  1. Easy Upgrade: As fewer customers will be there who won’t have installed the Geo-Spatial plugin when we move all our aggregations to Geo-Spatial plugin.
  2. Easy to back-port and release the patch versions.
  3. Faster release cycles, reduction in PR review time, merges and lesser merge conflicts etc.

Cons:

  1. Customer Confusion: It can cause confusion to the customer to understand as the aggregations are divided in the plugins and core repo.
  2. We can end up modifying the code in core repo more often than not to support the new features and interfaces that we want to build in Geo-Spatial repo. Even small changes can end up touching more than 1 repo, which will create problems of backward compatibility, testing, developer frustration.
  3. As of now, we have no input from community whether we should move the aggregations to Geo-Spatial repo or keep them in the OpenSearch core.

Alternative 3

Create the new aggregations for GeoShape directly in the Geo Spatial repo, do all the refactoring required in core OpenSearch repo to avoid code duplication. Keep adding new features on the Geo Spatial repo only.
Pros:

  1. No upgrade required and no confusion as customer who wants to use new aggregations just need to install the right plugin.
  2. Easy to back-port and release for the patch versions.
  3. Faster release cycles, reduction in PR review time, merges and lesser merge conflicts etc.

Cons:

  1. We can end up modifying the code in core repo more often to support the new features and interfaces that we want to build in Geo-Spatial repo. Even small changes can end up touching more than 1 repo, which will increase the developer frustration for release.

Feedback Required

The above sections provide Phased Approach as the proposed solution, but we really want to focus on the Alternative-2 as well. It would be great to know community thinks about the Alternative-2 and should we do that instead of phased approach which is long and time consuming.

Appendix

Frequently Asked Question

Q: If after step 1 of phase 2 we found out that community doesn’t want yet another plugin for Geo related indexing and querying then will phase 1 effort be in vain?
No it won’t be in vain as we would have developed the interfaces that can be used in the XY aggregations, we would have carved out the geo related code to module which can evolve of its own, even when it is with the Core repo.

Useful Links:

  1. https://github.com/opensearch-project/OpenSearch/
  2. https://github.com/opensearch-project/geospatial/
  3. [RFC] - Implement Aggregations on Geo Shape Field #84
@navneet1v navneet1v added enhancement New feature or request Design Design Proposal labels Jul 14, 2022
@navneet1v navneet1v self-assigned this Jul 14, 2022
@nknize
Copy link

nknize commented Jul 15, 2022

Some quick initial feedback:

Which module we should use to write the Geo Shape Aggregation level interfaces? or should we directly go with the Geo-Spatial repo?

geo_shape is already in the core as a "primitive" type; as are the geo_* aggregations that work on geo_point only. For this reason, the Shape DocValue based ValuesSource should go into core along side the existing FieldMapper and indexing logic.

Which module we should use to write the Value Source boiler plate code for converting Doc Values back to specific classes for GeoShape, as they have some dependency between classes already present like CoreValueSourceType etc?

I think we can start by doing this in the existing FiedData package? Longer term I'd like to separate all of the core geo logic into a core geo module; but that will likely require pulling some of the aggregation framework into a separate module? (more research needed here)

The solution is to move the Aggregations, Indexing and Query for both Geo Point and GeoShape to the Geo-Spatial plugin provided via Geo-Spatial Repo.

I think we should go with Alternative 1 and keep the current aggregations and geo_ "primitives" (geo_shape and geo_point) in core and introduce the shape counterparts in the geo plugin (along w/ future enhancements like projection support). Refactoring to the geo plugins will be a breaking change for those that don't already have the geo plugin installed. Another idea would be to solicit community feedback.

@navneet1v
Copy link
Collaborator Author

Hi @nknize

I think we should go with Alternative 1 and keep the current aggregations and geo_ "primitives" (geo_shape and geo_point) in core and introduce the shape counterparts in the geo plugin (along w/ future enhancements like projection support). Refactoring to the geo plugins will be a breaking change for those that don't already have the geo plugin installed. Another idea would be to solicit community feedback.

When you that introduce the shape related counterparts in Geo plugin you mean Geo-Spatial repo right? Is this understanding correct?

@nknize
Copy link

nknize commented Jul 15, 2022

Is this understanding correct?

Yes

@navneet1v
Copy link
Collaborator Author

Is this understanding correct?

Yes

Then I guess you mean the alternative 2 not the alternative 1 in your older response. Because in alternative 1 we are keeping everything(new aggregations on the Geo_Shape) in the core itself.

Please let me know if I am missing something.

@navneet1v navneet1v pinned this issue Jul 15, 2022
@navneet1v navneet1v unpinned this issue Jul 15, 2022
@navneet1v
Copy link
Collaborator Author

Hi,
After discussing with @nknize and @vamshin we agreed on the proposed solution and agreed upon pros provided in the proposal. Here are few highlights of the discussion:

  1. We will be moving GeoPoint's aggregation to the GeoModule first to complete the phase-1 for the aggregations.
  2. The new aggregations that we will build on the GeoPoint will still go in the GeoModule. We are planning to create GeoHex Aggregation on Geo Points.
  3. The GeoShape based aggregations will go directly in the GeoModule.
  4. The aggregations on the XY Point and XY shape will be the part of the Geo Spatial Plugin.

Once we have all the aggregation in the GeoModule for the GeoShape and GeoPoint we will evaluate further on the migration plan to move the aggregations from Geo Module to the GeoSpatial repo. This will be the phase-2.

@navneet1v
Copy link
Collaborator Author

Updating the proposal point number 2 on top.
The new aggregations that we are adding will be part of GeoSpatial repo and will not be part of Geo Module folder in OpenSearch. We are calling all the older aggregations as primitive aggregations(Geobounds, geohash, geotile, geocentroid). Hence for geo_shape these should be part of geo module only.

Pros:

  1. Eventually we need to move everything to GeoSpatial so somehow the effort is saved.
  2. Its new aggregation and customers in future will have less breaking change when we move everything to GeoSpatial plugin..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Design Design Proposal enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants