Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[META] Split and modularize the server monolith #8110

Open
9 tasks
nknize opened this issue Jun 16, 2023 · 27 comments
Open
9 tasks

[META] Split and modularize the server monolith #8110

nknize opened this issue Jun 16, 2023 · 27 comments
Labels
distributed framework enhancement Enhancement or improvement to existing feature or request Meta Meta issue, not directly linked to a PR Roadmap:Modular Architecture Project-wide roadmap label WIP Work in progress

Comments

@nknize
Copy link
Collaborator

nknize commented Jun 16, 2023

Related modularization discuss issue (more than just JPMS support): #5910
Related JPMS issue: #1588

[WIP] This meta issue tracks the scope for refactoring the monolithic :server module into separable libraries to support JPMS and serverless or cloud native implementation extensions.

Phase 0 - JPMS Support (Eliminate top level split packages, see #1838)

Phase 1 - Decoupling to support serverless and cloud native implementation extensions

  • o.o.env - refactor classes to o.o.core.env in :libs:opensearch-core (dependency of o.o.bootstrap)
  • o.o.common.settings - refactor classes to o.o.core.common.settings in :libs:opensearch-core (dependency of o.o.env)
  • o.o.cluster.metadata - refactor select classes to a new :libs:opensearch-cluster library

Resulting libraries:

:libs:opensearch-common
:libs:opensearch-core
:libs:opensearch-cluster

relates #1838

@nknize nknize added enhancement Enhancement or improvement to existing feature or request Meta Meta issue, not directly linked to a PR WIP Work in progress untriaged labels Jun 16, 2023
@nknize nknize changed the title [META] Refactoring the server monolith [META] Split and modularize the server monolith Jun 16, 2023
@nknize
Copy link
Collaborator Author

nknize commented Jun 16, 2023

/cc @rishabh6788 this work has been largely advancing by me over the last few months and reaching some good breakthrough milestones. The Streamable classes (e.g., StreamInput, StreamOutput, Writeable, NamedWriteable) and OpenSearchException class PR is up to tease out a big part of the tightly coupled classes across the :server module into separable libraries. This unlocks a big portion of :server. The meta issue here seeks to resolve the remaining split packages across server as outlined in the #1838 issue you opened a while ago. A separate goal of #5910 is to ensure as many of the existing extensible classes are in the right library and right package name such that we can thoughtfully expose or restrict extension points to downstream plugin consumers.

If you have some time I'd love for you to jump in and help review the PRs.

@andrross
Copy link
Member

Thanks @nknize! I want to start the discussion on expediting the completion of phase 0 (elimination of packages split across jars) AND backporting it the 2.x branch. I think the current state where there is major structural divergence between main and 2.x is undesirable due to causing conflicts on backports. Given that plugins also maintain a branch developing against OpenSearch main, I think they are dealing with the same problem. Also, plugins are not binary-compatible across versions anyway: they must at a minimum update version numbers and re-compile. I think it easier for everyone to complete the structural refactoring quickly and do it across both branches (but also not do it near a release to not add any extra variables as teams are completing integration for that release).

@reta @dblock What do you think?

@reta
Copy link
Collaborator

reta commented Jun 16, 2023

I think it easier for everyone to complete the structural refactoring quickly and do it across both branches (but also not do it near a release to not add any extra variables as teams are completing integration for that release).

@andrross Agree 💯 , the elimination of packages splits would be a huge win towards getting JPMS done, though for 2.x this is going to be 100% breaking change (that goes way beyond @opensearch.internal). Would we selectively keep the structural differences between 2.x and 3.x+ ?

@nknize
Copy link
Collaborator Author

nknize commented Jun 16, 2023

Would we selectively keep the structural differences between 2.x and 3.x+ ?

I think we should look at that on a case by case basis? Let's lean into progress not perfection here so we can iterate quickly.

I also think we shouldn't have FUD around breaking java signatures and package names in minor releases at this juncture (as long as we don't do it too close to a release). The codebase isn't well defined enough for that (hence @opensearch.internal marked on the majority of the codebase). Once we have JPMS and we start defining exports in module-info.java we can use that as the mechanism to collectively invent and simplify on the java surface area we want to expose external to core and guarantee non-breaking across minors on those classes. The only thing we strictly guarantee non breaking at this point is the DSL and index encodings (which is heavily enforced by lucene anyway). #7998 is a nice next step utility for us to quickly and iteratively check what downstream plugins are affected by the change while we're working on the refactor. It doesn't mean we have to open a PR or even fix the breaking change. But it does enable a developer (like me) to quickly check the impact and include that in my PR description w/o reviewers having to do it themselves.

@andrross
Copy link
Member

I think we should look at that on a case by case basis?

I agree with a case by case basis. I think the value here of getting rid of a blocker to adopting JPMS is clear, and the "breakages" will be super simple to resolve as it is almost exclusively a matter of fixing imports (and the plugins maintaining a version against main have to resolve these issues anyway), so there's a compelling case for doing the items marked as phase 0 now.

I also agree that it is not practical to be 100% strict on not changing any Java APIs given how tightly coupled the code bases are.

@nknize
Copy link
Collaborator Author

nknize commented Jun 16, 2023

Semi related, I opened an issue for someone to add useful Java annoations we could use to enforce bwc. Not a requirement for this but another mechanism that would be helpful once we start carving out the publicly exposed java classes through module-info.java.

@dblock
Copy link
Member

dblock commented Jun 20, 2023

@reta @dblock What do you think?

The extensibility project (#2447) proposed that we first make plugins depend on an SDK reducing the N plugins : 1 core dependency tree to N plugins : 1 SDK : 1 core, first, therefore avoiding this pain (only the SDK will need to change when core changes), and we are at experimental release for 2.9, but it looks like we're not choosing to wait for that to deliver and aren't going to try to port plugins to extensions as a prerequisite. I don't have any solutions or workarounds to what the downstream dependencies are experiencing or will experience, you just can't both refactor and not cause disruption.

Given that, I would keep doing what @nknize is doing, and GA 3.0 as soon as possible to stop the pain of backports to 2.x.

The only other idea I have is to make extensions work as plugins (in-proc) so that we can still cut off the direct dependency on core. Finally, if we go the SDK route, I am not sure what additional benefits core having all this modularity brings other than to make it easier to work inside of it. Here's my original comment to the refactoring proposal along those lines: #5910 (comment).

@rishabh6788
Copy link
Contributor

/cc @rishabh6788 this work has been largely advancing by me over the last few months and reaching some good breakthrough milestones. The Streamable classes (e.g., StreamInput, StreamOutput, Writeable, NamedWriteable) and OpenSearchException class PR is up to tease out a big part of the tightly coupled classes across the :server module into separable libraries. This unlocks a big portion of :server. The meta issue here seeks to resolve the remaining split packages across server as outlined in the #1838 issue you opened a while ago. A separate goal of #5910 is to ensure as many of the existing extensible classes are in the right library and right package name such that we can thoughtfully expose or restrict extension points to downstream plugin consumers.

If you have some time I'd love for you to jump in and help review the PRs.

Tagging correct Rishabh @rishabhmaurya

@nknize
Copy link
Collaborator Author

nknize commented Jun 20, 2023

...GA 3.0 as soon as possible to stop the pain of backports to 2.x.

+1 to work towards 3.0 GA

Backport pain will always exist
Within a repo when a namespace changes.

...first make plugins depend on an SDK reducing the N plugins : 1 core dependency tree to N plugins : 1 SDK : 1 core, first, therefore avoiding this pain

A big gap in this plan is the sdk still transitively exports all of the core classes to "downstream" plugins defeating strong encapsulation provided by JPMS (which is the primary objective of phase 0). This means core isolation through the sdk is not achieved and plugins can bypass the sdk and override core logic. This is exacerbated by the sdk also itself not providing JPMS support. Opening RFCs to deprecate core mechanisms like ThreadContext is a side effect we can eliminate with JPMS support. The near term pain (reduced by good DevOps mechanisms like checkCompatibility) I think is worth it to achieve the strong encapsulation we need to make SDK truly isolate the core classes.

These efforts are far better together.

@andrross
Copy link
Member

GA 3.0 as soon as possible to stop the pain of backports to 2.x

Major versions upgrades are disruptive to all users (including non-plugin developer users). I have a hard time justifying the disruption to the community at large that a major version release brings with the refactoring changes here.

I don't have any solutions or workarounds to what the downstream dependencies are experiencing or will experience, you just can't both refactor and not cause disruption.

Assuming that downstream dependencies are maintaining a version against main and against 2.x, then they are already experiencing the disruption, and the way to resolve that disruption is to backport the refactorings and bring the branches in line with each other. Any community plugins (outside of the opensearch-project) that are only maintaining a version against the 2.x branch are blissfully unaware of these structural changes and would be disrupted by the backports. But we don't truly support semver with plugins today anyway because plugins have to make code changes and recompile for all releases. We should still be thoughtful about the "breaking" changes that we backport, but I'm trying to make the case that the changes in phase 0 here (structural that are trivial for a consumer to resolve) are worth it on balance.

@nknize
Copy link
Collaborator Author

nknize commented Jun 21, 2023

Once Phase 0 is complete we can selectively export specific packages we want downstreams to be able to import and extend. Those we don't can be refactored to an .internal namespace that's not exposed. Technically this completely removes the need for OpenSearch-SDK because the SDK is already defined and implemented through the core (libraries and modules). To make life easier we could just move those exported API classes along with existing sdk extension classes to new .sdk. sub packages in the existing core libraries. This enables us to finally provide enforceable API BWC and the breaking changes will be all but gone (not going to say completely eliminated because there may be some corner cases). This will also enable us begin working on actually supporting semver in the public API. It greatly simplifies the extensibility implementation because it reduces it further from the N plugins : 1 SDK : 1 core to N plugins : 1 core w/ "SDK"

Note too that ML Commons, common utils, and parts of security can move to core libraries further simplifying cross plugin dependencies. The one "gotcha" (blocker) is the code implemented in Kotlin should be refactored to java before bringing to core and we should enforce no Kotlin code in the core libraries.

@dblock
Copy link
Member

dblock commented Jun 21, 2023

A big gap in this plan is the sdk still transitively exports all of the core classes to "downstream" plugins defeating strong encapsulation provided by JPMS (which is the primary objective of phase 0).

This is only current implementation for build time, not at runtime, to go faster in dev. At runtime extensions do not depend on core transitively.

@dblock
Copy link
Member

dblock commented Jun 21, 2023

Once Phase 0 is complete we can selectively export specific packages we want downstreams to be able to import and extend. Those we don't can be refactored to an .internal namespace that's not exposed. Technically this completely removes the need for OpenSearch-SDK because the SDK is already defined and implemented through the core (libraries and modules).

This is misleading. You still have jar hell. The SDK doesn't just move classes around or annotate them, it introduces a proper r/w API, and allows a plugin an extension to have compatibility across multiple (major) versions, and to fully remote extensions on a separate JVM runtime that is not core and brings none of the latter along. You can do semver with an SDK instead of doing semver along with massive core.

@nknize
Copy link
Collaborator Author

nknize commented Jun 21, 2023

This is only current implementation for build time...

At build time is exactly where all of core classes are exposed such that a developer can override core logic.

At runtime extensions do not depend on core transitively.

At runtime the classloader needs core classes directly (e.g., ClusterState for SDKClusterService) without the runtime dependency you'd get a CNF exception. This is what modularity prevents.

@nknize
Copy link
Collaborator Author

nknize commented Jun 21, 2023

This is misleading. You still have jar hell.

I don't understand what jar hell has to do with this. JPMS is about not exposing packages to downstreams not about highjacking classes with the same package and classname.

...it introduces a proper r/w API,

A proper r/w API is about strong encapsulation and not allowing downstreams to access classes you don't want them to hijack at runtime. Take ThreadContext right now. Without JPMS any downstream can import ThreadContext and use its public methods, including stashContext to hijack the context. This is dangerous (unknowingly or not). Enter JPMS. In order for downstreams to even see it in their IDE the core codebase would have to explicitly exports org.opensearch.common.util.concurrent; WIthout it downstreams can't even import the class because the package is not visible (the specific error: java: package org.opensearch.common.util.concurrent is not visible! To achieve the same without JPMS would require complete logic refatoring to prevent stashing the context, (e.g., create an Immutable version, etc.etc.etc.). JPMS simplifies this encapsulation we're essentially trying to duplicate in the SDK.

In the context of an SDK I encourage anyone confused about the benefits of refactoring to support JPMS to achieve a proper r/w API with strong encapsulation to read JEP 403 and Project Jigsaw documentation.

You can do semver with an SDK instead of doing semver along with massive core.

💯 And you don't need a separate opensearch-sdk to achieve this. Nor do you need a separate SDK to achieve separating JVM processes (which doesn't alone solve the encapsulation issue).

@dblock
Copy link
Member

dblock commented Jun 21, 2023

The goals of refactoring as proposed is to avoid leaking internals into plugins by establishing a r/w API with strong encapsulation. Even when achieved, it will be difficult to have a large ecosystem of plugins for all the reasons that are not solved by refactoring.

If plugins extensions don't depend on core at all, then this benefit is not needed, and thus refactoring unnecessary for that purpose (it has other benefits).

I think we should do both, but I believe that extensions give a lot more benefits a lot quicker, and more incrementally with less disruption, as plugins can migrate to extensions gradually, extension authors don't have to work or deal with core anymore, and don't need to release with every new minor version of OpenSearch, making developing plugins/extensions a lot cheaper.

@dbwiddis
Copy link
Member

At runtime the classloader needs core classes directly (e.g., ClusterState for SDKClusterService) without the runtime dependency you'd get a CNF exception. This is what modularity prevents.

Hey @nknize just for some context, the SDK's use of ClusterState was (and still is) temporary to have a value to return for SDKClusterService.state(). However, that was identified as a huge performance impact and we (mostly @joshpalis) have been working to rewrite all using REST API calls in opensearch-project/opensearch-sdk-java#674 . We still have the cluster state request transport call but I’ve proposed removing it in opensearch-project/opensearch-sdk-java#767 and we have to figure out how to handle some calls like opensearch-project/opensearch-sdk-java#354 (which I think @joshpalis worked around when addressing 674 issue).

TLDR: we're trying to get rid of it, but we're not there yet.

@dbwiddis
Copy link
Member

If plugins extensions don't depend on core at all, then this benefit is not needed, and thus refactoring unnecessary for that purpose (it has other benefits).

@dblock In an ideal future world (thinking 3.x) this is the goal for extensions. The only commonality is intended to be request/response classes auto-generated from spec files... and probably some needed XContent parsing classes.

@nknize
Copy link
Collaborator Author

nknize commented Jun 21, 2023

If plugins extensions don't depend on core at all....

Even with opensearch-sdk the plugins extensions depend on core. You can't remove (or even minimize) that dependency without a refactor, and even still the opensearch-sdk will take a dependency on the minimal refactor and transitively expose to downstream consumers. That's what phase 1 of this meta issue is about. Refactoring critical path logic away from the dependency chain through base libraries such that downstream only takes a dependency on the base library(ies) needed to implement it's logic. For example, if I want to write a simple PCA aggregation as a plugin or extension, I have to take a dependency on opensearch-{opensearch-version} to implement my Aggregator AggregatorFactory etc. I have to do this even if it's written as an extension using opensearch-sdk. With the refactor to separate libraries all I have to do is take a dependency on opensearch-core and opensearch-aggregations and implement the aggregations boiler plate logic. With that separable library (with strong encapsulation) solved by JPMS, what value does turning that aggregation into an extension bring? I think enumerating that here with a concrete example will help identify the key mechanisms provided by the opensearch-sdk implementation.

@dblock
Copy link
Member

dblock commented Jun 21, 2023

Even with opensearch-sdk the plugins extensions depend on core.

That is simply incorrect.

You are right that to expose aggregation into the SDK without a transitive dependency a refactor in core will be required.

Re: value of extensions I invite you to read https://github.com/opensearch-project/opensearch-sdk-java#introduction.

@nknize
Copy link
Collaborator Author

nknize commented Jun 21, 2023

That is simply incorrect.

AnomalyDetection takes a dependency on core. At least it's implementation dependency is (marginally) better than opensearch-sdk's api dependency. To load the core classes needed at runtime means the classes have to be available to the class loader.. hence, "dependency". Let's take another example. AnalysisExtension which has a slew of core dependency classes, many which rely on other core classes like o.o.env.Environment

I invite you to read...

I've read it...now we're in discussion mode. The part I've always liked about extensibility is "our goal is to isolate plugin interactions with OpenSearch". This is achieved by running in a separate DiscoveryNode, but let's be clear, this is starting up a new node. So it's being achieved through isolating via another node. That's not a bad thing (though it takes some bootstrapping). I just want to ensure we don't oversell extensibility as solving the "sdk" isolation conversation. It's still taking dependencies on core. I prefer we handle this isolation in core itself and keep the "sdk" api components in the core, not in opensearch-sdk.

@rishabhmaurya
Copy link
Contributor

rishabhmaurya commented Jun 22, 2023

We probably need to address lucene split packages in server module - https://github.com/opensearch-project/OpenSearch/tree/main/server/src/main/java/org/apache/lucene as it would start failing when lucene jars are present in module path? ignore, I see it to be part of phase-0.

@dblock
Copy link
Member

dblock commented Jun 28, 2023

Start the SDK conversation in the core by defining the core API through JPMS module-info.java, not a downstream repo.

Nomenclature-wise we have opensearch-core so maybe calling it "SDK" is more confusing than anything because we created https://github.com/opensearch-project/opensearch-sdk-java and called it the "OpenSearch SDK for Java"?

@andrross andrross added the Roadmap:Modular Architecture Project-wide roadmap label label May 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
distributed framework enhancement Enhancement or improvement to existing feature or request Meta Meta issue, not directly linked to a PR Roadmap:Modular Architecture Project-wide roadmap label WIP Work in progress
Projects
None yet
Development

No branches or pull requests

9 participants