Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[META] [BUG] Extension-to-OpenSearch connections have multiple bugs #782

Closed
dbwiddis opened this issue May 24, 2023 · 6 comments · Fixed by opensearch-project/OpenSearch#8029
Assignees
Labels
bug Something isn't working meta

Comments

@dbwiddis
Copy link
Member

dbwiddis commented May 24, 2023

What is the bug?

This is a consolidation of multiple bugs and attempts to fix (not yet resolved) that are creating various blockers for @scrawfor99 and @cwperks (Security), @ryanbogan and @joshpalis (Integration testing), and @dbwiddis and @vibrantvarun (performance testing).

Root problem: Extensions need the OpenSearch IP and port to create a transport connection. This is defined/available in multiple places:

  1. The "published" address used by OpenSearch when creating a connection (where it says to connect)
  2. The bind address and port used by OpenSearch (where it is listening) (what we need)
  3. The localNode address passed by OpenSearch to SDK in its initialization request, which may or may not match (1) or (2)
  4. The host and port configured in the extension's settings, which may or may not match (1), (2), or (3)

There are some chicken-and-egg issues as well:

  • The extension needs to be running first before OpenSearch so may not even know the correct host at that time
  • The extension needs the address to send transport requests (connection opened during init) and REST requests (at any later time, and may use a different address/port than transport, also unknown before run time).

How can one reproduce the bug?

Attempt to initialize extensions and establish both Transport and REST connections at other than well-known localhost ports.

Do you have any additional context?

Current status:

  1. OpenSearch communicates its local node to the SDK which might be a correct place to call back, but might not
  2. The SDK overwrites the extension settings with this new address and port without checking its validity (Update OpenSearch host and port on SDKRestClient after init #730) except caused new issues ([BUG] Single-node OpenSearch crashes when connecting to an extension on a remote node #761, [BUG] Extension Initialization invalidates opensearch host/port configurations in the extension.yml #769), prompting:
  3. If the local node address is exactly "127.0.0.1" then it doesn't overwrite the settings in the step above (Add default Opensearch host and port if init comes from localhost #763)
  4. However, in the "connect to local node as extension" it's still using the old port from the request, this got missed being overwritten in 763.
  5. However, that connect to local node as extension is also causing duplicate thread context bugs ([BUG] Duplicate Unique ID setting in OpenSearch transport #771)
  6. And all the above fixes don't handle the case where the OpenSearchNode has the correct address but the incorrect port (and the fact that we need different address/ports for Transport and REST.

What needs to be done to fix this:

  1. Consider eliminating the SDK-to-OpenSearch transport connection entirely ([PROPOSAL] Eliminate all Transport calls from Extensions to OpenSearch #767) to get rid of all these problems. Then we just need to communicate the REST address and port.
  2. Consider overriding address but not port
  3. Consider new settings to separate REST and Transport host and port
  4. Consider forcing address/port settings to not be overwritten using some sort of configurable parameter
  5. Consider finding "truth" in a different location (@saratvemulapalli noted here)
  6. Consider attempting multiple connections at all known sources until one works
@dbwiddis dbwiddis added bug Something isn't working untriaged meta labels May 24, 2023
@dbwiddis
Copy link
Member Author

Action:

@dbwiddis
Copy link
Member Author

Next immediate step: leave the SDKClient settings alone as they are needed for REST client and shouldn't have been updated using Transport. I tried to revert #730 but too many changes have happened since. For now, we can simply remove this line:

extensionsRunner.getSdkClient().updateOpenSearchNodeSettings(extensionInitRequest.getSourceNode().getAddress());

This will allow the ExtensionsInitRequestHandler to (try to) make a Transport Connection using the port information provided in the init request, while keeping the REST connections per the Extension settings, giving us the two different connection parameters.

@ryanbogan
Copy link
Member

The fix worked for @scrawfor99.

@dbwiddis
Copy link
Member Author

Reverting the problematic portion of #730 resolves #769. We are effectively maintaining the two different connection parameters:

  • Extension Settings used for ExtensionsRunner initialization contain the REST endpoint
  • The openSearchNode communicated with the initialization request contains (usually) the Transport endpoints. This still doesn't work in a single-node OpenSearch case, however. Removing the need for the openSearchNode at all is proposed in [PROPOSAL] Eliminate all Transport calls from Extensions to OpenSearch #767.

The Extension Settings solution still doesn't resolve the chicken-and-egg situation of needing to know the OpenSearch REST endpoint prior to starting the extension. To resolve this, @owaiskazi19 is working toward a solution:

  • remove the call to ExtensionsManager initialize() in Node.java
  • trigger that initialize call via a REST endpoint. For example, _extensions/discovery or _extensions/initialize or similar.
  • additionally permit adding extensions not present in extensions.yml by passing _extensions/initialize/{name} and including a body which includes the same parameters as are currently in the .yaml file for each extension.

@joshpalis
Copy link
Member

We can update the OpenSearch REST connections during extension initialization using the environment settings passed to the ExtensionsManager from Node.java here . See the http.port setting below :

{
  "client.type": "node",
  "cluster.initial_cluster_manager_nodes": [
    "runTask-0"
  ],
  "cluster.name": "runTask",
  "cluster.routing.allocation.disk.watermark.flood_stage": "1b",
  "cluster.routing.allocation.disk.watermark.high": "1b",
  "cluster.routing.allocation.disk.watermark.low": "1b",
  "discovery.initial_state_timeout": "0s",
  "discovery.seed_hosts": "127.0.0.1:9300",
  "discovery.seed_providers": "file",
  "http.port": "9200",
  "http.type.default": "netty4",
  "indices.breaker.total.use_real_memory": "false",
  "logger.org.opensearch.action.support.master": "DEBUG",
  "logger.org.opensearch.cluster.coordination": "DEBUG",
  "node.attr.shard_indexing_pressure_enabled": "true",
  "node.attr.testattr": "test",
  "node.name": "runTask-0",
  "node.portsfile": "true",
  "path.data": [
    "/local/home/jpalis/repos/integ/OpenSearch/build/testclusters/runTask-0/data"
  ],
  "path.home": "/local/home/jpalis/repos/integ/OpenSearch/distribution/archives/linux-tar/build/install/opensearch-3.0.0-SNAPSHOT",
  "path.logs": "/local/home/jpalis/repos/integ/OpenSearch/build/testclusters/runTask-0/logs",
  "path.repo": [
    "/local/home/jpalis/repos/integ/OpenSearch/build/testclusters/runTask-0/repo"
  ],
  "path.shared_data": "/local/home/jpalis/repos/integ/OpenSearch/build/testclusters/runTask-0/sharedData",
  "script.disable_max_compilations_rate": "true",
  "transport.port": "9300",
  "transport.type.default": "netty4"
}

OpenSearch defaults its http port to 9200 if not otherwise configured within the opensearch.yml via the http.port setting, so we can check if the setting is present and if not we can also default the extension REST clients to use port 9200. Sending this information to the extension as part of the ExtensionInitRequest will allow us to update the SDKRestClient and also OpenSearchAsyncClient after they have been created using the extension settings within createComponents(). Dynamically updating the rest clients during initialization should help in running integration tests for extensions, as the OpenSearch distribution spun on for integration test clusters randomizes its transport/http ports.

@dbwiddis
Copy link
Member Author

We can update the OpenSearch REST connections during extension initialization using the environment settings passed to the ExtensionsManager from Node.java here .

Nice find!

See the http.port setting below

That's great for the REST client, but you didn't mention this:

"transport.port": "9300"

This doesn't solve for all use cases but is definitely helpful in filling in parts of the puzzle.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working meta
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants