Skip to content

Conversation

@gmjehovich
Copy link
Contributor

Summary

Adds validation to the rename_replacement parameter in the snapshot restore API to prevent resource exhaustion.

Changes

  • Added validation in RestoreSnapshotRequest.validate() to limit rename_replacement UTF-8 byte length to MAX_INDEX_NAME_BYTES (255 bytes)
  • Added integration test testRenameReplacementTooLongRejected() to verify the validation works correctly
  • Imported MAX_INDEX_NAME_BYTES from MetadataCreateIndexService to maintain consistency with index name constraints

Testing

  • Integration test confirms validation rejects overly long values
  • Test confirms valid values (≤255 bytes) are still accepted
  • Verified existing restore functionality is not affected

@gmjehovich gmjehovich added Team:Distributed Coordination Meta label for Distributed Coordination team Team:Distributed Indexing Meta label for Distributed Indexing team :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs auto-backport Automatically create backport pull requests when merged v8.19.8 v9.1.8 v9.2.2 and removed Team:Distributed Coordination Meta label for Distributed Coordination team Team:Distributed Indexing Meta label for Distributed Indexing team labels Nov 11, 2025
@gmjehovich gmjehovich marked this pull request as ready for review November 12, 2025 01:04
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

@elasticsearchmachine
Copy link
Collaborator

Hi @gmjehovich, I've created a changelog YAML for you.

@gmjehovich gmjehovich requested a review from a team November 12, 2025 01:13
Copy link
Member

@ywangd ywangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. I have a suggestion about whether we should further restrict building unnecessarily long strings for renamed index names.

Comment on lines 160 to 169
if (renameReplacement.getBytes(StandardCharsets.UTF_8).length > MAX_INDEX_NAME_BYTES) {
validationException = addValidationError(
"rename_replacement UTF-8 byte length ["
+ byteLength
+ "] exceeds maximum allowed length ["
+ MAX_INDEX_NAME_BYTES
+ "] bytes",
validationException
);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to usage of back-references and escapes, the result index name can be shorter than 255 when renameReplacement is longer than 255. So we could be preventing some legal index names in edge cases. I think it might be OK though since such long index names are very rare.

Copy link
Contributor Author

@gmjehovich gmjehovich Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I hadn't considered this, it is a good point.

It seems to me like the easiest way to get around this would be to add a multiplier to make the validation more permissive for back-references/escapes, something like

            if (renameReplacement.getBytes(StandardCharsets.UTF_8).length > MAX_INDEX_NAME_BYTES * 2) {

The multiplier itself is kind of arbitrary, but it provides some breathing room for the cases you brought up. Let me know what you think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. We could do something along that line. I'd prefer to be a bit more strict, e.g. allowing an extra buffer of 10 chars.

RestoreSnapshotRequest request = new RestoreSnapshotRequest(TimeValue.THIRTY_SECONDS, "repo", "snapshot");
request.indices("b".repeat(255));
request.renamePattern("b");
request.renameReplacement("1".repeat(10_000_000));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can random the length from 256 to 10_000_000

Comment on lines 1153 to 1160
RestoreSnapshotResponse restoreResponse = client().admin()
.cluster()
.prepareRestoreSnapshot(TEST_REQUEST_TIMEOUT, repoName, snapshotName)
.setIndices(indexName)
.setRenamePattern("b+")
.setRenameReplacement("a".repeat(255))
.setWaitForCompletion(true)
.get();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we test that a restore with rename pattern b fails expectedly, i.e. it does not OOM? It should create a replacement name of "a".repeat (255 * 255) which then fails index name validation.

I think ideally we'd change the replaceAll method call with a manual loop of find and append calls so that it bails out as soon as the string builder is too long, similar to how this safeReplace works. That way we avoid building a long string (up to 65,025) unnecessarily and then abandonned it shortly after. The long name also fills logs. The heap usage is certainly not as bad as, or even close to, the unbounded rename replacement. That said, a restore request could contain thousands of indices and there could be multiple requests running concurrently (up to 50).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a test that verifies the 255×255 scenario fails with proper index name validation rather than OOM.

Regarding the manual loop implementation similar to safeReplace, I attempted this approach but ran into issues with accurately estimating sizes for back-references ($1, $2, etc.). The conservative estimation (assuming each group could expand to the full match size) would reject valid cases where capture groups only match portions of the text. These false rejections would likely be edge cases (complex regex patterns with multiple capture groups), but they're still valid operations that would be blocked.

For now, I've kept the simpler validation that prevents the unbounded OOM cases. While the 65KB intermediate strings aren't ideal, they're bounded, and I don't believe they'll cause an OOM error (although I haven't actually tested it with 50 concurrent requests). As you noted, the heap usage is nowhere near the unbounded case this PR fixes.

How much of a concern is the current replaceAll implementation for you? The manual loop would be more efficient, but with the tradeoff that some valid rename patterns may be rejected. Happy to implement it if you think the efficiency gain is worth that tradeoff, or it could be explored more in a follow-up PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I attempted this approach but ran into issues with accurately estimating sizes for back-references ($1, $2, etc.).
with the tradeoff that some valid rename patterns may be rejected.

I don't quite follow. I was thinking something like the following change to RestoreService.java

@@ -1074,7 +1075,22 @@ public final class RestoreService implements ClusterStateApplier {
         if (prefix != null) {
             index = index.substring(prefix.length());
         }
-        renamedIndex = index.replaceAll(request.renamePattern(), request.renameReplacement());
+        final var matcher = Pattern.compile(request.renamePattern()).matcher(index);
+        var found = matcher.find();
+        if (found) {
+            final var sb = new StringBuilder();
+            do {
+                matcher.appendReplacement(sb, request.renameReplacement());
+                found = matcher.find();
+            } while (found && sb.length() <= 255);
+            if (sb.length() > 255) {
+                throw new IllegalArgumentException();
+            }
+            matcher.appendTail(sb);
+            renamedIndex = sb.toString();
+        } else {
+            renamedIndex = index;
+        }
         if (prefix != null) {
             renamedIndex = prefix + renamedIndex;
         }

Does it not work and could reject valid cases?

it could be explored more in a follow-up PR

I am OK for it to be a follow-up if you prefer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhh I see what you mean now, I think this avoids the issue I was running into... I was trying to copy the logic in safeReplace too closely, which does an estimation before actually making the string and led to that issue I described. I don't think we need to be that careful here, and this approach you provided should work without rejecting valid cases.

I will do some testing on it tomorrow.

@gmjehovich gmjehovich force-pushed the fix-restore-snapshot-oom branch from 8428e51 to bc4012e Compare November 13, 2025 23:26
Copy link
Member

@ywangd ywangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. I only had one minor comment. Thanks!

Comment on lines 1973 to 1991
private static String safeRenameIndex(String index, String renamePattern, String renameReplacement) {
final var matcher = Pattern.compile(renamePattern).matcher(index);
var found = matcher.find();
if (found) {
final var sb = new StringBuilder();
do {
matcher.appendReplacement(sb, renameReplacement);
found = matcher.find();
} while (found && sb.length() <= MAX_INDEX_NAME_BYTES);

if (sb.length() > MAX_INDEX_NAME_BYTES) {
throw new IllegalArgumentException("index name would exceed " + MAX_INDEX_NAME_BYTES + " bytes after rename");
}
matcher.appendTail(sb);
return sb.toString();
} else {
return index;
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to have an unit test for this method itself.

Copy link
Member

@ywangd ywangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Thanks for the iterations!

@gmjehovich gmjehovich merged commit 6b59792 into elastic:main Nov 18, 2025
34 checks passed
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
9.2
8.19 Commit could not be cherrypicked due to conflicts
9.1 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 137859

gmjehovich added a commit to gmjehovich/elasticsearch that referenced this pull request Nov 18, 2025
…store request (elastic#137859)

* Add length validation for rename_replacement parameter in snapshot restore

* [CI] Auto commit changes from spotless

* Update RestoreSnapshotIT.java

* Update docs/changelog/137859.yaml

* Fixup integ testing

# Conflicts:
#	server/src/internalClusterTest/java/org/elasticsearch/snapshots/RestoreSnapshotIT.java

* Fix integration tests

* Add new IT test case for long replacement

* Add buffer to rename_replacement size check; add safeRenameIndex()

* Randomize rename_replacement in testRenameReplacementNameTooLong()

* [CI] Auto commit changes from spotless

* Add unit test for safeRenameIndex()

* fix testRenameReplacementNameTooLong() to use createTestInstance()

* [CI] Auto commit changes from spotless

---------

Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
elasticsearchmachine pushed a commit that referenced this pull request Nov 19, 2025
…store request (#137859) (#138244)

* Add length validation for rename_replacement parameter in snapshot restore

* [CI] Auto commit changes from spotless

* Update RestoreSnapshotIT.java

* Update docs/changelog/137859.yaml

* Fixup integ testing

# Conflicts:
#	server/src/internalClusterTest/java/org/elasticsearch/snapshots/RestoreSnapshotIT.java

* Fix integration tests

* Add new IT test case for long replacement

* Add buffer to rename_replacement size check; add safeRenameIndex()

* Randomize rename_replacement in testRenameReplacementNameTooLong()

* [CI] Auto commit changes from spotless

* Add unit test for safeRenameIndex()

* fix testRenameReplacementNameTooLong() to use createTestInstance()

* [CI] Auto commit changes from spotless

---------

Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
@gmjehovich
Copy link
Contributor Author

💚 All backports created successfully

Status Branch Result
8.19
9.1

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

gmjehovich added a commit to gmjehovich/elasticsearch that referenced this pull request Nov 19, 2025
…store request (elastic#137859)

* Add length validation for rename_replacement parameter in snapshot restore

* [CI] Auto commit changes from spotless

* Update RestoreSnapshotIT.java

* Update docs/changelog/137859.yaml

* Fixup integ testing

# Conflicts:
#	server/src/internalClusterTest/java/org/elasticsearch/snapshots/RestoreSnapshotIT.java

* Fix integration tests

* Add new IT test case for long replacement

* Add buffer to rename_replacement size check; add safeRenameIndex()

* Randomize rename_replacement in testRenameReplacementNameTooLong()

* [CI] Auto commit changes from spotless

* Add unit test for safeRenameIndex()

* fix testRenameReplacementNameTooLong() to use createTestInstance()

* [CI] Auto commit changes from spotless

---------

Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
(cherry picked from commit 6b59792)

# Conflicts:
#	server/src/internalClusterTest/java/org/elasticsearch/snapshots/RestoreSnapshotIT.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged backport pending >bug :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs Team:Distributed Coordination Meta label for Distributed Coordination team v8.19.8 v9.1.8 v9.2.2 v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants