[GPU] Support for performance profiling #136021

ldematte · 2025-10-06T12:58:26Z

In order to better understand the performance characteristics of vector indexing with a GPU, this PR introduces 2 changes:

changes to KnnIndexTester (more logging, support different write buffer sizes in input, support async-profiler
more logging in the GPU codec

elasticsearchmachine · 2025-10-06T12:58:52Z

Pinging @elastic/es-search-relevance (Team:Search Relevance)

x-pack/plugin/gpu/src/main/java/org/elasticsearch/xpack/gpu/codec/ES92GpuHnswVectorsWriter.java

x-pack/plugin/gpu/src/main/java/org/elasticsearch/xpack/gpu/codec/CuVSResourceManager.java

qa/vector/src/main/java/org/elasticsearch/test/knn/KnnIndexer.java

mayya-sharipova · 2025-10-06T13:35:51Z

qa/vector/build.gradle

  if (System.getenv("DO_PROFILING") != null) {
    jvmArgs '-XX:StartFlightRecording=dumponexit=true,maxsize=250M,filename=knn.jfr,settings=profile.jfc'
  }
+  def asyncProfilerPath = System.getProperty("asyncProfiler.path", null)


I am uncertain about this part about async profiler, but I trust your expertise on this.

mayya-sharipova

@ldematte Thanks Lorenzo, makes sense. I've left some comments to address, but otherwise it looks good.

…csearch into knn-index-tester-changes

benwtrent · 2025-10-06T17:08:27Z

qa/vector/build.gradle

+  if (asyncProfilerPath != null) {
+    if (OS.current().equals(OS.MAC)) {
+      def asyncProfilerAgent = "${asyncProfilerPath}/lib/libasyncProfiler.dylib"
+      println "Using async-profiler agent ${asyncProfilerAgent}"
+      jvmArgs "-agentpath:${asyncProfilerAgent}=start,event=cpu,interval=10ms,file=${layout.buildDirectory.asFile.get()}/tmp/elasticsearch-0_%t_%p.jfr"
+    } else if (OS.current().equals(OS.LINUX)) {
+      def asyncProfilerAgent = "${asyncProfilerPath}/lib/libasyncProfiler.so"
+      println "Using async-profiler agent ${asyncProfilerAgent}"
+      jvmArgs "-agentpath:${asyncProfilerAgent}=start,event=cpu,interval=10ms,wall=50ms,file=${layout.buildDirectory.asFile.get()}/tmp/elasticsearch-0_%t_%p.jfr"
+    } else {
+      println "Ignoring 'asyncProfiler.path': not available on ${OS.current()}";
+    }
+  }


I am cool with this. However, why don't we add wall to MAC as well?

I tried it and had to back off. Looking at the error I got and at the async-profiler code, apparently only Linux has an implementation that uses perf events, which let you record both cpu time and wall time at the same time. On Mac, the engine behind is less flexible/precise, and you can have one or the other.

I'm wondering: maybe I should add an option of that, like adding a -DasyncProfiler.event=, default to cpu, but can be changed to wall so Mac users can have this choice?

benwtrent · 2025-10-06T17:09:25Z

qa/vector/src/main/java/org/elasticsearch/test/knn/KnnIndexer.java

+        iwc.setMaxBufferedDocs(IndexWriterConfig.DISABLE_AUTO_FLUSH);
+        iwc.setRAMBufferSizeMB(writerBufferSizeInMb);


We need to be careful, we should by default benchmark with ES defaults. Optimizing our benchmarks but not our production code can give a false sense of improvement

setRAMBufferSizeMB is the same default as before, but now I'm not sure it is the same as ES. I'll check.
As for the number of docs... I'll check that too. I supposed these settings are exclusive, but it might be they are "first met wins" instead.
If that's the case, I'll default to what ES has and add an input option for that too.

benwtrent · 2025-10-06T17:13:48Z

qa/vector/src/main/java/org/elasticsearch/test/knn/KnnIndexer.java

            } catch (IOException ioe) {
                throw new UncheckedIOException(ioe);
            }
+            logger.debug("Index thread times: [{}ms] read, [{}ms] add doc", readTime / 1_000_000, docAddTime / 1_000_000);


if we are going to worry about this nuance, we should simply adjust this to return the time each thread spent indexing and sum those up in the results

Additionally, we would need to separate the call to:

ConcurrentMergeScheduler cms = (ConcurrentMergeScheduler) iwc.getMergeScheduler(); cms.sync();

and maybe have a "exact index time" vs "overall index time" or something.

I added this because I was seeing weird variance on the machine I'm using (on AWS) and wanted to see if read/write performance was at the root of the issue.
I'm not sure there is value in keeping this, other than looking at the logs and say "ah, yes, this run has issues with reading, I'll treat it as an outlier" (or not, depending on what you are measuring).
But I do agree that a plain log is not the best tool for this; I'll make the change you suggest, it makes sense and it's not too much of a change.

ldematte added 3 commits October 6, 2025 11:55

Add write buffer parameter; add log for write time vs doc add time

c20fa4f

Ad async profiling; adjust log message and level; adjust gitignore

c5224b8

Logging with times for resource acquisition, flush and merge

1bc1b68

ldematte requested review from mayya-sharipova and ChrisHegarty October 6, 2025 12:58

ldematte added >non-issue auto-backport Automatically create backport pull requests when merged :Search Relevance/Vectors Vector search v9.2.1 v9.3.0 labels Oct 6, 2025

elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Oct 6, 2025

mayya-sharipova reviewed Oct 6, 2025

View reviewed changes

x-pack/plugin/gpu/src/main/java/org/elasticsearch/xpack/gpu/codec/ES92GpuHnswVectorsWriter.java Outdated Show resolved Hide resolved

mayya-sharipova reviewed Oct 6, 2025

View reviewed changes

x-pack/plugin/gpu/src/main/java/org/elasticsearch/xpack/gpu/codec/CuVSResourceManager.java Outdated Show resolved Hide resolved

mayya-sharipova reviewed Oct 6, 2025

View reviewed changes

qa/vector/src/main/java/org/elasticsearch/test/knn/KnnIndexer.java Outdated Show resolved Hide resolved

mayya-sharipova reviewed Oct 6, 2025

View reviewed changes

mayya-sharipova approved these changes Oct 6, 2025

View reviewed changes

tteofili approved these changes Oct 6, 2025

View reviewed changes

ldematte added 3 commits October 6, 2025 15:54

Merge branch 'main' into knn-index-tester-changes

9eee2a6

Fix timing scale

f034d79

Merge branch 'knn-index-tester-changes' of github.com:ldematte/elasti…

5f0f4b2

…csearch into knn-index-tester-changes

benwtrent reviewed Oct 6, 2025

View reviewed changes

benwtrent mentioned this pull request Oct 6, 2025

Adding asynchronous fetching for DirectIO directory #134803

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[GPU] Support for performance profiling #136021

[GPU] Support for performance profiling #136021

ldematte commented Oct 6, 2025

Uh oh!

elasticsearchmachine commented Oct 6, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mayya-sharipova Oct 6, 2025

Uh oh!

mayya-sharipova left a comment

Uh oh!

benwtrent Oct 6, 2025

Uh oh!

ldematte Oct 7, 2025

Uh oh!

benwtrent Oct 6, 2025

Uh oh!

ldematte Oct 7, 2025

Uh oh!

benwtrent Oct 6, 2025

Uh oh!

ldematte Oct 7, 2025 •

edited

Loading

Uh oh!

Uh oh!

		iwc.setMaxBufferedDocs(IndexWriterConfig.DISABLE_AUTO_FLUSH);
		iwc.setRAMBufferSizeMB(writerBufferSizeInMb);

[GPU] Support for performance profiling #136021

Are you sure you want to change the base?

[GPU] Support for performance profiling #136021

Conversation

ldematte commented Oct 6, 2025

Uh oh!

elasticsearchmachine commented Oct 6, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mayya-sharipova Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

mayya-sharipova left a comment

Choose a reason for hiding this comment

Uh oh!

benwtrent Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

ldematte Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

benwtrent Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

ldematte Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

benwtrent Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

ldematte Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ldematte Oct 7, 2025 •

edited

Loading