Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Low-overhead profiling #77

Open
rmuir opened this issue Jul 31, 2020 · 15 comments
Open

Low-overhead profiling #77

rmuir opened this issue Jul 31, 2020 · 15 comments

Comments

@rmuir
Copy link
Collaborator

rmuir commented Jul 31, 2020

I think the benchmark results would be more useful if we could see some basic analysis of the code under load. We could imagine summary tables, graphs, flame charts, whatever is needed to knock out the hotspots and keep them under control.

Instead of seeing just that performance dropped or improved, we might be able to reason about why: it could make things better for developers. And maybe, just maybe, we have some giant frogs boiling for years that nobody never noticed yet: you know, easy wins to improve perf that went undetected.

As a start I imagine breaking the problem down into two profiling mechamisms:

@rmuir
Copy link
Collaborator Author

rmuir commented Nov 29, 2020

linux perf usage becomes a lot easier with jdk-16 early access build. We don't need to compile/install any java agent and can just pass JVM flags such as -XX:+UnlockDiagnosticVMOptions -XX:+DumpPerfMapAtExit -XX:+PreserveFramePointer, and run the benchmark with perf record. See https://bugs.openjdk.java.net/browse/JDK-8254723

@mikemccand
Copy link
Owner

And maybe, just maybe, we have some giant frogs boiling for years that nobody never noticed yet

LOL

@mikemccand
Copy link
Owner

I chatted w/ @rmuir and he suggested we start simpler here with Java Flight Recorder (JFR). We can later upgrade to perf. Progress not perfection!

Some notes from our chat:

@mikemccand
Copy link
Owner

I have a start at this! Still iterating on it ... it produces output like this, after all 20 JVM iterations of a benchmark run, when running on wikimediumall:

CPU merged search profile for comp:
PROFILE SUMMARY from 1060297 events (total: 1M)
  tests.profile.mode=cpu
  tests.profile.count=30
  tests.profile.stacksize=1
  tests.profile.linenumbers=false
PERCENT       CPU SAMPLES   STACK
8.32%         88215         org.apache.lucene.util.packed.DirectReader$DirectPackedReader12#get()
8.03%         85123         org.apache.lucene.util.packed.DirectMonotonicReader#get()
6.41%         67966         org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#countAll()
6.14%         65140         org.apache.lucene.codecs.lucene80.Lucene80DocValuesProducer$15#binaryValue()
5.21%         55271         org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#countOneSegment()
2.88%         30504         org.apache.lucene.facet.taxonomy.IntTaxonomyFacets#increment()
2.86%         30302         java.nio.DirectByteBuffer#get()
2.72%         28800         jdk.internal.misc.Unsafe#convEndian()
2.14%         22713         org.apache.lucene.store.ByteBufferGuard#ensureValid()
2.07%         21976         java.nio.Buffer#checkIndex()
1.81%         19224         java.nio.ByteBuffer#get()
1.56%         16575         java.nio.Buffer#position()
1.45%         15367         org.apache.lucene.util.packed.DirectReader$DirectPackedReader4#get()
1.20%         12770         java.nio.Buffer#checkSegment()
1.20%         12756         org.apache.lucene.store.ByteBufferGuard#getBytes()
1.12%         11868         org.apache.lucene.store.ByteBufferIndexInput#setCurBuf()
1.02%         10787         java.nio.Buffer#nextGetIndex()
1.00%         10633         org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
0.96%         10209         org.apache.lucene.search.PhraseScorer$1#matches()
0.94%         9916          org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl#seek()
0.93%         9836          org.apache.lucene.codecs.lucene84.Lucene84PostingsReader$EverythingEnum#nextPosition()
0.83%         8758          org.apache.lucene.index.SingletonSortedSetDocValues#nextOrd()
0.82%         8688          org.apache.lucene.codecs.lucene80.Lucene80NormsProducer$3#longValue()
0.80%         8465          jdk.internal.misc.Unsafe#copyMemory()
0.77%         8188          org.apache.lucene.search.ConjunctionDISI#doNext()
0.74%         7833          org.apache.lucene.codecs.lucene80.Lucene80DocValuesProducer$DenseBinaryDocValues#advance()
0.71%         7481          org.apache.lucene.util.PriorityQueue#add()
0.66%         6981          org.apache.lucene.codecs.lucene84.Lucene84PostingsReader$EverythingEnum#skipPositions()
0.65%         6865          org.apache.lucene.search.spans.TermSpans#nextStartPosition()
0.63%         6682          org.apache.lucene.search.spans.NearSpansOrdered#stretchToOrder()

HEAP merged search profile for comp:
PROFILE SUMMARY from 28436 events (total: 25291M)
  tests.profile.mode=heap
  tests.profile.count=30
  tests.profile.stacksize=1
  tests.profile.linenumbers=false
PERCENT       HEAP SAMPLES  STACK
13.62%        3445M         org.apache.lucene.util.FixedBitSet#<init>()
6.46%         1634M         org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame#<init>()
6.16%         1557M         java.util.AbstractList#iterator()
5.14%         1299M         org.apache.lucene.util.BytesRef#<init>()
5.12%         1294M         org.apache.lucene.util.ArrayUtil#growExact()
4.15%         1048M         org.apache.lucene.util.fst.ByteSequenceOutputs#read()
4.01%         1014M         org.apache.lucene.queryparser.charstream.FastCharStream#refill()
2.84%         719M          org.apache.lucene.codecs.lucene84.Lucene84PostingsReader$BlockDocsEnum#<init>()
2.78%         702M          org.apache.lucene.search.ExactPhraseMatcher$1$1#getImpacts()
2.37%         598M          org.apache.lucene.search.ExactPhraseMatcher$1#getImpacts()
2.00%         506M          org.apache.lucene.util.DocIdSetBuilder$Buffer#<init>()
1.86%         470M          java.nio.DirectByteBufferR#duplicate()
1.85%         468M          jdk.internal.misc.Unsafe#allocateUninitializedArray()
1.85%         467M          org.apache.lucene.store.DataInput#skipBytes()
1.83%         463M          org.apache.lucene.codecs.blocktree.IntersectTermsEnumFrame#load()
1.55%         392M          org.apache.lucene.codecs.blocktree.SegmentTermsEnum#getFrame()
1.37%         347M          org.apache.lucene.codecs.lucene84.Lucene84PostingsReader$BlockImpactsDocsEnum#<init>()
1.27%         320M          org.apache.lucene.codecs.lucene84.Lucene84PostingsReader#newTermState()
1.23%         312M          java.util.ArrayList#grow()
1.22%         308M          org.apache.lucene.codecs.lucene84.ForUtil#<init>()
1.18%         298M          java.util.ArrayList#iterator()
1.18%         298M          org.apache.lucene.codecs.blocktree.SegmentTermsEnum#<init>()
1.07%         271M          org.apache.lucene.store.ByteBufferIndexInput#buildSlice()
1.07%         270M          org.apache.lucene.codecs.blocktree.IntersectTermsEnumFrame#<init>()
0.93%         236M          java.nio.DirectByteBufferR#asLongBuffer()
0.90%         226M          org.apache.lucene.codecs.lucene84.Lucene84PostingsReader$BlockImpactsPostingsEnum#<init>()
0.89%         224M          org.apache.lucene.store.ByteBufferIndexInput#newCloneInstance()
0.80%         201M          java.util.Arrays#copyOf()
0.78%         196M          java.nio.DirectByteBufferR#slice()
0.76%         191M          org.apache.lucene.codecs.blocktree.SegmentTermsEnum#getArc()

Look at that 2.72% CPU in convEndian and 2.15% in ByteBufferGuard#ensureValid!

mikemccand added a commit that referenced this issue Jan 12, 2021
@mikemccand
Copy link
Owner

I pushed JFR for searching (still need to enable for indexing too), and it ran last night for the first time: https://home.apache.org/~mikemccand/lucenebench/2021.01.12.00.03.16.html

The results are ... curious. E.g. 39% of HEAP samples were for java.lang.Integer#valueOf()!? Are we really boxing that many integers (where!).

6.4% of HEAP was jdk.internal.math.FloatingDecimal#readJavaFormatString()!? I think this is parsing String to float/double, sheesh.

~2% of CPU is spent in java.nio.ByteBuffer#get() -- are we really pulling single bytes so often!?

I will increase reported stack depth so we can get more details :)

@mikemccand
Copy link
Owner

OK I re-ran the *.jfr aggregation, asking for stacksize=10. I think we really must fix our grouping implementation to not box/unbox Integer on every hit!

PERCENT       HEAP SAMPLES  STACK
39.19%        330167M       java.lang.Integer#valueOf()
                              at org.apache.lucene.search.grouping.TermGroupSelector#advanceTo()
                              at org.apache.lucene.search.grouping.SecondPassGroupingCollector#collect()
                              at org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
                              at org.apache.lucene.search.Weight$DefaultBulkScorer#score()
                              at org.apache.lucene.search.BulkScorer#score()
                              at org.apache.lucene.search.IndexSearcher#search()
                              at org.apache.lucene.search.IndexSearcher#search()
                              at perf.SearchTask#go()
                              at perf.TaskThreads$TaskThread#run()
6.38%         53776M        jdk.internal.math.FloatingDecimal#readJavaFormatString()
                              at jdk.internal.math.FloatingDecimal#parseFloat()
                              at java.lang.Float#parseFloat()
                              at perf.VectorDictionary#parseLine()
                              at perf.VectorDictionary#<init>()
                              at perf.TaskParser#<init>()
                              at perf.SearchPerfTest#_main()
                              at perf.SearchPerfTest#main()
4.00%         33743M        org.apache.lucene.util.FixedBitSet#<init>()
                              at org.apache.lucene.util.DocIdSetBuilder#upgradeToBitSet()
                              at org.apache.lucene.util.DocIdSetBuilder#grow()
                              at org.apache.lucene.util.DocIdSetBuilder#add()
                              at org.apache.lucene.search.MultiTermQueryConstantScoreWrapper$1#rewrite()
                              at org.apache.lucene.search.MultiTermQueryConstantScoreWrapper$1#bulkScorer()
                              at org.apache.lucene.search.IndexSearcher#search()
                              at org.apache.lucene.search.IndexSearcher#search()
                              at org.apache.lucene.search.IndexSearcher#search()
                              at org.apache.lucene.search.IndexSearcher#searchAfter()
3.16%         26630M        java.lang.StringLatin1#newString()
                              at java.lang.String#substring()
                              at java.lang.String#split()
                              at java.lang.String#split()
                              at perf.VectorDictionary#parseLine()
                              at perf.VectorDictionary#<init>()
                              at perf.TaskParser#<init>()
                              at perf.SearchPerfTest#_main()
                              at perf.SearchPerfTest#main()
2.39%         20101M        java.util.Arrays#copyOf()
                              at java.util.ArrayList#grow()
                              at java.util.ArrayList#grow()
                              at java.util.ArrayList#add()
                              at java.util.ArrayList#add()
                              at java.lang.String#split()
                              at java.lang.String#split()
                              at perf.VectorDictionary#parseLine()
                              at perf.VectorDictionary#<init>()
                              at perf.TaskParser#<init>()
2.28%         19242M        java.util.Arrays#copyOfRange()
                              at java.lang.StringLatin1#newString()
                              at java.lang.String#substring()
                              at java.lang.String#split()
                              at java.lang.String#split()
                              at perf.VectorDictionary#parseLine()
                              at perf.VectorDictionary#<init>()
                              at perf.TaskParser#<init>()
                              at perf.SearchPerfTest#_main()
                              at perf.SearchPerfTest#main()
1.88%         15839M        org.apache.lucene.util.FixedBitSet#<init>()
                              at org.apache.lucene.util.DocIdSetBuilder#upgradeToBitSet()
                              at org.apache.lucene.util.DocIdSetBuilder#grow()
                              at org.apache.lucene.facet.FacetsCollector#collect()
                              at org.apache.lucene.search.MultiCollector$MultiLeafCollector#collect()
                              at org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
                              at org.apache.lucene.search.Weight$DefaultBulkScorer#score()
                              at org.apache.lucene.search.BulkScorer#score()
                              at org.apache.lucene.search.IndexSearcher#search()
                              at org.apache.lucene.search.IndexSearcher#search()
1.73%         14598M        java.lang.ref.Finalizer#runFinalization()
                              at java.lang.ref.Reference$1#runFinalization()
                              at java.lang.Runtime#runFinalization()
                              at perf.SearchPerfTest#_main()
                              at perf.SearchPerfTest#main()
1.72%         14517M        org.apache.lucene.search.ExactPhraseMatcher$1$1#getImpacts()
                              at org.apache.lucene.search.MaxScoreCache#getMaxScoreForLevel()
                              at org.apache.lucene.search.ImpactsDISI#advanceTarget()
                              at org.apache.lucene.search.ImpactsDISI#advance()
                              at org.apache.lucene.search.ImpactsDISI#nextDoc()
                              at org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
                              at org.apache.lucene.search.Weight$DefaultBulkScorer#score()
                              at org.apache.lucene.search.BulkScorer#score()
                              at org.apache.lucene.search.IndexSearcher#search()
                              at org.apache.lucene.search.IndexSearcher#search()
1.26%         10575M        java.util.AbstractList#listIterator()
                              at java.util.AbstractList$SubList$1#<init>()
                              at java.util.AbstractList$SubList#listIterator()
                              at java.util.AbstractList#listIterator()
                              at java.util.AbstractList$SubList#iterator()
                              at org.apache.lucene.search.DisiPriorityQueue#iterator()
                              at org.apache.lucene.search.WANDScorer#updateMaxScores()
                              at org.apache.lucene.search.WANDScorer#updateMaxScoresIfNecessary()
                              at org.apache.lucene.search.WANDScorer#moveToNextCandidate()
                              at org.apache.lucene.search.WANDScorer$1#advance()
0.90%         7559M         java.lang.String#split()
                              at java.lang.String#split()
                              at perf.VectorDictionary#parseLine()
                              at perf.VectorDictionary#<init>()
                              at perf.TaskParser#<init>()
                              at perf.SearchPerfTest#_main()
                              at perf.SearchPerfTest#main()
0.87%         7371M         java.util.AbstractList#iterator()
                              at org.apache.lucene.search.MaxScoreCache#computeMaxScore()
                              at org.apache.lucene.search.MaxScoreCache#getMaxScoreForLevel()
                              at org.apache.lucene.search.ImpactsDISI#advanceTarget()
                              at org.apache.lucene.search.ImpactsDISI#advance()
                              at org.apache.lucene.search.WANDScorer#advanceTail()
                              at org.apache.lucene.search.WANDScorer#advanceTail()
                              at org.apache.lucene.search.WANDScorer$2#matches()
                              at org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
                              at org.apache.lucene.search.Weight$DefaultBulkScorer#score()
0.85%         7163M         org.apache.lucene.search.ExactPhraseMatcher$1#getImpacts()
                              at org.apache.lucene.search.MaxScoreCache#getMaxScoreForLevel()
                              at org.apache.lucene.search.MaxScoreCache#getSkipLevel()
                              at org.apache.lucene.search.MaxScoreCache#getSkipUpTo()
                              at org.apache.lucene.search.ImpactsDISI#advanceTarget()
                              at org.apache.lucene.search.ImpactsDISI#advance()
                              at org.apache.lucene.search.ImpactsDISI#nextDoc()
                              at org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
                              at org.apache.lucene.search.Weight$DefaultBulkScorer#score()
                              at org.apache.lucene.search.BulkScorer#score()
0.82%         6901M         org.apache.lucene.util.FixedBitSet#<init>()
                              at org.apache.lucene.search.PointRangeQuery$1$4#get()
                              at org.apache.lucene.search.PointRangeQuery$1#scorer()
                              at org.apache.lucene.search.Weight#bulkScorer()
                              at org.apache.lucene.search.IndexSearcher#search()
                              at org.apache.lucene.search.IndexSearcher#search()
                              at org.apache.lucene.search.IndexSearcher#search()
                              at org.apache.lucene.search.IndexSearcher#searchAfter()
                              at org.apache.lucene.search.IndexSearcher#search()
                              at perf.SearchTask#go()
0.78%         6565M         org.apache.lucene.util.FixedBitSet#<init>()
                              at org.apache.lucene.util.DocIdSetBuilder#upgradeToBitSet()
                              at org.apache.lucene.util.DocIdSetBuilder#grow()
                              at org.apache.lucene.search.PointRangeQuery$1$1#grow()
                              at org.apache.lucene.util.bkd.BKDReader#addAll()
                              at org.apache.lucene.util.bkd.BKDReader#intersect()
                              at org.apache.lucene.util.bkd.BKDReader#intersect()
                              at org.apache.lucene.util.bkd.BKDReader#intersect()
                              at org.apache.lucene.util.bkd.BKDReader#intersect()
                              at org.apache.lucene.util.bkd.BKDReader#intersect()
0.75%         6356M         java.lang.StringUTF16#compress()
                              at java.lang.String#<init>()
                              at java.lang.String#<init>()
                              at java.io.BufferedReader#readLine()
                              at java.io.BufferedReader#readLine()
                              at perf.VectorDictionary#<init>()
                              at perf.TaskParser#<init>()
                              at perf.SearchPerfTest#_main()
                              at perf.SearchPerfTest#main()
0.71%         6003M         java.util.AbstractList#iterator()
                              at org.apache.lucene.search.MaxScoreCache#computeMaxScore()
                              at org.apache.lucene.search.MaxScoreCache#getMaxScoreForLevel()
                              at org.apache.lucene.search.ImpactsDISI#getMaxScore()
                              at org.apache.lucene.search.TermScorer#getMaxScore()
                              at org.apache.lucene.search.WANDScorer#updateMaxScores()
                              at org.apache.lucene.search.WANDScorer#updateMaxScoresIfNecessary()
                              at org.apache.lucene.search.WANDScorer#moveToNextCandidate()
                              at org.apache.lucene.search.WANDScorer$1#advance()
                              at org.apache.lucene.search.BlockMaxConjunctionScorer$2#doNext()
0.66%         5520M         java.util.ArrayList#grow()
                              at java.util.ArrayList#grow()
                              at java.util.ArrayList#add()
                              at java.util.ArrayList#add()
                              at org.apache.lucene.search.ExactPhraseMatcher$1$1#getImpacts()
                              at org.apache.lucene.search.MaxScoreCache#getMaxScoreForLevel()
                              at org.apache.lucene.search.ImpactsDISI#advanceTarget()
                              at org.apache.lucene.search.ImpactsDISI#advance()
                              at org.apache.lucene.search.ImpactsDISI#nextDoc()
                              at org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
0.61%         5154M         java.util.concurrent.CopyOnWriteArrayList#iterator()
                              at jdk.jfr.internal.RequestEngine#run_requests()
                              at jdk.jfr.internal.RequestEngine#doPeriodic()
                              at jdk.jfr.internal.PlatformRecorder#periodicTask()
                              at jdk.jfr.internal.PlatformRecorder#lambda$startDiskMonitor$1()
                              at jdk.jfr.internal.PlatformRecorder$$Lambda$44+0x0000000800b74740.1164107853#run()
                              at java.lang.Thread#run()
0.58%         4906M         org.apache.lucene.codecs.lucene84.Lucene84PostingsReader$BlockDocsEnum#<init>()
                              at org.apache.lucene.codecs.lucene84.Lucene84PostingsReader#postings()
                              at org.apache.lucene.codecs.lucene84.Lucene84PostingsReader#impacts()
                              at org.apache.lucene.codecs.blocktree.SegmentTermsEnum#impacts()
                              at org.apache.lucene.search.TermQuery$TermWeight#scorer()
                              at org.apache.lucene.search.Weight#scorerSupplier()
                              at org.apache.lucene.search.BooleanWeight#scorerSupplier()
                              at org.apache.lucene.search.BooleanWeight#scorer()
                              at org.apache.lucene.search.Weight#bulkScorer()
                              at org.apache.lucene.search.BooleanWeight#bulkScorer()
0.54%         4514M         java.util.Arrays#asList()
                              at org.apache.lucene.search.DisiPriorityQueue#iterator()
                              at org.apache.lucene.search.WANDScorer#updateMaxScores()
                              at org.apache.lucene.search.WANDScorer#updateMaxScoresIfNecessary()
                              at org.apache.lucene.search.WANDScorer#moveToNextCandidate()
                              at org.apache.lucene.search.WANDScorer$1#advance()
                              at org.apache.lucene.search.BlockMaxConjunctionScorer$2#doNext()
                              at org.apache.lucene.search.BlockMaxConjunctionScorer$2#advance()
                              at org.apache.lucene.search.BlockMaxConjunctionScorer$2#nextDoc()
                              at org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
0.53%         4494M         java.util.AbstractList#iterator()
                              at org.apache.lucene.search.MaxScoreCache#computeMaxScore()
                              at org.apache.lucene.search.MaxScoreCache#getMaxScoreForLevel()
                              at org.apache.lucene.search.ImpactsDISI#getMaxScore()
                              at org.apache.lucene.search.TermScorer#getMaxScore()
                              at org.apache.lucene.search.BlockMaxConjunctionScorer#getMaxScore()
                              at org.apache.lucene.search.BlockMaxConjunctionScorer$2#moveToNextBlock()
                              at org.apache.lucene.search.BlockMaxConjunctionScorer$2#advanceTarget()
                              at org.apache.lucene.search.BlockMaxConjunctionScorer$2#doNext()
                              at org.apache.lucene.search.BlockMaxConjunctionScorer$2#advance()
0.52%         4416M         java.util.AbstractList#iterator()
                              at org.apache.lucene.search.MaxScoreCache#computeMaxScore()
                              at org.apache.lucene.search.MaxScoreCache#getMaxScoreForLevel()
                              at org.apache.lucene.search.ImpactsDISI#advanceTarget()
                              at org.apache.lucene.search.ImpactsDISI#advance()
                              at org.apache.lucene.search.BlockMaxConjunctionScorer$2#doNext()
                              at org.apache.lucene.search.BlockMaxConjunctionScorer$2#advance()
                              at org.apache.lucene.search.BlockMaxConjunctionScorer$2#nextDoc()
                              at org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
                              at org.apache.lucene.search.Weight$DefaultBulkScorer#score()
0.51%         4295M         perf.VectorDictionary#parseLine()
                              at perf.VectorDictionary#<init>()
                              at perf.TaskParser#<init>()
                              at perf.SearchPerfTest#_main()
                              at perf.SearchPerfTest#main()
0.50%         4241M         jdk.internal.misc.Unsafe#allocateUninitializedArray()
                              at java.lang.StringConcatHelper#newArray()
                              at java.lang.StringConcatHelper#newArrayWithSuffix()
                              at java.lang.invoke.DirectMethodHandle$Holder#invokeStatic()
                              at java.lang.invoke.LambdaForm$MH+0x0000000800bb3040.1462044018#invoke()
                              at java.lang.invoke.LambdaForm$MH+0x0000000800ba6440.1536471117#invoke()
                              at java.lang.invoke.Invokers$Holder#linkToTargetMethod()
                              at org.apache.lucene.store.IndexInput#getFullSliceDescription()
                              at org.apache.lucene.store.ByteBufferIndexInput#buildSlice()
                              at org.apache.lucene.store.ByteBufferIndexInput#slice()
0.50%         4186M         org.apache.lucene.search.ExactPhraseMatcher$1#getImpacts()
                              at org.apache.lucene.search.MaxScoreCache#getMaxScoreForLevel()
                              at org.apache.lucene.search.ImpactsDISI#advanceTarget()
                              at org.apache.lucene.search.ImpactsDISI#advance()
                              at org.apache.lucene.search.ImpactsDISI#nextDoc()
                              at org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
                              at org.apache.lucene.search.Weight$DefaultBulkScorer#score()
                              at org.apache.lucene.search.BulkScorer#score()
                              at org.apache.lucene.search.IndexSearcher#search()
                              at org.apache.lucene.search.IndexSearcher#search()
0.50%         4180M         java.util.concurrent.locks.AbstractQueuedSynchronizer#acquire()
                              at java.util.concurrent.locks.AbstractQueuedSynchronizer#acquireSharedInterruptibly()
                              at java.util.concurrent.CountDownLatch#await()
                              at perf.TaskThreads$TaskThread#run()
0.49%         4099M         org.apache.lucene.util.DocIdSetBuilder$Buffer#<init>()
                              at org.apache.lucene.util.DocIdSetBuilder#addBuffer()
                              at org.apache.lucene.util.DocIdSetBuilder#ensureBufferCapacity()
                              at org.apache.lucene.util.DocIdSetBuilder#grow()
                              at org.apache.lucene.facet.FacetsCollector#collect()
                              at org.apache.lucene.search.MultiCollector$MultiLeafCollector#collect()
                              at org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
                              at org.apache.lucene.search.Weight$DefaultBulkScorer#score()
                              at org.apache.lucene.search.BulkScorer#score()
                              at org.apache.lucene.search.IndexSearcher#search()
0.48%         4072M         perf.SearchPerfTest#_main()
                              at perf.SearchPerfTest#main()
0.44%         3723M         org.apache.lucene.search.ExactPhraseMatcher$1#getImpacts()
                              at org.apache.lucene.search.ImpactsDISI#advanceShallow()
                              at org.apache.lucene.search.ImpactsDISI#advanceTarget()
                              at org.apache.lucene.search.ImpactsDISI#advance()
                              at org.apache.lucene.search.ImpactsDISI#nextDoc()
                              at org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
                              at org.apache.lucene.search.Weight$DefaultBulkScorer#score()
                              at org.apache.lucene.search.BulkScorer#score()
                              at org.apache.lucene.search.IndexSearcher#search()
                              at org.apache.lucene.search.IndexSearcher#search()

@mikemccand
Copy link
Owner

And CPU:

PERCENT       CPU SAMPLES   STACK
3.79%         232348        org.apache.lucene.util.packed.DirectMonotonicReader#get()
                              at org.apache.lucene.codecs.lucene80.Lucene80DocValuesProducer$15#binaryValue()
                              at org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#countAll()
                              at org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#<init>()
                              at perf.SearchTask#go()
                              at perf.TaskThreads$TaskThread#run()
3.55%         217460        org.apache.lucene.codecs.lucene80.Lucene80DocValuesProducer$15#binaryValue()
                              at org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#countAll()
                              at org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#<init>()
                              at perf.SearchTask#go()
                              at perf.TaskThreads$TaskThread#run()
2.70%         165557        org.apache.lucene.util.packed.DirectReader$DirectPackedReader12#get()
                              at org.apache.lucene.util.packed.DirectMonotonicReader#get()
                              at org.apache.lucene.codecs.lucene80.Lucene80DocValuesProducer$15#binaryValue()
                              at org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#countAll()
                              at org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#<init>()
                              at perf.SearchTask#go()
                              at perf.TaskThreads$TaskThread#run()
2.13%         130744        org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#countAll()
                              at org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#<init>()
                              at perf.SearchTask#go()
                              at perf.TaskThreads$TaskThread#run()
1.95%         119568        org.apache.lucene.search.grouping.TermGroupSelector#advanceTo()
                              at org.apache.lucene.search.grouping.SecondPassGroupingCollector#collect()
                              at org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
                              at org.apache.lucene.search.Weight$DefaultBulkScorer#score()
                              at org.apache.lucene.search.BulkScorer#score()
                              at org.apache.lucene.search.IndexSearcher#search()
                              at org.apache.lucene.search.IndexSearcher#search()
                              at perf.SearchTask#go()
                              at perf.TaskThreads$TaskThread#run()
1.74%         106828        org.apache.lucene.codecs.lucene84.Lucene84PostingsReader$BlockImpactsPostingsEnum#advance()
                              at org.apache.lucene.search.ConjunctionDISI#doNext()
                              at org.apache.lucene.search.ConjunctionDISI#nextDoc()
                              at org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
                              at org.apache.lucene.search.Weight$DefaultBulkScorer#score()
                              at org.apache.lucene.search.BulkScorer#score()
                              at org.apache.lucene.search.IndexSearcher#search()
                              at org.apache.lucene.search.IndexSearcher#search()
                              at org.apache.lucene.search.IndexSearcher#search()
                              at org.apache.lucene.search.IndexSearcher#searchAfter()
1.54%         94213         java.nio.ByteBuffer#get()
                              at java.nio.DirectByteBuffer#get()
                              at org.apache.lucene.store.ByteBufferGuard#getBytes()
                              at org.apache.lucene.store.ByteBufferIndexInput#readBytes()
                              at org.apache.lucene.codecs.lucene80.Lucene80DocValuesProducer$15#binaryValue()
                              at org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#countAll()
                              at org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#<init>()
                              at perf.SearchTask#go()
                              at perf.TaskThreads$TaskThread#run()
1.03%         63224         org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#countOneSegment()
                              at org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#countAll()
                              at org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#<init>()
                              at org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#<init>()
                              at perf.SearchTask#go()
                              at perf.TaskThreads$TaskThread#run()
1.02%         62342         org.apache.lucene.queries.intervals.OrderedIntervalsSource$OrderedIntervalIterator#nextInterval()
                              at org.apache.lucene.queries.intervals.IntervalFilter#nextInterval()
                              at org.apache.lucene.queries.intervals.IntervalScorer$1#matches()
                              at org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
                              at org.apache.lucene.search.Weight$DefaultBulkScorer#score()
                              at org.apache.lucene.search.BulkScorer#score()
                              at org.apache.lucene.search.IndexSearcher#search()
                              at org.apache.lucene.search.IndexSearcher#search()
                              at org.apache.lucene.search.IndexSearcher#search()
                              at org.apache.lucene.search.IndexSearcher#searchAfter()
1.02%         62230         org.apache.lucene.search.PhraseScorer$1#matches()
                              at org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
                              at org.apache.lucene.search.Weight$DefaultBulkScorer#score()
                              at org.apache.lucene.search.BulkScorer#score()
                              at org.apache.lucene.search.IndexSearcher#search()
                              at org.apache.lucene.search.IndexSearcher#search()
                              at org.apache.lucene.search.IndexSearcher#search()
                              at org.apache.lucene.search.IndexSearcher#searchAfter()
                              at org.apache.lucene.search.IndexSearcher#search()
                              at perf.SearchTask#go()
1.01%         61871         org.apache.lucene.index.SingletonSortedSetDocValues#nextDoc()
                              at org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#countOneSegment()
                              at org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#countAll()
                              at org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#<init>()
                              at org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#<init>()
                              at perf.SearchTask#go()
                              at perf.TaskThreads$TaskThread#run()
0.93%         57187         org.apache.lucene.search.ConjunctionDISI#doNext()
                              at org.apache.lucene.search.ConjunctionDISI#nextDoc()
                              at org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
                              at org.apache.lucene.search.Weight$DefaultBulkScorer#score()
                              at org.apache.lucene.search.BulkScorer#score()
                              at org.apache.lucene.search.IndexSearcher#search()
                              at org.apache.lucene.search.IndexSearcher#search()
                              at org.apache.lucene.search.IndexSearcher#search()
                              at org.apache.lucene.search.IndexSearcher#searchAfter()
                              at org.apache.lucene.search.IndexSearcher#search()
0.82%         50023         org.apache.lucene.search.grouping.TermGroupSelector#advanceTo()
                              at org.apache.lucene.search.grouping.FirstPassGroupingCollector#collect()
                              at org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
                              at org.apache.lucene.search.Weight$DefaultBulkScorer#score()
                              at org.apache.lucene.search.BulkScorer#score()
                              at org.apache.lucene.search.IndexSearcher#search()
                              at org.apache.lucene.search.IndexSearcher#search()
                              at perf.SearchTask#go()
                              at perf.TaskThreads$TaskThread#run()
0.63%         38777         org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
                              at org.apache.lucene.search.Weight$DefaultBulkScorer#score()
                              at org.apache.lucene.search.BulkScorer#score()
                              at org.apache.lucene.search.IndexSearcher#search()
                              at org.apache.lucene.search.IndexSearcher#search()
                              at perf.SearchTask#go()
                              at perf.TaskThreads$TaskThread#run()
0.62%         37725         jdk.internal.misc.Unsafe#convEndian()
                              at jdk.internal.misc.Unsafe#getShortUnaligned()
                              at java.nio.DirectByteBuffer#getShort()
                              at java.nio.DirectByteBuffer#getShort()
                              at org.apache.lucene.store.ByteBufferGuard#getShort()
                              at org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl#readShort()
                              at org.apache.lucene.util.packed.DirectReader$DirectPackedReader12#get()
                              at org.apache.lucene.util.packed.DirectMonotonicReader#get()
                              at org.apache.lucene.codecs.lucene80.Lucene80DocValuesProducer$15#binaryValue()
                              at org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#countAll()
0.59%         36204         org.apache.lucene.search.TermScorer#score()
                              at org.apache.lucene.search.ScoreCachingWrappingScorer#score()
                              at org.apache.lucene.search.FieldComparator$RelevanceComparator#compareBottom()
                              at org.apache.lucene.search.grouping.FirstPassGroupingCollector#isCompetitive()
                              at org.apache.lucene.search.grouping.FirstPassGroupingCollector#collect()
                              at org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
                              at org.apache.lucene.search.Weight$DefaultBulkScorer#score()
                              at org.apache.lucene.search.BulkScorer#score()
                              at org.apache.lucene.search.IndexSearcher#search()
                              at org.apache.lucene.search.IndexSearcher#search()
0.56%         34042         org.apache.lucene.codecs.lucene84.Lucene84PostingsReader#findFirstGreater()
                              at org.apache.lucene.codecs.lucene84.Lucene84PostingsReader$BlockImpactsPostingsEnum#advance()
                              at org.apache.lucene.search.ConjunctionDISI#doNext()
                              at org.apache.lucene.search.ConjunctionDISI#nextDoc()
                              at org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
                              at org.apache.lucene.search.Weight$DefaultBulkScorer#score()
                              at org.apache.lucene.search.BulkScorer#score()
                              at org.apache.lucene.search.IndexSearcher#search()
                              at org.apache.lucene.search.IndexSearcher#search()
                              at org.apache.lucene.search.IndexSearcher#search()
0.55%         33870         org.apache.lucene.facet.taxonomy.IntTaxonomyFacets#increment()
                              at org.apache.lucene.facet.taxonomy.IntTaxonomyFacets#increment()
                              at org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#countAll()
                              at org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#<init>()
                              at perf.SearchTask#go()
                              at perf.TaskThreads$TaskThread#run()
0.54%         33294         org.apache.lucene.store.ByteBufferGuard#ensureValid()
                              at org.apache.lucene.store.ByteBufferGuard#getShort()
                              at org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl#readShort()
                              at org.apache.lucene.util.packed.DirectReader$DirectPackedReader12#get()
                              at org.apache.lucene.util.packed.DirectMonotonicReader#get()
                              at org.apache.lucene.codecs.lucene80.Lucene80DocValuesProducer$15#binaryValue()
                              at org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#countAll()
                              at org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#<init>()
                              at perf.SearchTask#go()
                              at perf.TaskThreads$TaskThread#run()
0.53%         32211         org.apache.lucene.search.ScoreCachingWrappingScorer#score()
                              at org.apache.lucene.search.FieldComparator$RelevanceComparator#compareBottom()
                              at org.apache.lucene.search.grouping.FirstPassGroupingCollector#isCompetitive()
                              at org.apache.lucene.search.grouping.FirstPassGroupingCollector#collect()
                              at org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
                              at org.apache.lucene.search.Weight$DefaultBulkScorer#score()
                              at org.apache.lucene.search.BulkScorer#score()
                              at org.apache.lucene.search.IndexSearcher#search()
                              at org.apache.lucene.search.IndexSearcher#search()
                              at perf.SearchTask#go()
0.50%         30812         org.apache.lucene.codecs.lucene84.Lucene84PostingsReader$EverythingEnum#skipPositions()
                              at org.apache.lucene.codecs.lucene84.Lucene84PostingsReader$EverythingEnum#nextPosition()
                              at org.apache.lucene.queries.intervals.TermIntervalsSource$1#nextInterval()
                              at org.apache.lucene.queries.intervals.OrderedIntervalsSource$OrderedIntervalIterator#nextInterval()
                              at org.apache.lucene.queries.intervals.IntervalFilter#nextInterval()
                              at org.apache.lucene.queries.intervals.IntervalScorer$1#matches()
                              at org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
                              at org.apache.lucene.search.Weight$DefaultBulkScorer#score()
                              at org.apache.lucene.search.BulkScorer#score()
                              at org.apache.lucene.search.IndexSearcher#search()
0.50%         30709         org.apache.lucene.search.grouping.BlockGroupingCollector#collect()
                              at org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
                              at org.apache.lucene.search.Weight$DefaultBulkScorer#score()
                              at org.apache.lucene.search.BulkScorer#score()
                              at org.apache.lucene.search.IndexSearcher#search()
                              at org.apache.lucene.search.IndexSearcher#search()
                              at perf.SearchTask#go()
                              at perf.TaskThreads$TaskThread#run()
0.49%         30235         org.apache.lucene.search.similarities.BM25Similarity$BM25Scorer#score()
                              at org.apache.lucene.search.LeafSimScorer#score()
                              at org.apache.lucene.search.TermScorer#score()
                              at org.apache.lucene.search.ScoreCachingWrappingScorer#score()
                              at org.apache.lucene.search.FieldComparator$RelevanceComparator#compareBottom()
                              at org.apache.lucene.search.grouping.FirstPassGroupingCollector#isCompetitive()
                              at org.apache.lucene.search.grouping.FirstPassGroupingCollector#collect()
                              at org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
                              at org.apache.lucene.search.Weight$DefaultBulkScorer#score()
                              at org.apache.lucene.search.BulkScorer#score()
0.48%         29554         org.apache.lucene.search.SloppyPhraseMatcher#maxFreq()
                              at org.apache.lucene.search.PhraseScorer$1#matches()
                              at org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
                              at org.apache.lucene.search.Weight$DefaultBulkScorer#score()
                              at org.apache.lucene.search.BulkScorer#score()
                              at org.apache.lucene.search.IndexSearcher#search()
                              at org.apache.lucene.search.IndexSearcher#search()
                              at org.apache.lucene.search.IndexSearcher#search()
                              at org.apache.lucene.search.IndexSearcher#searchAfter()
                              at org.apache.lucene.search.IndexSearcher#search()
0.48%         29119         jdk.internal.misc.Unsafe#convEndian()
                              at jdk.internal.misc.Unsafe#getIntUnaligned()
                              at java.nio.DirectByteBuffer#getInt()
                              at java.nio.DirectByteBuffer#getInt()
                              at org.apache.lucene.store.ByteBufferGuard#getInt()
                              at org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl#readInt()
                              at org.apache.lucene.util.packed.DirectReader$DirectPackedReader20#get()
                              at org.apache.lucene.codecs.lucene80.Lucene80DocValuesProducer$21#ordValue()
                              at org.apache.lucene.search.grouping.TermGroupSelector#advanceTo()
                              at org.apache.lucene.search.grouping.SecondPassGroupingCollector#collect()
0.44%         26883         org.apache.lucene.util.packed.DirectReader$DirectPackedReader12#get()
                              at org.apache.lucene.codecs.lucene80.Lucene80DocValuesProducer$21#ordValue()
                              at org.apache.lucene.index.SingletonSortedSetDocValues#nextDoc()
                              at org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#countOneSegment()
                              at org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#countAll()
                              at org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#<init>()
                              at org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#<init>()
                              at perf.SearchTask#go()
                              at perf.TaskThreads$TaskThread#run()
0.44%         26770         java.nio.Buffer#position()
                              at java.nio.ByteBuffer#position()
                              at java.nio.MappedByteBuffer#position()
                              at java.nio.MappedByteBuffer#position()
                              at org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl#seek()
                              at org.apache.lucene.codecs.lucene80.Lucene80DocValuesProducer$15#binaryValue()
                              at org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#countAll()
                              at org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#<init>()
                              at perf.SearchTask#go()
                              at perf.TaskThreads$TaskThread#run()
0.43%         26243         org.apache.lucene.codecs.lucene80.Lucene80NormsProducer$3#longValue()
                              at org.apache.lucene.search.LeafSimScorer#getNormValue()
                              at org.apache.lucene.search.LeafSimScorer#score()
                              at org.apache.lucene.search.TermScorer#score()
                              at org.apache.lucene.search.BlockMaxConjunctionScorer#score()
                              at org.apache.lucene.search.TopScoreDocCollector$SimpleTopScoreDocCollector$1#collect()
                              at org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
                              at org.apache.lucene.search.Weight$DefaultBulkScorer#score()
                              at org.apache.lucene.search.BulkScorer#score()
                              at org.apache.lucene.search.IndexSearcher#search()
0.42%         25962         org.apache.lucene.index.SingletonSortedSetDocValues#nextOrd()
                              at org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#countOneSegment()
                              at org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#countAll()
                              at org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#<init>()
                              at org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#<init>()
                              at perf.SearchTask#go()
                              at perf.TaskThreads$TaskThread#run()
0.41%         25051         org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
                              at org.apache.lucene.search.Weight$DefaultBulkScorer#score()
                              at org.apache.lucene.search.BulkScorer#score()
                              at org.apache.lucene.search.IndexSearcher#search()
                              at org.apache.lucene.search.IndexSearcher#search()
                              at org.apache.lucene.search.IndexSearcher#search()
                              at org.apache.lucene.search.IndexSearcher#searchAfter()
                              at org.apache.lucene.search.IndexSearcher#search()
                              at perf.SearchTask#go()
                              at perf.TaskThreads$TaskThread#run()

mikemccand added a commit that referenced this issue Jan 12, 2021
… 1 for interactive benchmarks); also print the full command-line executed to aggregate all search JFR results
mikemccand added a commit that referenced this issue Jan 12, 2021
@msokolov
Copy link
Collaborator

msokolov commented Jan 12, 2021 via email

@mikemccand
Copy link
Owner

Good point -- that is a one-time cost, amortized over the life of a service that then handles query traffic.

Can I somehow instruct JFR to NOT profile at startup, and then trigger (from within my Java program, not via external JMC or so) for JFR to start?

@rmuir
Copy link
Collaborator Author

rmuir commented Jan 16, 2021

Mike, there are various ways to do this. For this case, maybe simply filter it out in the analysis? That way you have precise control. You can look at method / stacktrace / whatever to determine whatever logic you want, e.g. ignoring anything coming from perf.TaskParser#<init>() or whatever is appropriate here to remove ghosts.

Similar stuff is already done in unit tests for the gradle epoll thread: https://github.com/apache/lucene-solr/blob/master/buildSrc/src/main/java/org/apache/lucene/gradle/ProfileResults.java#L101

@rmuir
Copy link
Collaborator Author

rmuir commented Jan 16, 2021

Another alternative (untested, based on my understanding): If you can start the thing programmatically in the spawned jvms, then don't pass all the options as -XXStartFlightRecording= instead pass them as -XX:FlightRecorderOptions=.

In order to "start it" you then need to issue a diagnostic command. Usually done with jcmd executable, but i found some snippets here that show how to send the diagnostic with mbean: https://github.com/jmiettinen/jfr/blob/master/src/main/java/fi/eonwe/JFR.java

mikemccand added a commit that referenced this issue Jan 18, 2021
@mikemccand
Copy link
Owner

Last night's benchmarks ran with indexer profiling too:

Profiler for cpu
PROFILE SUMMARY from 471699 events (total: 471699)
  tests.profile.mode=cpu
  tests.profile.count=30
  tests.profile.stacksize=5
  tests.profile.linenumbers=false
PERCENT       CPU SAMPLES   STACK
6.72%         31682         org.apache.lucene.util.ByteBlockPool#setBytesRef()
                              at org.apache.lucene.util.BytesRefHash#equals()
                              at org.apache.lucene.util.BytesRefHash#findHash()
                              at org.apache.lucene.util.BytesRefHash#add()
                              at org.apache.lucene.index.TermsHashPerField#add()
6.63%         31278         org.apache.lucene.util.VectorUtil#dotProduct()
                              at org.apache.lucene.index.VectorValues$SearchStrategy#compare()
                              at org.apache.lucene.util.hnsw.HnswGraph#search()
                              at org.apache.lucene.util.hnsw.HnswGraphBuilder#addGraphNode()
                              at org.apache.lucene.util.hnsw.HnswGraphBuilder#build()
6.19%         29192         org.apache.lucene.analysis.standard.StandardTokenizerImpl#getNextToken()
                              at org.apache.lucene.analysis.standard.StandardTokenizer#incrementToken()
                              at org.apache.lucene.analysis.LowerCaseFilter#incrementToken()
                              at org.apache.lucene.analysis.FilteringTokenFilter#incrementToken()
                              at org.apache.lucene.index.IndexingChain$PerField#invert()
5.22%         24602         org.apache.lucene.index.TermsHashPerField#writeByte()
                              at org.apache.lucene.index.TermsHashPerField#writeVInt()
                              at org.apache.lucene.index.FreqProxTermsWriterPerField#addTerm()
                              at org.apache.lucene.index.TermsHashPerField#positionStreamSlice()
                              at org.apache.lucene.index.TermsHashPerField#add()
4.17%         19684         org.apache.lucene.util.BytesRefHash#findHash()
                              at org.apache.lucene.util.BytesRefHash#add()
                              at org.apache.lucene.index.TermsHashPerField#add()
                              at org.apache.lucene.index.IndexingChain$PerField#invert()
                              at org.apache.lucene.index.IndexingChain#processField()
2.75%         12994         java.io.BufferedOutputStream#write()
                              at org.apache.lucene.store.OutputStreamIndexOutput#writeByte()
                              at org.apache.lucene.store.RateLimitedIndexOutput#writeByte()
                              at org.apache.lucene.store.DataOutput#writeInt()
                              at org.apache.lucene.store.DataOutput#writeLong()
2.70%         12721         java.util.Arrays#binarySearch0()
                              at java.util.Arrays#binarySearch()
                              at org.apache.lucene.codecs.VectorWriter$VectorValuesMerger$MergerRandomAccess#vectorValue()
                              at org.apache.lucene.util.hnsw.HnswGraph#search()
                              at org.apache.lucene.util.hnsw.HnswGraphBuilder#addGraphNode()
2.62%         12344         org.apache.lucene.analysis.standard.StandardAnalyzer#lambda$createComponents$0()
                              at org.apache.lucene.analysis.standard.StandardAnalyzer$$Lambda$229+0x0000000800c25f98.1329879585#accept()
                              at org.apache.lucene.analysis.Analyzer$TokenStreamComponents#setReader()
                              at org.apache.lucene.analysis.Analyzer#tokenStream()
                              at org.apache.lucene.document.Field#tokenStream()
2.42%         11399         sun.nio.ch.FileDispatcherImpl#write0()
                              at sun.nio.ch.FileDispatcherImpl#write()
                              at sun.nio.ch.IOUtil#writeFromNativeBuffer()
                              at sun.nio.ch.IOUtil#write()
                              at sun.nio.ch.FileChannelImpl#write()
2.18%         10286         org.apache.lucene.index.TermsHashPerField#writeByte()
                              at org.apache.lucene.index.TermsHashPerField#writeVInt()
                              at org.apache.lucene.index.FreqProxTermsWriterPerField#writeProx()
                              at org.apache.lucene.index.FreqProxTermsWriterPerField#addTerm()
                              at org.apache.lucene.index.TermsHashPerField#positionStreamSlice()
1.92%         9072          org.apache.lucene.util.LongHeap#push()
                              at org.apache.lucene.util.hnsw.NeighborQueue#add()
                              at org.apache.lucene.util.hnsw.HnswGraph#search()
                              at org.apache.lucene.util.hnsw.HnswGraphBuilder#addGraphNode()
                              at org.apache.lucene.util.hnsw.HnswGraphBuilder#build()
1.92%         9055          org.apache.lucene.index.FreqProxTermsWriterPerField#addTerm()
                              at org.apache.lucene.index.TermsHashPerField#positionStreamSlice()
                              at org.apache.lucene.index.TermsHashPerField#add()
                              at org.apache.lucene.index.IndexingChain$PerField#invert()
                              at org.apache.lucene.index.IndexingChain#processField()
1.65%         7762          java.io.BufferedOutputStream#write()
                              at org.apache.lucene.store.OutputStreamIndexOutput#writeByte()
                              at org.apache.lucene.store.DataOutput#writeInt()
                              at org.apache.lucene.store.DataOutput#writeLong()
                              at org.apache.lucene.codecs.lucene84.ForUtil#encode()
1.57%         7390          org.apache.lucene.index.VectorValues$SearchStrategy#compare()
                              at org.apache.lucene.util.hnsw.HnswGraph#search()
                              at org.apache.lucene.util.hnsw.HnswGraphBuilder#addGraphNode()
                              at org.apache.lucene.util.hnsw.HnswGraphBuilder#build()
                              at org.apache.lucene.codecs.lucene90.Lucene90VectorWriter#writeGraph()
1.56%         7358          org.apache.lucene.util.hnsw.HnswGraph#search()
                              at org.apache.lucene.util.hnsw.HnswGraphBuilder#addGraphNode()
                              at org.apache.lucene.util.hnsw.HnswGraphBuilder#build()
                              at org.apache.lucene.codecs.lucene90.Lucene90VectorWriter#writeGraph()
                              at org.apache.lucene.codecs.lucene90.Lucene90VectorWriter#writeField()

and:

Profiler for heap
PROFILE SUMMARY from 134203 events (total: 493457M)
  tests.profile.mode=heap
  tests.profile.count=30
  tests.profile.stacksize=5
  tests.profile.linenumbers=false
PERCENT       HEAP SAMPLES  STACK
27.11%        133792M       org.apache.lucene.util.fst.BytesStore#writeByte()
                              at org.apache.lucene.util.fst.FST#<init>()
                              at org.apache.lucene.util.fst.FSTCompiler#<init>()
                              at org.apache.lucene.util.fst.FSTCompiler$Builder#build()
                              at org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$PendingBlock#compileIndex()
9.92%         48968M        perf.LineFileDocs#nextDoc()
                              at perf.IndexThreads$IndexThread#run()
7.29%         35963M        java.lang.String#<init>()
                              at java.lang.String#<init>()
                              at perf.LineFileDocs#nextDoc()
                              at perf.IndexThreads$IndexThread#run()
5.76%         28415M        org.apache.lucene.util.SparseFixedBitSet#insertLong()
                              at org.apache.lucene.util.SparseFixedBitSet#set()
                              at org.apache.lucene.util.hnsw.HnswGraph#search()
                              at org.apache.lucene.util.hnsw.HnswGraphBuilder#addGraphNode()
                              at org.apache.lucene.util.hnsw.HnswGraphBuilder#build()
5.16%         25465M        java.lang.StringUTF16#compress()
                              at java.lang.String#<init>()
                              at java.lang.String#<init>()
                              at perf.LineFileDocs#nextDoc()
                              at perf.IndexThreads$IndexThread#run()
5.08%         25073M        perf.LineFileDocs#readDocs()
                              at perf.LineFileDocs$1#run()
4.45%         21942M        org.apache.lucene.codecs.lucene84.Lucene84PostingsWriter#newTermState()
                              at org.apache.lucene.codecs.lucene84.Lucene84PostingsWriter#newTermState()
                              at org.apache.lucene.codecs.PushPostingsWriterBase#writeTerm()
                              at org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter#write()
                              at org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter#write()
4.10%         20237M        org.apache.lucene.index.FreqProxTermsWriterPerField$FreqProxPostingsArray#<init>()
                              at org.apache.lucene.index.FreqProxTermsWriterPerField$FreqProxPostingsArray#newInstance()
                              at org.apache.lucene.index.ParallelPostingsArray#grow()
                              at org.apache.lucene.index.TermsHashPerField$PostingsBytesStartArray#grow()
                              at org.apache.lucene.util.BytesRefHash#add()
2.75%         13555M        org.apache.lucene.util.ArrayUtil#growExact()
                              at org.apache.lucene.util.ArrayUtil#grow()
                              at org.apache.lucene.util.LongHeap#push()
                              at org.apache.lucene.util.hnsw.NeighborQueue#add()
                              at org.apache.lucene.util.hnsw.HnswGraph#search()
2.68%         13211M        org.apache.lucene.index.ParallelPostingsArray#<init>()
                              at org.apache.lucene.index.FreqProxTermsWriterPerField$FreqProxPostingsArray#<init>()
                              at org.apache.lucene.index.FreqProxTermsWriterPerField$FreqProxPostingsArray#newInstance()
                              at org.apache.lucene.index.ParallelPostingsArray#grow()
                              at org.apache.lucene.index.TermsHashPerField$PostingsBytesStartArray#grow()
2.02%         9959M         org.apache.lucene.util.ByteBlockPool$DirectTrackingAllocator#getByteBlock()
                              at org.apache.lucene.util.ByteBlockPool#nextBuffer()
                              at org.apache.lucene.util.ByteBlockPool#allocSlice()
                              at org.apache.lucene.index.TermsHashPerField#writeByte()
                              at org.apache.lucene.index.TermsHashPerField#writeVInt()
1.27%         6283M         org.apache.lucene.util.SparseFixedBitSet#insertBlock()
                              at org.apache.lucene.util.SparseFixedBitSet#set()
                              at org.apache.lucene.util.hnsw.HnswGraph#search()
                              at org.apache.lucene.util.hnsw.HnswGraphBuilder#addGraphNode()
                              at org.apache.lucene.util.hnsw.HnswGraphBuilder#build()
1.15%         5667M         org.apache.lucene.util.BytesRef#<init>()
                              at org.apache.lucene.store.ByteBuffersDataOutput#writeString()
                              at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter#writeField()
                              at org.apache.lucene.index.StoredFieldsConsumer#writeField()
                              at org.apache.lucene.index.IndexingChain#processField()
1.13%         5567M         org.apache.lucene.codecs.lucene80.Lucene80NormsProducer#getNorms()
                              at org.apache.lucene.codecs.PushPostingsWriterBase#writeTerm()
                              at org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter#write()
                              at org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter#write()
                              at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter#write()
1.02%         5030M         org.apache.lucene.util.SparseFixedBitSet#<init>()
                              at org.apache.lucene.util.hnsw.HnswGraph#search()
                              at org.apache.lucene.util.hnsw.HnswGraphBuilder#addGraphNode()
                              at org.apache.lucene.util.hnsw.HnswGraphBuilder#build()
                              at org.apache.lucene.codecs.lucene90.Lucene90VectorWriter#writeGraph()

It's curious that building FSTs is the top allocator (by sampling)! And, that SparseFixedBitSet.get/insertLong is such a hot spot!

I will fix nightly benchmark to include these profiler outputs in the accessible nightly log.

@gunnarmorling
Copy link

Hi, I just came across this issue, and as you were discussing JFR, I thought the JfrUnit project might be useful here. It lets you run assertions against JFR events, allowing you to identify regressions e.g. in terms of object allocations or I/O triggered by specific operations. Of course, you could also use it to assert custom JFR events (which in turn you could inject via the JMC Agent project, in case there's no suitable events matching your needs).

@mikemccand
Copy link
Owner

There are many interesting counters that you get from perf stat, e.g. running perf stat -dd ls -l:

Performance counter stats for 'ls -l':

              3.23 msec task-clock:u                     #    0.795 CPUs utilized             
                 0      context-switches:u               #    0.000 /sec                      
                 0      cpu-migrations:u                 #    0.000 /sec                      
               133      page-faults:u                    #   41.163 K/sec                     
         1,984,270      cycles:u                         #    0.614 GHz                         (2.09%)
           693,680      stalled-cycles-frontend:u        #   34.96% frontend cycles idle        (79.96%)
         2,309,178      instructions:u                   #    1.16  insn per cycle            
                                                  #    0.30  stalled cycles per insn   
           459,923      branches:u                       #  142.344 M/sec                     
            19,910      branch-misses:u                  #    4.33% of all branches           
           957,775      L1-dcache-loads:u                #  296.427 M/sec                     
            20,642      L1-dcache-load-misses:u          #    2.16% of all L1-dcache accesses   (97.91%)
   <not supported>      LLC-loads:u                                                           
   <not supported>      LLC-load-misses:u                                                     
           647,031      L1-icache-loads:u                #  200.253 M/sec                       (20.04%)
     <not counted>      L1-icache-load-misses:u                                                 (0.00%)
     <not counted>      dTLB-loads:u                                                            (0.00%)
     <not counted>      dTLB-load-misses:u                                                      (0.00%)
     <not counted>      iTLB-loads:u                                                            (0.00%)
     <not counted>      iTLB-load-misses:u                                                      (0.00%)

       0.004062988 seconds time elapsed

       0.000000000 seconds user
       0.003514000 seconds sys

Brendan Gregg has lots of cool examples of how to do this, e.g.: https://www.brendangregg.com/perf.html ... and his Netflix blog posts.

@mikemccand
Copy link
Owner

OK I made a baby step here: if perf seems to exist then we spawn the JVMs using perf stat -dd prefix. It produces fun output like this, which is saved away into nightly logs (so we could back-process in the future) but otherwise not parsed yet:

Performance counter stats for '/usr/lib/jvm/java-21-openjdk/bin/java --add-modules jdk.incubator.vector -Xmx32g -Xms32g -server -XX:+UseParallelGC -Djava.io.tmpdir=/l/tmp -XX:StartFlightRecording=dumponexit=true,maxsize=250M,settings=/l\
/util/src/python/profiling.jfc,filename=/l/logs/bench-search-atoa-base2-0.jfr -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints -classpath /l/trunk/lucene/core/build/libs/lucene-core-10.0.0-SNAPSHOT.jar:/l/trunk/lucene/sandbox/build\
/classes/java/main:/l/trunk/lucene/misc/build/classes/java/main:/l/trunk/lucene/facet/build/classes/java/main:/l/trunk/lucene/analysis/common/build/classes/java/main:/l/trunk/lucene/analysis/icu/build/classes/java/main:/l/trunk/lucene/qu\
eryparser/build/classes/java/main:/l/trunk/lucene/grouping/build/classes/java/main:/l/trunk/lucene/suggest/build/classes/java/main:/l/trunk/lucene/highlighter/build/classes/java/main:/l/trunk/lucene/codecs/build/classes/java/main:/l/trun\
k/lucene/queries/build/classes/java/main:/l/trunk/lucene/join/build/classes/java/main:/l/util/lib/HdrHistogram.jar:/l/util/build perf.SearchPerfTest -dirImpl MMapDirectory -indexPath /l/indices/wikimediumall.trunk.facets.taxonomy:Date.ta\
xonomy:Month.taxonomy:DayOfYear.taxonomy:RandomLabel.taxonomy.sortedset:Month.sortedset:DayOfYear.sortedset:RandomLabel.sortedset.Lucene90.Lucene99.dvfields.nd27.625M -facets taxonomy:Date;Date -facets taxonomy:Month;Month -facets taxono\
my:DayOfYear;DayOfYear -facets taxonomy:RandomLabel.taxonomy;RandomLabel -facets sortedset:Month;Month -facets sortedset:DayOfYear;DayOfYear -facets sortedset:RandomLabel.sortedset;RandomLabel -analyzer StandardAnalyzerNoStopWords -taskS\
ource /l/util/3992009.tasks -numConcurrentQueries 6 -taskRepeatCount 200 -field body -tasksPerCat 1 -searchConcurrency 0 -staticSeed -6719702 -seed -2580077 -similarity BM25Similarity -commit multi -hiliteImpl FastVectorHighlighter -log \
/l/logs/atoa.base2.0 -topN 100':

         30,700.97 msec task-clock:u                     #    5.749 CPUs utilized
                 0      context-switches:u               #    0.000 /sec
                 0      cpu-migrations:u                 #    0.000 /sec
            71,689      page-faults:u                    #    2.335 K/sec
   117,794,137,105      cycles:u                         #    3.837 GHz                         (47.51%)
    12,006,779,240      stalled-cycles-frontend:u        #   10.19% frontend cycles idle        (47.57%)
   139,806,618,209      instructions:u                   #    1.19  insn per cycle
                                                  #    0.09  stalled cycles per insn     (47.59%)
    29,683,678,719      branches:u                       #  966.865 M/sec                       (47.55%)
       644,231,829      branch-misses:u                  #    2.17% of all branches             (46.98%)
    61,901,518,938      L1-dcache-loads:u                #    2.016 G/sec                       (45.99%)
     1,067,103,787      L1-dcache-load-misses:u          #    1.72% of all L1-dcache accesses   (45.68%)
   <not supported>      LLC-loads:u
   <not supported>      LLC-load-misses:u
     6,984,642,008      L1-icache-loads:u                #  227.506 M/sec                       (45.96%)
        85,868,921      L1-icache-load-misses:u          #    1.23% of all L1-icache accesses   (46.19%)
       211,352,608      dTLB-loads:u                     #    6.884 M/sec                       (46.31%)
        46,297,044      dTLB-load-misses:u               #   21.91% of all dTLB cache accesses  (46.31%)
        31,526,035      iTLB-loads:u                     #    1.027 M/sec                       (46.72%)
         2,074,199      iTLB-load-misses:u               #    6.58% of all iTLB cache accesses  (47.25%)

       5.340673185 seconds time elapsed

      29.104880000 seconds user
       1.328313000 seconds sys

mikemccand added a commit that referenced this issue Jun 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants