Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8310929: Optimization for Integer.toString #14699

Closed
wants to merge 20 commits into from

Conversation

wenshao
Copy link
Contributor

@wenshao wenshao commented Jun 28, 2023

Optimization for:
Integer.toString
Long.toString
StringBuilder#append(int)

Benchmark Result

sh make/devkit/createJMHBundle.sh
bash configure --with-jmh=build/jmh/jars
make test TEST="micro:java.lang.Integers.toString*" 
make test TEST="micro:java.lang.Longs.toString*" 
make test TEST="micro:java.lang.StringBuilders.toStringCharWithInt8"

1. aliyun_ecs_c8i.xlarge

  • cpu : intel xeon sapphire rapids (x64)
-Benchmark               (size)  Mode  Cnt  Score   Error  Units (baseline)
-Integers.toStringBig       500  avgt   15  6.825 ± 0.023  us/op
-Integers.toStringSmall     500  avgt   15  4.823 ± 0.023  us/op
-Integers.toStringTiny      500  avgt   15  3.878 ± 0.101  us/op

+Benchmark               (size)  Mode  Cnt  Score   Error  Units (PR Update 04 f4aa1989)
+Integers.toStringBig       500  avgt   15  6.002 ± 0.054  us/op (+13.71%)
+Integers.toStringSmall     500  avgt   15  4.025 ± 0.020  us/op (+19.82%)
+Integers.toStringTiny      500  avgt   15  3.874 ± 0.067  us/op (+0.10%)

-Benchmark            (size)  Mode  Cnt  Score   Error  Units (baseline)
-Longs.toStringBig       500  avgt   15  9.224 ± 0.021  us/op
-Longs.toStringSmall     500  avgt   15  4.621 ± 0.087  us/op

+Benchmark            (size)  Mode  Cnt  Score   Error  Units (PR Update 04 f4aa1989)
+Longs.toStringBig       500  avgt   15  7.483 ± 0.018  us/op (+23.26%)
+Longs.toStringSmall     500  avgt   15  4.020 ± 0.016  us/op (+14.95%)

-Benchmark                           Mode  Cnt     Score    Error  Units (baseline)
-StringBuilders.toStringCharWithInt8 avgt   15    89.327 ±  0.733  ns/op

+Benchmark                            Mode  Cnt   Score   Error  Units (PR Update 04 f4aa1989)
+StringBuilders.toStringCharWithInt8  avgt   15  36.639 ± 0.422  ns/op (+143.80%)

2. aliyun_ecs_c8a.xlarge

  • cpu : amd epc genoa (x64)
-Benchmark               (size)  Mode  Cnt  Score   Error  Units (baseline)
-Integers.toStringBig       500  avgt   15  6.753 ± 0.007  us/op
-Integers.toStringSmall     500  avgt   15  4.470 ± 0.005  us/op
-Integers.toStringTiny      500  avgt   15  2.764 ± 0.020  us/op

+Benchmark               (size)  Mode  Cnt  Score   Error  Units (PR Update 04 f4aa1989)
+Integers.toStringBig       500  avgt   15  5.036 ± 0.005  us/op (+34.09%)
+Integers.toStringSmall     500  avgt   15  3.491 ± 0.025  us/op (+28.04%)
+Integers.toStringTiny      500  avgt   15  2.627 ± 0.552  us/op (+5.21%)

-Benchmark            (size)  Mode  Cnt   Score   Error  Units (baseline)
-Longs.toStringBig       500  avgt   15  10.087 ± 0.016  us/op
-Longs.toStringSmall     500  avgt   15   4.231 ± 0.068  us/op

+Benchmark            (size)  Mode  Cnt  Score   Error  Units (PR Update 04 f4aa1989)
+Longs.toStringBig       500  avgt   15  8.120 ± 0.010  us/op (+24.22%)
+Longs.toStringSmall     500  avgt   15  3.352 ± 0.006  us/op (+26.22%)

-Benchmark                           Mode  Cnt     Score    Error  Units (baseline)
-StringBuilders.toStringCharWithInt8 avgt   15    94.228 ±  0.150  ns/op

+Benchmark                            Mode  Cnt   Score   Error  Units (PR Update 04 f4aa1989)
+StringBuilders.toStringCharWithInt8  avgt   15  34.766 ± 0.283  ns/op (+171.03%)

3. aliyun_ecs_c8y.xlarge

  • cpu : aliyun yitian 710 (aarch64)
-Benchmark               (size)  Mode  Cnt   Score   Error  Units (baseline)
-Integers.toStringBig       500  avgt   15  12.224 ± 0.488  us/op
-Integers.toStringSmall     500  avgt   15   7.243 ± 0.189  us/op
-Integers.toStringTiny      500  avgt   15   6.131 ± 0.158  us/op

+Benchmark               (size)  Mode  Cnt  Score   Error  Units (PR Update 04 f4aa1989)
+Integers.toStringBig       500  avgt   15  9.488 ± 0.404  us/op (+28.83%)
+Integers.toStringSmall     500  avgt   15  6.535 ± 0.342  us/op (+10.83%)
+Integers.toStringTiny      500  avgt   15  6.121 ± 0.266  us/op (+0.16%)

-Benchmark            (size)  Mode  Cnt   Score   Error  Units (baseline)
-Longs.toStringBig       500  avgt   15  15.301 ± 0.523  us/op
-Longs.toStringSmall     500  avgt   15   7.797 ± 0.396  us/op

+Benchmark            (size)  Mode  Cnt   Score   Error  Units (PR Update 04 f4aa1989)
+Longs.toStringBig       500  avgt   15  14.367 ± 0.437  us/op (+6.50%)
+Longs.toStringSmall     500  avgt   15   6.408 ± 0.271  us/op (+21.67%)

-Benchmark                           Mode  Cnt    Score    Error  Units (baseline)
-StringBuilders.toStringCharWithInt8 avgt   15   52.980 ±  0.786  ns/op

+Benchmark                            Mode  Cnt   Score   Error  Units (PR Update 04 f4aa1989)
+StringBuilders.toStringCharWithInt8  avgt   15  41.639 ± 1.412  ns/op (+27.23%)

4. MacBookPro M1 Pro

-Benchmark               (size)  Mode  Cnt   Score   Error  Units (baseline)
-Integers.toStringBig       500  avgt   15  12.023 ± 0.100  us/op
-Integers.toStringSmall     500  avgt   15   4.631 ± 0.095  us/op
-Integers.toStringTiny      500  avgt   15   2.512 ± 0.036  us/op

+Benchmark               (size)  Mode  Cnt  Score   Error  Units (PR Update 04 f4aa1989)
+Integers.toStringBig       500  avgt   15  5.170 ± 0.055  us/op (+132.55%)
+Integers.toStringSmall     500  avgt   15  3.149 ± 0.019  us/op (+47.06%)
+Integers.toStringTiny      500  avgt   15  2.685 ± 0.025  us/op (-6.44%)

-Benchmark            (size)  Mode  Cnt  Score   Error  Units (baseline)
-Longs.toStringBig       500  avgt   15  8.429 ± 0.154  us/op
-Longs.toStringSmall     500  avgt   15  4.587 ± 0.054  us/op

+Benchmark            (size)  Mode  Cnt  Score   Error  Units (PR Update 04 f4aa1989)
+Longs.toStringBig       500  avgt   15  7.587 ± 0.140  us/op (+11.09%)
+Longs.toStringSmall     500  avgt   15  2.980 ± 0.026  us/op (+53.92%)

-Benchmark                            Mode  Cnt    Score    Error  Units (baseline)
-StringBuilders.toStringCharWithInt8  avgt   15  126.624 ± 58.659  ns/op

+Benchmark                            Mode  Cnt   Score    Error  Units (PR Update 04 f4aa1989)
+StringBuilders.toStringCharWithInt8  avgt   15  41.752 ± 20.337  ns/op (+203.27%)

5. Orange Pi 5 Plus

-Benchmark               (size)  Mode  Cnt   Score   Error  Units (baseline)
-Integers.toStringBig       500  avgt   15  20.895 ± 0.245  us/op
-Integers.toStringSmall     500  avgt   15  13.042 ± 0.187  us/op
-Integers.toStringTiny      500  avgt   15   8.017 ± 0.153  us/op

+Benchmark               (size)  Mode  Cnt   Score   Error  Units (PR Update 04 f4aa1989)
+Integers.toStringBig       500  avgt   15  16.814 ± 0.276  us/op (+24.27%)
+Integers.toStringSmall     500  avgt   15  10.028 ± 0.191  us/op (+30.05%)
+Integers.toStringTiny      500  avgt   15   9.392 ± 1.476  us/op (-14.65%)

-Benchmark            (size)  Mode  Cnt   Score   Error  Units 
-Longs.toStringBig       500  avgt   15  27.222 ± 0.386  us/op
-Longs.toStringSmall     500  avgt   15  12.366 ± 0.156  us/op

+Benchmark            (size)  Mode  Cnt   Score   Error  Units (PR Update 04 f4aa1989)
+Longs.toStringBig       500  avgt   15  23.967 ± 0.336  us/op (+13.58%)
+Longs.toStringSmall     500  avgt   15   9.787 ± 0.143  us/op (+26.35%)

-Benchmark                            Mode  Cnt   Score   Error  Units (baseline)
-StringBuilders.toStringCharWithInt8  avgt   15  95.690 ± 1.582  ns/op

+Benchmark                            Mode  Cnt   Score   Error  Units (PR Update 04 f4aa1989)
+StringBuilders.toStringCharWithInt8  avgt   15  70.560 ± 1.486  ns/op (+35.61)

Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8310929: Optimization for Integer.toString (Enhancement - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/14699/head:pull/14699
$ git checkout pull/14699

Update a local copy of the PR:
$ git checkout pull/14699
$ git pull https://git.openjdk.org/jdk.git pull/14699/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 14699

View PR using the GUI difftool:
$ git pr show -t 14699

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/14699.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Jun 28, 2023

👋 Welcome back wenshao! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Jun 28, 2023
@openjdk
Copy link

openjdk bot commented Jun 28, 2023

@wenshao The following label will be automatically applied to this pull request:

  • core-libs

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the core-libs core-libs-dev@openjdk.org label Jun 28, 2023
@mlbridge
Copy link

mlbridge bot commented Jun 28, 2023

@Glavo
Copy link
Contributor

Glavo commented Jun 28, 2023

Although this method is IntrinsicCandidate, in reality it will only be optimized in the fluent-chain of StringBuilder/StringBuffer, so optimizing its Java implementation is meaningful.

CallStaticJavaNode* csj = arg->in(0)->as_CallStaticJava();
if (csj->method() != nullptr &&
csj->method()->intrinsic_id() == vmIntrinsics::_Integer_toString &&
arg->outcnt() == 1) {
// _control is the list of StringBuilder calls nodes which
// will be replaced by new String code after this optimization.
// Integer::toString() call is not part of StringBuilder calls
// chain. It could be eliminated only if its result is used
// only by this SB calls chain.
// Another limitation: it should be used only once because
// it is unknown that it is used only by this SB calls chain
// until all related SB calls nodes are collected.
assert(arg->unique_out() == cnode, "sanity");
sc->add_control(csj);
sc->push_int(csj->in(TypeFunc::Parms));
continue;
}

@Glavo
Copy link
Contributor

Glavo commented Jun 29, 2023

Can we cache results for small integers?

@wenshao
Copy link
Contributor Author

wenshao commented Jun 29, 2023

caching tinyInt result string is a good idea, it can improve performance, but it is expensive, i am not sure whether it should be added.

@wenshao
Copy link
Contributor Author

wenshao commented Jun 30, 2023

benchmark data is update, it's ready for review @rgiulietti

Comment on lines 527 to 529
charPos -= 2;
UNSAFE.putShortUnaligned(buf, Unsafe.ARRAY_BYTE_BASE_OFFSET + charPos, PACKED_DIGITS[r], false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When switching to use Unsafe, getChars should do the array bounds check in the loop of the store index.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value range of the r variable is 0-99, and the length of PACKED_DIGITS is 100, There is no need to check the array boundary here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think he means to check the charPos to ensure it is not out of bounds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see, charPos needs to do bound checks, I added assert with reference to the implementation of StringUTF16#putChar, is this safe enough?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might need to look into the code around to ensure user input cannot cause charPos to go out of bounds. If charPos is not touched by user code at all, I think assert suffices (which are enabled via -esa and is thus enabled in the jtreg test suite)

Comment on lines 555 to 527
UNSAFE.putShortUnaligned(
buf,
Unsafe.ARRAY_BYTE_BASE_OFFSET + charPos,
Integer.PACKED_DIGITS[(int)((q * 100) - i)],
false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the array bound check for the store of the characters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added assert with reference to the implementation of StringUTF16#putChar, is this safe enough?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use VarHandle here?

Comment on lines 568 to 554
charPos -= 2;
UNSAFE.putShortUnaligned(
buf,
Unsafe.ARRAY_BYTE_BASE_OFFSET + charPos,
Integer.PACKED_DIGITS[(q2 * 100) - i2],
false);
i2 = q2;
buf[--charPos] = Integer.DigitOnes[r];
buf[--charPos] = Integer.DigitTens[r];
}

// We know there are at most two digits left at this point.
buf[--charPos] = Integer.DigitOnes[-i2];
if (i2 < -9) {
buf[--charPos] = Integer.DigitTens[-i2];
charPos -= 2;
UNSAFE.putShortUnaligned(
buf,
Unsafe.ARRAY_BYTE_BASE_OFFSET + charPos,
Integer.PACKED_DIGITS[-i2],
false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace the implicit array bounds check with an explicit array index check if using Unsafe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added assert with reference to the implementation of StringUTF16#putChar, is this safe enough?

Comment on lines 1589 to 1573
charPos -= 2;
UNSAFE.putIntUnaligned(
buf,
Unsafe.ARRAY_BYTE_BASE_OFFSET + (charPos << 1),
PACKED_DIGITS_UTF16[-i]);
} else {
putChar(buf, --charPos, '0' - i);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto add explicit array bounds check when using Unsafe, especially since the method is used outside of the source file. Here and in the uses of Unsafe below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added assert with reference to the implementation of StringUTF16#putChar, is this safe enough?

@merykitty
Copy link
Member

It could be worth it to have a cache for small integers to skip the calculations altogether.

@wenshao
Copy link
Contributor Author

wenshao commented Jul 27, 2023

It could be worth it to have a cache for small integers to skip the calculations altogether.

this PR is only for calculation optimization. caching small values should be a separate PR

@bridgekeeper
Copy link

bridgekeeper bot commented Aug 24, 2023

@wenshao This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@wenshao
Copy link
Contributor Author

wenshao commented Aug 26, 2023

@AlanBateman can you help me to review this PR?

@wenshao
Copy link
Contributor Author

wenshao commented Aug 31, 2023

@cl4es can you help me to review this PR?

@hjohn
Copy link

hjohn commented Aug 31, 2023

I'm wondering if a micro benchmark like this is very realistic. They may score positive as eventually these helper tables are in the inner most cache level, but it may be a net negative for larger functions that only do integer conversion occasionally, and almost always have a cache miss when doing a conversion. The tables may also be displacing other more useful cache lines.

In my opinion, cache is a limited resource, and having a low level function use a big chunk of it may be optimal for a micro benchmark, but seems unlikely to be optimal overall. If you are only doing mass string to integer conversion, I'm sure it will do better. If you are writing out JSON that sometimes contains integers, the integer conversions may now sometimes have cache misses (in which case a non-cached based conversion would perform better), or the conversion is displacing other more useful cache lines.

@theRealAph
Copy link
Contributor

I'm wondering if a micro benchmark like this is very realistic.

Exactly so! This is almost the canonical example of the "JMH considered harmful" talk I gave recently.

@theRealAph
Copy link
Contributor

I'm wondering if a micro benchmark like this is very realistic.

Exactly so! This is almost the canonical example of the "JMH considered harmful" talk I gave recently.

The subject is a joke! Yes, I love JMH, but be very careful how you use it.

@plokhotnyuk
Copy link
Contributor

plokhotnyuk commented Aug 31, 2023

@wenshao How about of approach used in James Anhalt's algorithm?

It reduces number of multiplications (and store operations in case of writing by 4-8 byte words) but increases the total number of instructions for the routine.

Copy link
Member

@cl4es cl4es left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I share the wariness against lookup tables (and the microbenchmarks that lend them too much credence), I think the approach in this PR is mostly an incremental improvement on an existing design and should be evaluated as such. By packing two lookup tables into one we reduce the opportunity for cache misses. The new lookup table is no larger than the two existing ones, so we're not wasting memory or sacrificing density. Having it in a single table makes it less likely that the arrays will end up in different locations on the heap or in compiled code. I see mostly positives here.

The main drawback is the proliferation of Unsafe usage.

Switching out the lookup table-based algorithm for something clever without a lookup table is laudable, but comes with a new set of challenges and should be done as a follow up. Since these tables can be aggressively constant-folded into the code section by our JITs it might even turn out to be a wash.

If @RogerRiggs is happy with asserts in lieu of explicit bounds checking then I move to approve this.

@wenshao
Copy link
Contributor Author

wenshao commented Sep 1, 2023

@wenshao How about of approach used in James Anhalt's algorithm?

It reduces number of multiplications (and store operations in case of writing by 4-8 byte words) but increases the total number of instructions for the routine.

If the compiled code size is greater than 325 (FreqInlineSize), it will not be inlined and performance will slow down. This algorithm obviously greater than 325.

different algorithms perform differently on different types of tiny/small/big values.

The first commit of this PR performs better under big values. But I still use the current algorithm, with few changes, and performance can be improved in all scenarios.

@wenshao
Copy link
Contributor Author

wenshao commented Sep 6, 2023

I'd be more comfortable replacing the use of Unsafe with either the ByteArray functions or VarHandles. Using VarHandles will enable future optimizations, whereas Unsafe is a primitive tool and is brittle.

I also agree that using VarHandler is better, but using VarHandler in StringLatin1 causes exception, as follows:

final class StringLatin1 {
    private static final VarHandle SHORT = MethodHandles.byteArrayViewVarHandle(short[].class, ByteOrder.LITTLE_ENDIAN);
}
make images
Building target 'images' in configuration 'macosx-aarch64-server-release'
Compiling up to 3452 files for java.base
Updating support/src.zip
Updating images/sec-bin.zip
Optimizing the exploded image
Error occurred during initialization of VM
java.lang.ExceptionInInitializerError
	at java.lang.invoke.VarHandle.<clinit>(java.base/VarHandle.java:2246)
	at java.lang.invoke.VarHandles.byteArrayViewHandle(java.base/VarHandles.java:258)
	at java.lang.invoke.MethodHandles.byteArrayViewVarHandle(java.base/MethodHandles.java:4553)
	at java.lang.StringLatin1.<clinit>(java.base/StringLatin1.java:84)
	at java.lang.String.equals(java.base/String.java:1863)
	at java.util.ImmutableCollections$Set12.<init>(java.base/ImmutableCollections.java:797)
	at java.util.Set.of(java.base/Set.java:487)
	at jdk.internal.reflect.Reflection.<clinit>(java.base/Reflection.java:58)
	at java.security.AccessController.doPrivileged(java.base/AccessController.java:319)
	at java.lang.reflect.AccessibleObject.<clinit>(java.base/AccessibleObject.java:524)
Caused by: java.lang.NullPointerException
	at java.lang.invoke.MethodHandleStatics.<clinit>(java.base/MethodHandleStatics.java:70)
	at java.lang.invoke.VarHandle.<clinit>(java.base/VarHandle.java:2246)
	at java.lang.invoke.VarHandles.byteArrayViewHandle(java.base/VarHandles.java:258)
	at java.lang.invoke.MethodHandles.byteArrayViewVarHandle(java.base/MethodHandles.java:4553)
	at java.lang.StringLatin1.<clinit>(java.base/StringLatin1.java:84)
	at java.lang.String.equals(java.base/String.java:1863)
	at java.util.ImmutableCollections$Set12.<init>(java.base/ImmutableCollections.java:797)
	at java.util.Set.of(java.base/Set.java:487)
	at jdk.internal.reflect.Reflection.<clinit>(java.base/Reflection.java:58)
	at java.security.AccessController.doPrivileged(java.base/AccessController.java:319)
	at java.lang.reflect.AccessibleObject.<clinit>(java.base/AccessibleObject.java:524)

@merykitty
Copy link
Member

@wenshao You can use a holder class that will get initialized only when StringLatin1::getChars is called.

@wenshao
Copy link
Contributor Author

wenshao commented Sep 6, 2023

In the test of Integers.toStringBig, using ByteArrayLittleEndian is about 5% slower than using Unsafe.

  • Integer.getChars
    static int getChars(int i, int index, byte[] buf) {
        // Used by trusted callers.  Assumes all necessary bounds checks have been done by the caller.
        int q, r;
        int charPos = index;

        boolean negative = i < 0;
        if (!negative) {
            i = -i;
        }

        // Generate two digits per iteration
        while (i <= -100) {
            q = i / 100;
            r = (q * 100) - i;
            i = q;
            charPos -= 2;
            assert charPos >= 0 && charPos < buf.length : "Trusted caller missed bounds check";
            ByteArrayLittleEndian.setShort(buf, charPos, PACKED_DIGITS[r]);
        }

        // We know there are at most two digits left at this point.
        if (i < -9) {
            charPos -= 2;
            assert charPos >= 0 && charPos < buf.length : "Trusted caller missed bounds check";
            ByteArrayLittleEndian.setShort(buf, charPos, PACKED_DIGITS[-i]);
        } else {
            buf[--charPos] = (byte)('0' - i);
        }

        if (negative) {
            buf[--charPos] = (byte)'-';
        }
        return charPos;
    }
  • jmh test
make test TEST="micro:java.lang.Integers.toStringBig" 
  • result
-Benchmark             (size)  Mode  Cnt  Score   Error  Units (use Unsafe)
-Integers.toStringBig     500  avgt   15  5.152 ? 0.052  us/op

+Benchmark             (size)  Mode  Cnt  Score   Error  Units (use ByteArrayLittleEndian)
+Integers.toStringBig     500  avgt   15  5.386 ? 0.020  us/op (slower 4.5%)

If you must not use Unsafe, I will replace it with ByteArrayLittleEndian

@RogerRiggs
Copy link
Contributor

The bulk of the improvement comes from the algorithmic change; using ByteArrayLittleEndian is more robust.
Thanks for the performance numbers, though hard to compare with the originals.

@cl4es
Copy link
Member

cl4es commented Sep 6, 2023

Assuming this is the M1 numbers we're still looking at a ~120% improvement. And if there's some systemic overhead to using ByteArrayLittleEndian compared to Unsafe then that's something we might improve over time. Rewiring this to use ByteArrayLittleEndian may be the way to go for now.

@openjdk openjdk bot removed the sponsor Pull request is ready to be sponsored label Sep 6, 2023
@wenshao
Copy link
Contributor Author

wenshao commented Sep 6, 2023

The performance test results of the latest version (PR Update 20 c0f42a7c ) are as follows:

1. aliyun_ecs_c8i.xlarge

  • cpu : intel xeon sapphire rapids (x64)
-Benchmark               (size)  Mode  Cnt  Score   Error  Units (baseline)
-Integers.toStringBig       500  avgt   15  6.800 ? 0.022  us/op
-Integers.toStringSmall     500  avgt   15  4.792 ? 0.021  us/op
-Integers.toStringTiny      500  avgt   15  3.757 ? 0.081  us/op

+Benchmark               (size)  Mode  Cnt  Score   Error  Units (PR Update 20 c0f42a7c)
+Integers.toStringBig       500  avgt   15  5.894 ? 0.046  us/op (+15.37%)
+Integers.toStringSmall     500  avgt   15  4.027 ? 0.012  us/op (+18.99%)
+Integers.toStringTiny      500  avgt   15  3.491 ? 0.090  us/op (+7.61%)

-Benchmark            (size)  Mode  Cnt  Score   Error  Units (baseline)
-Longs.toStringBig       500  avgt   15  9.213 ? 0.019  us/op
-Longs.toStringSmall     500  avgt   15  4.550 ? 0.016  us/op

+Benchmark            (size)  Mode  Cnt  Score   Error  Units (PR Update 20 c0f42a7c)
+Longs.toStringBig       500  avgt   15  7.507 ? 0.011  us/op (+22.72%)
+Longs.toStringSmall     500  avgt   15  3.967 ? 0.021  us/op (+14.69%)

-Benchmark                            Mode  Cnt   Score   Error  Units (baseline)
-StringBuilders.toStringCharWithInt8  avgt   15  89.187 ? 0.236  ns/op

+Benchmark                            Mode  Cnt   Score   Error  Units (PR Update 20 c0f42a7c)
+StringBuilders.toStringCharWithInt8  avgt   15  36.125 ? 0.309  ns/op (+146.88%)

2. aliyun_ecs_c8y.xlarge

  • cpu : aliyun yitian 710 (aarch64)
-Benchmark               (size)  Mode  Cnt   Score   Error  Units (baseline)
-Integers.toStringBig       500  avgt   15  11.649 ? 0.011  us/op
-Integers.toStringSmall     500  avgt   15   6.985 ? 0.018  us/op
-Integers.toStringTiny      500  avgt   15   5.972 ? 0.013  us/op

+Benchmark               (size)  Mode  Cnt  Score   Error  Units (PR Update 20 c0f42a7c)
+Integers.toStringBig       500  avgt   15  8.957 ? 0.026  us/op (+30.05%)
+Integers.toStringSmall     500  avgt   15  6.136 ? 0.018  us/op (+13.83%)
+Integers.toStringTiny      500  avgt   15  5.753 ? 0.026  us/op (+3.80%)

-Benchmark            (size)  Mode  Cnt   Score   Error  Units (baseline)
-Longs.toStringBig       500  avgt   15  14.568 ? 0.021  us/op
-Longs.toStringSmall     500  avgt   15   7.250 ? 0.023  us/op

+Benchmark            (size)  Mode  Cnt   Score   Error  Units (PR Update 20 c0f42a7c)
+Longs.toStringBig       500  avgt   15  13.401 ? 0.012  us/op (+8.70%)
+Longs.toStringSmall     500  avgt   15   6.031 ? 0.018  us/op (+20.21%)

-Benchmark                            Mode  Cnt   Score   Error  Units (baseline)
-StringBuilders.toStringCharWithInt8  avgt   15  52.484 ? 0.534  ns/op

+Benchmark                            Mode  Cnt   Score   Error  Units (PR Update 20 c0f42a7c)
+StringBuilders.toStringCharWithInt8  avgt   15  40.410 ? 0.348  ns/op (+29.87%)

3. MacBookPro M1 Pro

-Benchmark               (size)  Mode  Cnt   Score   Error  Units (baseline)
-Integers.toStringBig       500  avgt   15  18.483 ± 2.771  us/op
-Integers.toStringSmall     500  avgt   15   4.435 ± 0.067  us/op
-Integers.toStringTiny      500  avgt   15   2.382 ± 0.063  us/op

+Benchmark               (size)  Mode  Cnt  Score   Error  Units (PR Update 20 c0f42a7c)
+Integers.toStringBig       500  avgt   15  5.392 ? 0.016  us/op (+242.78%)
+Integers.toStringSmall     500  avgt   15  3.201 ? 0.024  us/op (+38.55%)
+Integers.toStringTiny      500  avgt   15  2.141 ? 0.021  us/op (+11.25%)

-Benchmark            (size)  Mode  Cnt  Score   Error  Units (baseline)
-Longs.toStringBig       500  avgt   15  8.336 ± 0.025  us/op
-Longs.toStringSmall     500  avgt   15  4.389 ± 0.018  us/op

+Benchmark            (size)  Mode  Cnt  Score   Error  Units (PR Update 20 c0f42a7c)
+Longs.toStringBig       500  avgt   15  7.706 ? 0.015  us/op (+8.17%)
+Longs.toStringSmall     500  avgt   15  3.094 ? 0.021  us/op (+41.85%)

-Benchmark                            Mode  Cnt    Score    Error  Units (baseline)
-StringBuilders.toStringCharWithInt8  avgt   15  124.316 ± 61.017  ns/op

+Benchmark                            Mode  Cnt   Score    Error  Units (PR Update 20 c0f42a7c)
+StringBuilders.toStringCharWithInt8  avgt   15  44.497 ? 29.741  ns/op (+179.38%)

@wenshao
Copy link
Contributor Author

wenshao commented Sep 7, 2023

/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Sep 7, 2023
@openjdk
Copy link

openjdk bot commented Sep 7, 2023

@wenshao
Your change (at version c0f42a7) is now ready to be sponsored by a Committer.

Copy link
Member

@cl4es cl4es left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using ByteArrayLittleEndian cleaned up the code nicely. The overhead compared to Unsafe doesn't seem too concerning (maybe even some wins?)

@wenshao
Copy link
Contributor Author

wenshao commented Sep 7, 2023

Can it be merged now?

Copy link
Contributor

@RogerRiggs RogerRiggs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thank you for your patience.

For future performance PRs, please put the detailed performance data in a separate comment, not the description. The description is included in every email and the perf data gets out of date quickly.

@wenshao
Copy link
Contributor Author

wenshao commented Sep 8, 2023

/sponsor

@openjdk
Copy link

openjdk bot commented Sep 8, 2023

@wenshao Only Committers are allowed to sponsor changes.

@y1yang0
Copy link
Member

y1yang0 commented Sep 8, 2023

/sponsor

@openjdk
Copy link

openjdk bot commented Sep 8, 2023

Going to push as commit 4b43c25.
Since your change was applied there have been 68 commits pushed to the master branch:

  • 111ecdb: 8268829: Provide an optimized way to walk the stack with Class object only
  • 716201c: 8314935: Shenandoah: Unable to throw OOME on back-to-back Full GCs
  • 4c6d7fc: 8315795: runtime/Safepoint/TestAbortVMOnSafepointTimeout.java fails after JDK-8305507
  • 7e7ab6e: 8315877: ProblemList vmTestbase/nsk/jvmti/InterruptThread/intrpthrd003/TestDescription.java on macosx-aarch64
  • 0c865a7: 8315637: JDK-8314249 broke libgraal
  • 683672c: 8292692: Move MethodCounters inline functions out of method.hpp
  • 9bf3dee: 8314831: NMT tests ignore vm flags
  • b74805d: 8315863: [GHA] Update checkout action to use v4
  • 1cae0f5: 8315220: Event NativeLibraryLoad breaks invariant by taking a stacktrace when thread is in state _thread_in_native
  • 8f7e29b: 8313422: test/langtools/tools/javac 144 test classes uses com.sun.tools.classfile library
  • ... and 58 more: https://git.openjdk.org/jdk/compare/2f7c65ec48dc35d75eed8af411d482ba40de70dc...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Sep 8, 2023
@openjdk openjdk bot closed this Sep 8, 2023
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels Sep 8, 2023
@openjdk
Copy link

openjdk bot commented Sep 8, 2023

@y1yang0 @wenshao Pushed as commit 4b43c25.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@wenshao
Copy link
Contributor Author

wenshao commented Sep 8, 2023

/backport jdk21u

@openjdk
Copy link

openjdk bot commented Sep 8, 2023

@wenshao To use the /backport command, you need to be in the OpenJDK census and your GitHub account needs to be linked with your OpenJDK username (how to associate your GitHub account with your OpenJDK username).

@RogerRiggs
Copy link
Contributor

/backport jdk21u

fyi, the backport command is issued on the final commit (not the PR).
So that 4b43c25 from the bot message above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core-libs core-libs-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.