Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

(Premature) optimization of nqp::join() on JVM #106

Merged
merged 1 commit into from

6 participants

@gerdr
Collaborator

Instead of using a StringBuilder (which may need to repeatedly re-allocate its internal buffer), we allocate a character array of correct size. This improves the performance of micro-benchmarks, but the effect on setting compilation is minimal.

@Benabik
Collaborator

StringBuilder does have a constructor that takes a capacity. Have you compared timings with using that?

@gerdr
Collaborator

@Benabik I did not compare timings against a properly-sized StringBuilder, but if there's any performance difference at all, this will be slightly faster as I basically manually inlined AbstractStringBuilder.append(), getting rid of the overflow checks and saving the allocation of the StringBuilder object.

The JIT can potentially do exactly the same thing, but I don't think it's worth the effort to de-optimize the code now that I've already written it ;)

@Benabik
Collaborator

Agressive manual inlining leads to harder to read code. I had wondered if you could get much of the speed improvement without adding so much code.

It's also possible that the implementation of StringBuilder uses lower level loops for speed. memcpy can often be faster than manual loops, and I imagine that would be even worse when the loop is in bytecode.

What micro-benchmark improves? I could try it myself, since I suggested it.

@gerdr
Collaborator

Ahem, StringBuilder uses String.getChars() to copy the data, same as I do. The additional loops are there to precompute the length.

Anyway, if you want to do this, more idiomatic code that does the same thing (untested, beware typos) is

public static String join(String delimiter, SixModelObject arr, ThreadContext tc) {
    final int prim = arr.st.REPR.get_value_storage_spec(tc, arr.st).boxed_primitive;
    if (prim != StorageSpec.BP_NONE && prim != StorageSpec.BP_STR)
        ExceptionHandling.dieInternal(tc, "Unsupported native array type in join");

    final int numElems = (int) arr.elems(tc);
    if (numElems == 0)
        return "";

    String[] strings = new String[numElems];
    int totalLength = delimiter.length() * (numElems - 1);

    for (int i = 0; i < numElems; ++i) {
        if (prim == StorageSpec.BP_STR) {
            arr.at_pos_native(tc, i);
            strings[i] = tc.native_s;
            totalLength += tc.native_s.length();
        } else {
            String s = arr.at_pos_boxed(tc, i).get_str(tc);
            strings[i] = s;
            totalLength += s.length();
        }
    }

    StringBuilder sb = new StringBuilder(totalLength);
    sb.append(string[0]);

    for(int i = 1; i < strings.length; ++i) {
        sb.append(delimiter);
        sb.append(strings[i]);
    }

    return sb.toString();
}

As a micro-benchmark, just call nqp::join() on an array that's 'big enough'.

@Benabik
Collaborator

I appear to have mis-read your commit, my apologies. And looking at the code to still use StringBuilder, I see there's not a lot of difference in complexity anyway. When I read "premature optimization" and "manual inlining of library code", I worry a bit. Thank you for explaining things.

In the meantime, I have to figure out how I broke my nqp.jvm...

@diakopter
Owner

looks like my emailed reply didn't make it an hour ago; oh well; here it is:

It's worth testing at least, since it's also conceivable the JIT can optimize the usual factoring better.

@gerdr
Collaborator

I agree that it's worth looking into, but mostly for academic reasons. I would be very surprised if the JIT could optimize the less explicit code any better as StringBuilder is essentially a thin wrapper for exactly the same code.

@teodozjan

Hi,

IMHO it requires benchark.

Also there is temptation to make StringBuilder static. See this stackoverflow question but may become memory hungry solution if someone uses this in bad way

@moritz
Owner

I've done some rudimentary benchmarks in https://gist.github.com/moritz/ab4e7cd01c36032e4d6b

Results:

  • startup time unchanged (2.02 +- 0.06 => 2.08 +- 0.15)
  • performance improvements of 9.6%, but in my numbers that's only three standard deviations (so don't trust it too much).

So a decent improvement in a micro benchmark while only slightly increasing code size and complexity sounds like an overall win to me.

@moritz moritz merged commit 52733c5 into perl6:master
@rurban
Collaborator

See also parrot/parrot#1123 to fix this in the parrot backend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Jul 10, 2013
  1. @gerdr

    Optimize nqp::join() on JVM

    gerdr authored
This page is out of date. Refresh to see the latest.
Showing with 30 additions and 11 deletions.
  1. +30 −11 src/vm/jvm/runtime/org/perl6/nqp/runtime/Ops.java
View
41 src/vm/jvm/runtime/org/perl6/nqp/runtime/Ops.java
@@ -2676,25 +2676,44 @@ public static String chr(long val) {
}
public static String join(String delimiter, SixModelObject arr, ThreadContext tc) {
- final StringBuilder sb = new StringBuilder();
- int prim = arr.st.REPR.get_value_storage_spec(tc, arr.st).boxed_primitive;
+ final int prim = arr.st.REPR.get_value_storage_spec(tc, arr.st).boxed_primitive;
if (prim != StorageSpec.BP_NONE && prim != StorageSpec.BP_STR)
ExceptionHandling.dieInternal(tc, "Unsupported native array type in join");
final int numElems = (int) arr.elems(tc);
- for (int i = 0; i < numElems; i++) {
- if (i > 0) {
- sb.append(delimiter);
- }
- if (prim == StorageSpec.BP_STR) {
+ if (numElems == 0)
+ return "";
+
+ String[] strings = new String[numElems];
+ int totalLength = delimiter.length() * (numElems - 1);
+
+ if (prim == StorageSpec.BP_STR) {
+ for (int i = 0; i < numElems; ++i) {
arr.at_pos_native(tc, i);
- sb.append(tc.native_s);
- } else {
- sb.append(arr.at_pos_boxed(tc, i).get_str(tc));
+ strings[i] = tc.native_s;
+ totalLength += tc.native_s.length();
+ }
+ } else {
+ for (int i = 0; i < numElems; ++i) {
+ String s = arr.at_pos_boxed(tc, i).get_str(tc);
+ strings[i] = s;
+ totalLength += s.length();
}
}
- return sb.toString();
+ char[] chars = new char[totalLength];
+ strings[0].getChars(0, strings[0].length(), chars, 0);
+
+ int pos = strings[0].length();
+ for(int i = 1; i < strings.length; ++i) {
+ delimiter.getChars(0, delimiter.length(), chars, pos);
+ pos += delimiter.length();
+
+ strings[i].getChars(0, strings[i].length(), chars, pos);
+ pos += strings[i].length();
+ }
+
+ return new String(chars);
}
public static SixModelObject split(String delimiter, String string, ThreadContext tc) {
Something went wrong with that request. Please try again.