Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cleanup the code an try to make it faster #373

Merged
merged 5 commits into from
Jan 29, 2018
Merged

cleanup the code an try to make it faster #373

merged 5 commits into from
Jan 29, 2018

Conversation

rbri
Copy link
Collaborator

@rbri rbri commented Dec 27, 2017

No description provided.

@sainaen
Copy link
Contributor

sainaen commented Dec 27, 2017

Hi! Thanks for the patch!
You've said “try to make it faster” in the title, so in the end does this change make flatten() perform better? How did you measure the effect?

@rbri
Copy link
Collaborator Author

rbri commented Dec 28, 2017

I have not done any real performance tests, so i wrote 'try'.
Played a bit with the unit tests and made one more change (in fact i changed back to array list instead of deque).
I think the code has two improvements; first there is not dept calculation and second we do not add elements to the queue only for pull them two lines later.

@sainaen
Copy link
Contributor

sainaen commented Dec 28, 2017

I have not done any real performance tests, so i wrote 'try'.

Oh, I see.
I didn't mean to imply that you should revert your change though. So, how did you test that ArrayList is faster?

there is not dept calculation

Do you think we can totally replace depth with a flag isFlat/isFlattened? I'd expect depth to be used for maximum depth checks, to prevent too deep trees of ConsStrings, but I couldn't find anything but simple depth > 0 checks.

second we do not add elements to the queue only for pull them two lines later.

Honestly, code 'after' looks more complicated to me than 'before': (1) it has more conditions and nesting, (2) identical line that sets next is in more than one branch. I'd love to have proof that this is justified by the improved performance.

By the way, I've just checked depths of s1 and s2 in the ConsString's constructor, and it looks like most of the time, at least in our tests, s1 is 'deeper' than s2:

side avg depth median depth
s1 93638 3
s2 1.25 0

So typical ConsString looks like this:

(s1 = ConsString(
    s1 = ConsString(
        s1 = ConsString(...),
        s2 = String("something")),
    s2 = String("something")),
 s2 = String("something"))

I suspect there's an easy perf win here: process s2 before s1, so that the stack (confusingly named buffer) doesn't get too long. With this, buffer will very rarely exceed ArrayList's default capacity which would allow us not to care about optimizing the case when next is a ConsString with depth == 0. But that's just a speculation, we'll need to verify it before implementing.

@rbri
Copy link
Collaborator Author

rbri commented Dec 28, 2017

Do you think we can totally replace depth with a flag isFlat/isFlattened? I'd expect depth to be used for maximum depth checks, to prevent too deep trees of ConsStrings, but I couldn't find anything but simple depth > 0 checks.

That was my original idea but changing the type of the var will break serialization.

@sainaen
Copy link
Contributor

sainaen commented Dec 28, 2017

That was my original idea but changing the type of the var will break serialization.

Wait, I thought ConsString replaces itself with a flattened String when serialized. Isn't that what writeReplace() supposed to do?

@rbri
Copy link
Collaborator Author

rbri commented Dec 28, 2017

Honestly, code 'after' looks more complicated to me than 'before': (1) it has more conditions and nesting, (2) identical line that sets next is in more than one branch. I'd love to have proof that this is justified by the improved performance.

I simply did some timing tests with our ConsStringTest class.

By the way, I've just checked depths of s1 and s2 in the ConsString's constructor, and it looks like most of the time, at least in our tests, s1 is 'deeper' than s2:

Maybe the huge numbers are because of our unit tests (testAppendManyStrings).

@sainaen
Copy link
Contributor

sainaen commented Dec 28, 2017

Maybe the huge numbers are because of our unit tests (testAppendManyStrings).

Most certainly. But even if I run only test262 cases, that are not there to specifically test ConsString, a huge gap between typical s1 and s2 depths is still there.

Thinking about this more, with depth we could dynamically choose which branch to handle first, so it's always the 'deeper' one. This way there's no need to guess beforehand. Though I wonder if it'd somehow make JIT's job harder.

@rbri
Copy link
Collaborator Author

rbri commented Dec 28, 2017

ok made a new version available, at least here testAppendManyStrings() is 20% faster than the original version (when running with 50000000)

@sainaen
Copy link
Contributor

sainaen commented Dec 28, 2017

Awesome!

I've played around with some JMH tests (NOTE: I'm not a perf engineer, results could be wrong! I'll publish my tests later) and this implementation is noticeably faster:

  • in the most favorable setup, with s1.depth == 1 and s2.depth == 1000000, flatten() from current master did 24.340 ± 1.974 ms/op
  • in the most terrible setup, with s1.depth == 1000000 and s2.depth == 1, flatten() from current master did 28.365 ± 2.401 ms/op
  • in the same situations proposed flatten() did 20.502 ± 1.008 ms/op and 17.076 ± 1.680 ms/op

🎉

There could possibly be a minor improvement if we switch to ArrayDeque, but the error bounds are too big.

I also like the name change. What do you think about going with it slightly further: firstleft, secondright? It's a tree after all. :)

Copy link
Contributor

@sainaen sainaen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for working on this!

@gbrail
Copy link
Collaborator

gbrail commented Jan 3, 2018 via email

@sainaen
Copy link
Contributor

sainaen commented Jan 3, 2018

I did 9 runs of v8 benchmark for code in master and with this patch. The median score I got is 43.6 (before) vs. 43.3 (after), which is comfortably inside error bounds, except maybe for Crypto. This result makes sense since neither of the test scripts seem to be heavy on String concatenation.

median v8 scores

Caliper in the SunSpider benchmark fails with “could not create the JVM”, does it works for you?
Running SunSpiderBenchmark class directly, I don't get huge difference in the numbers, similarly to the v8 benchmark. Don't know how trustworthy are these numbers.

SunSpider's String median time

(See my Google sheet with the data.)

There's a new v8/web-tooling-benchmark, which supposed to have more real-life loads, but they require (at least) template strings support, so it doesn't run in Rhino right now. :(

@gbrail
Copy link
Collaborator

gbrail commented Jan 16, 2018

Thanks for all the detailed work. I'd like to get 1.7.8 out and since we did a "release candidate" I'd like to hold off on merging this until after that so that we minimize the changes to only bug fixes. Let me know if this is blocking something more urgent. Thanks!

@rbri
Copy link
Collaborator Author

rbri commented Jan 16, 2018 via email

@gbrail gbrail merged commit c71115f into mozilla:master Jan 29, 2018
@gbrail
Copy link
Collaborator

gbrail commented Jan 29, 2018

I decided this looked good too. Thanks!

@rbri
Copy link
Collaborator Author

rbri commented Jan 30, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants