Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JDK-8266503: [ul] Make Decorations safely copy-able and reduce their size (first attempt, closed) #3855

Conversation

@tstuefe
Copy link
Member

@tstuefe tstuefe commented May 4, 2021

In Universal Logging, class LogDecorations keeps resolved decorations as well as a lookup table of pointers to the start of each resolved decoration, by decorator. This is dangerous, since it makes object copy non-trivial (the pointers would need to be relocated). It is also wasteful since we spend 8 bytes per pointer per index.

Better would be to use a numerical index array of offset values, which could be safely copied.

And since the resolve buffer is 256 char, we can easily make this index an 8-bit value, which reduces the size of a LogDecorations object down to 280 bytes (from 368). Size matters especially in the context of JDK-8229517.

The patch also adds a gtest, which tests that decorations are safely copy-able and that decorator resolution works as expected.

Testing:

  • manually jtreg runtime/logging
  • manually gtests
  • Will run more tests tonight

Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8266503: [ul] Make Decorations safely copy-able and reduce their size

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/3855/head:pull/3855
$ git checkout pull/3855

Update a local copy of the PR:
$ git checkout pull/3855
$ git pull https://git.openjdk.java.net/jdk pull/3855/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 3855

View PR using the GUI difftool:
$ git pr show -t 3855

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/3855.diff

@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented May 4, 2021

👋 Welcome back stuefe! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

Loading

@openjdk
Copy link

@openjdk openjdk bot commented May 4, 2021

@tstuefe The following label will be automatically applied to this pull request:

  • hotspot-runtime

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

Loading

@tstuefe tstuefe marked this pull request as ready for review May 4, 2021
@openjdk openjdk bot added the rfr label May 4, 2021
@mlbridge
Copy link

@mlbridge mlbridge bot commented May 4, 2021

Webrevs

Loading

Copy link
Contributor

@navyxliu navyxliu left a comment

I think the idea lookup table of offsets is great. Also compilers can make use of SIMD instructions to generate more efficient copying code.

But I think we still need a custom copy constructor. Here is why.

  1. the internal representation of decorations is compact. LogDecorations::create_decorations() puts all non-trivial decorations together.
  2. the average use of _decorations_buffer is just 1/10 of its capacity or DecorationsBufferSize.
    I dump all decorations : java -Xlog:async -Xlog:all=trace:file=hotspot.log --version
    By default, its decorators is LogDecorators::DefaultDecoratorsMask = (1 << uptime_decorator) | (1 << level_decorator) | (1 << tags_decorator);

The internal representation of _decorations_buffer most likely looks as follows.

Screen Shot 2021-05-04 at 1 11 12 PM

uptime | lvl(optimized out) | tags
2.582s '\0' '\0' codestrings '\0' [followed by garbage bytes]

I think it's a good idea to copy what we need. My copy constructor actually just copys 20bytes instead of 256 bytes.

LogDecorations::LogDecorations(const LogDecorations& o)
    : _level(o._level), _tagset(o._tagset) {
  size_t sz = 0;
  for (int i = 0; i < LogDecorators::Count; ++i) {
    if (o._decoration_offset[i] != NULL) {
      sz += strlen(o._decoration_offset[i]) + 1;
    }
  }

  // safe to call with sz = 0
  memcpy(_decorations_buffer, o._decorations_buffer, sz);
  for (int i = 0; i < LogDecorators::Count; ++i) {
    if (o._decoration_offset[i] != NULL) {
      _decoration_offset[i] = (o._decoration_offset[i] - o._decorations_buffer) + _decorations_buffer;
    } else {
      _decoration_offset[i] = NULL;
    }
  }
}

Can we that resolve copy ctor of logDecoration in this JBS? I think we can split this from that PR, which has been quite big.
I can try your approach and see how much performance hit on average.

Loading

// as "invalid offset" marker.
typedef uint8_t offset_t;
static const offset_t invalid_offset = DecorationsBufferSize - 1;
offset_t _decoration_offset[LogDecorators::Count];
Copy link
Contributor

@navyxliu navyxliu May 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is less flexible than before. the type uint8_t limits the size of _decorations_buffer is lesser than 256. it's impossible to stretch _decorations_buffer. Further, we should avoid implicit assumption between DecorationsBufferSize and uint8_t.

Loading

Copy link
Contributor

@navyxliu navyxliu May 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that you have a statement. It does check offset_t is wide enough to represent DecorationsBufferSize.

static const offset_t invalid_offset = DecorationsBufferSize - 1;
``

Loading

@navyxliu
Copy link
Contributor

@navyxliu navyxliu commented May 4, 2021

hi, Thomas,

I compare yours and mine using java -Xlog:async -Xlog:'all=trace:file=hotspot-async.log:uptime,tid,l,tg:filecount=0' --version. here is the performance comparison.

baseline is synchronized log outputs. custom copy ctor is what I am using in PR#3135.

wall-time(ms)
1 baseline 96.942
2 custom copy ctor 84.008
3 default copy ctor 84.226

I acknowledge that copying _decorations_buffer as need is not necessary.
use default copy ctor is good enough.

thanks,
--lx

Loading

// as "invalid offset" marker.
typedef uint8_t offset_t;
static const offset_t invalid_offset = DecorationsBufferSize - 1;
offset_t _decoration_offset[LogDecorators::Count];
Copy link
Contributor

@navyxliu navyxliu May 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that you have a statement. It does check offset_t is wide enough to represent DecorationsBufferSize.

static const offset_t invalid_offset = DecorationsBufferSize - 1;
``

Loading

@tstuefe tstuefe marked this pull request as draft May 5, 2021
@tstuefe
Copy link
Member Author

@tstuefe tstuefe commented May 5, 2021

Converted back to Draft because I think this can be done much better by splitting decorator resolution phase and printing phase.

Loading

@openjdk openjdk bot removed the rfr label May 5, 2021
@navyxliu
Copy link
Contributor

@navyxliu navyxliu commented May 5, 2021

I read this patch and PR #3874. IMHO, I don't think PR #3874 is over this patch in terms of "makes the code easier to understand and read." The change is obviously bigger(+238 −117) than this one. Is it worth it?

To me, the only critical drawback of storing resolved c-string is i18n. As a matter of fact, unified logging doesn't put i18n into consideration, so it is not an issue.

A universal buffer seems cleaner than scatter fields, doesn't it? Reducing size(from 368 -> 56 bytes on Linux x64) is appealing, but it actually has very limited impact on real performance.

I am not a reviewer. Those are all my personal opinions. Take it as a grain of salt. I can go either way.

Loading

@tstuefe
Copy link
Member Author

@tstuefe tstuefe commented May 6, 2021

I read this patch and PR #3874. IMHO, I don't think PR #3874 is over this patch in terms of "makes the code easier to understand and read." The change is obviously bigger(+238 −117) than this one. Is it worth it?

Be fair, a lot of those lines is added tests, which had not been there before, and added comments, which this code sorely needed. The code size of logDecorations.cpp/hpp is smaller now.

I reduced the patch size somewhat further.

To me, the only critical drawback of storing resolved c-string is i18n. As a matter of fact, unified logging doesn't put i18n into consideration, so it is not an issue.

A universal buffer seems cleaner than scatter fields, doesn't it?

In my opinion no.

Reducing size(from 368 -> 56 bytes on Linux x64) is appealing, but it actually has very limited impact on real performance.

Really? It decreases your planned message buffer size, no?

I am not a reviewer. Those are all my personal opinions. Take it as a grain of salt. I can go either way.

I close this PR now in favor of the other one.

Loading

@tstuefe
Copy link
Member Author

@tstuefe tstuefe commented May 6, 2021

Closing in favor of #3874

Loading

@tstuefe tstuefe changed the title JDK-8266503: [ul] Make Decorations safely copy-able and reduce their size JDK-8266503: [ul] Make Decorations safely copy-able and reduce their size (first attempt, closed) May 6, 2021
@tstuefe tstuefe closed this May 6, 2021
@navyxliu
Copy link
Contributor

@navyxliu navyxliu commented May 6, 2021

Really? It decreases your planned message buffer size, no?

Yes, I am happy to see it reduces the size of each payload. but sometimes smaller memory footprint doesn't change to runtime performance directly. From my measurement, LogDecoration doesn't impact that much.

Check out this performance comparison. I found that copying 20 bytes is almost identical to copying 256 bytes.
#3855 (comment)

Loading

@tstuefe
Copy link
Member Author

@tstuefe tstuefe commented May 6, 2021

Really? It decreases your planned message buffer size, no?

Yes, I am happy to see it reduces the size of each payload. but sometimes smaller memory footprint doesn't change to runtime performance directly. From my measurement, LogDecoration doesn't impact that much.

Check out this performance comparison. I found that copying 20 bytes is almost identical to copying 256 bytes.
#3855 (comment)

Oh sure. But I never claimed this to be performance relevant. Its all about memory.

Loading

@tstuefe tstuefe deleted the JDK-8266503-UL-decorations-should-use-index-array branch Aug 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
2 participants