Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core/fault/stacktrace: Implement compression. #1172

Merged
merged 1 commit into from
Oct 9, 2017

Conversation

ben-clayton
Copy link
Contributor

The goal here is to get a stacktrace down to something under a 150 bytes so it could be reported in a Google Analytics exception message.

I've implemented a bunch of different compression methods, compared these to flate, gzip, lzw and zlib. All of the custom compressors out-compress the standard compressors.

As compression payloads are small, and time isn't that important, the algorithm tries to compress with 8 different compressors - and encodes the compression type as the first 3 bits of the data.

Currently unused, but hopefully will be soon.

bestValue := v
bestIndex := 0
bestScore := size(uint64(bestIndex)) + size(bestValue)
for idx := 1; idx < p.history; idx++ {
Copy link
Contributor

@dsrbecky dsrbecky Oct 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not get this. Why not just:
for j := 0; j < i; j++ {
idx := i - j

Copy link
Contributor Author

@ben-clayton ben-clayton Oct 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The index used to be fixed size - you're right in that the history length is now dynamic, so yes, this is a good suggestion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return out
}

type compressorList []compressor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems like unnecessary complexity. Does that gain anything at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - it lets you combine compressors together. Experimenting with different compressor combinations has been really useful.

/* 7 */ compressorList{baseCompressor{}, backRefCompressor{packDiff, unpackDiff, 15}},
}

const compressorIdxBits = 4 // Must be large enough to hold the index of packer.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just VLE encode it?


// Tweakables
const (
bitChunkMinSize = 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notes:

  • Just increments might not be optimal. If you build a histogram of the values, you can probably calculate the optimal chunk sizes.
  • Different optimal values might apply to the back-refences and to the deltas.
  • Are all of the addresses aligned by any chance? (i.e. can we divide all by common factor?)
  • Finally, I do not think it is really worth the time to investigate anything I just said :-D

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just increments might not be optimal. If you build a histogram of the values, you can probably calculate the optimal chunk sizes.

I actually tried this - storing the table outweighed the costs for this case. I was surprised too.

Are all of the addresses aligned by any chance? (i.e. can we divide all by common factor?)

Nope. Again, something that surprised me. My guess is that these aren't real PC addresses, but contain other metadata somehow.

Copy link
Contributor

@dsrbecky dsrbecky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from sharing some thoughts, LGTM

The goal here is to get a stacktrace down to something under a 150 bytes so it could be reported in a Google Analytics exception message.

I've implemented a bunch of different compression methods, compared these to flate, gzip, lzw and zlib. All of the custom compressors out-compress the standard compressors.

As compression payloads are small, and time isn't that important, the algorithm tries to compress with 8 different compressors - and encodes the compression type as the first 3 bits of the data.

Currently unused, but hopefully will be soon.
@ben-clayton ben-clayton merged commit 0d889f0 into google:master Oct 9, 2017
purvisa-at-google-com pushed a commit that referenced this pull request Sep 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants