Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make compression of peer forwarding configurable, rather than hard coded #208

Merged
merged 1 commit into from
Jan 29, 2021
Merged

Make compression of peer forwarding configurable, rather than hard coded #208

merged 1 commit into from
Jan 29, 2021

Conversation

BRMatt
Copy link
Contributor

@BRMatt BRMatt commented Jan 28, 2021

We're seeing substantial data transfer between peers in our refinery
cluster, far more than we saw going to honeycomb when we were using
hosted refinery.

We're currently running 3 c5.2xlarge instances of refinery, in separate
AZs, and each one seems to be sending ~7MB/s of data to its other 2
peers. Some back of the envelope math suggests this could generate a
bill of (7MB/s * 3600 * 24 * 30)/1000MB per GB * $0.02/GB = $362.88 for
1 node to talk to one peer. Given each node has 2 peers, and there are
three nodes, it seems plausible that the bandwidth alone could cost in
the ballpark of $2,172 - about 3 times the cost of running the cluster,
without even considering the cost of transmitting the sampled-in data to
Honeycomb.

While looking through the source code I noticed that refinery explicitly
opts out of using gzip compression when communicating with peers.

// gzip compression is expensive, and peers are most likely close to each other
// so we can turn off gzip when forwarding to peers
DisableGzipCompression: true,
EnableMsgpackEncoding: true,

It seems this change was made in February 2019. A
few months later libhoney-go was changed to use zstd for
compression
by default when sending data to honeycomb's
API.

The PR that introduces zstd references substantial performance
improvements, so I'm assuming Honeycomb had some performance issues with
gzip compression, disabled it on several of their services, switched to
using zstd, but never got around to enabling it in refinery...?

By the looks of it libhoney-go is currently sending zstd compressed data
to refinery, and refinery uses it to compress data sent to honeycomb,
so I assume we won't run into any substantial performance issues by
also using it on peer-to-peer communication. I've introduced a config
option to control this in case there are legitimate situations where
compression has to be disabled to prevent over-utilization of CPU.

@BRMatt
Copy link
Contributor Author

BRMatt commented Jan 28, 2021

I installed the debian package built by circleci on a machine in our cluster and the results are very promising!

Here's an iftop -B showing the data transfer between members of the cluster from the perspective of the canary running this build:

You can see that the canary is forwarding data to peers at ~1.5MB/s, and is receiving data from its peers at ~7MB/s.

Looking at the same tool on one of the peers running the latest stable release we can see that data from the canary is indeed streaming in at ~1-2MB/s.

If we adjust the back of the envelope math to account for this change, it seems the drop in data transfer costs would be quite substantial:

# pre-compression
(7MB/s * 3600 * 24 * 30)/1000MB per GB * $0.02/GB = $362.88
# after zstd
(2MB/s * 3600 * 24 * 30)/1000MB per GB * $0.02/GB = $103.68

Which would likely reduce the overall data transfer costs for us from ~$2,172 to $622.08!

Copy link
Contributor

@MikeGoldsmith MikeGoldsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, thanks @BRMatt! 🎉

@BRMatt BRMatt marked this pull request as ready for review January 28, 2021 15:49
@BRMatt BRMatt requested a review from a team January 28, 2021 15:49
@MikeGoldsmith
Copy link
Contributor

Hey @BRMatt - What do you think about updating the DisableGzipCompression property name so that it doesn't feel like a double-negative. eg "disable compression" with default of false.

Maybe something like CompressPeerCommunication with a default of true?

@BRMatt
Copy link
Contributor Author

BRMatt commented Jan 28, 2021

@MikeGoldsmith I don't mind too much, I was just trying to match the terminology used by libhoney-go's transmission. I presume libhoney uses the double negative so that a config struct with a zero value would be a reasonable default for most people? If you want to avoid it in refinery's config I'm happy to change it.

@BRMatt
Copy link
Contributor Author

BRMatt commented Jan 29, 2021

@MikeGoldsmith I've made the requested change. Do you think it'll be possible to cut a new release before the weekend? I'd quite like to have this version out ASAP to start saving on our data transfer bill 😬

Copy link
Contributor

@MikeGoldsmith MikeGoldsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. Thanks for updating. Just one small suggestion for the GetCompressPeerCommunication description.

I understand you were trying to use the same terminology between beeline-go and and refinery, but we don't have to use it and I think this helps make the intent a little clearer.

config/config.go Outdated
@@ -25,6 +25,10 @@ type Config interface {
// peer traffic
GetPeerListenAddr() (string, error)

// GetCompressPeerCommunication will be true if refinery should _not_ compress
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// GetCompressPeerCommunication will be true if refinery should _not_ compress
// GetCompressPeerCommunication will be true if refinery should compress

We're seeing substantial data transfer between peers in our refinery
cluster, far more than we saw going to honeycomb when we were using
hosted refinery.

We're currently running 3 c5.2xlarge instances of refinery, in separate
AZs, and each one seems to be sending ~7MB/s of data to its other 2
peers. Some back of the envelope math suggests this could generate a
bill of (7MB/s * 3600 * 24 * 30)/1000MB per GB * $0.02/GB = $362.88 for
1 node to talk to one peer. Given each node has 2 peers, and there are
three nodes, it seems plausible that the bandwidth alone could cost in
the ballpark of $2,172 - about 3 times the cost of running the cluster,
without even considering the cost of transmitting the sampled-in data to
Honeycomb.

While looking through the source code I noticed that refinery explicitly
opts out of using gzip compression when communicating with peers.

https://github.com/honeycombio/refinery/blob/3d85a7bb00141f810951eee402ae859aad74e9d5/cmd/refinery/main.go#L150-L153

It seems [this change was made][original-disable] in February 2019. A
few months later [libhoney-go was changed to use zstd for
compression][libhoney-zstd] by default when sending data to honeycomb's
API.

The PR that introduces zstd references substantial performance
improvements, so I'm assuming Honeycomb had some performance issues with
gzip compression, disabled it on several of their services, switched to
using zstd, but never got around to enabling it in refinery...?

By the looks of it libhoney-go is currently sending zstd compressed data
to refinery, and refinery uses it to compress data sent to honeycomb,
so I assume we won't run into any substantial performance issues by
also using it on peer-to-peer communication. I've introduced a config
option to control this in case there are legitimate situations where
compression has to be disabled to prevent over-utilization of CPU.

fixes #206

[original-disable]: #21
[libhoney-zstd]: honeycombio/libhoney-go#57
@MikeGoldsmith MikeGoldsmith merged commit d5df535 into honeycombio:main Jan 29, 2021
@BRMatt BRMatt deleted the feature-allow-peer-compression branch January 29, 2021 12:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants