New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compress large payloads in all Action Cable subscription adapters with ActiveSupport::Gzip #49903
base: main
Are you sure you want to change the base?
Compress large payloads in all Action Cable subscription adapters with ActiveSupport::Gzip #49903
Conversation
I think if compression is going to be added, it should be added to ActionCable on the whole, similar to how every single ActiveSupport::Cache store supports compression. Additionally, ActiveSupport::Cache has a threshold for when to use compression or not. It defaults to compressing anything over 1KB. With compression, there are diminishing returns when compressing content under a certain size, which is why the threshold is helpful. I think ActionCable could use to take some pointers from ActiveSupport::Cache on this. |
actioncable/lib/action_cable/subscription_adapter/postgresql.rb
Outdated
Show resolved
Hide resolved
0644e27
to
ebac408
Compare
I really like this idea, is this something the Rails Core team would accept? I can work on this, it would be great to read thoughts from @matthewd and @palkan. It could be something to enable or disable compression, set a compressor which by default would be Zlib as well as ActiveSupport::Cache, and set a threshold.
Good point. In the meantime, I've set a 1 kilobyte threshold by default for now. Do you think we should make that configurable? Since there is a possibility that we will move these configurations out of the adapter, I'd like to read more opinions before proceeding. |
ebac408
to
3067d29
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work 👍
Left some comments on how we can refactor it a bit and reuse existing Rails utilities.
I think if compression is going to be added, it should be added to ActionCable on the whole, similar to how every single ActiveSupport::Cache store supports compression?
I'm not sure we need this to be a global option. At least, at this point. I'd prefer to keep this an internal detail rather than something a user can interfere with—less configuration options, less mental overhead.
However, I think, it's worth to extract this functionality into a utility class, so possible third-party (or future built-in) adapters would be able to benefit from it.
So, in the end, the usage would be like:
class CustomAdapter < ActionCable::SubscriptionAdapter::Base
def initialize(*)
super
@compressor = ActionCable::SubscriptionAdapter::Compressor.new(threshold: 8.kilobytes, compressor: ActiveSupport::Gzip)
# we also would need to extend the subscriber map interface to support decompression
subscriber_map.compress_with(@compressor)
end
def broadcast(channel, payload)
payload = @compressor.compress(payload)
# ...
end
end
actioncable/lib/action_cable/subscription_adapter/postgresql.rb
Outdated
Show resolved
Hide resolved
actioncable/lib/action_cable/subscription_adapter/postgresql.rb
Outdated
Show resolved
Hide resolved
actioncable/lib/action_cable/subscription_adapter/postgresql.rb
Outdated
Show resolved
Hide resolved
3067d29
to
9438e90
Compare
@palkan thanks for your review! I've addressed all your comments, what do you think now? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Great work.
I've just noticed that we changed the way we escape payloads, see the comment.
actioncable/lib/action_cable/subscription_adapter/postgresql.rb
Outdated
Show resolved
Hide resolved
actioncable/lib/action_cable/subscription_adapter/postgresql.rb
Outdated
Show resolved
Hide resolved
9438e90
to
62cec8d
Compare
actioncable/lib/action_cable/subscription_adapter/postgresql.rb
Outdated
Show resolved
Hide resolved
83d97e1
to
275d408
Compare
Thanks @brunoprietog! Looks like I'm out of questions/suggestions 😁 @rafaelfranca @matthewd Looking for someone with merge permissions for the final review 🙂 |
I like the idea of default compression a lot, like with caching. Let's apply it to all adapters, not just pgsql. |
Great! Which threshold do you find suitable for the other adapters? Or do we go with the compressor default? (1 kb) |
1fa345e
to
569afc3
Compare
actioncable/CHANGELOG.md
Outdated
@@ -1,2 +1,13 @@ | |||
* Compress large payloads in all subscription adapters | |||
|
|||
This allows to broadcast payloads larger than 8000 bytes by compressing them with ActiveSupport::Gzip with PostgreSQL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think this needs a rewording.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, maybe give some indication of how big the payload can be now. Before we blew at 8k. What should you expect to be the limit with compression now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe double? I guess it depends a lot on the string. Also the limit really only gives problems in PostgreSQL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me it's about more than the limit. It's simply more efficient! We use compression for everything else that goes over the wire. So would make a combined point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. What do you think now?
end | ||
|
||
def compressed?(data) | ||
data.start_with?("1") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are 0/1 sufficiently distinctive markers that they'll never trigger on real data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to me, yes, because the prefix is always added, regardless of whether it was compressed or not. If the payload is "0", the payload to be broadcasted will be "00". Then the prefix is checked and removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's more if the real payload had a 1, then decompression would fail.
And actually, on second thought, that could definitely be a problem. If you have a channel that sends ids, and we do, this is going to blow. Think we need a different marker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite understand this point. If you send an id with value "1", the actual payload to be broadcasted is "01". Then the prefix is removed. The same applies for compressed payloads. Just compress the payload without the prefix and then check in the decompress method if it has to decompress or not.
Anyway, maybe we could also use the same prefix of ActiveSupport::Cache
, \x78
, if you think it's better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's more if the real payload had a 1, then decompression would fail.
That only could happen during the upgrade roll out (when some instances still publish broadcasts without the compression flag). To prevent this (and potential schema) changes, it's worth using a custom prefix (kinda versioning mechanism).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@palkan Right, do you have any suggestions to handle this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the following options here:
-
More sophisticated payload schema with the version bit:
| version (2) | compressed (1) | data |
. That would allow us not only handle legacy payloads but also allow us introduce new extensions in the future in a non-breaking way. What is the version then? Probably, something like the already mentioned\x78
from the cache implementation (which is just"x"
, right? so, it isn't really different form "1") -
Implement a fallback: if we observe "1" and fail to decompress, fall back to the original value. However, that wouldn't work with payloads starting with "0". That leads us to the third option...
-
Just ignore. Action Cable pub/sub is ephemeral; if some messages would get lost (or failed to decompress), that wouldn't be the end of the world, since the communication channel is unreliable; so, we do not break any guarantees.
I would go with 3 for now; maybe, adding one more prefix to indicate the schema version, e.g., "v1/<compressed>/<payload>"
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in that case we should have an Action Cable setting that allows to configure the schema?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should have an Action Cable setting that allows to configure the schema?
I don't think we need it; I meant internal version number (to handle framework-level changes), not smth like namespace
actioncable/lib/action_cable/subscription_adapter/postgresql.rb
Outdated
Show resolved
Hide resolved
3232ed3
to
ee8ff5e
Compare
195e0a0
to
0033200
Compare
…h ActiveSupport::Gzip
0033200
to
2a08fc8
Compare
Looking at this with some time away, I'm actually not really sure what we're gaining here. This doesn't solve the issue that pgsql has an upper boundary. It just pushes it out a bit further. Which means you're less likely to hit it, but when you do, it'll be even more of a surprise. I very much appreciate the work that's gone into this, but on further reflection, I don't actually think it's a good path. It doesn't definitively solve the underlying issue, but perhaps makes it even more surprising when it hits (kinda like int vs bigint of old) at higher scale/payload. And the compression overhead is very likely not going to improve performance on adapters that aren't constrained by payload size, since the compress/decompress step is almost certainly more performance intensive than just sending a larger raw payload. But that last point COULD be disproven by some solid performance validation, and if it was substantially better, I'd consider this change on those grounds alone. But the value of that performance improvement has to be big enough to offset the increased complexity of the setup. I don't think this is a good solution to the pg limit, though. |
Postgres’s NOTIFY docs say:
So what if you kept some of the work done here and basically have the Postgres adapter have an 8000 byte threshold. Anything under that simply works as it always has. But anything over that writes the message to a new table that the Postgres adapter has, and calls NOTIFY with a GlobalID of this record. The part you’re keeping of this PR would be the ability to have different types of messages. Instead of compressed and decompressed you have raw and global id as options. Writing to a table would obviously decrease throughput, which is why I think the table is solely for Postgres adapter messages over the 8000 byte limit. Then there’s not a performance loss for the adapter as is, and the only time you’d have a performance hit is when it would have previously errored (a great trade off). One thing to consider is when these records would be deleted. I’m not familiar enough with ActiveStorage. I don’t think the web servers can basically “pop” the message, since the broadcast would go out to all of them and you wouldn’t want Server A to delete the row before Server B can read it. So I think you’d need some sort of cleanup job. That bulk deletes old messages after they’re obviously done. Which then leaves me to wondering if this message column might need encryption. |
Oh and the table can be optional. On app boot the Postgres adapter can check for existence of the table. If it doesn’t exist, it does what it always has (errors on large payloads). |
@dhh wrote:
💯. This is potentially an interesting optimisation for reducing bytes over the network. It is not a solution for the large payload limit. @natematykiewicz wrote:
The actioncable-enhanced-postgresql-adapter gem has already implemented the separate table approach. Also #49634 is still open, which is a less refined version implementation of what ended up in the gem. I should probably update the PR to match the gem. I've ended up opinionated that it's the correct approach (fully self managing table management, no ActiveRecord dependency). |
Compressing the payload was a suggestion from @palkan that I found interesting because it is relatively simple, it doesn't add as much overhead as having to query a database table, and ultimately, if you broadcast payloads that are too large it shouldn't be recommended anyway. But it definitely doesn't solve the underlying problem, that's a fact. Anyway, from #50480 I can understand that the adapter you made for Campfire will quickly solve this problem. |
Another option is (pseudocode): if payload.size <= 8000 bytes
broadcast payload
else
compressed = compress(payload)
if compressed.size <= 8000 bytes
broadcast compressed
else
stored = ActionCablePayload.create!(payload: compressed)
broadcast stored
end
end And you can use the payload prefixes in this PR to help manage how the other end should decode it. I believe this would give you the best throughput, falling back to a database table only when absolutely necessary. Someone would want to benchmark broadcasting a 8000 byte payload vs compressing and broadcasting that payload. My assumption is that the time it takes to compress would be more than the network latency. But I don't actually know. Maybe you'd just always want to compress. |
At the beginning this was only for PostgreSQL and compression was only done when it was higher than 8000 bytes. Then it mutated to compress payloads in all adapters. I think that idea sounds like the most efficient in terms of avoiding table queries as much as possible. |
Well, the problem being solved has always been "The postgres adapter cannot be used with payloads over 8000 bytes". Compression was an easy way to make that raise that ceiling a lot, to make the problem much less likely. But to DHH's point, this PR doesn't solve the fact that the Postgres adapter cannot be used for all payloads. From what I'm seeing, gzipping can give you up to 1000x smaller files. So this PR means you might be able to send a ~7.5MB payload. Is that good enough for normal use cases? That's substantially larger than ~8KB. |
Tangentially, how does the Redis adapter handle a 7.5MB payload? This amount of data seems like this would be too large to send in 1 message anyways. I imagine that much HTML would cause the browser to throw memory warnings. And I'm not sure how it'd do parsing that much JSON. |
@natematykiewicz wrote:
This claim isn't helpful. Palkan reported ~3x reduction for typical hotwire payloads in my PR. This is in line with general wisdom that JSON/HTML payloads tend to compress between 2-4x.
My implementation treats < 8000 byte payloads normally and puts them in a table above 8000: The compression thing could be added in there as you say. But unless anyone can point to a real world measurement of compression improving the performance of their ActionCable deployment, it's unclear if the added complexity is warranted or not.
Not that the 7.5MB payload size makes sense, but just for the record: The way Redis pub/sub is limited is "bytes per client per minute". If I understand correctly it's 8-32MB per minute per client. So it's a very different model to the "per message" limit of Postgres. |
Sorry to be misleading. I was reading that a level 9 Gzip has a 1032:1 compression ratio. But I suppose that's like a perfect world scenario and not something you see realistically. I also don't know if people tend to do a level 9 Gzip at runtime. I know the default is level 6. So basically you're saying with compression we should expect to increase the payload limit from 8000 bytes to somewhere between 16,000 and 32,000 bytes. That's better, but not amazing... Do you know what level the 2-4x compression was? Was it 1, 6, or 9? If it wasn't 9, how's 9 look? Or would 9 be too slow to want to use for this?
Agreed 100%. That's why I said "Someone would want to benchmark broadcasting a 8000 byte payload vs compressing and broadcasting that payload. My assumption is that the time it takes to compress would be more than the network latency. But I don't actually know. Maybe you'd just always want to compress." |
In the examples I've seen, the difference between 6 and 9 in compression ratio is less than 1%. So it's negligible. |
Motivation / Background
The PostgreSQL notify command has a limit of 8000 bytes, which means that the payload that can be broadcasted cannot be very large.
This is motivated by the simplicity of the solution and by leaving the Redis dependency, along the same lines of Solid Cache and Solid Queue.
Detail
Like
ActiveSupport::Cache
, Payloads that are larger than 1 kilobytes are now compressed with ActiveSupport::Gzip.Additional information
In the tests you can see a rather large string that can now be broadcasted with the PostgreSQL adapter.
Additionally, I've added the
ActionCable::SubscriptionAdapter::Compressor
class so other adapters can make use of it in the future.There is another approach and another discussion in #49634.
Thanks to @palkan for the idea in this comment.
Checklist
Before submitting the PR make sure the following are checked:
[Fix #issue-number]