Compress large payloads in all Action Cable subscription adapters with ActiveSupport::Gzip #49903

brunoprietog · 2023-11-03T15:16:47Z

Motivation / Background

The PostgreSQL notify command has a limit of 8000 bytes, which means that the payload that can be broadcasted cannot be very large.

This is motivated by the simplicity of the solution and by leaving the Redis dependency, along the same lines of Solid Cache and Solid Queue.

Detail

Like ActiveSupport::Cache, Payloads that are larger than 1 kilobytes are now compressed with ActiveSupport::Gzip.

production:
  adapter: postgresql
  channel_prefix: appname_production

Additional information

In the tests you can see a rather large string that can now be broadcasted with the PostgreSQL adapter.

Additionally, I've added the ActionCable::SubscriptionAdapter::Compressor class so other adapters can make use of it in the future.

There is another approach and another discussion in #49634.

Thanks to @palkan for the idea in this comment.

Checklist

Before submitting the PR make sure the following are checked:

This Pull Request is related to one change. Changes that are unrelated should be opened in separate PRs.
Commit message has a detailed description of what changed and why. If this PR fixes a related issue include it in the commit message. Ex: [Fix #issue-number]
Tests are added or updated if you fix a bug or add a feature.
CHANGELOG files are updated for the changed libraries if there is a behavior change or additional feature. Minor bug fixes and documentation changes should not be included.

natematykiewicz · 2023-11-03T19:44:36Z

I think if compression is going to be added, it should be added to ActionCable on the whole, similar to how every single ActiveSupport::Cache store supports compression. Additionally, ActiveSupport::Cache has a threshold for when to use compression or not. It defaults to compressing anything over 1KB. With compression, there are diminishing returns when compressing content under a certain size, which is why the threshold is helpful.

I think ActionCable could use to take some pointers from ActiveSupport::Cache on this.

actioncable/lib/action_cable/subscription_adapter/postgresql.rb

guides/source/action_cable_overview.md

brunoprietog · 2023-11-07T22:27:29Z

I think if compression is going to be added, it should be added to ActionCable on the whole, similar to how every single ActiveSupport::Cache store supports compression.

I really like this idea, is this something the Rails Core team would accept? I can work on this, it would be great to read thoughts from @matthewd and @palkan.

It could be something to enable or disable compression, set a compressor which by default would be Zlib as well as ActiveSupport::Cache, and set a threshold.

Additionally, ActiveSupport::Cache has a threshold for when to use compression or not. It defaults to compressing anything over 1KB. With compression, there are diminishing returns when compressing content under a certain size, which is why the threshold is helpful.

Good point. In the meantime, I've set a 1 kilobyte threshold by default for now. Do you think we should make that configurable? Since there is a possibility that we will move these configurations out of the adapter, I'd like to read more opinions before proceeding.

palkan

Great work 👍

Left some comments on how we can refactor it a bit and reuse existing Rails utilities.

I think if compression is going to be added, it should be added to ActionCable on the whole, similar to how every single ActiveSupport::Cache store supports compression?

I'm not sure we need this to be a global option. At least, at this point. I'd prefer to keep this an internal detail rather than something a user can interfere with—less configuration options, less mental overhead.

However, I think, it's worth to extract this functionality into a utility class, so possible third-party (or future built-in) adapters would be able to benefit from it.

So, in the end, the usage would be like:

class CustomAdapter < ActionCable::SubscriptionAdapter::Base
  def initialize(*)
    super
    @compressor = ActionCable::SubscriptionAdapter::Compressor.new(threshold: 8.kilobytes, compressor: ActiveSupport::Gzip)
  
   # we also would need to extend the subscriber map interface to support decompression
    subscriber_map.compress_with(@compressor)
  end

  def broadcast(channel, payload)
    payload = @compressor.compress(payload)
    # ...
  end
end

actioncable/lib/action_cable/subscription_adapter/postgresql.rb

guides/source/action_cable_overview.md

actioncable/lib/action_cable/subscription_adapter/postgresql.rb

brunoprietog · 2023-12-09T02:21:14Z

@palkan thanks for your review! I've addressed all your comments, what do you think now?

palkan

Thanks! Great work.

I've just noticed that we changed the way we escape payloads, see the comment.

actioncable/lib/action_cable/subscription_adapter/postgresql.rb

palkan · 2023-12-13T07:59:03Z

Thanks @brunoprietog! Looks like I'm out of questions/suggestions 😁

@rafaelfranca @matthewd Looking for someone with merge permissions for the final review 🙂

dhh · 2024-01-02T15:32:09Z

I like the idea of default compression a lot, like with caching. Let's apply it to all adapters, not just pgsql.

brunoprietog · 2024-01-02T15:33:57Z

Great! Which threshold do you find suitable for the other adapters? Or do we go with the compressor default? (1 kb)

actioncable/lib/action_cable/subscription_adapter/inline.rb

dhh · 2024-01-02T16:32:13Z

actioncable/CHANGELOG.md

@@ -1,2 +1,13 @@
+*   Compress large payloads in all subscription adapters
+
+    This allows to broadcast payloads larger than 8000 bytes by compressing them with ActiveSupport::Gzip with PostgreSQL


Think this needs a rewording.

Also, maybe give some indication of how big the payload can be now. Before we blew at 8k. What should you expect to be the limit with compression now?

Maybe double? I guess it depends a lot on the string. Also the limit really only gives problems in PostgreSQL.

To me it's about more than the limit. It's simply more efficient! We use compression for everything else that goes over the wire. So would make a combined point.

Good point. What do you think now?

actioncable/CHANGELOG.md

dhh · 2024-01-02T16:33:53Z

actioncable/lib/action_cable/subscription_adapter/compressor.rb

+      end
+
+      def compressed?(data)
+        data.start_with?("1")


Are 0/1 sufficiently distinctive markers that they'll never trigger on real data?

According to me, yes, because the prefix is always added, regardless of whether it was compressed or not. If the payload is "0", the payload to be broadcasted will be "00". Then the prefix is checked and removed

It's more if the real payload had a 1, then decompression would fail.

And actually, on second thought, that could definitely be a problem. If you have a channel that sends ids, and we do, this is going to blow. Think we need a different marker.

I don't quite understand this point. If you send an id with value "1", the actual payload to be broadcasted is "01". Then the prefix is removed. The same applies for compressed payloads. Just compress the payload without the prefix and then check in the decompress method if it has to decompress or not.

Anyway, maybe we could also use the same prefix of ActiveSupport::Cache, \x78, if you think it's better.

It's more if the real payload had a 1, then decompression would fail.

That only could happen during the upgrade roll out (when some instances still publish broadcasts without the compression flag). To prevent this (and potential schema) changes, it's worth using a custom prefix (kinda versioning mechanism).

@palkan Right, do you have any suggestions to handle this?

I see the following options here:

More sophisticated payload schema with the version bit: | version (2) | compressed (1) | data |. That would allow us not only handle legacy payloads but also allow us introduce new extensions in the future in a non-breaking way. What is the version then? Probably, something like the already mentioned \x78 from the cache implementation (which is just "x", right? so, it isn't really different form "1")

Implement a fallback: if we observe "1" and fail to decompress, fall back to the original value. However, that wouldn't work with payloads starting with "0". That leads us to the third option...

Just ignore. Action Cable pub/sub is ephemeral; if some messages would get lost (or failed to decompress), that wouldn't be the end of the world, since the communication channel is unreliable; so, we do not break any guarantees.

I would go with 3 for now; maybe, adding one more prefix to indicate the schema version, e.g., "v1/<compressed>/<payload>".

So in that case we should have an Action Cable setting that allows to configure the schema?

we should have an Action Cable setting that allows to configure the schema?

I don't think we need it; I meant internal version number (to handle framework-level changes), not smth like namespace

actioncable/lib/action_cable/subscription_adapter/inline.rb

actioncable/lib/action_cable/subscription_adapter/postgresql.rb

actioncable/lib/action_cable/subscription_adapter/redis.rb

…h ActiveSupport::Gzip

brunoprietog · 2024-03-24T14:48:31Z

I've applied all the suggestions. Now the distinction between compressed and uncompressed is made with a prefix that includes the schema version. If the prefix doesn't comply with the format the original payload is returned.

What do you think? @dhh @palkan

dhh · 2024-04-03T08:26:54Z

Looking at this with some time away, I'm actually not really sure what we're gaining here. This doesn't solve the issue that pgsql has an upper boundary. It just pushes it out a bit further. Which means you're less likely to hit it, but when you do, it'll be even more of a surprise.

I very much appreciate the work that's gone into this, but on further reflection, I don't actually think it's a good path. It doesn't definitively solve the underlying issue, but perhaps makes it even more surprising when it hits (kinda like int vs bigint of old) at higher scale/payload. And the compression overhead is very likely not going to improve performance on adapters that aren't constrained by payload size, since the compress/decompress step is almost certainly more performance intensive than just sending a larger raw payload.

But that last point COULD be disproven by some solid performance validation, and if it was substantially better, I'd consider this change on those grounds alone. But the value of that performance improvement has to be big enough to offset the increased complexity of the setup.

I don't think this is a good solution to the pg limit, though.

natematykiewicz · 2024-04-03T13:05:19Z

Postgres’s NOTIFY docs say:

The “payload” string to be communicated along with the notification. This must be specified as a simple string literal. In the default configuration it must be shorter than 8000 bytes. (If binary data or large amounts of information need to be communicated, it's best to put it in a database table and send the key of the record.

So what if you kept some of the work done here and basically have the Postgres adapter have an 8000 byte threshold. Anything under that simply works as it always has. But anything over that writes the message to a new table that the Postgres adapter has, and calls NOTIFY with a GlobalID of this record. The part you’re keeping of this PR would be the ability to have different types of messages. Instead of compressed and decompressed you have raw and global id as options.

Writing to a table would obviously decrease throughput, which is why I think the table is solely for Postgres adapter messages over the 8000 byte limit. Then there’s not a performance loss for the adapter as is, and the only time you’d have a performance hit is when it would have previously errored (a great trade off).

One thing to consider is when these records would be deleted. I’m not familiar enough with ActiveStorage. I don’t think the web servers can basically “pop” the message, since the broadcast would go out to all of them and you wouldn’t want Server A to delete the row before Server B can read it. So I think you’d need some sort of cleanup job. That bulk deletes old messages after they’re obviously done.

Which then leaves me to wondering if this message column might need encryption.

natematykiewicz · 2024-04-03T13:07:52Z

Oh and the table can be optional. On app boot the Postgres adapter can check for existence of the table. If it doesn’t exist, it does what it always has (errors on large payloads).

dbackeus · 2024-04-03T18:45:28Z

@dhh wrote:

This doesn't solve the issue that pgsql has an upper boundary. It just pushes it out a bit further.

💯. This is potentially an interesting optimisation for reducing bytes over the network. It is not a solution for the large payload limit.

@natematykiewicz wrote:

But anything over that writes the message to a new table that the Postgres adapter has, and calls NOTIFY with a GlobalID of this record.

The actioncable-enhanced-postgresql-adapter gem has already implemented the separate table approach. Also #49634 is still open, which is a less refined version implementation of what ended up in the gem. I should probably update the PR to match the gem. I've ended up opinionated that it's the correct approach (fully self managing table management, no ActiveRecord dependency).

brunoprietog · 2024-04-03T18:53:53Z

Compressing the payload was a suggestion from @palkan that I found interesting because it is relatively simple, it doesn't add as much overhead as having to query a database table, and ultimately, if you broadcast payloads that are too large it shouldn't be recommended anyway. But it definitely doesn't solve the underlying problem, that's a fact.

Anyway, from #50480 I can understand that the adapter you made for Campfire will quickly solve this problem.

natematykiewicz · 2024-04-03T19:11:38Z

Another option is (pseudocode):

if payload.size <= 8000 bytes
  broadcast payload
else
  compressed = compress(payload)
  
  if compressed.size <= 8000 bytes
    broadcast compressed
  else
    stored = ActionCablePayload.create!(payload: compressed)
    broadcast stored
  end
end

And you can use the payload prefixes in this PR to help manage how the other end should decode it.

I believe this would give you the best throughput, falling back to a database table only when absolutely necessary.

Someone would want to benchmark broadcasting a 8000 byte payload vs compressing and broadcasting that payload. My assumption is that the time it takes to compress would be more than the network latency. But I don't actually know. Maybe you'd just always want to compress.

brunoprietog · 2024-04-03T19:16:22Z

At the beginning this was only for PostgreSQL and compression was only done when it was higher than 8000 bytes. Then it mutated to compress payloads in all adapters. I think that idea sounds like the most efficient in terms of avoiding table queries as much as possible.

natematykiewicz · 2024-04-03T19:26:26Z

Well, the problem being solved has always been "The postgres adapter cannot be used with payloads over 8000 bytes".

Compression was an easy way to make that raise that ceiling a lot, to make the problem much less likely. But to DHH's point, this PR doesn't solve the fact that the Postgres adapter cannot be used for all payloads. From what I'm seeing, gzipping can give you up to 1000x smaller files. So this PR means you might be able to send a ~7.5MB payload.

Is that good enough for normal use cases? That's substantially larger than ~8KB.

natematykiewicz · 2024-04-03T19:28:40Z

Tangentially, how does the Redis adapter handle a 7.5MB payload?

This amount of data seems like this would be too large to send in 1 message anyways. I imagine that much HTML would cause the browser to throw memory warnings. And I'm not sure how it'd do parsing that much JSON.

dbackeus · 2024-04-04T09:00:19Z

@natematykiewicz wrote:

From what I'm seeing, gzipping can give you up to 1000x smaller files.

This claim isn't helpful. Palkan reported ~3x reduction for typical hotwire payloads in my PR. This is in line with general wisdom that JSON/HTML payloads tend to compress between 2-4x.

Another option is (pseudocode) ...

My implementation treats < 8000 byte payloads normally and puts them in a table above 8000:
https://github.com/reclaim-the-stack/actioncable-enhanced-postgresql-adapter/blob/master/lib/action_cable/subscription_adapter/enhanced_postgresql.rb#L43-L56

The compression thing could be added in there as you say. But unless anyone can point to a real world measurement of compression improving the performance of their ActionCable deployment, it's unclear if the added complexity is warranted or not.

Tangentially, how does the Redis adapter handle a 7.5MB payload?

Not that the 7.5MB payload size makes sense, but just for the record: The way Redis pub/sub is limited is "bytes per client per minute". If I understand correctly it's 8-32MB per minute per client. So it's a very different model to the "per message" limit of Postgres.

natematykiewicz · 2024-04-04T15:53:23Z

This claim isn't helpful. Palkan reported ~3x reduction for typical hotwire payloads in my PR. This is in line with general wisdom that JSON/HTML payloads tend to compress between 2-4x.

Sorry to be misleading. I was reading that a level 9 Gzip has a 1032:1 compression ratio. But I suppose that's like a perfect world scenario and not something you see realistically. I also don't know if people tend to do a level 9 Gzip at runtime. I know the default is level 6.

So basically you're saying with compression we should expect to increase the payload limit from 8000 bytes to somewhere between 16,000 and 32,000 bytes. That's better, but not amazing...

Do you know what level the 2-4x compression was? Was it 1, 6, or 9? If it wasn't 9, how's 9 look? Or would 9 be too slow to want to use for this?

But unless anyone can point to a real world measurement of compression improving the performance of their ActionCable deployment, it's unclear if the added complexity is warranted or not.

Agreed 100%. That's why I said "Someone would want to benchmark broadcasting a 8000 byte payload vs compressing and broadcasting that payload. My assumption is that the time it takes to compress would be more than the network latency. But I don't actually know. Maybe you'd just always want to compress."

dbackeus · 2024-04-04T20:50:06Z

Do you know what level the 2-4x compression was? Was it 1, 6, or 9? If it wasn't 9, how's 9 look? Or would 9 be too slow to want to use for this?

In the examples I've seen, the difference between 6 and 9 in compression ratio is less than 1%. So it's negligible.

rails-bot bot added actioncable docs labels Nov 3, 2023

brunoprietog mentioned this pull request Nov 3, 2023

ActionCable PostgreSQL adapter: support payloads larger than 8000 bytes #49634

Open

4 tasks

ghiculescu reviewed Nov 4, 2023

View reviewed changes

actioncable/lib/action_cable/subscription_adapter/postgresql.rb Outdated Show resolved Hide resolved

dbackeus reviewed Nov 4, 2023

View reviewed changes

guides/source/action_cable_overview.md Outdated Show resolved Hide resolved

brunoprietog force-pushed the actioncable-compress-payloads branch from 0644e27 to ebac408 Compare November 7, 2023 22:13

brunoprietog force-pushed the actioncable-compress-payloads branch from ebac408 to 3067d29 Compare November 7, 2023 22:42

palkan reviewed Nov 27, 2023

View reviewed changes

brunoprietog force-pushed the actioncable-compress-payloads branch from 3067d29 to 9438e90 Compare December 9, 2023 02:17

brunoprietog changed the title ~~Add option to compress payloads in the Action Cable PostgreSQL adapter~~ Compress large payloads in the Action Cable PostgreSQL adapter Dec 9, 2023

palkan reviewed Dec 12, 2023

View reviewed changes

actioncable/lib/action_cable/subscription_adapter/postgresql.rb Outdated Show resolved Hide resolved

actioncable/lib/action_cable/subscription_adapter/postgresql.rb Outdated Show resolved Hide resolved

brunoprietog force-pushed the actioncable-compress-payloads branch from 9438e90 to 62cec8d Compare December 12, 2023 12:43

palkan reviewed Dec 12, 2023

View reviewed changes

actioncable/lib/action_cable/subscription_adapter/postgresql.rb Outdated Show resolved Hide resolved

brunoprietog force-pushed the actioncable-compress-payloads branch 2 times, most recently from 83d97e1 to 275d408 Compare December 13, 2023 00:43

brunoprietog mentioned this pull request Jan 2, 2024

Add DB-backed Action Cable adapter as new default in Rails 8 #50480

Open

skipkayhil reviewed Jan 2, 2024

View reviewed changes

actioncable/lib/action_cable/subscription_adapter/inline.rb Outdated Show resolved Hide resolved

brunoprietog force-pushed the actioncable-compress-payloads branch from 1fa345e to 569afc3 Compare January 2, 2024 16:21

dhh reviewed Jan 2, 2024

View reviewed changes

actioncable/CHANGELOG.md Outdated Show resolved Hide resolved

dhh reviewed Jan 2, 2024

View reviewed changes

actioncable/lib/action_cable/subscription_adapter/inline.rb Outdated Show resolved Hide resolved

dhh reviewed Jan 2, 2024

View reviewed changes

actioncable/lib/action_cable/subscription_adapter/inline.rb Outdated Show resolved Hide resolved

dhh reviewed Jan 2, 2024

View reviewed changes

actioncable/lib/action_cable/subscription_adapter/postgresql.rb Outdated Show resolved Hide resolved

dhh reviewed Jan 2, 2024

View reviewed changes

actioncable/lib/action_cable/subscription_adapter/redis.rb Outdated Show resolved Hide resolved

zachgoll mentioned this pull request Feb 23, 2024

Refresh account on update maybe-finance/maybe#476

Merged

brunoprietog force-pushed the actioncable-compress-payloads branch from 3232ed3 to ee8ff5e Compare March 24, 2024 13:52

brunoprietog changed the title ~~Compress large payloads in the Action Cable PostgreSQL adapter~~ Compress large payloads in all Action Cable subscription adapters with ActiveSupport::Gzip Mar 24, 2024

brunoprietog force-pushed the actioncable-compress-payloads branch 2 times, most recently from 195e0a0 to 0033200 Compare March 24, 2024 14:08

Compress large payloads in all Action Cable subscription adapters wit…

2a08fc8

…h ActiveSupport::Gzip

brunoprietog force-pushed the actioncable-compress-payloads branch from 0033200 to 2a08fc8 Compare March 24, 2024 14:15

brunoprietog requested review from palkan and dhh March 24, 2024 14:48

		@@ -1,2 +1,13 @@
		* Compress large payloads in all subscription adapters

		This allows to broadcast payloads larger than 8000 bytes by compressing them with ActiveSupport::Gzip with PostgreSQL

Compress large payloads in all Action Cable subscription adapters with ActiveSupport::Gzip #49903

Are you sure you want to change the base?

Compress large payloads in all Action Cable subscription adapters with ActiveSupport::Gzip #49903

Conversation

brunoprietog commented Nov 3, 2023 • edited

Motivation / Background

Detail

Additional information

Checklist

natematykiewicz commented Nov 3, 2023

brunoprietog commented Nov 7, 2023

palkan left a comment

Choose a reason for hiding this comment

brunoprietog commented Dec 9, 2023

palkan left a comment

Choose a reason for hiding this comment

palkan commented Dec 13, 2023

dhh commented Jan 2, 2024

brunoprietog commented Jan 2, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brunoprietog commented Mar 24, 2024

dhh commented Apr 3, 2024

natematykiewicz commented Apr 3, 2024

natematykiewicz commented Apr 3, 2024

dbackeus commented Apr 3, 2024 • edited

brunoprietog commented Apr 3, 2024

natematykiewicz commented Apr 3, 2024 • edited

brunoprietog commented Apr 3, 2024

natematykiewicz commented Apr 3, 2024

natematykiewicz commented Apr 3, 2024

dbackeus commented Apr 4, 2024 • edited

natematykiewicz commented Apr 4, 2024 • edited

dbackeus commented Apr 4, 2024 • edited

brunoprietog commented Nov 3, 2023 •

edited

dbackeus commented Apr 3, 2024 •

edited

natematykiewicz commented Apr 3, 2024 •

edited

dbackeus commented Apr 4, 2024 •

edited

natematykiewicz commented Apr 4, 2024 •

edited

dbackeus commented Apr 4, 2024 •

edited