Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compress large payloads in all Action Cable subscription adapters with ActiveSupport::Gzip #49903

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

brunoprietog
Copy link
Contributor

@brunoprietog brunoprietog commented Nov 3, 2023

Motivation / Background

The PostgreSQL notify command has a limit of 8000 bytes, which means that the payload that can be broadcasted cannot be very large.

This is motivated by the simplicity of the solution and by leaving the Redis dependency, along the same lines of Solid Cache and Solid Queue.

Detail

Like ActiveSupport::Cache, Payloads that are larger than 1 kilobytes are now compressed with ActiveSupport::Gzip.

production:
  adapter: postgresql
  channel_prefix: appname_production

Additional information

In the tests you can see a rather large string that can now be broadcasted with the PostgreSQL adapter.

Additionally, I've added the ActionCable::SubscriptionAdapter::Compressor class so other adapters can make use of it in the future.

There is another approach and another discussion in #49634.

Thanks to @palkan for the idea in this comment.

Checklist

Before submitting the PR make sure the following are checked:

  • This Pull Request is related to one change. Changes that are unrelated should be opened in separate PRs.
  • Commit message has a detailed description of what changed and why. If this PR fixes a related issue include it in the commit message. Ex: [Fix #issue-number]
  • Tests are added or updated if you fix a bug or add a feature.
  • CHANGELOG files are updated for the changed libraries if there is a behavior change or additional feature. Minor bug fixes and documentation changes should not be included.

@natematykiewicz
Copy link
Contributor

I think if compression is going to be added, it should be added to ActionCable on the whole, similar to how every single ActiveSupport::Cache store supports compression. Additionally, ActiveSupport::Cache has a threshold for when to use compression or not. It defaults to compressing anything over 1KB. With compression, there are diminishing returns when compressing content under a certain size, which is why the threshold is helpful.

I think ActionCable could use to take some pointers from ActiveSupport::Cache on this.

@brunoprietog
Copy link
Contributor Author

I think if compression is going to be added, it should be added to ActionCable on the whole, similar to how every single ActiveSupport::Cache store supports compression.

I really like this idea, is this something the Rails Core team would accept? I can work on this, it would be great to read thoughts from @matthewd and @palkan.

It could be something to enable or disable compression, set a compressor which by default would be Zlib as well as ActiveSupport::Cache, and set a threshold.

Additionally, ActiveSupport::Cache has a threshold for when to use compression or not. It defaults to compressing anything over 1KB. With compression, there are diminishing returns when compressing content under a certain size, which is why the threshold is helpful.

Good point. In the meantime, I've set a 1 kilobyte threshold by default for now. Do you think we should make that configurable? Since there is a possibility that we will move these configurations out of the adapter, I'd like to read more opinions before proceeding.

Copy link
Contributor

@palkan palkan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work 👍

Left some comments on how we can refactor it a bit and reuse existing Rails utilities.

I think if compression is going to be added, it should be added to ActionCable on the whole, similar to how every single ActiveSupport::Cache store supports compression?

I'm not sure we need this to be a global option. At least, at this point. I'd prefer to keep this an internal detail rather than something a user can interfere with—less configuration options, less mental overhead.

However, I think, it's worth to extract this functionality into a utility class, so possible third-party (or future built-in) adapters would be able to benefit from it.

So, in the end, the usage would be like:

class CustomAdapter < ActionCable::SubscriptionAdapter::Base
  def initialize(*)
    super
    @compressor = ActionCable::SubscriptionAdapter::Compressor.new(threshold: 8.kilobytes, compressor: ActiveSupport::Gzip)
  
   # we also would need to extend the subscriber map interface to support decompression
    subscriber_map.compress_with(@compressor)
  end

  def broadcast(channel, payload)
    payload = @compressor.compress(payload)
    # ...
  end
end

guides/source/action_cable_overview.md Outdated Show resolved Hide resolved
@brunoprietog brunoprietog changed the title Add option to compress payloads in the Action Cable PostgreSQL adapter Compress large payloads in the Action Cable PostgreSQL adapter Dec 9, 2023
@brunoprietog
Copy link
Contributor Author

@palkan thanks for your review! I've addressed all your comments, what do you think now?

Copy link
Contributor

@palkan palkan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Great work.

I've just noticed that we changed the way we escape payloads, see the comment.

@brunoprietog brunoprietog force-pushed the actioncable-compress-payloads branch 2 times, most recently from 83d97e1 to 275d408 Compare December 13, 2023 00:43
@palkan
Copy link
Contributor

palkan commented Dec 13, 2023

Thanks @brunoprietog! Looks like I'm out of questions/suggestions 😁

@rafaelfranca @matthewd Looking for someone with merge permissions for the final review 🙂

@dhh
Copy link
Member

dhh commented Jan 2, 2024

I like the idea of default compression a lot, like with caching. Let's apply it to all adapters, not just pgsql.

@brunoprietog
Copy link
Contributor Author

Great! Which threshold do you find suitable for the other adapters? Or do we go with the compressor default? (1 kb)

@@ -1,2 +1,13 @@
* Compress large payloads in all subscription adapters

This allows to broadcast payloads larger than 8000 bytes by compressing them with ActiveSupport::Gzip with PostgreSQL
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think this needs a rewording.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, maybe give some indication of how big the payload can be now. Before we blew at 8k. What should you expect to be the limit with compression now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe double? I guess it depends a lot on the string. Also the limit really only gives problems in PostgreSQL.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me it's about more than the limit. It's simply more efficient! We use compression for everything else that goes over the wire. So would make a combined point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. What do you think now?

actioncable/CHANGELOG.md Outdated Show resolved Hide resolved
end

def compressed?(data)
data.start_with?("1")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are 0/1 sufficiently distinctive markers that they'll never trigger on real data?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to me, yes, because the prefix is always added, regardless of whether it was compressed or not. If the payload is "0", the payload to be broadcasted will be "00". Then the prefix is checked and removed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more if the real payload had a 1, then decompression would fail.

And actually, on second thought, that could definitely be a problem. If you have a channel that sends ids, and we do, this is going to blow. Think we need a different marker.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand this point. If you send an id with value "1", the actual payload to be broadcasted is "01". Then the prefix is removed. The same applies for compressed payloads. Just compress the payload without the prefix and then check in the decompress method if it has to decompress or not.

Anyway, maybe we could also use the same prefix of ActiveSupport::Cache, \x78, if you think it's better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more if the real payload had a 1, then decompression would fail.

That only could happen during the upgrade roll out (when some instances still publish broadcasts without the compression flag). To prevent this (and potential schema) changes, it's worth using a custom prefix (kinda versioning mechanism).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@palkan Right, do you have any suggestions to handle this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the following options here:

  1. More sophisticated payload schema with the version bit: | version (2) | compressed (1) | data |. That would allow us not only handle legacy payloads but also allow us introduce new extensions in the future in a non-breaking way. What is the version then? Probably, something like the already mentioned \x78 from the cache implementation (which is just "x", right? so, it isn't really different form "1")

  2. Implement a fallback: if we observe "1" and fail to decompress, fall back to the original value. However, that wouldn't work with payloads starting with "0". That leads us to the third option...

  3. Just ignore. Action Cable pub/sub is ephemeral; if some messages would get lost (or failed to decompress), that wouldn't be the end of the world, since the communication channel is unreliable; so, we do not break any guarantees.

I would go with 3 for now; maybe, adding one more prefix to indicate the schema version, e.g., "v1/<compressed>/<payload>".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in that case we should have an Action Cable setting that allows to configure the schema?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should have an Action Cable setting that allows to configure the schema?

I don't think we need it; I meant internal version number (to handle framework-level changes), not smth like namespace

@brunoprietog brunoprietog changed the title Compress large payloads in the Action Cable PostgreSQL adapter Compress large payloads in all Action Cable subscription adapters with ActiveSupport::Gzip Mar 24, 2024
@brunoprietog brunoprietog force-pushed the actioncable-compress-payloads branch 2 times, most recently from 195e0a0 to 0033200 Compare March 24, 2024 14:08
@brunoprietog
Copy link
Contributor Author

I've applied all the suggestions. Now the distinction between compressed and uncompressed is made with a prefix that includes the schema version. If the prefix doesn't comply with the format the original payload is returned.

What do you think? @dhh @palkan

@brunoprietog brunoprietog requested review from palkan and dhh March 24, 2024 14:48
@dhh
Copy link
Member

dhh commented Apr 3, 2024

Looking at this with some time away, I'm actually not really sure what we're gaining here. This doesn't solve the issue that pgsql has an upper boundary. It just pushes it out a bit further. Which means you're less likely to hit it, but when you do, it'll be even more of a surprise.

I very much appreciate the work that's gone into this, but on further reflection, I don't actually think it's a good path. It doesn't definitively solve the underlying issue, but perhaps makes it even more surprising when it hits (kinda like int vs bigint of old) at higher scale/payload. And the compression overhead is very likely not going to improve performance on adapters that aren't constrained by payload size, since the compress/decompress step is almost certainly more performance intensive than just sending a larger raw payload.

But that last point COULD be disproven by some solid performance validation, and if it was substantially better, I'd consider this change on those grounds alone. But the value of that performance improvement has to be big enough to offset the increased complexity of the setup.

I don't think this is a good solution to the pg limit, though.

@natematykiewicz
Copy link
Contributor

Postgres’s NOTIFY docs say:

The “payload” string to be communicated along with the notification. This must be specified as a simple string literal. In the default configuration it must be shorter than 8000 bytes. (If binary data or large amounts of information need to be communicated, it's best to put it in a database table and send the key of the record.

So what if you kept some of the work done here and basically have the Postgres adapter have an 8000 byte threshold. Anything under that simply works as it always has. But anything over that writes the message to a new table that the Postgres adapter has, and calls NOTIFY with a GlobalID of this record. The part you’re keeping of this PR would be the ability to have different types of messages. Instead of compressed and decompressed you have raw and global id as options.

Writing to a table would obviously decrease throughput, which is why I think the table is solely for Postgres adapter messages over the 8000 byte limit. Then there’s not a performance loss for the adapter as is, and the only time you’d have a performance hit is when it would have previously errored (a great trade off).

One thing to consider is when these records would be deleted. I’m not familiar enough with ActiveStorage. I don’t think the web servers can basically “pop” the message, since the broadcast would go out to all of them and you wouldn’t want Server A to delete the row before Server B can read it. So I think you’d need some sort of cleanup job. That bulk deletes old messages after they’re obviously done.

Which then leaves me to wondering if this message column might need encryption.

@natematykiewicz
Copy link
Contributor

Oh and the table can be optional. On app boot the Postgres adapter can check for existence of the table. If it doesn’t exist, it does what it always has (errors on large payloads).

@dbackeus
Copy link

dbackeus commented Apr 3, 2024

@dhh wrote:

This doesn't solve the issue that pgsql has an upper boundary. It just pushes it out a bit further.

💯. This is potentially an interesting optimisation for reducing bytes over the network. It is not a solution for the large payload limit.

@natematykiewicz wrote:

But anything over that writes the message to a new table that the Postgres adapter has, and calls NOTIFY with a GlobalID of this record.

The actioncable-enhanced-postgresql-adapter gem has already implemented the separate table approach. Also #49634 is still open, which is a less refined version implementation of what ended up in the gem. I should probably update the PR to match the gem. I've ended up opinionated that it's the correct approach (fully self managing table management, no ActiveRecord dependency).

@brunoprietog
Copy link
Contributor Author

Compressing the payload was a suggestion from @palkan that I found interesting because it is relatively simple, it doesn't add as much overhead as having to query a database table, and ultimately, if you broadcast payloads that are too large it shouldn't be recommended anyway. But it definitely doesn't solve the underlying problem, that's a fact.

Anyway, from #50480 I can understand that the adapter you made for Campfire will quickly solve this problem.

@natematykiewicz
Copy link
Contributor

natematykiewicz commented Apr 3, 2024

Another option is (pseudocode):

if payload.size <= 8000 bytes
  broadcast payload
else
  compressed = compress(payload)
  
  if compressed.size <= 8000 bytes
    broadcast compressed
  else
    stored = ActionCablePayload.create!(payload: compressed)
    broadcast stored
  end
end

And you can use the payload prefixes in this PR to help manage how the other end should decode it.

I believe this would give you the best throughput, falling back to a database table only when absolutely necessary.

Someone would want to benchmark broadcasting a 8000 byte payload vs compressing and broadcasting that payload. My assumption is that the time it takes to compress would be more than the network latency. But I don't actually know. Maybe you'd just always want to compress.

@brunoprietog
Copy link
Contributor Author

At the beginning this was only for PostgreSQL and compression was only done when it was higher than 8000 bytes. Then it mutated to compress payloads in all adapters. I think that idea sounds like the most efficient in terms of avoiding table queries as much as possible.

@natematykiewicz
Copy link
Contributor

Well, the problem being solved has always been "The postgres adapter cannot be used with payloads over 8000 bytes".

Compression was an easy way to make that raise that ceiling a lot, to make the problem much less likely. But to DHH's point, this PR doesn't solve the fact that the Postgres adapter cannot be used for all payloads. From what I'm seeing, gzipping can give you up to 1000x smaller files. So this PR means you might be able to send a ~7.5MB payload.

Is that good enough for normal use cases? That's substantially larger than ~8KB.

@natematykiewicz
Copy link
Contributor

Tangentially, how does the Redis adapter handle a 7.5MB payload?

This amount of data seems like this would be too large to send in 1 message anyways. I imagine that much HTML would cause the browser to throw memory warnings. And I'm not sure how it'd do parsing that much JSON.

@dbackeus
Copy link

dbackeus commented Apr 4, 2024

@natematykiewicz wrote:

From what I'm seeing, gzipping can give you up to 1000x smaller files.

This claim isn't helpful. Palkan reported ~3x reduction for typical hotwire payloads in my PR. This is in line with general wisdom that JSON/HTML payloads tend to compress between 2-4x.

Another option is (pseudocode) ...

My implementation treats < 8000 byte payloads normally and puts them in a table above 8000:
https://github.com/reclaim-the-stack/actioncable-enhanced-postgresql-adapter/blob/master/lib/action_cable/subscription_adapter/enhanced_postgresql.rb#L43-L56

The compression thing could be added in there as you say. But unless anyone can point to a real world measurement of compression improving the performance of their ActionCable deployment, it's unclear if the added complexity is warranted or not.

Tangentially, how does the Redis adapter handle a 7.5MB payload?

Not that the 7.5MB payload size makes sense, but just for the record: The way Redis pub/sub is limited is "bytes per client per minute". If I understand correctly it's 8-32MB per minute per client. So it's a very different model to the "per message" limit of Postgres.

@natematykiewicz
Copy link
Contributor

natematykiewicz commented Apr 4, 2024

This claim isn't helpful. Palkan reported ~3x reduction for typical hotwire payloads in my PR. This is in line with general wisdom that JSON/HTML payloads tend to compress between 2-4x.

Sorry to be misleading. I was reading that a level 9 Gzip has a 1032:1 compression ratio. But I suppose that's like a perfect world scenario and not something you see realistically. I also don't know if people tend to do a level 9 Gzip at runtime. I know the default is level 6.

So basically you're saying with compression we should expect to increase the payload limit from 8000 bytes to somewhere between 16,000 and 32,000 bytes. That's better, but not amazing...

Do you know what level the 2-4x compression was? Was it 1, 6, or 9? If it wasn't 9, how's 9 look? Or would 9 be too slow to want to use for this?

But unless anyone can point to a real world measurement of compression improving the performance of their ActionCable deployment, it's unclear if the added complexity is warranted or not.

Agreed 100%. That's why I said "Someone would want to benchmark broadcasting a 8000 byte payload vs compressing and broadcasting that payload. My assumption is that the time it takes to compress would be more than the network latency. But I don't actually know. Maybe you'd just always want to compress."

@dbackeus
Copy link

dbackeus commented Apr 4, 2024

Do you know what level the 2-4x compression was? Was it 1, 6, or 9? If it wasn't 9, how's 9 look? Or would 9 be too slow to want to use for this?

In the examples I've seen, the difference between 6 and 9 in compression ratio is less than 1%. So it's negligible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants