Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support replacing cache compressor #48451

Merged
merged 1 commit into from Jul 26, 2023

Conversation

jonathanhefner
Copy link
Member

@jonathanhefner jonathanhefner commented Jun 11, 2023

This commit adds support for replacing the compressor used for serialized cache entries. Custom compressors must respond to deflate and inflate. For example:

module MyCompressor
  def self.deflate(string)
    # compression logic...
  end

  def self.inflate(compressed)
    # decompression logic...
  end
end

config.cache_store = :redis_cache_store, { compressor: MyCompressor }

As part of this work, cache stores now also support a :serializer option. Similar to the :coder option, serializers must respond to dump and load. However, serializers are only responsible for serializing a cached value, whereas coders are responsible for serializing the entire ActiveSupport::Cache::Entry instance. Additionally, the output from serializers can be automatically compressed, whereas coders are responsible for their own compression.

Specifying a serializer instead of a coder also enables performance optimizations, including the bare string optimization introduced by cache format version 7.1.


/cc @byroot since you expressed interest in this in #48368 (comment).

@byroot
Copy link
Member

byroot commented Jun 12, 2023

cc @casperisfine

Comment on lines 2 to 3
a `:compressor` option. The specified compressor must respond to `compress`
and `decompress`. For example:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think inflate and deflate are much more common than compress and decrompress (even though I do prefer the later, or better load/dump as it's more composable).

Snappy, Zlib, and Brotli all expose inflate/deflate.

So it would make sense to standardize on that, we can make a PR to zstd to alias these methods.

Comment on lines 298 to 296
unless Cache.format_version < 7.1 && !Cache::SerializerWithFallback::SERIALIZERS.value?(@coder)
Cache::Compressor
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hum, this makes me a little bit nervous. At Shopify we already run Rails edge, so we have a custom coder and Cache.format_version == 7.1.

Additionally, unless a && !b is hard to read. It would be preferable to use if here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed a new commit using a different approach based on what you said in #48451 (comment). When the cache format version is >= 7.1, entry values will be lazily deserialized, so that the entry version and expiration can be checked without deserializing (or decompressing) the value.

I combined that optimization with the bare string optimization from #48122 as well, and both optimizations can now apply to custom coders.

To me, it makes sense to activate these optimizations (for the default coder and for custom coders) when the cache format version is >= 7.1. But it sounds like that might cause an issue for Shopify during the next rolling deploy. (After a full deploy, I think everything should just work, including reading old cache entries from custom coders.)

We could potentially add a temporary configuration setting that would override and disable the optimizations for custom coders. And we could remove that setting before the final Rails 7.1 release.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it sounds like that might cause an issue for Shopify during the next rolling deploy.

We use custom coders that handles its own compression.

That said I'm not sure compression + custom coder makes sense. Because if you replace the default coder, Rails is no longer in control of the header. (I need to look closer at the code to see).

In a way I wonder if there isn't two "levels" of custom coder.

I think most users just want to substitute Marshal for something else (any object to String) and let AS in charge of the header etc.

Whereas with Paquito we serialize AS::Cache::Entry into String, so we are in control of the header.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I'm having a bit of trouble wrapping my head around the code (not necessarily your fault I'm a bit down today, I'll try to have a fresh look Monday).

One concern I have is that until now, you either used the default, or took entire control of the coder. From what I understand of the implementation here, we're a bit in the middle, as if you switch the compressor, you will have rollout problems.

I'm not too sure how we could reconcile both these concerns.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No rush!

Because if you replace the default coder, Rails is no longer in control of the header.

When SerializerAdapter's @adapt_dump is true, the output is prefixed with a SerializerAdapter header that starts with a special signature and includes the entry's version and expiration. The header also includes a "type" byte that indicates whether the payload is compressed (and whether the payload is a bare string).

To avoid double serialization of the entry's version and expiration, the value is wrapped in a new (version-less, expiration-less) Entry instance before passing it to the coder:

79db932#diff-6919f902e961b7c68d25ba2a11b7311930f7d4c3b01fc47adc4c946027fbccb0R32

Thus, SerializerAdapter can implement lazy deserialization / decompression for every coder and compressor. It can also can take responsibility for conditional compression based on the compression threshold, and it can implement the bare string optimization independently of the coder (including compressing bare strings as appropriate).

Both the coder and the compressor can still output their own headers to support fallback mechanisms. Those headers will be treated as part of the payload, and SerializerAdapter will pass them through to the coder's load method and compressor's inflate method.

Regardless of @adapt_dump, SerializerAdapter will check for its signature before loading a dump. If the signature is missing, it will pass the dump directly to the coder's load method.

If a coder has its own compression mechanism, SerializerAdapter-based compression can be disabled by specifying compress: false in the cache options. That is different than before, but it's only a concern for cache format version >= 7.1. When the cache format version is < 7.1, @adapt_dump will be false, so SerializerAdapter-based compression will be bypassed anyway.

TLDR: After this change is merged, if the cache format version is < 7.1, the output will be unchanged for all coders and compressors. If the cache format version is >= 7.1, all new cache entries will be prefixed with a SerializerAdapter header, for all coders and compressors. And regardless of the cache version format, SerializerAdapter#load will check for its signature and pass unrecognized dumps directly to the coder's load method.

After a full deploy, I think the main issue would be if the SerializerAdapter signature clashes with a coder's signature. But I think we can choose a signature that would minimize that risk.

In a way I wonder if there isn't two "levels" of custom coder.

That crossed my mind when I was writing the code to wrap the entry's value in a version-less, expiration-less Entry instance. I'm not sure what the best API would be for specifying which kind of coder you're using.

If we want to add that in the future though, we could use the "type" byte to indicate how an entry / value should be loaded (the same way it is used to differentiate bare strings).

end
end

def deserialize_entry(payload)
payload.nil? ? nil : @coder.load(payload)
@coder.load(try_decompress(payload)) unless payload.nil?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it would be preferable if we only compressed the "body" of the cache entry, and not the "header" part, so that we can immediately discard stale entries without having to decompress them.

It would even allow to instantiate the Entry instance with the compressed payload, and lazily uncompress it when accessed.

@casperisfine
Copy link
Contributor

FYI: @pawurb

Copy link
Member

@tenderlove tenderlove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems basically fine to me. Would you mind investigating this comment? I don't think these objects are actually serialized. If they are, then putting a proc inside them might not be a good idea. If they aren't actually serialized, would you mind removing the comment? (The comment already seems inaccurate)

@@ -1021,6 +1047,7 @@ def initialize(value, compressed: false, version: nil, expires_in: nil, expires_
end

def value
@value = @value.call if @value.instance_of?(Proc)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to do this? It's not obvious to me from the PR (but this PR is kind of long) 😅

(nm, I found it below)

-> { @compressor.inflate(payload).force_encoding(STRING_ENCODINGS[type.abs]) }
else
return nil
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have very strong feelings one way or another, but should we use this case statement as type of factory? It feels like we could eliminate the above conditional in the value method. Additionally this requires us to allocate an extra proc object when we could just select a subclass of Cache::Entry that has a conveniently implemented value method.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree here. Only the 6.1 format is stuck with the current implementation of Entry. We could use another class.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a commit (4291e60) that defines specialized Entry subclasses. But, in my opinion, the code is less clear. Deserialization logic is strongly coupled with serialization logic. Moving deserialization logic out of SerializerAdapter and away from SerializerAdapter#dump obscures this relationship.

If we were not implementing lazy deserialization, I think we would simply write the case statement as it was previously minus the -> { } wrappers. To me, using -> { } to implement lazy evaluation (basically, a thunk) makes sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a thunk is fine, but the concrete classes give us a chance to add some features specific to the strategy without leaking in to the scope of the proc (for example the @processed IV you added in the subclasses). I think we could do the same tricks with this conditional because we should know at allocation time whether or not the entry is compressed (additionally we could add the @processed treatment in the compressed cases)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

without leaking in to the scope of the proc

By this, do you mean holding on to references that were in scope when the proc was created? The Entry instance should be short-lived, just long enough to check Entry#expired? and Entry#mismatched? such as in Cache#read. But, if it is a concern, I can minimize the scope by isolating the case statement in its own method.

I think we could do the same tricks with this conditional because we should know at allocation time whether or not the entry is compressed

That conditional and the @compressed ivar are used by the 6.1 cache format, and must remain until we drop support for the 6.1 format.

The 7.0 format and the code in this PR does not make use of them. If it is more clear, an alternative way to write the method is:

def value
  if @value.instance_of?(Proc)
    @value = @value.call
  elsif compressed?
    uncompress(@value)
  else
    @value
  end
end

(additionally we could add the @processed treatment in the compressed cases)

In the previous commit that I pushed (4291e60), @processed does apply to the compressed cases (here and here).

@casperisfine
Copy link
Contributor

I don't think these objects are actually serialized.

They were until Rails 6.1, and still are if you are still using that format

module Marshal61WithFallback
include SerializerWithFallback
extend self
MARSHAL_SIGNATURE = "\x04\x08".b.freeze
def dump(entry)
Marshal.dump(entry)
end
def dump_compressed(entry, threshold)
Marshal.dump(entry.compressed(threshold))
end
def _load(dumped)
Marshal.load(dumped)
end
def dumped?(dumped)
dumped.start_with?(MARSHAL_SIGNATURE)
end
end

Which brings the question of when can we remove the 6.1 format, it might be urgent to deprecate it in 7.1, so we can remove it in 7.2 and finally be free to change Cache::Entry internal representation.

That's my mistake for not immediately deprecating it.

@casperisfine
Copy link
Contributor

I'm wondering if getting rid of the 6.1 format first wouldn't allow for a much more elegant implementation of this feature.

@jonathanhefner
Copy link
Member Author

I'm wondering if getting rid of the 6.1 format first wouldn't allow for a much more elegant implementation of this feature.

If we didn't have to worry about serializing / deserializing Entry instances, I think we could be a little more efficient, but I'm not sure about much more elegant. Could you elaborate on what you have in mind?

There are two spots where we specifically handle Entry instances:

  1. Wrapping the value in a version-less, expiration-less Entry instance..

  2. Extracting the value from the version-less, expiration-less Entry instance (the .value part).

Even if we get rid of the 6.1 format and coder, the load method of the 7.0 coder would still need to return Entry instances to support existing 7.0 format entries. Therefore, we would either need to add an is_a?(Cache::Entry) to spot 2, or add another case to the case statement (and another type value).

And I believe the same concern applies to existing entries written by custom coders, which have historically always received Entry instances.

With regard to making the code more clear, I've added a few comments in SerializerAdapter.

@jonathanhefner
Copy link
Member Author

the load method of the 7.0 coder would still need to return Entry instances to support existing 7.0 format entries. Therefore, we would either need to add an is_a?(Cache::Entry) to spot 2, or add another case to the case statement (and another type value).

Correction: although the load method of the 7.0 coder still needs to return Entry instances, @adapt_dump should be false for for cache format 7.0, so new 7.0 format entries will not include the SerializerAdapter signature. Thus, the case statement will not be reached for any 7.0 format entries, so we don't need to add an is_a?(Cache::Entry).

However, custom coders may still be an issue. I suppose we could say that custom coders must accept arbitrary values (not just Entry instances) when using cache format 7.1. Based on #48451 (comment) though, would that break Shopify's setup?

With regard to making the code more clear, I've added a few comments in SerializerAdapter.

I've pushed a new commit that refactors SerializerAdapter using several helper methods. To avoid multiple return values, I changed how the entry version is handled. I also went back to the thunk-based approach because it is more compatible with these helper methods, and I feel the code is more clear.

@casperisfine
Copy link
Contributor

Could you elaborate on what you have in mind?

Right sorry. If we had more liberty to change the Entry internal representation (and knew it won't be Marshaled), we could either use a specialized subclass, or give the compressor object as argument, allowing Entry#value to lazily deserialize. As Aaron hinted, Proc objects aren't cheap.

I'm sorry for the delay on reviewing this. I keep trying to review but I have an very hard time wrapping my head around this extra complexity, particularly, the Adapter throw me off somehow.

Maybe this problem is more complex than I envisioned.

I'm also a bit concern about the usability, because we'd allow you to change the compressor, but if you just do that it will blow up in production, unless you manage to make your deployment atomic and flush the cache (or change the prefix or something, but then you have consistency issues with delete).

That's a problem paquito sidestep by making compression part of the serializer which is versioned.

I do think there is value in allowing to change the compressor though, because Zlib isn't really a good option, it just happen to be the only one we can default to because it's the only one in the stdlib.

@jonathanhefner
Copy link
Member Author

jonathanhefner commented Jun 28, 2023

I've pushed a new commit that avoids the Proc allocations, trying from a different angle.

In the previous attempt (4291e60), I had added Cache::LazyEntry and Cache::LazyCompressedStringEntry classes that ended up obfuscating the loading logic, moving it too far from SerializerAdapter#dump.

In this attempt, I have added a single SerializerAdapter::LazyEntry class that holds a reference to the SerializerAdapter instance and delegates to it. (I had to extract Cache::Entry into its own file in order to require and subclass it.)

I also eliminated the case statement in favor of two sequential conditionals. Perhaps that feels more focused?

I also changed !@adapt_dump into @delegate_entire_dump, which might slightly clarify how SerializerAdapter#dump behaves in context.

I keep trying to review but I have an very hard time wrapping my head around this extra complexity, particularly, the Adapter throw me off somehow.

SerializerAdapter sits as a layer between Cache::Store and the serializer + compressor. It orchestrates the serializer and compressor together, handling whether to apply compression and decompression.

It also directly handles serialization of an entry's version and expiration so that they can be deserialized without deserializing the entry's value, per your suggestion (#48451 (comment)).

Lastly, it also assumes responsibility for the bare string serialization optimization from #48122.

All of the above applies only when the cache format is 7.1 and serializer.dump returns a string (not an Entry instance). In other words, it does not apply to the 6.1 coder, nor the 7.0 coder, nor MemoryStore::DupCoder. Nor to any custom coder if the cache format is still < 7.1.

In those cases, SerializerAdapter#dump will fully delegate to serializer.dump, and SerializerAdapter#dump_compressed will fully delegate to serializer.dump_compressed.

The reason SerializerAdapter is still used in those cases is to support upgrading in a future rolling deploy. Specifically, SerializerAdapter#load will be checking for SerializerAdapter::SIGNATURE (but SerializerAdapter#dump doesn't output that signature until the cache format is 7.1).

Incidentally, I am open to names other than SerializerAdapter. Perhaps simply Cache::Coder?

I'm also a bit concern about the usability, because we'd allow you to change the compressor, but if you just do that it will blow up in production, unless you manage to make your deployment atomic and flush the cache (or change the prefix or something, but then you have consistency issues with delete).

That's a problem paquito sidestep by making compression part of the serializer which is versioned.

To be clear, you are talking about a Rails 7.1 app that has already been deployed, and then changes the compressor on a subsequent rolling deploy?

In that case, yes, the compressor would need to handle versioning, but that is no different than changing the coder. I provide an example of how to do that in the API docs.

@jonathanhefner
Copy link
Member Author

jonathanhefner commented Jul 17, 2023

I've extracted the lazy deserialization optimization from this PR into #48754. Lazy deserialization requires some finesse to preserve the behavior of #48663, so I felt it was worth splitting into its own PR.

@jonathanhefner jonathanhefner force-pushed the cache-compressor branch 2 times, most recently from 416dc7d to 7f44087 Compare July 25, 2023 21:15
@jonathanhefner
Copy link
Member Author

@byroot Continuing from #48754 (comment)...

I've updated this PR to rename SerializerAdapter to Cache::Coder, and to disable Cache::Coder when a :coder option is specified.

Additionally, a :serializer option has been added to cache stores. The :serializer option is mutually exclusive with the :coder option. Serializers are responsible only for serializing cache values, whereas coders are responsible for the entire process of serializing an ActiveSupport::Cache::Entry instance, including compression and any optimizations.

If neither the :coder nor :serializer options are specified, Cache::Coder will be used with a serializer based on the cache format version.


module ActiveSupport
module Cache
class Coder # :nodoc:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth making this public to provide a migration path from the :coder option to the :serializer option. But that can be done in a follow-up PR.

Copy link
Member

@byroot byroot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, somehow the name change removed my mind block, this now look very good to me.

A bunch of nitpicks, but overall I think this is almost mergeable.

legacy_serializer = Cache.format_version < 7.1 && !@options[:serializer]
serializer = @options.delete(:serializer) || default_serializer
serializer = Cache::SerializerWithFallback[serializer] if serializer.is_a?(Symbol)
compressor = @options.delete(:compressor) || Zlib
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
compressor = @options.delete(:compressor) || Zlib
compressor = @options.delete(:compressor) { Zlib }

I think we want to allow compressor: nil as an option.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made this change, but is there a benefit to specifying compressor: nil instead of compress: false? Are you thinking it may be easier for downstream code to specify compressor: nil?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think compress: false becomes redundant with compressor: nil, but not worth deprecating I think.

# signifies a compressed payload with `type.abs` as its actual type.)
if compressed = try_compress(payload, threshold)
payload = compressed
type = -type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So using the sign bit to mark compressed it clever, but I think I'd prefer a more explicit bitmap:

COMPRESSED_FLAG = 0x80

type = type | COMPRESSED_FLAG

This mean the type template would become C (unsigned byte) not c (signed byte)

version = dump_version(entry.version)
version_length = version&.bytesize || -1

packed = [SIGNATURE, type, expires_at, version_length].pack(PACKED_TEMPLATE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
packed = [SIGNATURE, type, expires_at, version_length].pack(PACKED_TEMPLATE)
packed = SIGNATURE.b
packed << [type, expires_at, version_length].pack(PACKED_TEMPLATE)

A bit of a nitpick, but I think it would be a bit clearer if the signature wasn't part of the template on dump.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made this change, but I want to point out that it creates disparity between PACKED_TEMPLATE and the other PACKED_*_TEMPLATE strings, since the latter must still account for the signature.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I don't think it's a big deal. As long as it's covered by test. All these templates are quite cryptic in the first place 😄

end

def binary_compatible_string(string)
(string.encoding == Encoding::BINARY || string.ascii_only?) ? string : string.b
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this saves much. Any reason why you don't just call String#b instead of this helper?

Or perhaps s.encoding == Encoding::BINARY ? s : s.b if we want to save an allocation, but binary strings would be rare in the first place. Also b creates a shared string, so it's only allocating an object slot, not copying the bytes.

#ascii_only? triggers a coderange scan, so it may be detrimental if the string is long.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mostly agree. I added this due to #48751 (comment). The benchmark in that thread favors just calling String#b, but I started to wonder if there might be some compounding effect of GC pressure in a high load environment that the benchmark doesn't capture.

Anyway, that's just idle speculation, so I've made this change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compounding effect of GC pressure in a high load environment

I think we're talking of something like 2 strings per call to .b (if the original string is mutable, if not, only one).

I don't think that's a big problem for the GC. But yeah, I wish Ruby had a better API to do buffers like this, I might open a ticket upstream.

Comment on lines 60 to 67
payload = decompress(payload) if type < 0

if string_encoding = STRING_ENCODINGS[type.abs]
value = payload.force_encoding(string_encoding)
else
value = deserialize(payload)
end

Cache::Entry.new(value, version: version, expires_at: expires_at)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for this PR, but I think this is where we could either extend Cache::Entry or use a subclass to pass the raw serialized payload, the serializer and compressor, and let it be lazy.

The reasoning being that either the version or expires_at could cause the entry to be droped. Not sure if it happens that often though, I suppose it depends how you use it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 102 to 105
def serialize(value)
@serializer.dump(value)
end

def deserialize(serialized)
@serializer.load(serialized)
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So more a matter of style, but I'm not sure these small delegator methods really helps on the readability. (same for decompress)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking is: the decompress method is symmetric with try_compress, and the serialize and deserialize methods are symmetric with try_compress and decompress. Also, the intent of e.g. serialize(entry.value) contrasts with @serializer.dump(entry) if @legacy_serializer.

But if you feel like they hurt readability, I will remove them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think it adds a bit more indirection than necessary, this class is already quite complex. So I'd remove them if you don't mind.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

version_length = dumped.unpack1(PACKED_VERSION_LENGTH_TEMPLATE)

expires_at = nil if expires_at < 0
version = load_version(dumped.byteslice(PACKED_VERSION_INDEX, version_length))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
version = load_version(dumped.byteslice(PACKED_VERSION_INDEX, version_length))
version = load_version(dumped.byteslice(PACKED_VERSION_INDEX, version_length)) if version_length >= 0

I think I'd prefer if we explicitly handled negative length here. Lots of cache payloads don't have a version at all so might as well skip load_version and it also allow to remove the nil check in load_version.

Comment on lines 41 to 40
version = dump_version(entry.version)
version_length = version&.bytesize || -1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
version = dump_version(entry.version)
version_length = version&.bytesize || -1
if entry.version
version = dump_version(entry.version)
version_length = version.bytesize
else
version = nil
version_length = -1
end

I think a little bit more explicitness would be preferable here, this also allow to remove the nil check in dump_version.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this method is a bit long, how do you feel about a middle ground?

version = dump_version(entry.version) if entry.version
version_length = version&.bytesize || -1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok 👍

This commit adds support for replacing the compressor used for
serialized cache entries.  Custom compressors must respond to `deflate`
and `inflate`.  For example:

  ```ruby
  module MyCompressor
    def self.deflate(string)
      # compression logic...
    end

    def self.inflate(compressed)
      # decompression logic...
    end
  end

  config.cache_store = :redis_cache_store, { compressor: MyCompressor }
  ```

As part of this work, cache stores now also support a `:serializer`
option.  Similar to the `:coder` option, serializers must respond to
`dump` and `load`. However, serializers are only responsible for
serializing a cached value, whereas coders are responsible for
serializing the entire `ActiveSupport::Cache::Entry` instance.
Additionally, the output from serializers can be automatically
compressed, whereas coders are responsible for their own compression.

Specifying a serializer instead of a coder also enables performance
optimizations, including the bare string optimization introduced by cache
format version 7.1.
Copy link
Member

@byroot byroot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me now. Apologies for the overly slow review ❤️

@jonathanhefner jonathanhefner merged commit 1334919 into rails:main Jul 26, 2023
9 checks passed
@jonathanhefner
Copy link
Member Author

Thank you, @byroot (and @tenderlove)! 🙇

Cache::Coder.new(serializer, compressor, legacy_serializer: legacy_serializer)
end

@coder ||= Cache::SerializerWithFallback[:passthrough]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jonathanhefner Hi there! 👋🏻
Sorry to ping you on an already merged PR, but hopefully you'll see this! 🤞🏻

This change is breaking some stuff here at Shopify. We have some places that initialize a memory store this way:

ActiveSupport::Cache::MemoryStore.new(coder: :passthrough)

I understand I can fix our code by changing it to this:

ActiveSupport::Cache::MemoryStore.new(coder: ActiveSupport::Cache::SerializerWithFallback[:passthrough])

However, do you think this code could stay backwards compatible and allow setting a coder as a symbol?
One could for example add this line:

@coder = Cache::SerializerWithFallback[@coder] if @coder.is_a?(Symbol)

(Note this line existed as-is at LOC 272 before this PR.)

What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidstosik we should fix our code. Passing a coder as a symbol make no sense.

I can help you if needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kinda misread your message. I see what passing a symbol was supported for a bit, I need to check if it was supported in 7.0. If it wasn't then we should fix our code. If it was we should fix Rails.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so this wasn't supported in 7.0, this was added in 7.1 alpha and removed before release, so let's fix the Shopify side.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference, the way to achieve this behavior in 7.0 was to specify coder: nil, and that still works in 7.1. However, I don't think that is documented anywhere. Is that something we officially support? If so, I can add documentation and a regression test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that something we officially support?

I think it would make sense. It's a fairly niche use case but can be useful with MemoryStore as a perf optimization if you know your values aren't mutable, or with MemcachedStore if you wish to use a serializer at the Dalli level.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@casperisfine Thanks for digging that up! I already have a fix for our code so I think ignoring my message above is okay. 👍🏻

jonathanhefner added a commit to jonathanhefner/rails that referenced this pull request Sep 10, 2023
This covers rails#47770, rails#47964, rails#48103, and rails#48104, plus the follow-up
changes to `ActiveSupport::Cache::Store` options from rails#48449 and rails#48451.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants