Loading all actor identifiers into memory isn't scalable #837

rstawarz · 2024-02-07T21:30:19Z

Is there a way to configure/modify Flipper so that all gates are not loaded into memory?

We have several services that were successfully using Flipper. We began to migrate our monolith away from our custom feature flag implementation / rollout combo. However we haven't found a way to configure the adapters to NOT load every gate into memory. It seems all paths lead through loading all gate_values. In our usage, we have multiple flags that are enabled per actor for 100k+ users. I don't see how this tool can scale in this regard. I'm hoping I'm just missing something and someone could help shed some light on what I'm missing.

Thanks!

codenamev · 2024-02-07T22:05:20Z

Hey @jnunemaker 👋 Ryan's been trying to make this work for us, and I think the issue may stem from the default handling of the DSL's feature fetcher? or... are we holding it wrong? 😂

I'd be happy to submit a PR if you point me in the right direction 😄

jnunemaker · 2024-02-09T19:30:40Z

@rstawarz @codenamev 👋. You are correct in how flipper loads actors.

The goal of flipper is to store a smallish data structure of rules that then get applied in your app. We encourage fewer than 100 actors for any given feature and to use groups (and soon expressions) for when the number must be higher than that.

We have several customers/users in the 1-2k enabled actors range, but, in general, I'd think of that as an anti-pattern. The primary reason is performance. I'll try to explain in detail below but I am also happy to hop on a zoom or tuple and talk more about it.

Details

If you keep a large number of enabled actors per feature, then, like you said, you don't want to pull them all across the wire to see if one is enabled. The reasonable change would be to make it so flipper can check if a single actor is enabled at the adapter level.

But if we do that, preloading gets a lot more difficult, as instead of preloading the rules you would kind of need to preload the rules that are small or could possibly apply to N actors (user, org, project, etc.). Doing this means you trade large set of individual actors for more network calls or more complicated preloading.

The goal of flipper is to reduce network calls and be as was as possible by default. The way we achieve that is by keeping the rules relatively small. Groups and expressions (in alpha) make it possible to keep the rules small and avoid network calls. They are kind of the best of both worlds.

For example, when we allow for opt-in or large number of people individually enabled for a feature, we typically store that elsewhere and then use a group to determine enabled/disabled.

On one app, we have a settings jsonb column on user that we use for various things. If we want to allow opt in to a new feature like a redesign, we add the bits to let people click a toggle in the UI and have that flip the optin_redesign setting to true.

Then we use a group like this:

Flipper.register(:optin_redesign) do |actor|
  # respond_to? is so you can pass in different types of actors like org, project, user, etc.
  actor.respond_to?(: optin_redesign?) && optin_redesign?
end

With that group registered, you can enable or disable it for the feature to turn it on or off for any number of users (100k+) that have opted in.

Flipper.enable_group(:redesign, :optin_redesign)
Flipper.enabled?(:redesign, user) # returns true if user.optin_redesign? returns true

This keeps the data structure of rules small, preloading and caching simple and allows for large sets of individuals to be enabled.

Does something like that seem feasible for what you are trying to accomplish?

rstawarz · 2024-02-09T21:52:02Z

@jnunemaker Thank you for verifying! I understand what you're saying about groups, and have proposed this internally as a way to use Flipper as our interface to our existing feature flag model, not sure how much I love that but it works.

Given the potential for loading too many actors/gates into memory is a potential issue - do you see a good way to protect agains that? You opened this #79 a couple years ago (😆 ) - any plans on implementing that?

jnunemaker · 2024-02-09T21:55:51Z

Yeah, we have a limit in cloud. I haven't enforced in cloudless (oss).

I do think there are things we could explore on the individual actor stuff but in the end they always end up in memory at some point.

I spent several days going through the Ruby SDKs of other feature flag offerings and all end up in memory. Some have specific and unique sync mechanisms for large sets but they all end up in memory eventually.

The only options are network calls or memory and memory is more performant.

If there is anything we can do to help please let me know. I'd be super happy to pair on some of this and make recommendations or whatever helps you move forward.

rstawarz · 2024-02-12T20:27:32Z

@jnunemaker couple thoughts, happy to jump on a call to discuss if it'd be helpful:

Would you be open to a patch to the ActiveRecord adapter to allow a configurable limit of gates? (if not pushing down the cloud limit to cloudless?). As it stands now, the open source tool has the potential for misuse and to bring down the system if used without regard to the internal design decisions.
Along those same lines, thoughts on a patch to make using a group for a large number of actors a first class citizen of the ActiveRecord adapter (maybe other adapters too?) Something along the lines of a second table like flipper_gate_members. Effectively we're doing this internally, but its confusing if it's not a first class citizen of the tool and displayed in the UX (thinking something along the lines of being able to show the # of members of the group)

jnunemaker · 2024-02-13T13:34:07Z

Yep.
I'm not sure I follow a second table, but yeah I'd love to see your approach. Perhaps it's something I haven't thought of yet. What I'm picturing is preloading based on actor and a new adapter predicate method to determine enabled or not. But a change like large actor sets ripples into everything. For example, we'd likely have to completely change the way we do sync in cloud so I'm very unsure of this change.

jnunemaker · 2024-02-13T13:34:44Z

@rstawarz love to chat if you want or you can start with code and then we can chat. You can drop something on my calendar here when ready (either way): https://calendly.com/nunemaker/chat

jnunemaker · 2024-02-16T21:21:31Z

@rstawarz thanks for the chat. I talked with @bkeepers today. Shoot me an email john@fewerandfaster.com and I'll update you on some solutions for large actors.

jnunemaker self-assigned this Feb 9, 2024

jnunemaker mentioned this issue Feb 18, 2024

Add ActorLimit adapter #847

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loading all actor identifiers into memory isn't scalable #837

Loading all actor identifiers into memory isn't scalable #837

rstawarz commented Feb 7, 2024

codenamev commented Feb 7, 2024

jnunemaker commented Feb 9, 2024

rstawarz commented Feb 9, 2024

jnunemaker commented Feb 9, 2024

rstawarz commented Feb 12, 2024

jnunemaker commented Feb 13, 2024

jnunemaker commented Feb 13, 2024

jnunemaker commented Feb 16, 2024

Loading all actor identifiers into memory isn't scalable #837

Loading all actor identifiers into memory isn't scalable #837

Comments

rstawarz commented Feb 7, 2024

codenamev commented Feb 7, 2024

jnunemaker commented Feb 9, 2024

Details

rstawarz commented Feb 9, 2024

jnunemaker commented Feb 9, 2024

rstawarz commented Feb 12, 2024

jnunemaker commented Feb 13, 2024

jnunemaker commented Feb 13, 2024

jnunemaker commented Feb 16, 2024