Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading all actor identifiers into memory isn't scalable #837

Open
rstawarz opened this issue Feb 7, 2024 · 8 comments
Open

Loading all actor identifiers into memory isn't scalable #837

rstawarz opened this issue Feb 7, 2024 · 8 comments
Assignees

Comments

@rstawarz
Copy link

rstawarz commented Feb 7, 2024

Is there a way to configure/modify Flipper so that all gates are not loaded into memory?

We have several services that were successfully using Flipper. We began to migrate our monolith away from our custom feature flag implementation / rollout combo. However we haven't found a way to configure the adapters to NOT load every gate into memory. It seems all paths lead through loading all gate_values. In our usage, we have multiple flags that are enabled per actor for 100k+ users. I don't see how this tool can scale in this regard. I'm hoping I'm just missing something and someone could help shed some light on what I'm missing.

Thanks!

@codenamev
Copy link

Hey @jnunemaker 👋 Ryan's been trying to make this work for us, and I think the issue may stem from the default handling of the DSL's feature fetcher? or... are we holding it wrong? 😂

I'd be happy to submit a PR if you point me in the right direction 😄

@jnunemaker
Copy link
Collaborator

@rstawarz @codenamev 👋. You are correct in how flipper loads actors.

The goal of flipper is to store a smallish data structure of rules that then get applied in your app. We encourage fewer than 100 actors for any given feature and to use groups (and soon expressions) for when the number must be higher than that.

We have several customers/users in the 1-2k enabled actors range, but, in general, I'd think of that as an anti-pattern. The primary reason is performance. I'll try to explain in detail below but I am also happy to hop on a zoom or tuple and talk more about it.

Details

If you keep a large number of enabled actors per feature, then, like you said, you don't want to pull them all across the wire to see if one is enabled. The reasonable change would be to make it so flipper can check if a single actor is enabled at the adapter level.

But if we do that, preloading gets a lot more difficult, as instead of preloading the rules you would kind of need to preload the rules that are small or could possibly apply to N actors (user, org, project, etc.). Doing this means you trade large set of individual actors for more network calls or more complicated preloading.

The goal of flipper is to reduce network calls and be as was as possible by default. The way we achieve that is by keeping the rules relatively small. Groups and expressions (in alpha) make it possible to keep the rules small and avoid network calls. They are kind of the best of both worlds.

For example, when we allow for opt-in or large number of people individually enabled for a feature, we typically store that elsewhere and then use a group to determine enabled/disabled.

On one app, we have a settings jsonb column on user that we use for various things. If we want to allow opt in to a new feature like a redesign, we add the bits to let people click a toggle in the UI and have that flip the optin_redesign setting to true.

Then we use a group like this:

Flipper.register(:optin_redesign) do |actor|
  # respond_to? is so you can pass in different types of actors like org, project, user, etc.
  actor.respond_to?(: optin_redesign?) && optin_redesign?
end

With that group registered, you can enable or disable it for the feature to turn it on or off for any number of users (100k+) that have opted in.

Flipper.enable_group(:redesign, :optin_redesign)
Flipper.enabled?(:redesign, user) # returns true if user.optin_redesign? returns true

This keeps the data structure of rules small, preloading and caching simple and allows for large sets of individuals to be enabled.

Does something like that seem feasible for what you are trying to accomplish?

@jnunemaker jnunemaker self-assigned this Feb 9, 2024
@rstawarz
Copy link
Author

rstawarz commented Feb 9, 2024

@jnunemaker Thank you for verifying! I understand what you're saying about groups, and have proposed this internally as a way to use Flipper as our interface to our existing feature flag model, not sure how much I love that but it works.

Given the potential for loading too many actors/gates into memory is a potential issue - do you see a good way to protect agains that? You opened this #79 a couple years ago (😆 ) - any plans on implementing that?

@jnunemaker
Copy link
Collaborator

Yeah, we have a limit in cloud. I haven't enforced in cloudless (oss).

I do think there are things we could explore on the individual actor stuff but in the end they always end up in memory at some point.

I spent several days going through the Ruby SDKs of other feature flag offerings and all end up in memory. Some have specific and unique sync mechanisms for large sets but they all end up in memory eventually.

The only options are network calls or memory and memory is more performant.

If there is anything we can do to help please let me know. I'd be super happy to pair on some of this and make recommendations or whatever helps you move forward.

@rstawarz
Copy link
Author

@jnunemaker couple thoughts, happy to jump on a call to discuss if it'd be helpful:

  1. Would you be open to a patch to the ActiveRecord adapter to allow a configurable limit of gates? (if not pushing down the cloud limit to cloudless?). As it stands now, the open source tool has the potential for misuse and to bring down the system if used without regard to the internal design decisions.

  2. Along those same lines, thoughts on a patch to make using a group for a large number of actors a first class citizen of the ActiveRecord adapter (maybe other adapters too?) Something along the lines of a second table like flipper_gate_members. Effectively we're doing this internally, but its confusing if it's not a first class citizen of the tool and displayed in the UX (thinking something along the lines of being able to show the # of members of the group)

@jnunemaker
Copy link
Collaborator

  1. Yep.
  2. I'm not sure I follow a second table, but yeah I'd love to see your approach. Perhaps it's something I haven't thought of yet. What I'm picturing is preloading based on actor and a new adapter predicate method to determine enabled or not. But a change like large actor sets ripples into everything. For example, we'd likely have to completely change the way we do sync in cloud so I'm very unsure of this change.

@jnunemaker
Copy link
Collaborator

@rstawarz love to chat if you want or you can start with code and then we can chat. You can drop something on my calendar here when ready (either way): https://calendly.com/nunemaker/chat

@jnunemaker
Copy link
Collaborator

@rstawarz thanks for the chat. I talked with @bkeepers today. Shoot me an email john@fewerandfaster.com and I'll update you on some solutions for large actors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants