-
Notifications
You must be signed in to change notification settings - Fork 290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor how RBAC enabled/disabled status is determined for Grafana Cloud stacks #4279
Conversation
engine/apps/user_management/sync.py
Outdated
# NOTE: checking whether or not RBAC is enabled depends on whether we are dealing with an open-source or cloud | ||
# stack. For Cloud we should make a call to the GCOM API, using an admin API token, and get the list of | ||
# feature_toggles enabled for the stack. For open-source, simply make a HEAD request to the grafana instance's API | ||
# and consider RBAC enabled if the list RBAC permissions endpoint returns 200. We cannot simply rely on the HEAD | ||
# call in cloud because if an instance is not active, the grafana gateway will still return 200 for the | ||
# HEAD request. | ||
# stack. For open-source, simply make a HEAD request to the grafana instance's API and consider RBAC enabled if | ||
# the list RBAC permissions endpoint returns 200. | ||
# | ||
# For cloud, we need to check the stack's status first. If the stack is active, we can make the same HEAD request | ||
# to the grafana instance's API. If the stack is not active, we will simply rely on the org's previous state of | ||
# org.is_rbac_permissions_enabled | ||
if settings.LICENSE == settings.CLOUD_LICENSE_NAME: | ||
# We cannot simply rely on the HEAD call in cloud because if an instance is not active | ||
# the grafana gateway will still return 200 for the HEAD request. | ||
stack_id = organization.stack_id | ||
gcom_client = GcomAPIClient(settings.GRAFANA_COM_ADMIN_API_TOKEN) | ||
rbac_is_enabled = gcom_client.is_rbac_enabled_for_stack(organization.stack_id) | ||
|
||
# the stack MUST be active for this check.. if it is in any other state | ||
# the Grafana API risks returning an HTTP 200 but the actual permissions data that is | ||
# synced later on will be empty (and we'd erase all RBAC permissions stored in OnCall) | ||
if gcom_client.is_stack_active(stack_id): | ||
rbac_is_enabled = grafana_api_client.is_rbac_enabled_for_organization() | ||
else: | ||
rbac_is_enabled = organization.is_rbac_permissions_enabled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
main change
@contextmanager | ||
def patched_grafana_api_client(organization, is_rbac_enabled_for_organization=False): | ||
GRAFANA_INCIDENT_PLUGIN_BACKEND_URL_KEY = "backendUrl" | ||
|
||
with patch("apps.user_management.sync.GrafanaAPIClient") as mock_grafana_api_client: | ||
mock_grafana_api_client.GRAFANA_INCIDENT_PLUGIN_BACKEND_URL_KEY = GRAFANA_INCIDENT_PLUGIN_BACKEND_URL_KEY | ||
|
||
mock_client_instance = mock_grafana_api_client.return_value | ||
|
||
mock_client_instance.get_users.return_value = [ | ||
{ | ||
"userId": 1, | ||
"email": "test@test.test", | ||
"name": "Test", | ||
"login": "test", | ||
"role": "admin", | ||
"avatarUrl": "test.test/test", | ||
"permissions": [], | ||
}, | ||
] | ||
mock_client_instance.get_teams.return_value = ( | ||
{ | ||
"totalCount": 1, | ||
"teams": ( | ||
{ | ||
"id": 1, | ||
"name": "Test", | ||
"email": "test@test.test", | ||
"avatarUrl": "test.test/test", | ||
}, | ||
), | ||
}, | ||
None, | ||
) | ||
mock_client_instance.get_team_members.return_value = ( | ||
[ | ||
{ | ||
"orgId": organization.org_id, | ||
"teamId": 1, | ||
"userId": 1, | ||
}, | ||
], | ||
None, | ||
) | ||
mock_client_instance.get_grafana_incident_plugin_settings.return_value = ( | ||
{"enabled": True, "jsonData": {GRAFANA_INCIDENT_PLUGIN_BACKEND_URL_KEY: MOCK_GRAFANA_INCIDENT_BACKEND_URL}}, | ||
None, | ||
) | ||
mock_client_instance.get_grafana_labels_plugin_settings.return_value = ( | ||
{"enabled": True, "jsonData": {}}, | ||
None, | ||
) | ||
mock_client_instance.check_token.return_value = (None, {"connected": True}) | ||
mock_client_instance.is_rbac_enabled_for_organization.return_value = is_rbac_enabled_for_organization | ||
|
||
yield mock_grafana_api_client |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
most of the changes in this file are related to this..
we had a lot of duplication around mocking out responses on the GrafanaAPIClient
(plus several instances where we had like 8 nested context managers 😆 ).. this gets rid of that
def should_be_considered_for_rbac_permissioning(self) -> bool: | ||
""" | ||
this is sort of a hacky workaround to address a cloud issue we introduced with the accessControlOncall | ||
feature flag. The flag is technically enabled for all stacks, but the way in which OnCall used to be | ||
reading it (via GCOM config.feature_flags for the stack) made it such that RBAC wasn't actually being | ||
enabled for most stacks from the oncall backend perspective. Once we change things to start HEADing | ||
the permissions search endpoint, this will effectively turn on RBAC for all orgs.. soo instead lets | ||
slowly turn it on via the logic here | ||
""" | ||
cloud_rbac_rollout_percentage = settings.CLOUD_RBAC_ROLLOUT_PERCENTAGE | ||
|
||
# if rbac permissions are already enabled for the org, they're "grandfathered" in | ||
if not cloud_rbac_rollout_percentage or self.is_rbac_permissions_enabled: | ||
return True | ||
return self.id <= math.floor(Organization.objects.count() * cloud_rbac_rollout_percentage) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc: @gamab
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will eventually be removed once we're comfortable rolling this out to all stacks in cloud (ie. we'll slowly ramp up the value of CLOUD_RBAC_ROLLOUT_PERCENTAGE
)
…/oncall into jorlando/rbac-feature-flag-fixes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. As discussed with you, I left a couple of questions.
Side note: what happens if the search endpoint returns an error (!=200) do we remove the users permissions?
…/oncall into jorlando/rbac-feature-flag-fixes
based on PR comment
…/oncall into jorlando/rbac-feature-flag-fixes
What this PR does
In cloud we are currently (somewhat) improperly determining whether or not a Grafana stack had the
accessControlOnCall
feature flag enabled. At first things worked fine. We would enable this feature toggle via the Grafana Admin UI, and then the OnCall backend would read this value from GCOM'sGET /instance/<stack_id>
endpoint (viaconfig.feature_toggles
), and everything worked as expected.There was a recent change made in
grafana/deployment_tools
to set this feature flag to True for all stacks. However, for some reason, the GCOM endpoint above doesn't return theaccessControlOnCall
feature toggle value inconfig.feature_toggles
if it is set in this manner (it only returns the value if it is set via the Grafana Admin UI).So what we should instead be doing is such instead of asking GCOM for this feature toggle, infer whether RBAC is enabled on the stack by doing a
HEAD /api/access-control/users/permissions/search
(this endpoint is only available on a Grafana stack ifaccessControlOnCall
is enabled).Few caveats to this ☝️
active
state (ie. not paused). This is because, no matter if theaccessControlOnCall
is enabled or not, if the stack is in apaused
state it will ALWAYS returnHTTP 200
which can be misleading and lead to bugs (this feels like a bug on the Grafana API, will follow up with core grafana team)settings.CLOUD_RBAC_ROLLOUT_PERCENTAGE
(see alsoOrganization. should_be_considered_for_rbac_permissioning
)Which issue(s) this PR closes
Related to https://github.com/grafana/identity-access-team/issues/667
Checklist
pr:no public docs
PR label added if not required)release:
). These labels dictate how your PR willshow up in the autogenerated release notes.