Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(hc): Use hybrid cloud services in AuthIdentityHandler #43035

Merged
merged 9 commits into from Jan 26, 2023

Conversation

RyanSkonnord
Copy link
Member

These changes are targeted toward getting all AuthHelper and AuthIdentityHandler code to run with control silo limits. Most availability errors from AuthIdentityHandler are fixed, but we need more work around the Actor model and ApiInviteHelper to make it stable.

These changes are targeted toward getting all AuthHelper and
AuthIdentityHandler code to run with control silo limits. Most
availability errors from AuthIdentityHandler are fixed, but we need more
work around the Actor model and ApiInviteHelper to make it stable.
request: HttpRequest
identity: Mapping[str, Any]

def __post_init__(self) -> None:
# For debugging. TODO: Remove when tests are stable
if not isinstance(self.organization, ApiOrganization):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍
You could also make this conditional to tests by

import sys
if 'pytest' in sys.modules:
   ...

Which means you can safely continue to leave it.

src/sentry/auth/helper.py Outdated Show resolved Hide resolved
self, *, metadata: AuditLogMetadata, organization_member: ApiOrganizationMember
) -> None:
user_id = organization_member.user_id
assert user_id is not None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm... I think this is wrong. Just because a user does not exist, does not mean we cannot log something about their id. Many AuditLogEntry invocations come from async processes, where technically they can have a user_id in hand of a user that has already been deleted. At the moment, we use cascade logic to set_null those user_ids, but in the future an async process will provide that.

I think the correct thing is to take the user_id provided "as is" without validating its deletion and allow eventually consistency of the tombstone / outbox pattern I'm working on to handle the rest.

So basically, we still want audit log entries even when a user no longer exists, and we'll handle the set null at a different level.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I would guess that, when I dropped in that assertion, I was thinking more about getting Mypy to pass than about the correct behavior for audit logs.

It also wouldn't surprise me if, before my changes, the rare case of a null user would have simply caused an error, and it's just more obvious now.

from sentry.services.hybrid_cloud.user import APIUser, user_service


class DatabaseBackedAuditLogService(AuditLogService):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A high level concern: right now, I actually already did implement an AuditLogEntry service:

    def save_or_write_to_kafka(self):
        """
        Region Silos do not have access to the AuditLogEntry table which is specific to the control silo.
        For those silos, this method publishes the attempted audit log write to a durable kafka queue synchronously
        that will eventually be consumed by the control silo.  For the control silo, this method ultimately results
        in a save() call.
        """
        from sentry.region_to_control.producer import audit_log_entry_service

        audit_log_entry_service.produce_audit_log_entry(self)

Which is hooked up to some existing call sites, but isn't running in production. However, it is currently used in a handful of tests.

One major difference is that in my original design, AuditLogs were written to kafka and asynchronously written to the database. I do think we want to move away from kafka, but I wonder if using Outbox to synchronize these might also be a strong option. It would improve latency and availability of these code paths since writing an AuditLog is never a synchronously required action.

Strictly speaking, such work is not necessary for this PR for now (let's get the auth login work across the line), but when the login work is wrapping up, let's think about how we can converge on a single audit log service, and wether we think it should remain synchronous or not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AuditLogs were written to kafka and asynchronously written to the database. I do think we want to move away from kafka, but I wonder if using Outbox to synchronize these might also be a strong option. It would improve latency and availability of these code paths since writing an AuditLog is never a synchronously required action.

Having audits be transported by outbox would be ideal imo. Not having to setup cross region Kafka would be good as it helps us improve resilience and reduces infrastructure complexity.

@abc.abstractmethod
def handle_new_membership(
self,
request: Request,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, my worry here is the requirement of the request object. Is there, any way we can further unpack this? Because when this becomes an RPC invocation, we don't want to be serializing an entire request object.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we rely on two session values, invite_token and invite_member_id and the request user.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, but it would cascade some sizeable changes through ApiInviteHelper. How about as a separate PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally makes sense to do separately 👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flags.member_limit__restricted = True

# Otherwise create a new membership
om = organization_service.add_organization_member(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. In the future, this will become simpler: we'll create an OrganizationMemberMapping orm locally on the control, and that will synchronize back. But this is sufficient for current world compatibility.

Copy link
Member

@corps corps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! I think, as you said at standup, taking the approach by starting with setting test to stable=True and following the implications that surfaces is a great way to make progress. I think this is a very smart way to break up the complex work and I can see where you're headed. I added a lot of feedback but I want to emphasize that I agree with your approach and am grateful for breaking this down.

Comment on lines +36 to +42
def log_organization_membership(
self,
*,
metadata: AuditLogMetadata,
organization_member: ApiOrganizationMember,
) -> None:
raise NotImplementedError
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you thinking that there will be a method for each audit log that is recorded? We have quite a few audit log event types which will make this interface wide.

We should be able to extend the existing AuditLogEvent plumbing code to be aware of required data attributes which would let us have an interface like

audit_log_service.log(AuditLogMetaData( 
    organization=self.organization,
    actor=self.user,
    ip_address=self.request.META["REMOTE_ADDR"],
    target_object=auth_identity.id,
    event=audit_log.get_event_id("SSO_IDENTITY_LINK"),
    auth_identity=auth_identity
)

With required parameters embedded in to the event definition AuditLogMetaData could ensure all the required attributes are present. This approach would have a narrow interface, but wouldn't be as typesound as what you have presently.

from sentry.services.hybrid_cloud.user import APIUser, user_service


class DatabaseBackedAuditLogService(AuditLogService):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AuditLogs were written to kafka and asynchronously written to the database. I do think we want to move away from kafka, but I wonder if using Outbox to synchronize these might also be a strong option. It would improve latency and availability of these code paths since writing an AuditLog is never a synchronously required action.

Having audits be transported by outbox would be ideal imo. Not having to setup cross region Kafka would be good as it helps us improve resilience and reduces infrastructure complexity.

@abc.abstractmethod
def handle_new_membership(
self,
request: Request,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we rely on two session values, invite_token and invite_member_id and the request user.

src/sentry/services/hybrid_cloud/organization/impl.py Outdated Show resolved Hide resolved
@RyanSkonnord
Copy link
Member Author

Regarding several comments on and near the writing of audit logs: I agree with your feedback, and everything around the audit logs was pretty much a hack job trying to get this integrated with minimal code movement outside of AuditLogService. Per discussion with @corps, I think the most realistic way to ship these auth changes would be to punt the audit log stuff until we can cover it with a good general mechanism.

@corps corps merged commit f930d42 into master Jan 26, 2023
@corps corps deleted the auth-helper-control-silo-services branch January 26, 2023 03:32
mikejihbe pushed a commit that referenced this pull request Feb 6, 2023
These changes are targeted toward getting all AuthHelper and
AuthIdentityHandler code to run with control silo limits. Most
availability errors from AuthIdentityHandler are fixed, but we need more
work around the Actor model and ApiInviteHelper to make it stable.

Co-authored-by: Zach Collins <recursive.cookie.jar@gmail.com>
@github-actions github-actions bot locked and limited conversation to collaborators Feb 10, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Scope: Backend Automatically applied to PRs that change backend components
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants