-
Notifications
You must be signed in to change notification settings - Fork 243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deduplicate alert manager alerts by caching their hashes #1396
Conversation
ae16603
to
fb37d09
Compare
93f16f1
to
8b07759
Compare
8b07759
to
c28ddc5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice work
left few comments
alert_hash = Web.get_compound_hash([ | ||
alert.fingerprint.encode('ascii'), | ||
alert.status.encode('utf-8'), | ||
str(alert.startsAt.timestamp()).encode('ascii'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please also add the endsAt
I don't think you can get 2 alerts with everything the same, and only endsAt is different, but I don't see a downside in adding it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
src/robusta/runner/web.py
Outdated
|
||
Web.event_handler.get_telemetry().last_alert_at = str(datetime.now()) | ||
return jsonify(success=True) | ||
|
||
@staticmethod | ||
def processed_alerts_cache_flusher(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't need this.
I think the TTL Cache handles expiration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, this is implemented to handle some worst case scenarios when the built-in expiration mechanism in TTLCache
would be run and take a very long time. To guard against this, we run expire
periodically to distribute the workload.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, upon further inspection and bencharking it looks like the "bulk expire", if it happens, in practice won't take so long to warrant having this mechanism implemented. Removed.
src/robusta/runner/web.py
Outdated
@staticmethod | ||
def init(event_handler: PlaybooksEventHandler, loader: ConfigLoader): | ||
Web.metrics = QueueMetrics() | ||
Web.api_server_queue = TaskQueue(name="api_server_queue", num_workers=NUM_EVENT_THREADS, metrics=Web.metrics) | ||
Web.alerts_queue = TaskQueue(name="alerts_queue", num_workers=NUM_EVENT_THREADS, metrics=Web.metrics) | ||
Web.event_handler = event_handler | ||
Web.loader = loader | ||
threading.Thread( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't need this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice work
No description provided.