Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deduplicate alert manager alerts by caching their hashes #1396

Merged
merged 3 commits into from
May 3, 2024

Conversation

RobertSzefler
Copy link
Contributor

No description provided.

@RobertSzefler RobertSzefler force-pushed the feature/main-1609/dedup-alertmgr-alerts branch from ae16603 to fb37d09 Compare May 2, 2024 13:58
@RobertSzefler RobertSzefler marked this pull request as ready for review May 2, 2024 14:50
@RobertSzefler RobertSzefler force-pushed the feature/main-1609/dedup-alertmgr-alerts branch 4 times, most recently from 93f16f1 to 8b07759 Compare May 2, 2024 16:33
@RobertSzefler RobertSzefler force-pushed the feature/main-1609/dedup-alertmgr-alerts branch from 8b07759 to c28ddc5 Compare May 3, 2024 06:55
Copy link
Contributor

@arikalon1 arikalon1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work
left few comments

alert_hash = Web.get_compound_hash([
alert.fingerprint.encode('ascii'),
alert.status.encode('utf-8'),
str(alert.startsAt.timestamp()).encode('ascii'),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please also add the endsAt
I don't think you can get 2 alerts with everything the same, and only endsAt is different, but I don't see a downside in adding it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added


Web.event_handler.get_telemetry().last_alert_at = str(datetime.now())
return jsonify(success=True)

@staticmethod
def processed_alerts_cache_flusher():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need this.
I think the TTL Cache handles expiration

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, this is implemented to handle some worst case scenarios when the built-in expiration mechanism in TTLCache would be run and take a very long time. To guard against this, we run expire periodically to distribute the workload.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, upon further inspection and bencharking it looks like the "bulk expire", if it happens, in practice won't take so long to warrant having this mechanism implemented. Removed.

@staticmethod
def init(event_handler: PlaybooksEventHandler, loader: ConfigLoader):
Web.metrics = QueueMetrics()
Web.api_server_queue = TaskQueue(name="api_server_queue", num_workers=NUM_EVENT_THREADS, metrics=Web.metrics)
Web.alerts_queue = TaskQueue(name="alerts_queue", num_workers=NUM_EVENT_THREADS, metrics=Web.metrics)
Web.event_handler = event_handler
Web.loader = loader
threading.Thread(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need this

Copy link
Contributor

@arikalon1 arikalon1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work

@arikalon1 arikalon1 merged commit 2aa1b2a into master May 3, 2024
14 checks passed
@arikalon1 arikalon1 deleted the feature/main-1609/dedup-alertmgr-alerts branch May 3, 2024 12:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants