Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Add metrics to track how often events are soft_failed #10156

Merged
merged 5 commits into from
Jun 11, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changelog.d/10156.misc
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Add `synapse_federation_soft_failed_events_total` metric to track how often events are soft failed.
7 changes: 7 additions & 0 deletions synapse/handlers/federation.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@
)

import attr
from prometheus_client import Counter
from signedjson.key import decode_verify_key_bytes
from signedjson.sign import verify_signed_json
from unpaddedbase64 import decode_base64
Expand Down Expand Up @@ -101,6 +102,11 @@

logger = logging.getLogger(__name__)

soft_failed_event_counter = Counter(
"synapse_federation_soft_failed_events_total",
"Events received over federation that we marked as soft_failed",
)


@attr.s(slots=True)
class _NewEventInfo:
Expand Down Expand Up @@ -2498,6 +2504,7 @@ async def _check_for_soft_fail(
event_auth.check(room_version_obj, event, auth_events=current_auth_events)
except AuthError as e:
logger.warning("Soft-failing %r because %s", event, e)
Copy link
Contributor Author

@MadLittleMods MadLittleMods Jun 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the future:

If we want to dig deeper into what rooms are soft_failing messages, we could use the Elasticsearch logs and some aggregations (Kibana or raw ES queries). We can add a few fields to the Elasticsearch mapping, room_id, mxid, event_id which should handle the high cardinality(lots of different values) just fine.

It looks like we use Logstash or Filebeat and I assume there is a way we can parse another field out besides message? Maybe https://stackoverflow.com/q/40460830/796832

-- #10156 (comment)

To document my additional findings here;

We can already use logger.info('foo', extra={ "foo": "bar"}) as shown in tests/logging/test_terse_json.py#L64-L74 to add extra fields to the structured logging.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created #10168 to add these fields

soft_failed_event_counter.inc()
event.internal_metadata.soft_failed = True

async def on_get_missing_events(
Expand Down