fix: move metric increments to lowest callbacks #959
Conversation
Codecov Report
@@ Coverage Diff @@
## master #959 +/- ##
=====================================
Coverage 100% 100%
=====================================
Files 57 57
Lines 9616 9613 -3
=====================================
- Hits 9616 9613 -3
Continue to review full report at Codecov.
|
@@ -207,6 +210,13 @@ def _write_response(self, status_code, errno, message=None, error=None, | |||
if status_code == 410: | |||
self.set_header("Cache-Control", "max-age=86400") | |||
|
|||
if self._handling_message and status_code >= 300: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will only metric messages that are potentially stored and handled by Autopush and not passed via a bridge, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only the simplepush/webpush handlers set this flag, yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I didn't see any metrics for autopush.notification.bridge.sent
in datadog. I presume a separate issue will be filed for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's probably because those metrics were sent passing the base_tags as the counter value, not the proper tags keyword. I'm guessing they were dropped as a result. I'll fix that too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, updated to pass in the tags as the tags= value, that should fix the bridge.sent metrics.
"vapid:{}".format(vapid is not None) | ||
]) | ||
self._router_response(exc) | ||
self._router_response(exc, router_type, vapid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we pass just the boolean state of VAPID rather than the whole structure? Are we planning on extracting more from the vapid data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are not, but it does kind of help for consistency.
_router_fail_err's no longer needed in registration, so we should nuke those calls. Then it and router_response are only called by web.webpush/simple and they can assume metrics should always be emitted (without the need for the _handling_message flag). The router_type and vapid args can become required. only difference of simplepush's usage is it doesn't emit a vapid tag. not a big deal or easily hardcoded around if router_type == "simplepush" ..also leftover from when this landed, I thought we fixed the vapid arg but it seems still a bit off (sometimes a bool, sometimes not) |
@pjenvey agreed, I think that should be done as a refactor to clean up all these dangling callbacks though. |
2704574
to
6796949
Compare
autopush/web/base.py
Outdated
@@ -255,27 +265,41 @@ def _boto_err(self, fail): | |||
self._write_response(503, errno=202, | |||
message="Communication error, please retry") | |||
|
|||
def _router_response(self, response): | |||
def _router_response(self, response, router_type=None, vapid=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
router_type and vapid should be required (more importantly, webpush and simplepush aren't calling it w/ these args and should be)
def initialize(self): | ||
"""Must run on initialization to set ahead of validation""" | ||
super(WebPushHandler, self).initialize() | ||
self._handling_message = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI this could have been a class var, but I think we'll shuffle this around later so don't care
@@ -207,6 +210,13 @@ def _write_response(self, status_code, errno, message=None, error=None, | |||
if status_code == 410: | |||
self.set_header("Cache-Control", "max-age=86400") | |||
|
|||
if self._handling_message and status_code >= 300: | |||
self.metrics.increment('notification.message.error', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we will get the metric in all cases here, but router_type and vapid will be None in certain cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is correct, sometimes it'll fail before getting that knowledge. So it will just be logged as a None value.
self.metrics.timing("notification.request_time", | ||
duration=time_diff) | ||
self.metrics.increment('notification.message.success', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
simplepush.py needs a similar diff
What do we get out of a message size Counter? |
6796949
to
2ae9441
Compare
The metric increments were being called for registration API calls due to an error callback. They weren't called for success cases as well. Moving them to the lower callbacks with a new flag should help ensure they're incremented correctly. Also fixed message_data calls to use increment instead of a gauge and removed sending of base_tags in websocket.py to avoid too many metric values. Closes #958
2ae9441
to
637e246
Compare
The metric increments were being called for registration API
calls due to an error callback. They weren't called for success
cases as well. Moving them to the lower callbacks with a new
flag should help ensure they're incremented correctly.
Closes #958