More granular adapter error metrics #550

dbemiller · 2018-06-07T20:27:42Z

This PR captures better adapter metrics, and fixes some bugs I ran into along the way.

It deletes a few old metrics, and introduces the following:

prebidserver.adapter.{bidder}.requests.gotbids: a bid request was made and bids were returned
prebidserver.adapter.{bidder}.requests.nobid: a bid request was made, but no bids were returned
prebidserver.adapter.{bidder}.requests.badinput: a bid request was made with some invalid input
prebidserver.adapter.{bidder}.requests.badserverresponse: a bid request was made and the server responded with an error or something unexpected.
prebidserver.adapter.{bidder}.requests.timeout: a bid request was made and the bidder timed out.
prebidserver.adapter.{bidder}.requests.unknown: a bid request was made and some unknown error happened.

One risky change in here is that the "account X adapter" request counter was also split into gotbids and nobid. If you have lots of accounts, this could potentially cause problems in your metrics collector.

Some loose translation from old => new metrics are:

prebidserver.adapter.{bidder}.requests => prebidserver.adapter.{bidder}.requests.(gotbids|nobid)
prebidserver.adapter.{bidder}.error_requests => prebidserver.adapter.{bidder}.requests.(badinput|badserverresponse|unknown)
prebidserver.adapter.{bidder}.timeout_requests => prebidserver.adapter.{bidder}.requests.timeout

However, these are not exact either. Previously, a nobid-with-errors would get logged as one of those error conditions. Now we will log a nobid event and an error event.

This also fixed a bug in bid price logging. In /openrtb2/auction, bidders are allowed to return bids and errors. For example, "you gave us a banner & video Imp, and we don't bid on video... but here are some banner bids."

Previously, the code was only logging bids and bid prices if there were no errors. Now it logs all bids.

bretg · 2018-06-08T14:30:09Z

It would be good to know how often each adapter was called for which type of request. Similar to PR Bad request metrics #547 , could we add the request-type to the adapter? e.g.

requests.adapter.*.requests.type.{ortb-web,ortb-app,amp,legacy}

Note this a little different than what's in #547 as it breaks openrtb2 into ortb-web and ortb-app. Any chance of tweaking that at the request level? And maybe also adding type:

requests.type.{ortb-web,ortb-app,amp,legacy}

likewise for accounts

requests.account.*.requests.type.{ortb-web,ortb-app,amp,legacy}

hhhjort · 2018-06-08T15:30:04Z

pbsmetrics/go_metrics.go

 	}
+	for errType := range labels.AdapterErrors {


This may be an issue for any metrics backend that supports true labels. Normally you would want a label "error" with the value indicating the error type, or the value "none". To support multiple error types, you would need either multiple labels, one for each error type and treat the value for each as true/false. Or record multiple requests, one for each error type.

Considering that each individual request to a bidder should only be able to result in one error, may be better to record each success as a metrics event, and each error as a metrics event. With the current implementation we are losing info about the success rate.

Take for an example an adapter that on average generates 4 requests per auction, and generates on average 1.5 errors. If we look at the metrics, we will see an error rate of 150%, which is of course crazy. The true error rate was actually 37.5%. Tone this down to 0.5 errors on average and the metrics will report a 50% error rate when only 12.5% of requests to the bidder generated an error. Better to record on a per request basis if we care about more than "did the adapter perform flawlessly in the auctions?"

This may be an issue for any metrics backend that supports true labels.

Can't the backend choose how it logs events? This is an implementation for go-metrics, which doesn't support labels.

Considering that each individual request to a bidder should only be able to result in one error

This isn't true... officially, the docs say:

The errors should contain a list of errors which explain why this bidder's bids will be "subpar" in some way.

The goal for adapters is to return good error messages that will explain to a Publisher what went wrong. It could be one or many. A good example might be "you sent us a banner/video imp, but we don't support video" and "our server returned a 5xx when we tried to fetch the banner bids".

With the current implementation we are losing info about the success rate.

You're right... but I don't think the problem is with the AdapterErrors list. Just this implementation of it. We could easily collect this data with:

if len(labels.AdapterErrors) == 0 { // log success } else { // log error }

That said... I must admit that I don't see the value in it. If you knew the error rate was high, what would you do about it? Suggest that other Pubs avoid it? Tell the Bidder about it? What if the cause was just a single high-traffic publisher who messed up their bidder params?

We can chat if you disagree, but... I can't think of any actionable data that we're losing here.

As for returning multiple errors, let me modify that by saying multiple types of errors. If one request returns multiple reasons why the request was subpar, that is still just the one type of error. We certainly cannot get both a 5xx error and an error why the request was subpar in a single request.

As for "If you knew the error rate was high, what would you do about it?", what is the purpose of having metrics on these errors if not to track the error rate? I don't see how making the error metrics more granular is useful if we are not tracking the error rate accurately.

dbemiller · 2018-06-08T15:40:34Z

requests.adapter.*.requests.type.{ortb-web,ortb-app,amp,legacy}

Logged as #551. Since it's not a breaking change, there's no sense in smooshing it into this PR.

requests.account.*.requests.type.{ortb-web,ortb-app,amp,legacy}

Logged as #552. Same reason.

requests.type.{ortb-web,ortb-app,amp,legacy}

This sounds good & I did it now. #547 already made similar breaking changes and hasn't been released yet... so we should consolidate them.

prebidserver.requests.ok.openrtb2 is changed into prebidserver.requests.ok.openrtb2-web and prebidserver.requests.ok.openrtb2-app. Same with the requests.badinput.* and requests.err.* metrics.

* 'master' of https://github.com/prebid/prebid-server: (93 commits) Separated the bidder info from the endpoint (prebid#585) More granular adapter error metrics (prebid#550) Deleted the validate endpoint, and dead code files. (prebid#541) Deleted the /getuids endpoint. (prebid#531) Moved the cookie parsing into the actual ParseCookie function. (prebid#584) [adform adapter] Pass gdpr & digitrust to bid request (prebid#581) Adding endpoint exposing prebid-server version (prebid#577) Infer imp.secure state (prebid#572) Changing custom parameter name (prebid#576) Adds map support to config logging. (prebid#573) Moves the metrics config under pbsmetrics where it should be rather than in the general config module. (prebid#571) Reorganizes the metrics code to support metrics submodules the same way we do in other sections. (prebid#568) Consolidates 2 metrics engine functions and fixes missing metric from… (prebid#558) Changes configuration logging to be reflect-based. (prebid#563) Cookie sync URL gdpr support for Brightroll adapter (prebid#562) Improve config logging & validation (prebid#560) Use the cache type to determine if no cache was available. (prebid#557) Fixed the validation logic on unbounded caches, and added tests. (prebid#556) Include binaries in releases (prebid#554) Defines defaults for all config options and removes pointers from the… (prebid#539) ...

* Split the adapter error metrics up by purpose. Fixed some bugs. * Renamed some more metrics, and fixed some issues. * Split the openrtb2 request type into web and app.

Split the adapter error metrics up by purpose. Fixed some bugs.

80fdd22

dbemiller added the breaking change - hosts Has breaking changes for companies that host Prebid Server label Jun 7, 2018

Renamed some more metrics, and fixed some issues.

d2615ce

dbemiller requested review from hhhjort and cirla June 8, 2018 13:50

dbemiller assigned cirla and hhhjort Jun 8, 2018

cirla previously approved these changes Jun 8, 2018

View reviewed changes

Split the openrtb2 request type into web and app.

09c35b2

dbemiller dismissed cirla’s stale review via 09c35b2 June 8, 2018 15:25

dbemiller mentioned this pull request Jun 8, 2018

Capture request types per adapter #551

Closed

hhhjort reviewed Jun 8, 2018

View reviewed changes

dbemiller mentioned this pull request Jun 8, 2018

Capture request type metrics per-account #552

Closed

hhhjort previously approved these changes Jun 13, 2018

View reviewed changes

bretg mentioned this pull request Jun 13, 2018

Refactor metrics prebid/prebid-server-java#81

Closed

Merged from master. Fixed conflicts.

4a67447

dbemiller dismissed hhhjort’s stale review via 4a67447 June 18, 2018 15:39

hhhjort approved these changes Jun 25, 2018

View reviewed changes

cirla approved these changes Jun 25, 2018

View reviewed changes

dbemiller merged commit b91c7f8 into master Jun 26, 2018

dbemiller deleted the bad-server-response-metrics branch June 26, 2018 14:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More granular adapter error metrics #550

More granular adapter error metrics #550

dbemiller commented Jun 7, 2018 •

edited

Loading

bretg commented Jun 8, 2018 •

edited

Loading

hhhjort Jun 8, 2018

dbemiller Jun 8, 2018

hhhjort Jun 8, 2018

dbemiller commented Jun 8, 2018

More granular adapter error metrics #550

More granular adapter error metrics #550

Conversation

dbemiller commented Jun 7, 2018 • edited Loading

bretg commented Jun 8, 2018 • edited Loading

hhhjort Jun 8, 2018

Choose a reason for hiding this comment

dbemiller Jun 8, 2018

Choose a reason for hiding this comment

hhhjort Jun 8, 2018

Choose a reason for hiding this comment

dbemiller commented Jun 8, 2018

dbemiller commented Jun 7, 2018 •

edited

Loading

bretg commented Jun 8, 2018 •

edited

Loading