Skip to content

Troubleshooting RTBkit Using Graphite

marksweiss edited this page Sep 4, 2014 · 1 revision

Troubleshooting Bid Requests

You may run into situations where the volume of bid requests you expect to see reaching your agents is lower than you expect, or where your system is not bidding at all. When this happens, you need to troubleshoot to determine the cause.

The overall workflow for this is to look in the following areas of your RTBkit system:

  • Bid traffic over the network into RTBkit and between RTBkit components
  • Bid Request parsing
  • Bid Request filtering
  • The status of your Bidding Agents -- are they available, reachable and representing active accounts?
  • The status of your Banker -- is it available and does it have budget for active accounts?

Troubleshooting Network Issues in Graphite

If your system isn't bidding, the first thing to confirm is that it is receiving bids. Check these keys:

  • \<Exchange Connector Name\>:auctionNewConnection will show data if the Exchange Connector socket is open and recieving traffic.
  • \<Exchange Connector Name\>:auctionStart will show data if RTBkit is able to start processing them.
  • \<Exchange Connector Name>:auctionResponseSent is written at end of bid processing when a bid response is sent back to the exchange. If this key is being written, your system is able to at least validly receive bids and respond to the caller. If there are other issues, all responses will be HTTP Response Code 204 indicating no bid, but you will have eliminated the network as the external network or the inventory source as the cause of your issues.

In addition, you should check that the data displayed for the auctionReceived and auctionResponseSent keys match.

If they do, and the responses are not all HTTP Response Code 204, your stack is processing each request without throwing exceptions. In that case you should focus on the Parsing, Filtering, Bidding Agent and Banker steps described next.

If the counts don't match, you may have exceptions. You should check the RTBkit logs for errors.

Troubleshooting Network Issues Directly

You can also confirm bid traffic at the network level directly with tools like tcpdump or tcpick. Your bootstrap.json RTBkit configuration has the ports used by the various RTBkit services.

Here is the tcpick command to listen to traffic on port 9950:

sudo tcpick -i eth0 -yP -C -h "port 9950"

Here is the equivalent comment in tcpdump:

tcpdump -i eth0 port 9950

Bid Request Parsing

You should next verify that RTBkit is parsing your bid requests without error. The first key to check is router:<Exchange Connector Name\>:error:parsingBidRequest, which may display child keys for various types of errors if there are parsing errors.

If there are no error keys, compare the values in router:\<Exchange Connector Name\>:auctionReceived, which shows the number of bid requests we are trying to parse, to router:\<Exchange Connector Name\>:auctionStart, which shows the number of bid requests that have passed through parsing. If they don't match, the difference shows how many bid requests are either failing parsing or being dropped because of load shedding. (Load shedding occurs when RTBkit can't keep up with inbound load and so responds to some requests immediately with a Response Code 204 indicating no-bid. It does this to not timeout on calls from the exchanges and still attempt to bid on some percentage of the incoming requests that it can keep up with.)

Bid Request Filtering

The next are to investigate is the Bid Request filtering pipeline. By looking at the graphite keys that instrument this pipeline, you can determine where the bid requests are being dropped. The usual suspects are:

  • at the entrance of the exchange connectors
  • requests being dropped because the system is in slow mode
  • static filters
  • augmentors
  • dynamic filters
  • bid responses not being correctly received from the agents.

Each of these steps can be checked with the following graphite keys:

Exchange Connectors:

.router..auctionReceived .router..auctionStart

Slow Mode:

drawAsInfinite(<install>.router.monitor.systemInSlowMode)
<install>.router.monitor.ignoredAuctions

Static Filters:

sumSeries(<install>.router.accounts.*.filter.intoStaticFilters)
sumSeries(<install>.router.accounts.*.filter.passedStaticFilters)
<install>.router.auctionPassedPreprocessing

Augmenters:

<install>.augmentationLoop.augmentation.request
<install>.augmentationLoop.augmentation.response
<install>.augmentationLoop.augmentation.unknown

Dynamic Filters:

sumSeries(<install>.router.accounts.*.filter.intoDynamicFilters)
sumSeries(<install>.router.accounts.*.filter.passedDynamicFilters)

Response from the agents:

<install>.router.bid
sumSeries(<install>.router.accounts.*.bids)
sumSeries(<install>.router.accounts.*.bidErrors.*)
sumSeries(<install>.agents.bids)

All the keys that are in CAPS represent messages from the router. drawAsInfinite and sumSeries are functions you can call in the graphite interface by clicking on the "data" button.

Additional Router Configuration

If you are receiving a low amount of bid traffic, rather than none at all, you can try turning on real-time polling in your router configuration, like this: "realTimePolling": true in router-config.json. Turning this setting on will cause RTBkit to try and process bid requests as quickly as possible, at the cost of using more CPU resources.

Bidding Agent Status

If bid requests are passing through all the filters, but you are still not bidding or bidding at levels much less than expected, you should verify the status of your Bidding Agents.

If any agents haven't responded to the router for more than two seconds, there will be data for the key router:accounts:\<AGENT NAME\>:static:agentAppearsDead.

Also, if you know traffic is passing your filters but you see the router:auctionDropped key set, this could mean the router is unable to connect to any agents, that no agents are correctly configured, or that no agents are active. So your next step is to check your Agent Configuration Service.

Under router:accounts you will have child keys for each account. Here you can check the ping0 and ping1 keys which are heartbeat keys showing the agent is available and responding. Under bidResponseTimeMs you can see a variety of keys that show the number of bids the agent is responding to and the response time. Note that you should not rely only on the mean key here because even a significant number of outliers may not change the mean enough to be noticeable. Looking at the percentile keys gives a better picture of the distribution of response times and how many are falling outside of a healthy range.

The inFlight key will show you how many current open bids the agent is waiting on. Finally, the bids key will have data if the agent is bidding.

Banker Status

The last area to check is your banker. You want to insure that all your agents are represented by banker accounts, and that those accounts have a budget rather than no budget. You can use the Banker REST API to do this.

Troubleshooting Ad Server Events

Your RTBkit bidder might be handling bid requests perfectly, but still not behaving as expected because it is not processing ad server events correctly. If Wins are not being received, or not being matched correctly, RTBkit goes into slow mode, throttling bidding. This is a safety measure to prevent overspending, because the system can't correctly debit accounts if wins aren't being recorded. Problems recording or matching wins are thus another common scenario leading to unexpected, undesirable bidder behavior.

Confirm Events are Reaching the Ad Server Connector

First, you'll want to confirm that you are receiving event messages. Check the adserver:\<Ad Server Name\>:rqTimeMs key. This key is only written if the Ad Server Connector is receiving events. Next, you can check for errors processing ad server events in the adserver:\<Ad Server Name\>:error, where child keys are written for various exceptions.

Confirm Events are Reaching the Post Auction Loop

If you know you are receiving the expected ad server events, you next want to confirm that RTBkit is able to match them correctly to previous bid requests.

The first step is to confirm that messages are reaching the Post Auction Loop. The postAuction:messages:[AUCTION|WIN|EVENT:CLICK|EVENT:CONVERSION] keys are written when messages reach the PAL.

Confirm Click and Conversions are Being Matched to Wins in the Post Auction Loop

Finally, you can confirm that the PAL is correctly matching the Click and Conversion events it receives in the postAuction:delivery:[CLICK|CONVERSION]:accounts:\<ACCOUNT_NAME\> keys.

Clone this wiki locally