-
Notifications
You must be signed in to change notification settings - Fork 406
Troubleshooting RTBkit Using Graphite
You may run into situations where the volume of bid requests you expect to see reaching your agents is lower than you expect, or where your system is not bidding at all. When this happens, you need to troubleshoot to determine the cause.
The overall workflow for this is to look in the following areas of your RTBkit system:
- Bid traffic over the network into RTBkit and between RTBkit components
- Bid Request parsing
- Bid Request filtering
- The status of your Bidding Agents -- are they available, reachable and representing active accounts?
- The status of your Banker -- is it available and does it have budget for active accounts?
If your system isn't bidding, the first thing to confirm is that it is receiving bids. Check these keys:
-
\<Exchange Connector Name\>:auctionNewConnection
will show data if the Exchange Connector socket is open and recieving traffic. -
\<Exchange Connector Name\>:auctionStart
will show data if RTBkit is able to start processing them. -
\<Exchange Connector Name>:auctionResponseSent
is written at end of bid processing when a bid response is sent back to the exchange. If this key is being written, your system is able to at least validly receive bids and respond to the caller. If there are other issues, all responses will be HTTP Response Code 204 indicating no bid, but you will have eliminated the network as the external network or the inventory source as the cause of your issues.
In addition, you should check that the data displayed for the auctionReceived
and auctionResponseSent
keys match.
If they do, and the responses are not all HTTP Response Code 204, your stack is processing each request without throwing exceptions. In that case you should focus on the Parsing, Filtering, Bidding Agent and Banker steps described next.
If the counts don't match, you may have exceptions. You should check the RTBkit logs for errors.
You can also confirm bid traffic at the network level directly with tools like tcpdump or tcpick. Your bootstrap.json
RTBkit configuration has the ports used by the various RTBkit services.
Here is the tcpick command to listen to traffic on port 9950:
sudo tcpick -i eth0 -yP -C -h "port 9950"
Here is the equivalent comment in tcpdump:
tcpdump -i eth0 port 9950
You should next verify that RTBkit is parsing your bid requests without error. The first key to check is router:<Exchange Connector Name\>:error:parsingBidRequest
, which may display child keys for various types of errors if there are parsing errors.
If there are no error keys, compare the values in router:\<Exchange Connector Name\>:auctionReceived
, which shows the number of bid requests we are trying to parse, to router:\<Exchange Connector Name\>:auctionStart
, which shows the number of bid requests that have passed through parsing. If they don't match, the difference shows how many bid requests are either failing parsing or being dropped because of load shedding. (Load shedding occurs when RTBkit can't keep up with inbound load and so responds to some requests immediately with a Response Code 204 indicating no-bid. It does this to not timeout on calls from the exchanges and still attempt to bid on some percentage of the incoming requests that it can keep up with.)
The next are to investigate is the Bid Request filtering pipeline. By looking at the graphite keys that instrument this pipeline, you can determine where the bid requests are being dropped. The usual suspects are:
- at the entrance of the exchange connectors
- requests being dropped because the system is in slow mode
- static filters
- augmentors
- dynamic filters
- bid responses not being correctly received from the agents.
Each of these steps can be checked with the following graphite keys:
Exchange Connectors:
.router..auctionReceived .router..auctionStart
Slow Mode:
drawAsInfinite(<install>.router.monitor.systemInSlowMode)
<install>.router.monitor.ignoredAuctions
Static Filters:
sumSeries(<install>.router.accounts.*.filter.intoStaticFilters)
sumSeries(<install>.router.accounts.*.filter.passedStaticFilters)
<install>.router.auctionPassedPreprocessing
Augmenters:
<install>.augmentationLoop.augmentation.request
<install>.augmentationLoop.augmentation.response
<install>.augmentationLoop.augmentation.unknown
Dynamic Filters:
sumSeries(<install>.router.accounts.*.filter.intoDynamicFilters)
sumSeries(<install>.router.accounts.*.filter.passedDynamicFilters)
Response from the agents:
<install>.router.bid
sumSeries(<install>.router.accounts.*.bids)
sumSeries(<install>.router.accounts.*.bidErrors.*)
sumSeries(<install>.agents.bids)
All the keys that are in CAPS represent messages from the router. drawAsInfinite and sumSeries are functions you can call in the graphite interface by clicking on the "data" button.
If you are receiving a low amount of bid traffic, rather than none at all, you can try turning on real-time polling in your router configuration, like this: "realTimePolling": true
in router-config.json
. Turning this setting on will cause RTBkit to try and process bid requests as quickly as possible, at the cost of using more CPU resources.
If bid requests are passing through all the filters, but you are still not bidding or bidding at levels much less than expected, you should verify the status of your Bidding Agents.
If any agents haven't responded to the router for more than two seconds, there will be data for the key router:accounts:\<AGENT NAME\>:static:agentAppearsDead
.
Also, if you know traffic is passing your filters but you see the router:auctionDropped key set, this could mean the router is unable to connect to any agents, that no agents are correctly configured, or that no agents are active. So your next step is to check your Agent Configuration Service.
Under router:accounts
you will have child keys for each account. Here you can check the ping0
and ping1
keys which are heartbeat keys showing the agent is available and responding. Under bidResponseTimeMs
you can see a variety of keys that show the number of bids the agent is responding to and the response time. Note that you should not rely only on the mean key here because even a significant number of outliers may not change the mean enough to be noticeable. Looking at the percentile keys gives a better picture of the distribution of response times and how many are falling outside of a healthy range.
The inFlight
key will show you how many current open bids the agent is waiting on. Finally, the bids key will have data if the agent is bidding.
The last area to check is your banker. You want to insure that all your agents are represented by banker accounts, and that those accounts have a budget rather than no budget. You can use the Banker REST API to do this.
Your RTBkit bidder might be handling bid requests perfectly, but still not behaving as expected because it is not processing ad server events correctly. If Wins are not being received, or not being matched correctly, RTBkit goes into slow mode, throttling bidding. This is a safety measure to prevent overspending, because the system can't correctly debit accounts if wins aren't being recorded. Problems recording or matching wins are thus another common scenario leading to unexpected, undesirable bidder behavior.
First, you'll want to confirm that you are receiving event messages. Check the adserver:\<Ad Server Name\>:rqTimeMs
key. This key is only written if the Ad Server Connector is receiving events. Next, you can check for errors processing ad server events in the adserver:\<Ad Server Name\>:error
, where child keys are written for various exceptions.
If you know you are receiving the expected ad server events, you next want to confirm that RTBkit is able to match them correctly to previous bid requests.
The first step is to confirm that messages are reaching the Post Auction Loop. The postAuction:messages:[AUCTION|WIN|EVENT:CLICK|EVENT:CONVERSION]
keys are written when messages reach the PAL.
Finally, you can confirm that the PAL is correctly matching the Click and Conversion events it receives in the postAuction:delivery:[CLICK|CONVERSION]:accounts:\<ACCOUNT_NAME\>
keys.
- Getting Started
- Pull Request Guidelines
- Coding Standards
- Demo Stack
- How to compile static filters test
- RTBkit Binary Package
- Architecture
- Bid Request Lifecycle
- ZooKeeper Nodes
- Load Shedding
- Banker
- Post Auction Loop State Machine
- Post-Auction Loop Sharding
- ZMQ Endpoints