Skip to content
Adrian Cole edited this page Jul 27, 2019 · 3 revisions

This is a template you can use to detail your site setup. It is important to provide some summary data even if you have blogs about it. This allows others to quickly learn about your site and compare vs others. Feel free to also include links to your blogs, decks or videos.

How to use this

Copy/paste this template into a new wiki under [Sites].

Change the title (Site Template) to the name of your site, then replace variables with things relevant to your site, and replace questions with answers. Remove sections irrelevant to you.

Introduction and Scope

  • At COMPANY, we have X engineers on Y teams managing Z services. 1 engineer works on distributed tracing.
  • Generally speaking, instrumentation is covered by X and Y due to most services being Z
  • As of MM-YYYY, we are using X as a data pipeline into Y storage, using Z for analysis and search.

System Overview

instrumentation

approach, platforms supported

data ingestion

formats, data pipeline, sampling

data store and aggregation

data at rest, retention, indexing, cleansing

realtime and batch analysis

techniques, visualizations, UI, tooling

Site-specific data conventions

service name

what is the source of your Zipkin service name? does it come from discovery? Is it used in other tools like metrics?

site-specific tags

which tags do you rely on for search or aggregation? For example, do you add correlation or environment IDs to spans? Which are fixed cardinality

Goals

What near, middle and long term milestones exist?

What value are the business looking to receive?

What improvements are you looking to further?

What other projects relate to your tracing goals?

Current Status (DD-YYYY)

The following is just an example, tailor it to your site and feel free to add diagrams.

  • NN+ services using Zipkin in the non-production and NN+ in production. Mainly in XX (cloud or infra platform)
  • By end of this year we are aiming to increase it to NNN+ for mostly cutting across our critical paths.
  • In production, we have N instances of Zipkin collectors (3 for each Http and Kafka transports)
  • We are reuse the same  Elasticsearch clusters (10 Nodes) setup for Logging, retaining spans for 7 days. The above architecture results in NNNN spans/day roughly N-NGB on disk.

Note If you are a SaaS, consider that trace count and sampling rate can be used to reverse-engineer traffic rate. This might be insider information, depending on your company's rules. Span rate or volume (like MB/s) may not be so revealing as there are an arbitrary amount of spans per trace. Alternatives such as this can help facilitate sharing without leaking!

Clone this wiki locally