diff --git a/generate_thrift_v1_docs/generate.sh b/generate_thrift_v1_docs/generate.sh
new file mode 100755
index 0000000..b34e043
--- /dev/null
+++ b/generate_thrift_v1_docs/generate.sh
@@ -0,0 +1,44 @@
+#!/bin/bash
+
+set -euo pipefail
+set -x
+
+# Where's Waldo?
+me="$(readlink -f ${BASH_SOURCE[0]})"
+[ $? -gt 0 ] && me="${BASH_SOURCE[0]}"
+mydir="$(cd "$(dirname "$me")" && pwd -P)"
+rootdir="$(cd $(dirname "$mydir") && pwd -P)"
+target_root="${rootdir}/public"
+target_dir="${target_root}/thrift/v1"
+
+# Prepare clean output space
+rm -rfv "$target_dir"
+mkdir -p "$target_dir"
+
+# Prepare clean workspace
+cd "$(mktemp -d)"
+git clone https://github.com/openzipkin/zipkin-api.git
+cd zipkin-api/thrift
+
+# Generate HTML docs with Thrift
+rm -fv wrapper.thrift
+for source in *.thrift; do
+ echo "include \"$source\"" >> wrapper.thrift
+done
+thrift -r --gen html -I . -out "$target_dir" wrapper.thrift
+
+# Turn Thrift-output index.html into valid XML
+# HTML Tidy exists with 1 on warnings, and we _will_ have warnings
+set +e
+tidy -indent -asxml -output "$target_dir/index.tidy.html" "$target_dir/index.html"
+tidy_status=$?
+[ $tidy_status -gt 1 ] && exit $tidy_status
+set -e
+
+# Apply some transforms to the generated HTML
+java -jar /usr/share/java/Saxon-HE.jar \
+ -s:"$target_dir/index.tidy.html" \
+ -xsl:"$mydir/transform.xslt" \
+ -o:"$target_dir/index.baked.html"
+mv -v "$target_dir/index.baked.html" "$target_dir/index.html"
+rm -v "$target_dir/index.tidy.html"
diff --git a/generate_thrift_v1_docs/transform.xslt b/generate_thrift_v1_docs/transform.xslt
new file mode 100644
index 0000000..e8c1c0a
--- /dev/null
+++ b/generate_thrift_v1_docs/transform.xslt
@@ -0,0 +1,23 @@
+Zipkin V1 Thrift models
+
Module | + +Services | + +Data types | + +Constants | + +
---|---|---|---|
zipkinCore | + ++ + |
+ Annotation + AnnotationType + + BinaryAnnotation + + Endpoint + Span + + |
+
+ CLIENT_ADDR + + CLIENT_RECV + + CLIENT_RECV_FRAGMENT + + CLIENT_SEND + + CLIENT_SEND_FRAGMENT + + ERROR + HTTP_HOST + + HTTP_METHOD + + HTTP_PATH + + HTTP_REQUEST_SIZE + + HTTP_RESPONSE_SIZE + + HTTP_STATUS_CODE + + HTTP_URL + LOCAL_COMPONENT + + MESSAGE_ADDR + + MESSAGE_RECV + + MESSAGE_SEND + + SERVER_ADDR + + SERVER_RECV + + SERVER_RECV_FRAGMENT + + SERVER_SEND + + SERVER_SEND_FRAGMENT + + WIRE_RECV + + WIRE_SEND + + |
+
+
zipkinDependencies | + ++ + |
+ Dependencies + + DependencyLink + + |
+
+ + + |
Module | Services | Data types | Constants | +
---|---|---|---|
wrapper | + | + | + |
Module | Services | Data types | Constants | +
---|---|---|---|
zipkinCore | + | Annotation +AnnotationType +BinaryAnnotation +Endpoint +Span + |
+CLIENT_ADDR + CLIENT_RECV + CLIENT_RECV_FRAGMENT + CLIENT_SEND + CLIENT_SEND_FRAGMENT + ERROR + HTTP_HOST + HTTP_METHOD + HTTP_PATH + HTTP_REQUEST_SIZE + HTTP_RESPONSE_SIZE + HTTP_STATUS_CODE + HTTP_URL + LOCAL_COMPONENT + MESSAGE_ADDR + MESSAGE_RECV + MESSAGE_SEND + SERVER_ADDR + SERVER_RECV + SERVER_RECV_FRAGMENT + SERVER_SEND + SERVER_SEND_FRAGMENT + WIRE_RECV + WIRE_SEND + |
+
Constant | Type | Value | +
---|---|---|
CLIENT_SEND | string | "cs" |
The client sent ("cs") a request to a server. There is only one send per +span. For example, if there's a transport error, each attempt can be logged +as a WIRE_SEND annotation. + +If chunking is involved, each chunk could be logged as a separate +CLIENT_SEND_FRAGMENT in the same span. + +Annotation.host is not the server. It is the host which logged the send +event, almost always the client. When logging CLIENT_SEND, instrumentation +should also log the SERVER_ADDR. + | ||
CLIENT_RECV | string | "cr" |
The client received ("cr") a response from a server. There is only one +receive per span. For example, if duplicate responses were received, each +can be logged as a WIRE_RECV annotation. + +If chunking is involved, each chunk could be logged as a separate +CLIENT_RECV_FRAGMENT in the same span. + +Annotation.host is not the server. It is the host which logged the receive +event, almost always the client. The actual endpoint of the server is +recorded separately as SERVER_ADDR when CLIENT_SEND is logged. + | ||
SERVER_SEND | string | "ss" |
The server sent ("ss") a response to a client. There is only one response +per span. If there's a transport error, each attempt can be logged as a +WIRE_SEND annotation. + +Typically, a trace ends with a server send, so the last timestamp of a trace +is often the timestamp of the root span's server send. + +If chunking is involved, each chunk could be logged as a separate +SERVER_SEND_FRAGMENT in the same span. + +Annotation.host is not the client. It is the host which logged the send +event, almost always the server. The actual endpoint of the client is +recorded separately as CLIENT_ADDR when SERVER_RECV is logged. + | ||
SERVER_RECV | string | "sr" |
The server received ("sr") a request from a client. There is only one +request per span. For example, if duplicate responses were received, each +can be logged as a WIRE_RECV annotation. + +Typically, a trace starts with a server receive, so the first timestamp of a +trace is often the timestamp of the root span's server receive. + +If chunking is involved, each chunk could be logged as a separate +SERVER_RECV_FRAGMENT in the same span. + +Annotation.host is not the client. It is the host which logged the receive +event, almost always the server. When logging SERVER_RECV, instrumentation +should also log the CLIENT_ADDR. + | ||
MESSAGE_SEND | string | "ms" |
Message send ("ms") is a request to send a message to a destination, usually +a broker. This may be the only annotation in a messaging span. If WIRE_SEND +exists in the same span, it follows this moment and clarifies delays sending +the message, such as batching. + +Unlike RPC annotations like CLIENT_SEND, messaging spans never share a span +ID. For example, "ms" should always be the parent of "mr". + +Annotation.host is not the destination, it is the host which logged the send +event: the producer. When annotating MESSAGE_SEND, instrumentation should +also tag the MESSAGE_ADDR. + | ||
MESSAGE_RECV | string | "mr" |
A consumer received ("mr") a message from a broker. This may be the only +annotation in a messaging span. If WIRE_RECV exists in the same span, it +precedes this moment and clarifies any local queuing delay. + +Unlike RPC annotations like SERVER_RECV, messaging spans never share a span +ID. For example, "mr" should always be a child of "ms" unless it is a root +span. + +Annotation.host is not the broker, it is the host which logged the receive +event: the consumer. When annotating MESSAGE_RECV, instrumentation should +also tag the MESSAGE_ADDR. + | ||
WIRE_SEND | string | "ws" |
Optionally logs an attempt to send a message on the wire. Multiple wire send +events could indicate network retries. A lag between client or server send +and wire send might indicate queuing or processing delay. + | ||
WIRE_RECV | string | "wr" |
Optionally logs an attempt to receive a message from the wire. Multiple wire +receive events could indicate network retries. A lag between wire receive +and client or server receive might indicate queuing or processing delay. + | ||
CLIENT_SEND_FRAGMENT | string | "csf" |
Optionally logs progress of a (CLIENT_SEND, WIRE_SEND). For example, this +could be one chunk in a chunked request. + | ||
CLIENT_RECV_FRAGMENT | string | "crf" |
Optionally logs progress of a (CLIENT_RECV, WIRE_RECV). For example, this +could be one chunk in a chunked response. + | ||
SERVER_SEND_FRAGMENT | string | "ssf" |
Optionally logs progress of a (SERVER_SEND, WIRE_SEND). For example, this +could be one chunk in a chunked response. + | ||
SERVER_RECV_FRAGMENT | string | "srf" |
Optionally logs progress of a (SERVER_RECV, WIRE_RECV). For example, this +could be one chunk in a chunked request. + | ||
HTTP_HOST | string | "http.host" |
The domain portion of the URL or host header. Ex. "mybucket.s3.amazonaws.com" + +Used to filter by host as opposed to ip address. + | ||
HTTP_METHOD | string | "http.method" |
The HTTP method, or verb, such as "GET" or "POST". + +Used to filter against an http route. + | ||
HTTP_PATH | string | "http.path" |
The absolute http path, without any query parameters. Ex. "/objects/abcd-ff" + +Used to filter against an http route, portably with zipkin v1. + +In zipkin v1, only equals filters are supported. Dropping query parameters makes the number +of distinct URIs less. For example, one can query for the same resource, regardless of signing +parameters encoded in the query line. This does not reduce cardinality to a HTTP single route. +For example, it is common to express a route as an http URI template like +"/resource/{resource_id}". In systems where only equals queries are available, searching for +http/path=/resource won't match if the actual request was /resource/abcd-ff. + +Historical note: This was commonly expressed as "http.uri" in zipkin, even though it was most +often just a path. + | ||
HTTP_URL | string | "http.url" |
The entire URL, including the scheme, host and query parameters if available. Ex. +"https://mybucket.s3.amazonaws.com/objects/abcd-ff?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Algorithm=AWS4-HMAC-SHA256..." + +Combined with HTTP_METHOD, you can understand the fully-qualified request line. + +This is optional as it may include private data or be of considerable length. + | ||
HTTP_STATUS_CODE | string | "http.status_code" |
The HTTP status code, when not in 2xx range. Ex. "503" + +Used to filter for error status. + | ||
HTTP_REQUEST_SIZE | string | "http.request.size" |
The size of the non-empty HTTP request body, in bytes. Ex. "16384" + +Large uploads can exceed limits or contribute directly to latency. + | ||
HTTP_RESPONSE_SIZE | string | "http.response.size" |
The size of the non-empty HTTP response body, in bytes. Ex. "16384" + +Large downloads can exceed limits or contribute directly to latency. + | ||
LOCAL_COMPONENT | string | "lc" |
The value of "lc" is the component or namespace of a local span. + +BinaryAnnotation.host adds service context needed to support queries. + +Local Component("lc") supports three key features: flagging, query by +service and filtering Span.name by namespace. + +While structurally the same, local spans are fundamentally different than +RPC spans in how they should be interpreted. For example, zipkin v1 tools +center on RPC latency and service graphs. Root local-spans are neither +indicative of critical path RPC latency, nor have impact on the shape of a +service graph. By flagging with "lc", tools can special-case local spans. + +Zipkin v1 Spans are unqueryable unless they can be indexed by service name. +The only path to a service name is by (Binary)?Annotation.host.serviceName. +By logging "lc", a local span can be queried even if no other annotations +are logged. + +The value of "lc" is the namespace of Span.name. For example, it might be +"finatra2", for a span named "bootstrap". "lc" allows you to resolves +conflicts for the same Span.name, for example "finatra/bootstrap" vs +"finch/bootstrap". Using local component, you'd search for spans named +"bootstrap" where "lc=finch" + | ||
ERROR | string | "error" |
When an annotation value, this indicates when an error occurred. When a +binary annotation key, the value is a human readable message associated +with an error. + +Due to transient errors, an ERROR annotation should not be interpreted +as a span failure, even the annotation might explain additional latency. +Instrumentation should add the ERROR binary annotation when the operation +failed and couldn't be recovered. + +Here's an example: A span has an ERROR annotation, added when a WIRE_SEND +failed. Another WIRE_SEND succeeded, so there's no ERROR binary annotation +on the span because the overall operation succeeded. + +Note that RPC spans often include both client and server hosts: It is +possible that only one side perceived the error. + | ||
CLIENT_ADDR | string | "ca" |
Indicates a client address ("ca") in a span. Most likely, there's only one. +Multiple addresses are possible when a client changes its ip or port within +a span. + | ||
SERVER_ADDR | string | "sa" |
Indicates a server address ("sa") in a span. Most likely, there's only one. +Multiple addresses are possible when a client is redirected, or fails to a +different server ip or port. + | ||
MESSAGE_ADDR | string | "ma" |
Indicates the remote address of a messaging span, usually the broker. + |
BOOL | 0 |
+Set to 0x01 when key is CLIENT_ADDR or SERVER_ADDR
+ |
BYTES | 1 |
+No encoding, or type is unknown.
+ |
I16 | 2 | + |
I32 | 3 | + |
I64 | 4 | + |
DOUBLE | 5 | + |
STRING | 6 |
+the only type zipkin v1 supports search against.
+ |
Key | Field | Type | Description | Requiredness | Default value | +
---|---|---|---|---|---|
1 | ipv4 | i32 | IPv4 host address packed into 4 bytes. Ex for the ip 1.2.3.4, it would be (1 << 24) | (2 << 16) | (3 << 8) | 4 | default | |
2 | port | i16 | IPv4 port or 0, if unknown. Note: this is to be treated as an unsigned integer, so watch for negatives. | default | |
3 | service_name | string | Classifier of a source or destination in lowercase, such as "zipkin-web". This is the primary parameter for trace lookup, so should be intuitive as possible, for example, matching names in service discovery. Conventionally, when the service name isn't known, service_name = "unknown". However, it is also permissible to set service_name = "" (empty string). The difference in the latter usage is that the span will not be queryable by service name unless more information is added to the span with non-empty service name, e.g. an additional annotation from the server. Particularly clients may not have a reliable service name at ingest. One approach is to set service_name to "" at ingest, and later assign a better label based on binary annotations, such as user agent. | default | |
4 | ipv6 | binary | IPv6 host address packed into 16 bytes. Ex Inet6Address.getBytes() | optional |
Key | Field | Type | Description | Requiredness | Default value | +
---|---|---|---|---|---|
1 | timestamp | i64 | Microseconds from epoch. This value should use the most precise value possible. For example, gettimeofday or multiplying currentTimeMillis by 1000. | default | |
2 | value | string | Usually a short tag indicating an event, like "sr" or "finagle.retry". | default | |
3 | host | Endpoint | The host that recorded the value, primarily for query by service name. | optional |
Key | Field | Type | Description | Requiredness | Default value | +
---|---|---|---|---|---|
1 | key | string | Name used to lookup spans, such as "http.path" or "finagle.version". | default | |
2 | value | binary | Serialized thrift bytes, in TBinaryProtocol format. For legacy reasons, byte order is big-endian. See THRIFT-3217. | default | |
3 | annotation_type | AnnotationType | The thrift type of value, most often STRING. annotation_type shouldn't vary for the same key. | default | |
4 | host | Endpoint | The host that recorded value, allowing query by service name or address. There are two exceptions: when key is "ca" or "sa", this is the source or destination of an RPC. This exception allows zipkin to display network context of uninstrumented services, such as browsers or databases. | optional |
Key | Field | Type | Description | Requiredness | Default value | +
---|---|---|---|---|---|
1 | trace_id | i64 | Unique 8-byte identifier for a trace, set on all spans within it. | default | |
3 | name | string | Span name in lowercase, rpc method for example. Conventionally, when the span name isn't known, name = "unknown". | default | |
4 | id | i64 | Unique 8-byte identifier of this span within a trace. A span is uniquely identified in storage by (trace_id, id). | default | |
5 | parent_id | i64 | The parent's Span.id; absent if this the root span in a trace. | optional | |
6 | annotations | list< | Associates events that explain latency with a timestamp. Unlike log statements, annotations are often codes: for example SERVER_RECV("sr"). Annotations are sorted ascending by timestamp. | default | |
8 | binary_annotations | list< | Tags a span with context, usually to support query or aggregation. For example, a binary annotation key could be "http.path". | default | |
9 | debug | bool | True is a request to store this span even if it overrides sampling policy. | optional | 0 |
10 | timestamp | i64 | Epoch microseconds of the start of this span, absent if this an incomplete span. This value should be set directly by instrumentation, using the most precise value possible. For example, gettimeofday or syncing nanoTime against a tick of currentTimeMillis. For compatibility with instrumentation that precede this field, collectors or span stores can derive this via Annotation.timestamp. For example, SERVER_RECV.timestamp or CLIENT_SEND.timestamp. Timestamp is nullable for input only. Spans without a timestamp cannot be presented in a timeline: Span stores should not output spans missing a timestamp. There are two known edge-cases where this could be absent: both cases exist when a collector receives a span in parts and a binary annotation precedes a timestamp. This is possible when.. - The span is in-flight (ex not yet received a timestamp) - The span's start event was lost | optional | |
11 | duration | i64 | Measurement in microseconds of the critical path, if known. Durations of less than one microsecond must be rounded up to 1 microsecond. This value should be set directly, as opposed to implicitly via annotation timestamps. Doing so encourages precision decoupled from problems of clocks, such as skew or NTP updates causing time to move backwards. For compatibility with instrumentation that precede this field, collectors or span stores can derive this by subtracting Annotation.timestamp. For example, SERVER_SEND.timestamp - SERVER_RECV.timestamp. If this field is persisted as unset, zipkin will continue to work, except duration query support will be implementation-specific. Similarly, setting this field non-atomically is implementation-specific. This field is i64 vs i32 to support spans longer than 35 minutes. | optional | |
12 | trace_id_high | i64 | Optional unique 8-byte additional identifier for a trace. If non zero, this means the trace uses 128 bit traceIds instead of 64 bit. | optional |
Module | Services | Data types | Constants | +
---|---|---|---|
zipkinDependencies | + | Dependencies +DependencyLink + |
++ |
Key | Field | Type | Description | Requiredness | Default value | +
---|---|---|---|---|---|
1 | parent | string | parent service name (caller) | default | |
2 | child | string | child service name (callee) | default | |
4 | callCount | i64 | total traced calls made from parent to child | default | |
5 | errorCount | i64 | how many calls are known to be errors | default |
Key | Field | Type | Description | Requiredness | Default value | +
---|---|---|---|---|---|
1 | start_ts | i64 | milliseconds from epoch | default | |
2 | end_ts | i64 | milliseconds from epoch | default | |
3 | links | list< | default |