From e815ab368e21adef8fa37e7a9ae2ad9797ad3d4e Mon Sep 17 00:00:00 2001 From: Brendan Gregg Date: Thu, 15 May 2014 15:02:38 -0700 Subject: [PATCH] Summary from USENIX/LISA13 Metrics Workshop --- appservers/README.md | 13 +++++++++++++ config/README.md | 24 ++++++++++++++++++++++++ databases/README.md | 22 ++++++++++++++++++++++ distributed/README.md | 14 ++++++++++++++ messaging/README.md | 12 ++++++++++++ network/README.md | 26 ++++++++++++++++++++++++++ resources/README.md | 7 +++++++ webservers/README.md | 14 ++++++++++++++ 8 files changed, 132 insertions(+) create mode 100644 appservers/README.md create mode 100644 config/README.md create mode 100644 databases/README.md create mode 100644 distributed/README.md create mode 100644 messaging/README.md create mode 100644 network/README.md create mode 100644 resources/README.md create mode 100644 webservers/README.md diff --git a/appservers/README.md b/appservers/README.md new file mode 100644 index 0000000..31fc191 --- /dev/null +++ b/appservers/README.md @@ -0,0 +1,13 @@ +# Application Servers + +* Total requests served, rate +* Latency: + * time to serve a client + * complete a client transaction + * request queue time +* App error rate +* Error counts on backend H/W +* Bandwidth usage front and backend +* System load on primary application server: CPU, memory, disk, swapping +* Usage patterns: + * which user, client time, session time, active vs idle time diff --git a/config/README.md b/config/README.md new file mode 100644 index 0000000..52ef069 --- /dev/null +++ b/config/README.md @@ -0,0 +1,24 @@ +# Configuration + +* Apps should export flags, to check for consistency + * a metadata to show the target configuration +* Versioning: + * ldd, libraries linked against + * time a config was applied +* Platform Type: + * server H/W +* Cost of Configuration + * cost of configuration upload/download + * time to deployment: security changes (high priority), vs others + * CPU and RAM usage during configuration +* People + * deployment report +* Hardware + * current hardware + * max expected performance +* Process + * compliance measurement of configuration: percent of systems +* Failure + * failure of configuration deployment + * rollbacks, rollforward: config metric didn't apply +* OS flags diff --git a/databases/README.md b/databases/README.md new file mode 100644 index 0000000..e7f025b --- /dev/null +++ b/databases/README.md @@ -0,0 +1,22 @@ +# Databases + +* Queries/sec +* # of connections +* connections/sec +* avg time per query +* cache hit rate +* avg io latency +* aggregate io +* % of query time in io +* # of locks +* # of versions (for read consistency) +* terminated connects +* SQL statements +* cache evictions +* query errors by type +* saturation: plan to execute + * queueing on pool +* change in number of executed plans +* latency of last checkpoint, and on-disk representation of wall log + * (how much of DB to reply) +* checkpoint times diff --git a/distributed/README.md b/distributed/README.md new file mode 100644 index 0000000..335c05e --- /dev/null +++ b/distributed/README.md @@ -0,0 +1,14 @@ +# Distributed Systems + +* Perceived latency: service time and queueing +* Request rate +* Error rate +* Traffic origins +* Histogram of latencies for each server, for comparisons +* Visualizations: + * heatmaps + * for service + * per server + * per backend + * system 'flame graph' + * visualize traffic as graph, queue time, request flow diff --git a/messaging/README.md b/messaging/README.md new file mode 100644 index 0000000..f91caff --- /dev/null +++ b/messaging/README.md @@ -0,0 +1,12 @@ +# Message Queueing + +* Distribution of message latency (ns) +* Throughput +* Total number of ns +* Errors, drop, retransmits, discards +* Message fanout distribution (gain: ratio of input to put) +* For distribution message queues: see distributied systems +* Queue lengths +* Saturation: run out of space +* Resource constraints on queueing systems +* Last time of access diff --git a/network/README.md b/network/README.md new file mode 100644 index 0000000..53f4505 --- /dev/null +++ b/network/README.md @@ -0,0 +1,26 @@ +# Network Infrastructure + +* Physical Infrastructure + * bandwidth, utilization of individual links + * CoS/QoS rate/drops + * L2/L2 protocol health + * churn + * reachabality +* Per port: + * packets/sec + * packet size + * buffer utilization + * perf flow into: + * app injection BW + * app injectiov rate + * app consumption rate + * app consumption BW +* Component: + * links + * errors + * latency + * utilization +* Topology: + * app to app latency + * app to app low + * symmetry diff --git a/resources/README.md b/resources/README.md new file mode 100644 index 0000000..f9aaeb1 --- /dev/null +++ b/resources/README.md @@ -0,0 +1,7 @@ +# Resources/Devices + +* Utilization + * per-device: eg, as a heat map for distribution over time +* Saturation + * average queue length, or time waiting on queue +* Errors diff --git a/webservers/README.md b/webservers/README.md new file mode 100644 index 0000000..5fe1a4a --- /dev/null +++ b/webservers/README.md @@ -0,0 +1,14 @@ +# Web Servers + +* Requests: referrer, origin, UA, resp code, count + * origin + * response code +* Req size: distribution +* Response Size: resp code, distribution +* Responce Count: resp code, counter +* Time To First Bite: resp code, distribution +* Time To Last Bite: resp code, distribution +* Active Workers: guage +* Worker Age: guage +* Connections: counter +* Process Metrics from host