Compare query profiles of 2 different servers by querying performance_schema.events_statements_summary_by_digest
The intention here is to connect to two servers and collect information from the digest table in parallel. This will do the following:
- generate n collections of query digests for each server.
- From this data we can collect n-1 samples which are based on collection x compared against collection x-1.
- Finally we find the top Z queries for server1
- With each query on server1 attempt to compare each of these queries with server2, comparing metrics using the available samples.
Note: this is still work in progress and not completed.
queryprofiler [<options>] <dsn1> [<dsn2> ...]
DSN1='user:password@tcp(server1.example.com:3306)/performance_schema' DSN2='user:password@tcp(server2.example.com:3306)/performance_schema' ./queryprofiler "$DSN1" "$DSN2"
In theory using P_S to profile the queries may seem quite simple, but I think that to get useful values it requires a little more attention. The sections below describe how queryprofiler analyses the queries on the server.
Event is the table represetation of the P_S digest table.
Collection is a slice of Events together with a timestamp of when data was collected. Collections is a slice of Collection.
Sample is a slice of rows that come from subtracting matching values by Key and recording the start time and duration of the sample. It contains several rows for different queries. Samples is a slice of Sample. Sample metrics are normalised to metrics per second for consistency.
This is a slice of float64, which is the underlying numbers used by this program. Thus a sample really contains a named set of Mmetric.
In theory the QUERY_DIGEST might be used but this digest is not stable between different MySQL versions so I collect an MD5 digest of the DIGEST_TEXT. That said the DIGEST is not a unique key, what's unique is a combination of query (digest) and SCHEMA_NAME, so the Key considered as the key of queries is based on the MD5_DIGEST and the SCHEMA_NAME, joined by a ".". if SCHEMA_NAME contains a value.
events_statements_summary_by_digest may have empty DIGEST/DIGEST_TEXT. This represents lost values because the maximum number of digest values has been exceeded. You may see this empty query having quite high values because of this.
Only completed queries are shown. Any long query that is running while queryprofiler is looking for data won't be shown.
events_statements_summary_by_digest should have only one row per DIGEST_TEXT / SCHEMA_NAME. Unfortunately I've seen that this is not the case and multiple row may be present. This has been reported. See http://bugs.mysql.com/bug.php?id=79533. In the meantime if multiple rows are found with the same DIGEST_TEXT/SCHEMA_NAME the values are merged together.
events_statements_summary_by_digest has a DIGEST column which represents a unique key (with the SCHEMA_NAME) to identify queries. However, this digest may not be the same for the same query on 2 different servers due to the way the optimiser works. Consequently queryprofiler takes an MD5 checksum of the QUERY_TEXT and uses that instead. I should really file a feature requesting that the generated query digest is calculated consistently as that would avoid this extra operation.
For sample output look at the file of the same name: sample-output.txt