-
Notifications
You must be signed in to change notification settings - Fork 136
Description
The trigger for this issue report is GoogleCloudPlatform/pgadapter#2760
The built-in metrics exporter checks that all metrics include an instance_id. If at least one metric does not include an instance_id, then entire export will be skipped. This again also means that the metric remains in the collection of unexported metrics, and all built-in metric exports stop from that point until the client is restarted.
A client can easily collect a metric without an instance_id, because the `instance_id is being set in a the header interceptor that is called when the headers are being sent by the client. That never happens if the client cannot establish a network connection to Spanner.
Copy-paste from the PGAdapter issue:
The problem occurs when a request is being sent by PGAdapter (or more correct: By the underlying Java client) but never really leaves the client, for example due to a network problem. The reason is that:
- The RPC is collected as a failed attempt and included in the metrics.
- However, the
instance_idis not set before the request is being sent. Instead, that happens in this interceptor when the headers are being is sent. - If the request is never being sent due to a network problem, then no
instance_idwill be set, and the metric will be added to the collection without aninstance_id. - Once that has happened once, it will continue to log the warning, as the entire export is being skipped (instead of only the metric without an
instance_id).
The easiest way to reproduce the problem (and verify that it indeed happens in the way described above):
- Create a client and successfully execute a simple query (this ensures that everything has been initialized and simplifies the next steps).
- Set a breakpoint at this line (this is where the client initiates the request)
- Set a breakpoint at this line (this is where the headers are being sent)
- Disable your network (disable WiFi or unplug your cable).
- Try to execute another query. You will see that breakpoint 3 is reached and 4 is not. This also means that no
instance_idis added to the metric attributes.