Add after nibble hook to pt-online-schema-change #645
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
The current progress resource only outputs information as text and it is rather limited in the information it gives. It is only possible to choose between percentage, time or iterations. Besides that, the Progress class is also shared between many workflows to output the current status of the OSC (and other module). Running the module in debug mode generates extra output with meaningful information, although it is not as handy as a metric.
I want to export metrics of the internals of the running OSC so I have better visibility over the dynamic configurations, speed and progress. It helps on troubleshoots for long-running OSCs (more than a day).
Some useful metrics:
Proposed solution
This change adds a new hook to the pt-online-schema-change script. The hook is between the
before_copy_rows
and after_copy_rows. I named iton_copy_rows_after_nibble
.That new hook allows users to write a custom code to get information from table row_cnt, nibble_time, progress and rate. Metrics can be submitted from that hook without further changes to the shared Progress class.
The current position of
on_copy_rows_after_nibble
is at the end so all other checks has finished (catch up of replicas, load, flow control).This change don't solve the problem for publishing current state of the script is not available (running, replica catch up, paused, ...). I'm open to suggestions for further changes.