Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(tenant/timeline metrics): race condition during shutdown + recreation #7064

Merged
merged 1 commit into from
Mar 11, 2024

Commits on Mar 8, 2024

  1. fix(tenant/timeline metrics): shutdown

    Tenant::shutdown or Timeline::shutdown completes and becomes externally
    observable before the corresponding Tenant/Timeline object is dropped.
    
    For example, after observing a Tenant::shutdown to complete, we could
    attach the same tenant_id again. The shut down Tenant object might still
    be around at the time of the attach.
    
    The race is then the following:
    - old object's metrics are still around
    - new object uses with_label_values
    - old object calls remove_label_values
    
    The outcome is that the new object will have the metric objects (they're
    an Arc internall) but the metrics won't be part of the internal registry
    and hence they'll be missing in `/metrics`.
    
    Later, when the new object gets shut down and tries to
    remove_label_value, it will observe an error because
    the metric was already removed by the old object.
    
    Changes
    -------
    
    This PR moves metric removal to `shutdown()`.
    
    An alternative design would be to multi-version the metrics using a
    distinguishing label, or, to use a better metrics crate that allows
    removing metrics from the registry through the locally held metric
    handle instead of interacting with the (globally shared) registry.
    problame committed Mar 8, 2024
    Configuration menu
    Copy the full SHA
    89b177e View commit details
    Browse the repository at this point in the history