Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raft topology: check_and_repair_cdc_streams: don't create a new generation if current one is optimal #14055

Closed
kbr-scylla opened this issue May 28, 2023 · 1 comment · Fixed by #14789

Comments

@kbr-scylla
Copy link
Contributor

Follow-up to #13683.

In #13683 I left a FIXME in storage_service::raft_check_and_repair_cdc_streams:

// FIXME: check if the current generation is optimal, don't request new one if it isn't

The behavior of the original (gossiper-based) implementation of this API would do exactly that: if the current CDC generation is fine, don't create a new one (hence the "check_and" part of the name).

The new implementation always creates a new generation.

It needs to be modified to do the check first. For that we should refactor the code from generation_service::check_and_repair_cdc_streams (the gossiper-based implementation) - take out the optimality check into a separate function and use it in both places. Note that generation_service::check_and_repair_cdc_streams has other checks which are not applicable to the group0-based implementation (the situation with group 0 is simpler, we don't need to do a distributed read and so on). We basically need only this part:

            if (tmptr->sorted_tokens().size() != gen->entries().size()) {
                // We probably have garbage streams from old generations
                cdc_log.info("Generation size does not match the token ring, regenerating");
                should_regenerate = true;
            } else {
                std::unordered_set<dht::token> gen_ends;
                for (const auto& entry : gen->entries()) {
                    gen_ends.insert(entry.token_range_end);
                }
                for (const auto& metadata_token : tmptr->sorted_tokens()) {
                    if (!gen_ends.contains(metadata_token)) {
                        cdc_log.warn("CDC generation {} missing token {}. Regenerating.", latest, metadata_token);
                        should_regenerate = true;
                        break;
                    }
                }
            }

The test that was modified in #13683 should be modified further to call the API after a decommission multiple times concurrently; only one generation should be created.

@avikivity
Copy link
Member

No vulnerable branches, not backporting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment