Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent isolated/decommissioned nodes handle kafka API requests #7829

Conversation

VadimPlh
Copy link
Contributor

@VadimPlh VadimPlh commented Dec 19, 2022

Ideas

The main idea of this pr:

  • Add new state for redpanda node - isolated, how redpanda process should "discover" that it is isolated
  • Create way to say clients “Do not communicate with me and send your request to another node”

We need to signal to clients that node can not get any requests anymore. The simplest way to do it is by using Kafka RPC. Clients send metadata requests to get info about the cluster, so we do not want to return wrong information from isolated nodes. So we can do it by using metadata response.

For now we have several types of pings for nodes.

  • Health_monitor
  • Node_status_table
  • Append_entries request from controller leader

So for beginning we can just check that we do not communicate with another nodes from Health_monitor, also we do not get append_entries requests from controller leader + node does not have outside raft heartbits. So after it we can decide that we are isolated and signal client to reconnect to another node.

RFC
Ticket

Backports Required

  • none - not a bug fix
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v22.3.x
  • v22.2.x
  • v22.1.x

UX Changes

  • none

Release Notes

Features

  • Redpanda nodes now automatically decline Kafka API requests when they detect that they are isolated from the cluster
  • Cluster configuration properties are added to control node isolation detection: node_isolation_raft_timeout and node_isolation_heartbeat_timeout. Both are 3 seconds by default.

@VadimPlh VadimPlh force-pushed the issue-271-part1-change-metadata-responce branch 5 times, most recently from f6cc7c7 to 243796a Compare December 21, 2022 13:13
@VadimPlh VadimPlh force-pushed the issue-271-part1-change-metadata-responce branch from 243796a to 5c0ad4e Compare December 21, 2022 14:12
@VadimPlh VadimPlh force-pushed the issue-271-part1-change-metadata-responce branch 3 times, most recently from db3485a to 3a6d3a1 Compare January 3, 2023 17:56
@VadimPlh VadimPlh marked this pull request as ready for review January 3, 2023 17:56
@VadimPlh VadimPlh force-pushed the issue-271-part1-change-metadata-responce branch 2 times, most recently from 0ee43de to 6ae1bc4 Compare January 4, 2023 14:31
tests/rptest/tests/isolated_decommissioned_node_test.py Outdated Show resolved Hide resolved
src/v/features/feature_table.cc Outdated Show resolved Hide resolved
src/v/raft/consensus.h Outdated Show resolved Hide resolved
src/v/config/configuration.cc Outdated Show resolved Hide resolved
src/v/config/configuration.cc Outdated Show resolved Hide resolved
src/v/cluster/metadata_cache.cc Show resolved Hide resolved
tests/rptest/tests/isolated_decommissioned_node_test.py Outdated Show resolved Hide resolved
@VadimPlh VadimPlh force-pushed the issue-271-part1-change-metadata-responce branch 10 times, most recently from b498256 to 6232d0d Compare January 17, 2023 18:42
@VadimPlh VadimPlh requested a review from jcsp January 17, 2023 18:47
@VadimPlh VadimPlh force-pushed the issue-271-part1-change-metadata-responce branch from 6232d0d to 60e933f Compare January 18, 2023 18:09
@VadimPlh
Copy link
Contributor Author

Please resolve conflicts

resolved

mmaslankaprv
mmaslankaprv previously approved these changes Jan 23, 2023
@mmaslankaprv
Copy link
Member

/ci-repeat 5

@redpanda-data redpanda-data deleted a comment from mmaslankaprv Jan 23, 2023
@redpanda-data redpanda-data deleted a comment from mmaslankaprv Jan 23, 2023
@andrewhsu
Copy link
Member

@mmaslankaprv @VadimPlh i have a feeling tests/rptest/tests/tests/rptest/tests/isolated_decommissioned_node_test.py should instead be tests/rptest/tests/isolated_decommissioned_node_test.py

@mmaslankaprv
Copy link
Member

mmaslankaprv commented Jan 23, 2023

@mmaslankaprv @VadimPlh i have a feeling tests/rptest/tests/tests/rptest/tests/isolated_decommissioned_node_test.py should instead be tests/rptest/tests/isolated_decommissioned_node_test.py

@andrewhsu thank you, sorry for this

@mmaslankaprv
Copy link
Member

/ci-repeat 10 skip-units dt-repeat=32 tests/rptest/tests/isolated_decommissioned_node_test.py

@andrewhsu
Copy link
Member

andrewhsu commented Jan 23, 2023

sorry, i noticed another typo in the repeat command (based on the syntax defined in #940) so i kicked off a draft PR #8368 but it has the same git commit so it replaced this PR's test jobs.

i'll close the draft PR #8368 and kick off another repeat command as intended in comment but with typo fix.

@andrewhsu
Copy link
Member

/ci-repeat 10
dt-repeat=32
skip-unit
tests/isolated_decommissioned_node_test.py

@andrewhsu
Copy link
Member

i've created another draft PR #8369 that runs the ci-repeat command with the full ducktape test path in case specifying the short path does not work: #8369 (comment)

@andrewhsu
Copy link
Member

andrewhsu commented Jan 23, 2023

@VadimPlh @mmaslankaprv fyi the buildkite job went green (from draft PR #8369 (comment)):
https://buildkite.com/redpanda/redpanda/builds/21696

that used this PR's codebase (plus a dummy commit)

If node know about leader for controller
it means node is not isolated.
New setting node_isolation_heartbeat_timeout.
How long after the last heartbeat request
a node will wait before considering itself to be isolated
Metadata_cache now contains bool flag which signal
is node isolated or not. This bool will be
updated by new sharded service and cached inside
metadata cache
Returns list of all nodes in cluster
New service comunicate swith health_monitor,
node_status_table to understand is node isolated
For now we have 3 different signal about node
communication:
* Health_monitor
* node status table
* raft0 has a leader

If all of them are not updated for long time
node can be isolated
If node isolated or decomissioned it can not handle kafka requests from
client, so in this case we need to signal client comunicate with another
broker. For this we need to exclude isolated node from brokers list and
return -1 for controller_id, after it client will send metadata request to
another broker and will comunicate with it

Also we can not put isolated node like leader for partition.
To prevent client stuck we gonna add fake leader to force
client connect to another broker
@VadimPlh
Copy link
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants