Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[IPv6 configuration] A node is stuck with "?U" status and Host ID is "null", unclear reason #16039

Closed
1 of 2 tasks
juliayakovlev opened this issue Nov 13, 2023 · 27 comments
Closed
1 of 2 tasks
Assignees
Labels
area/ipv6 Related to IPv6 networking P1 Urgent status/regression
Milestone

Comments

@juliayakovlev
Copy link

Issue description

  • This issue is a regression.
  • It is unknown if this issue is a regression.

Scylla configuration: all addresses are IPv6
scylla.yaml

When cluster was created, longevity-10gb-3h-5-4-db-node-ba01d950-1 node remained in status "?N":

Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address                                  Load       Tokens       Owns    Host ID                               Rack
UN  2a05:d018:12e3:f000:7aed:3233:7bbb:c344  1.01 MB    256          ?       5728ba00-ca9c-45db-adc4-bf08c0067df7  1a
UN  2a05:d018:12e3:f000:1005:2f59:d49d:bc98  810.74 KB  256          ?       7679d491-4572-4612-bb39-59ce8ddca1f2  1a
?N  2a05:d018:12e3:f000:69d:d442:b592:5d7a   ?          256          ?       null                                  1a
UN  2a05:d018:12e3:f000:9129:adc0:f67a:79b6  317.33 KB  256          ?       a79c8fd3-b4d7-4452-b41e-49e93a6003bf  1a
UN  2a05:d018:12e3:f000:c352:c6ab:ea67:d165  448.57 KB  256          ?       3600e6fc-a89e-490c-9746-ac26393c4b94  1a
UN  2a05:d018:12e3:f000:d16b:cbcb:111a:bc43  582.42 KB  256          ?       0c118da7-d2b8-42dc-9799-a3b5fb79fa7b  1a

Host ID is null despite it was set:

2023-11-07T15:34:21.695+00:00 longevity-10gb-3h-5-4-db-node-ba01d950-1     !INFO | scylla[11960]:  [shard 0:main] init - Setting local host id to 37334da4-55c5-417c-ba3f-661ac2bf9e01

Setting host id for the node was seen by other nodes and status "NORMAL". For example:

2023-11-07T15:37:00.415+00:00 longevity-10gb-3h-5-4-db-node-ba01d950-2     !INFO | scylla[11943]:  [shard 0:goss] storage_service - Set host_id=37334da4-55c5-417c-ba3f-661ac2bf9e01 to be owned by node=2a05:d018:12e3:f000:069d:d442:b592:5d7a
2023-11-07T15:37:00.415+00:00 longevity-10gb-3h-5-4-db-node-ba01d950-2     !INFO | scylla[11943]:  [shard 0:goss] gossip - InetAddress 2a05:d018:12e3:f000:069d:d442:b592:5d7a is now UP, status = NORMAL

But nodetool status shows "?N" status.
As result the test failed with error Not all nodes joined the cluster

I did not find any errors in the DB node and test logs. It is really not clear why the status is UNKNOWN for this node.
When I ran the reproducers I received case when 2 nodes were in UNKNOWN status.

This issue is observed with 5.3.0 version and 5.4.0

Impact

Cluster is not ready

How frequently does it reproduce?

I ran few reproducers with different versions

5.4.0~rc1 - the problem is reproduced, reproduced almost every run
5.3.0-rc0 - the problem is reproduced (got from first run). Official IPv6 test was not run with this version.
2023.1.2 - the problem is NOT reproduced (ran twice). Official IPv6 test was run and passed.
5.2.9 - the problem is NOT reproduced (ran once). Official IPv6 test was run and passed.

Installation details

Cluster size: 6 nodes (i4i.2xlarge)

Scylla Nodes used in this run:

  • longevity-10gb-3h-5-4-db-node-ba01d950-6 (54.247.25.73 | 2a05:d018:12e3:f000:7aed:3233:7bbb:c344) (shards: 7)
  • longevity-10gb-3h-5-4-db-node-ba01d950-5 (18.203.248.49 | 2a05:d018:12e3:f000:1005:2f59:d49d:bc98) (shards: 7)
  • longevity-10gb-3h-5-4-db-node-ba01d950-4 (3.252.162.208 | 2a05:d018:12e3:f000:d16b:cbcb:111a:bc43) (shards: 7)
  • longevity-10gb-3h-5-4-db-node-ba01d950-3 (54.75.1.208 | 2a05:d018:12e3:f000:c352:c6ab:ea67:d165) (shards: 7)
  • longevity-10gb-3h-5-4-db-node-ba01d950-2 (34.245.97.126 | 2a05:d018:12e3:f000:9129:adc0:f67a:79b6) (shards: 7)
  • longevity-10gb-3h-5-4-db-node-ba01d950-1 (63.35.226.250 | 2a05:d018:12e3:f000:69d:d442:b592:5d7a) (shards: 7)

OS / Image: ami-01715aa610de633df (aws: undefined_region)

Test: longevity-10gb-3h-ipv6-test
Test id: ba01d950-928e-49f4-a81e-1bb54e52f9ad
Test name: scylla-5.4/longevity/longevity-10gb-3h-ipv6-test
Test config file(s):

Logs and commands
  • Restore Monitor Stack command: $ hydra investigate show-monitor ba01d950-928e-49f4-a81e-1bb54e52f9ad
  • Restore monitor on AWS instance using Jenkins job
  • Show all stored logs command: $ hydra investigate show-logs ba01d950-928e-49f4-a81e-1bb54e52f9ad

Logs:

Jenkins job URL
Argus

@juliayakovlev
Copy link
Author

Another reproducer:
2 nodes with null host_id: longevity-10gb-3h-5-4-db-node-726ad7ab-4 and longevity-10gb-3h-5-4-db-node-726ad7ab-6

By log host_id was set successully. Other nodes also see this host_id.
For example, node longevity-10gb-3h-5-4-db-node-726ad7ab-4 with IP 2a05:d018:12e3:f000:a90:e791:a483:e28a. Setting host_id:

Nov 12 11:34:35 longevity-10gb-3h-5-4-db-node-726ad7ab-4 scylla[12476]:  [shard 0:main] init - Setting local host id to e687fc9e-a4aa-4265-83bd-123a1b3cf47e

Its status is UP and UNKNOWN seen on longevity-10gb-3h-5-4-db-node-726ad7ab-2 :

Nov 12 11:34:38 longevity-10gb-3h-5-4-db-node-726ad7ab-2 scylla[12260]:  [shard 2:main] raft_group_registry - Raft server id e687fc9e-a4aa-4265-83bd-123a1b3cf47e cannot be translated to an IP address.
Nov 12 11:34:38 longevity-10gb-3h-5-4-db-node-726ad7ab-2 scylla[12260]:  [shard 0:goss] gossip - InetAddress 2a05:d018:12e3:f000:0a90:e791:a483:e28a is now UP, status = UNKNOWN

Setting host_id is seen later on the longevity-10gb-3h-5-4-db-node-726ad7ab-2 :

Nov 12 11:35:29 longevity-10gb-3h-5-4-db-node-726ad7ab-2 scylla[12260]:  [shard 0:goss] storage_service - Set host_id=e687fc9e-a4aa-4265-83bd-123a1b3cf47e to be owned by node=2a05:d018:12e3:f000:0a90:e791:a483:e28a

And status in the cluster is ?U with Host ID = "null":

< t:2023-11-12 11:36:17,669 f:base.py         l:228  c:RemoteLibSSH2CmdRunner p:DEBUG > Status=Up/Down
< t:2023-11-12 11:36:17,669 f:base.py         l:228  c:RemoteLibSSH2CmdRunner p:DEBUG > |/ State=Normal/Leaving/Joining/Moving
< t:2023-11-12 11:36:17,671 f:base.py         l:228  c:RemoteLibSSH2CmdRunner p:DEBUG > --  Address                                  Load       Tokens       Owns    Host ID                               Rack
< t:2023-11-12 11:36:17,675 f:base.py         l:228  c:RemoteLibSSH2CmdRunner p:DEBUG > UN  2a05:d018:12e3:f000:4857:4997:eb0a:5d5d  ?          256          ?       9c0ce21e-1c56-4440-942f-790aaf55ecd4  1a
< t:2023-11-12 11:36:17,678 f:base.py         l:228  c:RemoteLibSSH2CmdRunner p:DEBUG > UN  2a05:d018:12e3:f000:d510:5e05:8c45:5c9e  466.02 KB  256          ?       fec5a101-2939-44e7-a881-6e2cde8735c8  1a
< t:2023-11-12 11:36:17,680 f:base.py         l:228  c:RemoteLibSSH2CmdRunner p:DEBUG > UN  2a05:d018:12e3:f000:d9fa:1c56:f18c:4cbb  188.43 KB  256          ?       d70afd66-2027-4191-a53b-12efe419bd39  1a
< t:2023-11-12 11:36:17,682 f:base.py         l:228  c:RemoteLibSSH2CmdRunner p:DEBUG > ?N  2a05:d018:12e3:f000:a90:e791:a483:e28a   ?          256          ?       null                                  1a

Issue description

  • This issue is a regression.
  • It is unknown if this issue is a regression.

Describe your issue in detail and steps it took to produce it.

Impact

Describe the impact this issue causes to the user.

How frequently does it reproduce?

Describe the frequency with how this issue can be reproduced.

Installation details

Cluster size: 6 nodes (i4i.2xlarge)

Scylla Nodes used in this run:

  • longevity-10gb-3h-5-4-db-node-726ad7ab-6 (54.195.157.182 | 2a05:d018:12e3:f000:ea5:fa8d:4daa:a44b) (shards: 7)
  • longevity-10gb-3h-5-4-db-node-726ad7ab-5 (34.245.153.44 | 2a05:d018:12e3:f000:366a:e44f:5412:3604) (shards: 7)
  • longevity-10gb-3h-5-4-db-node-726ad7ab-4 (34.245.136.189 | 2a05:d018:12e3:f000:a90:e791:a483:e28a) (shards: 7)
  • longevity-10gb-3h-5-4-db-node-726ad7ab-3 (3.254.140.245 | 2a05:d018:12e3:f000:d510:5e05:8c45:5c9e) (shards: 7)
  • longevity-10gb-3h-5-4-db-node-726ad7ab-2 (3.253.25.236 | 2a05:d018:12e3:f000:4857:4997:eb0a:5d5d) (shards: 7)
  • longevity-10gb-3h-5-4-db-node-726ad7ab-1 (54.170.234.145 | 2a05:d018:12e3:f000:d9fa:1c56:f18c:4cbb) (shards: 7)

OS / Image: ami-01715aa610de633df (aws: undefined_region)

Test: longevity-10gb-3h-ipv6-test
Test id: 726ad7ab-ce9e-4a6c-89b3-29fe7ac645fa
Test name: scylla-staging/yulia/longevity-10gb-3h-ipv6-test
Test config file(s):

Logs and commands
  • Restore Monitor Stack command: $ hydra investigate show-monitor 726ad7ab-ce9e-4a6c-89b3-29fe7ac645fa
  • Restore monitor on AWS instance using Jenkins job
  • Show all stored logs command: $ hydra investigate show-logs 726ad7ab-ce9e-4a6c-89b3-29fe7ac645fa

Logs:

Jenkins job URL
Argus

@mykaul
Copy link
Contributor

mykaul commented Nov 20, 2023

@elcallio - please have a look.

@elcallio
Copy link
Contributor

So the problem is that parts of the JMX interface nodetool uses relies on endpoints as strings, i.e. textual IP:s. Scylla/seastar formats ipv6 addresses slightly different; java removes leading zeros in an address, i.e. 2001:db8:0:0:0:0:0:2 whereas scylla uses inet_ntop "standard" unix formatting with full zero fill, i.e. 2001:0db8:0000:0000:0000:0000:0000:0002 .

Relatively easy stop-gap solution is to simply transform strings in scylla-jmx, normalizing to java-style text. I am reluctant to modify the formatting in scylla.

@elcallio
Copy link
Contributor

The problem is of course making sure all relevant code paths are handled...

elcallio pushed a commit to elcallio/scylla-jmx that referenced this issue Nov 20, 2023
Fixes scylladb#228

Endpoint strings in StorageService does not follow Java formatting for IPv6 nodes.
This causes nodetool to break when mixing API:s returning actual IP addresses with info mapped by string.
Example is nodetool status.

Refs scylladb/scylladb#16039

While this is really broken nodetool code, the problem is probably easier to fix here, in Scylla-JMX,
as a stopgap until we can fully replace nodetool.
@mykaul
Copy link
Contributor

mykaul commented Nov 20, 2023

@elcallio - any idea how did it work in the past? Doesn't sound like it ever did... ?

@elcallio
Copy link
Contributor

Probably not. I am however lying a bit as well: The current formatting was applied by 4ea6e06, which changed formatting to use a special, manual fmt routine (in a header - why?), instead of inet_ntop in seastar. However, using the latter would instead print with all middle zeroes truncated (i.e. 2001:db8::2 in this example), which would be just as non-matching.

I.e. we have two different formatting paths here. Not sure if there would be any bad consequences of changing the gms:::inet_address formatting to not lead-zero.

But mainly I of course dislike the two paths...

@mykaul
Copy link
Contributor

mykaul commented Nov 22, 2023

Does it impact 5.4?

@elcallio
Copy link
Contributor

Yes. In fact, given that java/inet_ntop formats differently and the way the API:s work, I don't think these nodetool ops ever worked properly for IPv6 hosts. Luckily, the java-only (JMX) fix should backport easily across all releases one might wish to fix.

@mykaul mykaul added backport/5.4 Issues that should be backported to 5.4 branch once they'll be fixed Backport candidate P1 Urgent labels Nov 22, 2023
@mykaul mykaul added this to the 5.4 milestone Nov 22, 2023
@mykaul mykaul removed the triage/oss label Nov 22, 2023
@mykaul
Copy link
Contributor

mykaul commented Nov 22, 2023

scylladb/scylla-jmx#229 is the fix on JMX (then need backport to 5.4)

@avikivity
Copy link
Member

Why did we discover it on 5.4 and not master?

@mykaul
Copy link
Contributor

mykaul commented Nov 23, 2023

5.4.0~rc1 - the problem is reproduced, reproduced almost every run
5.3.0-rc0 - the problem is reproduced (got from first run). Official IPv6 test was not run with this version. <------
2023.1.2 - the problem is NOT reproduced (ran twice). Official IPv6 test was run and passed.
5.2.9 - the problem is NOT reproduced (ran once). Official IPv6 test was run and passed.

@avikivity - I would assume some sort of a 'race' on when we run this test on master (and it's obvious we don't run regularly IPv6 tests on master?)

denesb pushed a commit to scylladb/scylla-jmx that referenced this issue Nov 24, 2023
Fixes #228

Endpoint strings in StorageService does not follow Java formatting for IPv6 nodes.
This causes nodetool to break when mixing API:s returning actual IP addresses with info mapped by string.
Example is nodetool status.

Refs scylladb/scylladb#16039

While this is really broken nodetool code, the problem is probably easier to fix here, in Scylla-JMX,
as a stopgap until we can fully replace nodetool.

Closes: #229
@avikivity
Copy link
Member

@denesb since you reviewed it, please complete the backport

@mykaul
Copy link
Contributor

mykaul commented Nov 30, 2023

@denesb since you reviewed it, please complete the backport

I thought they were all done?

@bhalevy
Copy link
Member

bhalevy commented Dec 3, 2023

@denesb since you reviewed it, please complete the backport

I thought they were all done?

@denesb I don't see updated tools/jmx in branch-5.4

@mykaul
Copy link
Contributor

mykaul commented Dec 3, 2023

@denesb since you reviewed it, please complete the backport

I thought they were all done?

@denesb I don't see updated tools/jmx in branch-5.4

6f073df

@mykaul
Copy link
Contributor

mykaul commented Dec 3, 2023

What's not clear to me is if it has scylladb/scylla-jmx@80ce599 in it as well.

@avikivity
Copy link
Member

I think we should fix the way scylladb formats ipv6.

@mykaul
Copy link
Contributor

mykaul commented Dec 3, 2023

I think we should fix the way scylladb formats ipv6.

The comment from elcallio/scylla-jmx@ccb1ccb says:

While this is really broken nodetool code, the problem is probably easier to fix here, in Scylla-JMX,
as a stopgap until we can fully replace nodetool.

@avikivity
Copy link
Member

I can't evaluate it without understanding what's broken.

@mykaul
Copy link
Contributor

mykaul commented Dec 3, 2023

I can't evaluate it without understanding what's broken.

See above - #16039 (comment)

@avikivity
Copy link
Member

Probably not. I am however lying a bit as well: The current formatting was applied by 4ea6e06, which changed formatting to use a special, manual fmt routine (in a header - why?), instead of inet_ntop in seastar. However, using the latter would instead print with all middle zeroes truncated (i.e. 2001:db8::2 in this example), which would be just as non-matching.

I.e. we have two different formatting paths here. Not sure if there would be any bad consequences of changing the gms:::inet_address formatting to not lead-zero.

But mainly I of course dislike the two paths...

4ea6e06 should be fixed to restore the previous behavior. After that, we can change jmx/nodetool/whatever to be more permissive, and after that change the formatters to conform to canonical representation (without zeroes).

@mykaul
Copy link
Contributor

mykaul commented Dec 3, 2023

@tchaikov - please see above.

@mykaul
Copy link
Contributor

mykaul commented Dec 3, 2023

@avikivity - but I vote to get the java fix to 5.4, to unblock its release (if it's not in already - not sure)

@avikivity
Copy link
Member

Don't know, @denesb should manage the backport.

@elcallio
Copy link
Contributor

elcallio commented Dec 4, 2023

4ea6e06 should be fixed to restore the previous behavior. After that, we can change jmx/nodetool/whatever to be more permissive, and after that change the formatters to conform to canonical representation (without zeroes).

The problem is that it is java formatting that is the issue here. Java formatting != inet_ntop. Nodetool mixes data in inetaddress format with data in string format, addresses formatted in seastar/scylla. And I don't think we should change seastar to format ipv6 java-style.

Which is why, until such time as we make a nodetool replacement, and can bypass the archaic JMX API:s it uses, it is easier to just try to ensure strings are java-formatted in the JMX stack.

denesb pushed a commit to scylladb/scylla-jmx that referenced this issue Dec 4, 2023
Fixes #228

Endpoint strings in StorageService does not follow Java formatting for IPv6 nodes.
This causes nodetool to break when mixing API:s returning actual IP addresses with info mapped by string.
Example is nodetool status.

Refs scylladb/scylladb#16039

While this is really broken nodetool code, the problem is probably easier to fix here, in Scylla-JMX,
as a stopgap until we can fully replace nodetool.

Closes: #229
(cherry picked from commit 80ce599)
denesb added a commit that referenced this issue Dec 4, 2023
* ./tools/jmx 9a03d4fa...166599f0 (1):
  > StorageService: Normalize endpoint inetaddress strings to java form

Fixes: #16039
@denesb
Copy link
Contributor

denesb commented Dec 4, 2023

Backport queued to 5.4 as 1a0424d

tchaikov added a commit to tchaikov/scylladb that referenced this issue Dec 4, 2023
in 4ea6e06, we specialized fmt::formatter<gms::inet_address> using
the formatter of bytes if the underlying address is an IPv6 address.
this breaks the tests with JMX which expected the shortened form of
the text representation of the IPv6 address.

in this change, instead of reinventing the wheel, let's reuse the
existing formatter of net::inet_address, which is able to handle
both IPv4 and IPv6 addresses, also it follows
https://datatracker.ietf.org/doc/html/rfc5952 by compressing the
consecutive zeros.

since this new formatter is a thin wrapper of seastar::net::inet_addresss,
the corresponding unit test will be added to Seastar.

Refs scylladb#16039
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
@tchaikov
Copy link
Contributor

tchaikov commented Dec 4, 2023

Probably not. I am however lying a bit as well: The current formatting was applied by 4ea6e06, which changed formatting to use a special, manual fmt routine (in a header - why?), instead of inet_ntop in seastar. However, using the latter would instead print with all middle zeroes truncated (i.e. 2001:db8::2 in this example), which would be just as non-matching.
I.e. we have two different formatting paths here. Not sure if there would be any bad consequences of changing the gms:::inet_address formatting to not lead-zero.
But mainly I of course dislike the two paths...

4ea6e06 should be fixed to restore the previous behavior. After that, we can change jmx/nodetool/whatever to be more permissive, and after that change the formatters to conform to canonical representation (without zeroes).

should be fixed by #16267

tchaikov added a commit to tchaikov/seastar that referenced this issue Dec 4, 2023
to ensure that we adhere to the related RFC.

Refs scylladb/scylladb#16039
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
tchaikov added a commit to tchaikov/seastar that referenced this issue Dec 4, 2023
to ensure that we adhere to the related RFC. because
seastar::inet_address does not have its own test suite, let's
colocate it with the tests for network_interfaces() at this moment.
we can extract them out once there are more of them.

Refs scylladb/scylladb#16039
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
tchaikov added a commit to tchaikov/seastar that referenced this issue Dec 4, 2023
to ensure that we adhere to the related RFC. because
seastar::inet_address does not have its own test suite, let's
colocate it with the tests for network_interfaces() at this moment.
we can extract them out once there are more of them.

Refs scylladb/scylladb#16039
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
avikivity pushed a commit to scylladb/seastar that referenced this issue Dec 4, 2023
to ensure that we adhere to the related RFC. because
seastar::inet_address does not have its own test suite, let's
colocate it with the tests for network_interfaces() at this moment.
we can extract them out once there are more of them.

Refs scylladb/scylladb#16039
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
tchaikov added a commit to tchaikov/scylladb that referenced this issue Dec 4, 2023
in 4ea6e06, we specialized fmt::formatter<gms::inet_address> using
the formatter of bytes if the underlying address is an IPv6 address.
this breaks the tests with JMX which expected the shortened form of
the text representation of the IPv6 address.

in this change, instead of reinventing the wheel, let's reuse the
existing formatter of net::inet_address, which is able to handle
both IPv4 and IPv6 addresses, also it follows
https://datatracker.ietf.org/doc/html/rfc5952 by compressing the
consecutive zeros.

since this new formatter is a thin wrapper of seastar::net::inet_addresss,
the corresponding unit test will be added to Seastar.

Refs scylladb#16039
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
avikivity pushed a commit that referenced this issue Dec 4, 2023
in 4ea6e06, we specialized fmt::formatter<gms::inet_address> using
the formatter of bytes if the underlying address is an IPv6 address.
this breaks the tests with JMX which expected the shortened form of
the text representation of the IPv6 address.

in this change, instead of reinventing the wheel, let's reuse the
existing formatter of net::inet_address, which is able to handle
both IPv4 and IPv6 addresses, also it follows
https://datatracker.ietf.org/doc/html/rfc5952 by compressing the
consecutive zeros.

since this new formatter is a thin wrapper of seastar::net::inet_addresss,
the corresponding unit test will be added to Seastar.

Refs #16039
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #16267
@mykaul mykaul removed Backport candidate backport/5.4 Issues that should be backported to 5.4 branch once they'll be fixed labels Dec 13, 2023
graphcareful pushed a commit to graphcareful/seastar that referenced this issue Mar 20, 2024
to ensure that we adhere to the related RFC. because
seastar::inet_address does not have its own test suite, let's
colocate it with the tests for network_interfaces() at this moment.
we can extract them out once there are more of them.

Refs scylladb/scylladb#16039
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ipv6 Related to IPv6 networking P1 Urgent status/regression
Projects
None yet
Development

No branches or pull requests

7 participants