Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repair of a single shard range opens rpc connections for streaming on all shards #4708

Closed
slivne opened this issue Jul 15, 2019 · 6 comments

Comments

@slivne
Copy link
Contributor

commented Jul 15, 2019

Scylla Version: 3.0.X

I ran nodetool repair from the command line using the ranges from the trace

nodetool repair -st 8811292670950375424 -et 8812418570857218048 keyspace1

Jul 14 22:30:35 ip-10-0-0-95.ec2.internal scylla[7188]:  [shard 0] repair - starting user-requested repair for keyspace keyspace1, repair id 40428, options {{ trace -> false}, { ranges -> 8811292670950375424:881
Jul 14 22:30:35 ip-10-0-0-95.ec2.internal scylla[7188]:  [shard 0] repair - Repair 1 out of 1 ranges, id=40428, shard=0, keyspace=keyspace1, table={standard1}, range=(8811292670950375424, 8812418570857218048]
Jul 14 22:30:35 ip-10-0-0-95.ec2.internal scylla[7188]:  [shard 0] repair - repair 40428 on shard 0 completed successfully
Jul 14 22:30:35 ip-10-0-0-95.ec2.internal scylla[7188]:  [shard 2] repair - Repair 1 out of 1 ranges, id=40428, shard=2, keyspace=keyspace1, table={standard1}, range=(8811292670950375424, 8812418570857218048]
Jul 14 22:30:35 ip-10-0-0-95.ec2.internal scylla[7188]:  [shard 3] repair - Repair 1 out of 1 ranges, id=40428, shard=3, keyspace=keyspace1, table={standard1}, range=(8811292670950375424, 8812418570857218048]
Jul 14 22:30:35 ip-10-0-0-95.ec2.internal scylla[7188]:  [shard 1] repair - Repair 1 out of 1 ranges, id=40428, shard=1, keyspace=keyspace1, table={standard1}, range=(8811292670950375424, 8812418570857218048]
Jul 14 22:30:35 ip-10-0-0-95.ec2.internal scylla[7188]:  [shard 1] repair - repair 40428 on shard 1 completed successfully
Jul 14 22:30:35 ip-10-0-0-95.ec2.internal scylla[7188]:  [shard 3] repair - repair 40428 on shard 3 completed successfully
Jul 14 22:30:35 ip-10-0-0-95.ec2.internal scylla[7188]:  [shard 2] repair - Start streaming for repair id=40428, shard=2, index=0, ranges_in=1, ranges_out=2
Jul 14 22:30:35 ip-10-0-0-95.ec2.internal scylla[7188]:  [shard 2] stream_session - [Stream #ff9d2cf0-a686-11e9-93d4-000000000003] Executing streaming plan for repair-in-id-40428-shard-2-index-0 with peers={10.0
Jul 14 22:30:35 ip-10-0-0-95.ec2.internal scylla[7188]:  [shard 2] compaction - Compacting [/var/lib/scylla/data/keyspace1/standard1-20ab6610a67c11e9b2e1000000000003/la-22070-big-Data.db:level=0, /var/lib/scylla
Jul 14 22:30:35 ip-10-0-0-95.ec2.internal scylla[7188]:  [shard 1] stream_session - [Stream #ff9d2cf0-a686-11e9-93d4-000000000003] Write to sstable for ks=keyspace1, cf=standard1, estimated_partitions=1, receive
Jul 14 22:30:35 ip-10-0-0-95.ec2.internal scylla[7188]:  [shard 2] stream_session - [Stream #ff9d2cf0-a686-11e9-93d4-000000000003] Streaming plan for repair-in-id-40428-shard-2-index-0 succeeded, peers={10.0.0.2
Jul 14 22:30:35 ip-10-0-0-95.ec2.internal scylla[7188]:  [shard 2] stream_session - [Stream #ff9d2cf1-a686-11e9-93d4-000000000003] Executing streaming plan for repair-out-id-40428-shard-2-index-0 with peers={10.
Jul 14 22:30:35 ip-10-0-0-95.ec2.internal scylla[7188]:  [shard 2] stream_session - [Stream #ff9d2cf1-a686-11e9-93d4-000000000003] Start sending ks=keyspace1, cf=standard1, estimated_partitions=129, with new rpc
Jul 14 22:30:35 ip-10-0-0-95.ec2.internal scylla[7188]:  [shard 2] stream_session - [Stream #ff9d2cf1-a686-11e9-93d4-000000000003] Start sending ks=keyspace1, cf=standard1, estimated_partitions=129, with new rpc
Jul 14 22:30:35 ip-10-0-0-95.ec2.internal scylla[7188]:  [shard 0] stream_session - [Stream #ff9d2cf1-a686-11e9-93d4-000000000003] Start sending ks=keyspace1, cf=standard1, estimated_partitions=1, with new rpc s
Jul 14 22:30:35 ip-10-0-0-95.ec2.internal scylla[7188]:  [shard 0] stream_session - [Stream #ff9d2cf1-a686-11e9-93d4-000000000003] Start sending ks=keyspace1, cf=standard1, estimated_partitions=1, with new rpc s
Jul 14 22:30:35 ip-10-0-0-95.ec2.internal scylla[7188]:  [shard 3] stream_session - [Stream #ff9d2cf1-a686-11e9-93d4-000000000003] Start sending ks=keyspace1, cf=standard1, estimated_partitions=1, with new rpc s
Jul 14 22:30:35 ip-10-0-0-95.ec2.internal scylla[7188]:  [shard 3] stream_session - [Stream #ff9d2cf1-a686-11e9-93d4-000000000003] Start sending ks=keyspace1, cf=standard1, estimated_partitions=1, with new rpc s
Jul 14 22:30:35 ip-10-0-0-95.ec2.internal scylla[7188]:  [shard 1] stream_session - [Stream #ff9d2cf1-a686-11e9-93d4-000000000003] Start sending ks=keyspace1, cf=standard1, estimated_partitions=1, with new rpc s
Jul 14 22:30:35 ip-10-0-0-95.ec2.internal scylla[7188]:  [shard 1] stream_session - [Stream #ff9d2cf1-a686-11e9-93d4-000000000003] Start sending ks=keyspace1, cf=standard1, estimated_partitions=1, with new rpc s
Jul 14 22:30:35 ip-10-0-0-95.ec2.internal scylla[7188]:  [shard 2] stream_session - [Stream #ff9d2cf1-a686-11e9-93d4-000000000003] Streaming plan for repair-out-id-40428-shard-2-index-0 succeeded, peers={10.0.0.
Jul 14 22:30:35 ip-10-0-0-95.ec2.internal scylla[7188]:  [shard 2] repair - repair 40428 on shard 2 completed successfully
Jul 14 22:30:35 ip-10-0-0-95.ec2.internal scylla[7188]:  [shard 0] repair - repair 40428 completed successfully

only "shard 2" had information to send although all shards open streaming sessions

@slivne slivne added this to the 3.1 milestone Jul 15, 2019

@slivne slivne added the Backport 3.0 label Jul 15, 2019

asias added a commit to asias/scylla that referenced this issue Jul 16, 2019

streaming: Do not open rpc stream connection if ranges are not releva…
…nt to a shard

Given a list of ranges to stream, stream_transfer_task will create an
reader with the ranges and create a rpc stream connection on all the shards.

When user provides ranges to repair with -st -et options, e.g.,
using scylla-manger, such ranges can belong to only one shard, repair
will pass such ranges to streaming.

As a result, only one shard will have data to send while the rpc stream
connections are created on all the shards, which can cause the kernel
run out of ports in some systems.

To mitigate the problem, do not open the connection if the ranges do not
belong to the shard at all.

Refs: scylladb#4708

asias added a commit to asias/scylla that referenced this issue Jul 16, 2019

streaming: Do not open rpc stream connection if ranges are not releva…
…nt to a shard

Given a list of ranges to stream, stream_transfer_task will create an
reader with the ranges and create a rpc stream connection on all the shards.

When user provides ranges to repair with -st -et options, e.g.,
using scylla-manger, such ranges can belong to only one shard, repair
will pass such ranges to streaming.

As a result, only one shard will have data to send while the rpc stream
connections are created on all the shards, which can cause the kernel
run out of ports in some systems.

To mitigate the problem, do not open the connection if the ranges do not
belong to the shard at all.

Refs: scylladb#4708
@slivne

This comment has been minimized.

Copy link
Contributor Author

commented Jul 16, 2019

pull request reviewed by gleb waiting for avi

avikivity added a commit that referenced this issue Jul 18, 2019

streaming: Do not open rpc stream connection if ranges are not releva…
…nt to a shard

Given a list of ranges to stream, stream_transfer_task will create an
reader with the ranges and create a rpc stream connection on all the shards.

When user provides ranges to repair with -st -et options, e.g.,
using scylla-manger, such ranges can belong to only one shard, repair
will pass such ranges to streaming.

As a result, only one shard will have data to send while the rpc stream
connections are created on all the shards, which can cause the kernel
run out of ports in some systems.

To mitigate the problem, do not open the connection if the ranges do not
belong to the shard at all.

Refs: #4708

avikivity added a commit that referenced this issue Jul 21, 2019

streaming: Do not open rpc stream connection if ranges are not releva…
…nt to a shard

Given a list of ranges to stream, stream_transfer_task will create an
reader with the ranges and create a rpc stream connection on all the shards.

When user provides ranges to repair with -st -et options, e.g.,
using scylla-manger, such ranges can belong to only one shard, repair
will pass such ranges to streaming.

As a result, only one shard will have data to send while the rpc stream
connections are created on all the shards, which can cause the kernel
run out of ports in some systems.

To mitigate the problem, do not open the connection if the ranges do not
belong to the shard at all.

Refs: #4708
(cherry picked from commit 64a4c0e)

avikivity added a commit that referenced this issue Jul 21, 2019

streaming: Do not open rpc stream connection if ranges are not releva…
…nt to a shard

Given a list of ranges to stream, stream_transfer_task will create an
reader with the ranges and create a rpc stream connection on all the shards.

When user provides ranges to repair with -st -et options, e.g.,
using scylla-manger, such ranges can belong to only one shard, repair
will pass such ranges to streaming.

As a result, only one shard will have data to send while the rpc stream
connections are created on all the shards, which can cause the kernel
run out of ports in some systems.

To mitigate the problem, do not open the connection if the ranges do not
belong to the shard at all.

Refs: #4708
(cherry picked from commit 64a4c0e)
@slivne

This comment has been minimized.

Copy link
Contributor Author

commented Jul 22, 2019

@asias why doesn't this patch fix the issue

@asias

This comment has been minimized.

Copy link
Contributor

commented Jul 22, 2019

@asias why doesn't this patch fix the issue

What do you mean? How did you test?

@slivne

This comment has been minimized.

Copy link
Contributor Author

commented Jul 22, 2019

@asias

This comment has been minimized.

Copy link
Contributor

commented Jul 23, 2019

Your patch referenced the issue it did not fix the issue - why is that ? Is there something else to add ? the possibility to reuse rpc streams - is not related to this issue.

OK. I thought you wanted to use this issue to track the out of ports issue. For this specific repair issue, my patch closes the issue.

@asias

This comment has been minimized.

Copy link
Contributor

commented Jul 23, 2019

I am closing it. The patch 64a4c0e is merged.

@asias asias closed this Jul 23, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.