Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scylla start very slowly, Spend a lot of time for Loading repair history #16774

Closed
zey1996 opened this issue Jan 15, 2024 · 6 comments
Closed
Milestone

Comments

@zey1996
Copy link

zey1996 commented Jan 15, 2024

Installation details
Scylla version: 5.2.11-arm64
Cluster size: 4 Node
OS: CentOS
Hardware details (for performance issues) Delete if unneeded
Platform: kubernetes containerd
Hardware:
memory=320G
cpu:

Architecture:          aarch64
Byte Order:            Little Endian
CPU(s):                128
On-line CPU(s) list:   0-127
Thread(s) per core:    1
Core(s) per socket:    64
Socket(s):             2
NUMA node(s):          4
Model:                 0
CPU max MHz:           2600.0000
CPU min MHz:           200.0000
BogoMIPS:              200.00
L1d cache:             64K
L1i cache:             64K
L2 cache:              512K
L3 cache:              32768K
NUMA node0 CPU(s):     0-31
NUMA node1 CPU(s):     32-63
NUMA node2 CPU(s):     64-95
NUMA node3 CPU(s):     96-127
Flags:                 fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm

Disks: 4 SSD, raid0


I use scylla-manager for repair. and I use tombstone_gc = {'mode':'repair'} on my table.
First the cluster runs for several days. and I want to rolling restart my cluster.
But I found scylla is very slow to start. It takes more than an hour
and i check the log, found this:
image
Looks like scylla is loading repair_history.
Then I check the system.repair_history, found out that this table has millions of records.
I try to learn the source code of scylla, but I don't found some code for clean this table,
I guess this table is used by gc. but it makes scylla start too slowly.
How can I fix it? or maybe I can do somethink to clean this table?

@mykaul
Copy link
Contributor

mykaul commented Jan 15, 2024

@asias - thoughts?

@MyByte0
Copy link
Contributor

MyByte0 commented Jan 22, 2024

@asias - thoughts?
That two point I found.

  1. The repair_service::load_history() has get_tables_metadata().for_each_table_gently, I use get_tables_metadata().parallel_for_each_table replace. Is that ok?
  2. There is no logic for table repair_history. Maybe need add ttl for this?

denesb added a commit that referenced this issue Feb 20, 2024
Using `parallel_for_each_table` instance of `for_each_table_gently` on
`repair_service::load_history`, to reduced bootstrap time.
Using uuid_xor_to_uint32 on repair load_history dispatch to shard.

Ref: #16774

Closes #16927

* github.com:scylladb/scylladb:
  repair: resolve load_history shard load skew
  repair: accelerate repair load_history time
@mykaul mykaul added this to the 6.0 milestone Feb 20, 2024
@mykaul
Copy link
Contributor

mykaul commented Feb 20, 2024

Now that #16927 is in - what's left here?

@zey1996
Copy link
Author

zey1996 commented Feb 21, 2024

Now that #16927 is in - what's left here?

#16927
It will make the records load faster. but I think we should control the num of records.
#17103

@mykaul
Copy link
Contributor

mykaul commented Mar 10, 2024

I see there's work still on #17103 - I assume it might miss 6.0, shall I defer this to 6.1?

@mykaul mykaul modified the milestones: 6.0, 6.1 Mar 28, 2024
@mykaul mykaul removed the triage/oss label Mar 28, 2024
@asias
Copy link
Contributor

asias commented Apr 8, 2024

Fixed by 99b7ccf. Closing.

@asias asias closed this as completed Apr 8, 2024
@mykaul mykaul modified the milestones: 6.1, 6.0 Apr 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants