Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scrub: repeated snapshot of KS will be created for multiple tables in each KS #8212

Closed
amoskong opened this issue Mar 4, 2021 · 4 comments
Closed
Assignees
Milestone

Comments

@amoskong
Copy link
Contributor

amoskong commented Mar 4, 2021

Installation details
Scylla version (or git commit hash): 4.4.rc1-0.20210223.9fc582ee8 , 4.5.dev-0.20210118.faf71c6f7
Cluster size: 1
OS (RHEL/CentOS/Ubuntu/AWS AMI): CentOS7

Description

Currently a snapshot of KS will be created for processing each table, if the ks has multiple tables, and the snapshot is create very quick, then snapshot name with timestamp will be repeated.

Test Scenario

  • Create keyspace ks and two test tables
  • Execute nodetool scrub --skip-corrupted ks
CREATE KEYSPACE ks WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'}  AND durable_writes = true;

CREATE TABLE ks.cf1 (
    pk text,
    ck int,
    s int,
    v int,
    PRIMARY KEY (pk, ck)
);

CREATE TABLE ks.cf2 (
    pk text,
    ck int,
    s int,
    v int,
    PRIMARY KEY (pk, ck)
);

[amos@amos-centos7 data]$ nodetool scrub --skip-corrupted ks
Using /etc/scylla/scylla.yaml as the config file
WARN  05:59:40,114 Only 19.803GiB free across all data volumes. Consider adding more capacity to your cluster or removing obsolete snapshots
error: Scylla API server HTTP GET to URL '/storage_service/keyspace_scrub/ks' failed: std::runtime_error (Keyspace ks: snapshot pre-scrub-1614837580273 already exists.)
-- StackTrace --
java.lang.IllegalStateException: Scylla API server HTTP GET to URL '/storage_service/keyspace_scrub/ks' failed: std::runtime_error (Keyspace ks: snapshot pre-scrub-1614837580273 already exists.)
	at com.scylladb.jmx.api.APIClient.getException(APIClient.java:140)
	at com.scylladb.jmx.api.APIClient.getRawValue(APIClient.java:187)
	at com.scylladb.jmx.api.APIClient.getRawValue(APIClient.java:201)
	at com.scylladb.jmx.api.APIClient.getIntValue(APIClient.java:238)
	at org.apache.cassandra.service.StorageService.scrub(StorageService.java:1750)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:72)
	at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:276)
	at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)
	at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)
	at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
	at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
	at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)
	at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
	at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801)
	at com.scylladb.jmx.utils.APIMBeanServer.invoke(APIMBeanServer.java:188)
	at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1468)
	at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:76)
	at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1309)
	at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1401)
	at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:829)
	at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:357)
	at sun.rmi.transport.Transport$1.run(Transport.java:200)
	at sun.rmi.transport.Transport$1.run(Transport.java:197)
	at java.security.AccessController.doPrivileged(Native Method)
	at sun.rmi.transport.Transport.serviceCall(Transport.java:196)
	at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:573)
	at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:834)
	at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:688)
	at java.security.AccessController.doPrivileged(Native Method)
	at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:687)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

[amos@amos-centos7 data]$ echo $?
2

/Cc @roydahan @juliayakovlev

@amoskong
Copy link
Contributor Author

amoskong commented Mar 4, 2021

The scrub code was introduced in ef1bdeb by @elcallio , and modified by @xemul recently.

api/storage_service.cc

1154     ss::scrub.set(r, wrap_ks_cf(ctx, [&snap_ctl] (http_context& ctx, std::unique_ptr<request> req, sstring keyspace, std::vector<sstring> colu     mn_families) {
1155         const auto skip_corrupted = req_param<bool>(*req, "skip_corrupted", false);
1156 
1157         auto f = make_ready_future<>();
1158         if (!req_param<bool>(*req, "disable_snapshot", false)) {
1159             auto tag = format("pre-scrub-{:d}", db_clock::now().time_since_epoch().count());
1160             f = parallel_for_each(column_families, [&snap_ctl, keyspace, tag](sstring cf) {
1161                 return snap_ctl.local().take_column_family_snapshot(keyspace, cf, tag);
1162             });
1163         }

We can fix the problem by adding table name to tag

@amoskong
Copy link
Contributor Author

amoskong commented Mar 5, 2021

The snapshots are created inside the $table-name/snapshots/ (eg: node1/data/ks/cf-8b1b0a107dc011eb956c000000000000/snapshots/)

But Scylla doesn't allow to have duplicate snapshot name for one keyspace, even the snapshots are in different table directories.

Workaround

The issue can be workaround by add '--no-snapshot' (nodetool scrub --no-snapshot ks), or scrub tables one by one (nodetool scrub ks cf).

elcallio pushed a commit to elcallio/scylla that referenced this issue Mar 9, 2021
Fixes scylladb#8212

Some snapshotting operations call in on a single table at a time.
When checking for existing snapshots in this case, we should not
bother with snapshots in other tables. Add an optional "filter"
to check routine, which if non-empty includes tables to check.

Use case is "scrub" which calls with a limited set of tables
to snapshot.
@slivne
Copy link
Contributor

slivne commented Mar 22, 2021

@avikivity need to backport for 4.4.1+

avikivity pushed a commit that referenced this issue Sep 12, 2021
Fixes #8212

Some snapshotting operations call in on a single table at a time.
When checking for existing snapshots in this case, we should not
bother with snapshots in other tables. Add an optional "filter"
to check routine, which if non-empty includes tables to check.

Use case is "scrub" which calls with a limited set of tables
to snapshot.

Closes #8240

(cherry picked from commit f44420f)
avikivity pushed a commit that referenced this issue Sep 12, 2021
Fixes #8212

Some snapshotting operations call in on a single table at a time.
When checking for existing snapshots in this case, we should not
bother with snapshots in other tables. Add an optional "filter"
to check routine, which if non-empty includes tables to check.

Use case is "scrub" which calls with a limited set of tables
to snapshot.

Closes #8240

(cherry picked from commit f44420f)
@avikivity
Copy link
Member

Backported to 4.3, 4.4 (already fixed in 4.5).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants