Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

simple_kill_streaming_node_while_bootstrapping_test: Startup failed: std::runtime_error ({shard 0: fmt::v6::format_error (invalid type specifier), shard 1: fmt::v6::format_error (invalid type specifier)})']) #6874

Closed
bhalevy opened this issue Jul 19, 2020 · 16 comments
Assignees

Comments

@bhalevy
Copy link
Member

bhalevy commented Jul 19, 2020

Seen several time recently in next runs:
https://jenkins.scylladb.com/view/master/job/scylla-master/job/next/2255/testReport/update_cluster_layout_tests/TestUpdateClusterLayout/dtest___simple_kill_streaming_node_while_bootstrapping_test/
https://jenkins.scylladb.com/view/master/job/scylla-master/job/next/2258/testReport/update_cluster_layout_tests/TestUpdateClusterLayout/dtest___simple_kill_streaming_node_while_bootstrapping_test/
https://jenkins.scylladb.com/view/master/job/scylla-master/job/next/2262/testReport/update_cluster_layout_tests/TestUpdateClusterLayout/dtest___simple_kill_streaming_node_while_bootstrapping_test/

Unexpected errors found: [('node4', ['ERROR 2020-07-19 10:22:46,706 [shard 0] init - Startup failed: std::runtime_error ({shard 0: fmt::v6::format_error (invalid type specifier), shard 1: fmt::v6::format_error (invalid type specifier)})'])]

https://jenkins.scylladb.com/view/master/job/scylla-master/job/next/2262/artifact/logs-release.2/1595154166737_update_cluster_layout_tests.TestUpdateClusterLayout.simple_kill_streaming_node_while_bootstrapping_test/node4.log

...
WARN  2020-07-19 10:22:44,017 [shard 0] repair - repair id [id=4, uuid=0b1092a1-061f-4691-b0ac-547b281ef09d] on shard 0 failed: fmt::v6::format_error (invalid type specifier)
...
WARN  2020-07-19 10:22:46,678 [shard 1] repair - repair id [id=4, uuid=0b1092a1-061f-4691-b0ac-547b281ef09d] on shard 1 failed - 1001 ranges failed
WARN  2020-07-19 10:22:46,678 [shard 1] repair - repair id [id=4, uuid=0b1092a1-061f-4691-b0ac-547b281ef09d] on shard 1 failed: fmt::v6::format_error (invalid type specifier)
INFO  2020-07-19 10:22:46,678 [shard 1] compaction - [Compact ks.cf ca2c5360-c9a9-11ea-91d4-000000000001] Compacting [/jenkins/workspace/scylla-master/next/scylla/.dtest/dtest-9ed_plr4/test/node4/data/ks/cf-8a0b8c10c9a911ea859a000000000001/mc-713-big-Data.db:level=0, /jenkins/workspace/scylla-master/next/scylla/.dtest/dtest-9ed_plr4/test/node4/data/ks/cf-8a0b8c10c9a911ea859a000000000001/mc-715-big-Data.db:level=0, /jenkins/workspace/scylla-master/next/scylla/.dtest/dtest-9ed_plr4/test/node4/data/ks/cf-8a0b8c10c9a911ea859a000000000001/mc-737-big-Data.db:level=0, /jenkins/workspace/scylla-master/next/scylla/.dtest/dtest-9ed_plr4/test/node4/data/ks/cf-8a0b8c10c9a911ea859a000000000001/mc-735-big-Data.db:level=0, /jenkins/workspace/scylla-master/next/scylla/.dtest/dtest-9ed_plr4/test/node4/data/ks/cf-8a0b8c10c9a911ea859a000000000001/mc-709-big-Data.db:level=0, /jenkins/workspace/scylla-master/next/scylla/.dtest/dtest-9ed_plr4/test/node4/data/ks/cf-8a0b8c10c9a911ea859a000000000001/mc-711-big-Data.db:level=0, /jenkins/workspace/scylla-master/next/scylla/.dtest/dtest-9ed_plr4/test/node4/data/ks/cf-8a0b8c10c9a911ea859a000000000001/mc-717-big-Data.db:level=0, /jenkins/workspace/scylla-master/next/scylla/.dtest/dtest-9ed_plr4/test/node4/data/ks/cf-8a0b8c10c9a911ea859a000000000001/mc-719-big-Data.db:level=0, /jenkins/workspace/scylla-master/next/scylla/.dtest/dtest-9ed_plr4/test/node4/data/ks/cf-8a0b8c10c9a911ea859a000000000001/mc-721-big-Data.db:level=0, /jenkins/workspace/scylla-master/next/scylla/.dtest/dtest-9ed_plr4/test/node4/data/ks/cf-8a0b8c10c9a911ea859a000000000001/mc-725-big-Data.db:level=0, /jenkins/workspace/scylla-master/next/scylla/.dtest/dtest-9ed_plr4/test/node4/data/ks/cf-8a0b8c10c9a911ea859a000000000001/mc-723-big-Data.db:level=0, /jenkins/workspace/scylla-master/next/scylla/.dtest/dtest-9ed_plr4/test/node4/data/ks/cf-8a0b8c10c9a911ea859a000000000001/mc-729-big-Data.db:level=0, /jenkins/workspace/scylla-master/next/scylla/.dtest/dtest-9ed_plr4/test/node4/data/ks/cf-8a0b8c10c9a911ea859a000000000001/mc-727-big-Data.db:level=0, /jenkins/workspace/scylla-master/next/scylla/.dtest/dtest-9ed_plr4/test/node4/data/ks/cf-8a0b8c10c9a911ea859a000000000001/mc-733-big-Data.db:level=0, /jenkins/workspace/scylla-master/next/scylla/.dtest/dtest-9ed_plr4/test/node4/data/ks/cf-8a0b8c10c9a911ea859a000000000001/mc-731-big-Data.db:level=0, /jenkins/workspace/scylla-master/next/scylla/.dtest/dtest-9ed_plr4/test/node4/data/ks/cf-8a0b8c10c9a911ea859a000000000001/mc-707-big-Data.db:level=0, ]
WARN  2020-07-19 10:22:46,678 [shard 0] repair - repair id [id=4, uuid=0b1092a1-061f-4691-b0ac-547b281ef09d] failed: std::runtime_error ({shard 0: fmt::v6::format_error (invalid type specifier), shard 1: fmt::v6::format_error (invalid type specifier)})
WARN  2020-07-19 10:22:46,678 [shard 0] repair - repair id [id=4, uuid=0b1092a1-061f-4691-b0ac-547b281ef09d] to sync data for keyspace=ks, status=failed: std::runtime_error ({shard 0: fmt::v6::format_error (invalid type specifier), shard 1: fmt::v6::format_error (invalid type specifier)})
INFO  2020-07-19 10:22:46,679 [shard 0] init - Shutting down gossiping
INFO  2020-07-19 10:22:46,679 [shard 0] gossip - My status = BOOT
WARN  2020-07-19 10:22:46,679 [shard 0] gossip - No local state or state is in silent shutdown, not announcing shutdown
INFO  2020-07-19 10:22:46,679 [shard 0] gossip - Disable and wait for gossip loop started
INFO  2020-07-19 10:22:46,679 [shard 0] gossip - Disable and wait for gossip loop finished
INFO  2020-07-19 10:22:46,679 [shard 0] gossip - Gossip is now stopped
INFO  2020-07-19 10:22:46,679 [shard 0] init - Shutting down gossiping was successful
INFO  2020-07-19 10:22:46,679 [shard 0] init - Shutting down storage service notifications
INFO  2020-07-19 10:22:46,679 [shard 0] init - Shutting down storage service notifications was successful
INFO  2020-07-19 10:22:46,679 [shard 0] init - Shutting down repair service
INFO  2020-07-19 10:22:46,679 [shard 0] init - Shutting down repair service was successful
INFO  2020-07-19 10:22:46,679 [shard 0] init - Shutting down cdc
INFO  2020-07-19 10:22:46,679 [shard 0] init - Shutting down cdc was successful
INFO  2020-07-19 10:22:46,679 [shard 0] init - Shutting down sstables format selector
INFO  2020-07-19 10:22:46,679 [shard 0] init - Shutting down sstables format selector was successful
INFO  2020-07-19 10:22:46,679 [shard 0] init - Shutting down migration manager
INFO  2020-07-19 10:22:46,679 [shard 0] migration_manager - stopping migration service
INFO  2020-07-19 10:22:46,679 [shard 1] migration_manager - stopping migration service
INFO  2020-07-19 10:22:46,679 [shard 0] init - Shutting down migration manager was successful
INFO  2020-07-19 10:22:46,679 [shard 0] init - Shutting down database
INFO  2020-07-19 10:22:46,679 [shard 0] compaction_manager - Asked to stop
INFO  2020-07-19 10:22:46,679 [shard 0] compaction_manager - Stopping 0 ongoing compactions
INFO  2020-07-19 10:22:46,679 [shard 0] compaction_manager - Stopped
INFO  2020-07-19 10:22:46,679 [shard 1] compaction_manager - Asked to stop
INFO  2020-07-19 10:22:46,679 [shard 1] compaction_manager - Stopping 1 ongoing compactions
INFO  2020-07-19 10:22:46,680 [shard 1] compaction_manager - compaction info: Compaction for ks/cf was stopped due to: shutdown: stopping
INFO  2020-07-19 10:22:46,680 [shard 1] compaction_manager - Stopped
INFO  2020-07-19 10:22:46,705 [shard 0] init - Shutting down database: waiting for background jobs...
INFO  2020-07-19 10:22:46,705 [shard 0] init - Shutting down database was successful
INFO  2020-07-19 10:22:46,706 [shard 0] init - Shutting down API server
INFO  2020-07-19 10:22:46,706 [shard 0] init - Shutting down API server was successful
INFO  2020-07-19 10:22:46,706 [shard 0] init - Shutting down migration manager notifier
INFO  2020-07-19 10:22:46,706 [shard 0] init - Shutting down migration manager notifier was successful
INFO  2020-07-19 10:22:46,706 [shard 0] init - Shutting down prometheus API server
INFO  2020-07-19 10:22:46,706 [shard 0] init - Shutting down prometheus API server was successful
INFO  2020-07-19 10:22:46,706 [shard 0] init - Shutting down sighup
INFO  2020-07-19 10:22:46,706 [shard 0] init - Shutting down sighup was successful
ERROR 2020-07-19 10:22:46,706 [shard 0] init - Startup failed: std::runtime_error ({shard 0: fmt::v6::format_error (invalid type specifier), shard 1: fmt::v6::format_error (invalid type specifier)})
@asias
Copy link
Contributor

asias commented Jul 20, 2020

I will take a look.

@asias
Copy link
Contributor

asias commented Jul 20, 2020

I could not reproduce with the exact scylla version. I am building with the dbuild. The fmt version it uses: fmt-devel-6.2.1-1.fc32.x86_64.

I suspect it is libfmt related.

@asias
Copy link
Contributor

asias commented Jul 20, 2020

The line in question was:

(id is of type repair_uniq_id)

 383         }).handle_exception([this, id] (std::exception_ptr ep) {
 384             rlogger.warn("repair id {} failed: {}", id, ep);

The print function for the repair_uniq_id is:

std::ostream& operator<<(std::ostream& os, const repair_uniq_id& x) {
    return os << format("[id={}, uuid={}]", x.id, x.uuid);
}

@avikivity Do you know if there is any limitation on recursive format() usage?

@bhalevy
Copy link
Member Author

bhalevy commented Jul 20, 2020

@asias we don't use format in seastar logger. See logger::really_do_log

@bhalevy
Copy link
Member Author

bhalevy commented Jul 20, 2020

Also, the following log message indicates that repair_uniq_id is printed just fine, no?

WARN  2020-07-19 10:22:44,017 [shard 0] repair - repair id [id=4, uuid=0b1092a1-061f-4691-b0ac-547b281ef09d] on shard 0 failed: fmt::v6::format_error (invalid type specifier)

@asias
Copy link
Contributor

asias commented Jul 20, 2020

Also, the following log message indicates that repair_uniq_id is printed just fine, no?

WARN  2020-07-19 10:22:44,017 [shard 0] repair - repair id [id=4, uuid=0b1092a1-061f-4691-b0ac-547b281ef09d] on shard 0 failed: fmt::v6::format_error (invalid type specifier)

You are correct. The exception comes from somewhere else.

@asias
Copy link
Contributor

asias commented Jul 20, 2020

@bhalevy Do you have a recent "good" commit?

@asias
Copy link
Contributor

asias commented Jul 20, 2020

I think I know what is problem now:

throw std::runtime_error(format("repair id {:d} on shard {:d} failed to repair {:d} sub ranges", id, shard, nr_failed_ranges));

@bhalevy
Copy link
Member Author

bhalevy commented Jul 20, 2020

I think I know what is problem now:

throw std::runtime_error(format("repair id {:d} on shard {:d} failed to repair {:d} sub ranges", id, shard, nr_failed_ranges));

yes, I just saw it too...

@bhalevy
Copy link
Member Author

bhalevy commented Jul 20, 2020

in repair_info::check_failed_ranges right?

@bhalevy
Copy link
Member Author

bhalevy commented Jul 20, 2020

I'd leave the format specifiers out when not required....

@asias
Copy link
Contributor

asias commented Jul 20, 2020

Yes, I am working on a patch.

asias added a commit to asias/scylla that referenced this issue Jul 20, 2020
We recently saw a weird log message:

   WARN  2020-07-19 10:22:46,678 [shard 0] repair - repair id [id=4,
   uuid=0b1092a1-061f-4691-b0ac-547b281ef09d] failed: std::runtime_error
   ({shard 0: fmt::v6::format_error (invalid type specifier), shard 1:
   fmt::v6::format_error (invalid type specifier)})

It turned out we have:

   throw std::runtime_error(format("repair id {:d} on shard {:d} failed to
   repair {:d} sub ranges", id, shard, nr_failed_ranges));

in the code, but we changed the id from integer to repair_uniq_id class.

We do not really need to specify the format specifiers for numbers.

Fixes scylladb#6874
@asias
Copy link
Contributor

asias commented Jul 20, 2020

Patch sent: #6880

@avikivity
Copy link
Member

No vulnerable branches.

@bhalevy
Copy link
Member Author

bhalevy commented Jul 21, 2020

Should be backported along with #6786 if the latter is ever backported.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants