Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scylla-jmx.service: is still failed during artifact tests after #206 fix - java-select fail to parse java version output #212

Closed
temichus opened this issue Apr 25, 2023 · 16 comments
Assignees
Milestone

Comments

@temichus
Copy link

temichus commented Apr 25, 2023

scylla-jmx.service: is still failed during artifact tests after #206 fix, with the same error message

2023-04-25 05:36:57.472: (TestFrameworkEvent Severity.ERROR) period_type=one-time event_id=86afdf2c-607b-4467-8334-0613a0f2e28e, source=ArtifactsTest.SetUp()
exception=Encountered a bad command exit code!
Command: '/usr/bin/nodetool  status '
Exit code: 1
Stdout:
Stderr:
nodetool: Failed to connect to '127.0.0.1:7199' - ConnectException: 'Connection refused (Connection refused)'.

from events log:

2023-04-25 05:27:21.937: (InfoEvent Severity.NORMAL) period_type=not-set event_id=bf11c34e-c95e-4964-9390-a7e667e79799: message=TEST_START test_id=dc7ec884-8eef-427e-88e0-fc5d10d5798c
2023-04-25 05:32:14.021: (ScyllaYamlUpdateEvent Severity.NORMAL) period_type=one-time event_id=d48d849b-4390-44da-8d22-cdeda3b50bc7: message=ScyllaYaml has been changed on node: artifacts-rocky8-jenkins-db-node-dc7ec884-0-1. Diff: --- 
+++ 
@@ -1,28 +1,35 @@
+alternator_enforce_authorization: false
 api_address: 127.0.0.1
 api_doc_dir: /opt/scylladb/api/api-doc/
 api_port: 10000
 api_ui_dir: /opt/scylladb/swagger-ui/dist/
+auto_bootstrap: true
 batch_size_fail_threshold_in_kb: 1024
 batch_size_warn_threshold_in_kb: 128
 cas_contention_timeout_in_ms: 1000
+cluster_name: artifacts-rocky8-jenkins-db-cluster-dc7ec884
 commitlog_segment_size_in_mb: 32
 commitlog_sync: periodic
 commitlog_sync_period_in_ms: 10000
 commitlog_total_space_in_mb: -1
 consistent_cluster_management: true
+enable_ipv6_dns_lookup: false
 endpoint_snitch: org.apache.cassandra.locator.SimpleSnitch
+experimental: true
 force_schema_commit_log: true
-listen_address: localhost
+hinted_handoff_enabled: true
+listen_address: 10.142.0.121
 murmur3_partitioner_ignore_msb_bits: 12
 native_shard_aware_transport_port: 19042
 native_transport_port: 9042
 num_tokens: 256
 partitioner: org.apache.cassandra.dht.Murmur3Partitioner
+prometheus_address: 0.0.0.0
 read_request_timeout_in_ms: 5000
-rpc_address: localhost
+rpc_address: 10.142.0.121
 rpc_port: 9160
 seed_provider:
 - class_name: org.apache.cassandra.locator.SimpleSeedProvider
   parameters:
-  - seeds: 127.0.0.1
+  - seeds: 10.142.0.121
 write_request_timeout_in_ms: 2000
2023-04-25 05:34:51.797: (ScyllaServerStatusEvent Severity.NORMAL) period_type=begin event_id=2dd846e0-c1f9-4a4e-9076-03948c3a01cb node=artifacts-rocky8-jenkins-db-node-dc7ec884-0-1
2023-04-25 05:34:51.883 <2023-04-25 05:34:51.482>: (DatabaseLogEvent Severity.WARNING) period_type=one-time event_id=20614a9e-2c79-42ec-96d2-b1111fd00228: type=WARNING regex=(^WARNING|!\s*?WARNING).*\[shard.*\] line_number=31 node=artifacts-rocky8-jenkins-db-node-dc7ec884-0-1
2023-04-25T05:34:51.482 artifacts-rocky8-jenkins-db-node-dc7ec884-0-1 !WARNING | scylla[66824]:  [shard 0] seastar - Unable to set SCHED_FIFO scheduling policy for timer thread; latency impact possible. Try adding CAP_SYS_NICE
2023-04-25 05:34:52.447 <2023-04-25 05:34:52.099>: (DatabaseLogEvent Severity.WARNING) period_type=one-time event_id=20614a9e-2c79-42ec-96d2-b1111fd00228: type=WARNING regex=(^WARNING|!\s*?WARNING).*\[shard.*\] line_number=486 node=artifacts-rocky8-jenkins-db-node-dc7ec884-0-1
2023-04-25T05:34:52.099 artifacts-rocky8-jenkins-db-node-dc7ec884-0-1 !WARNING | scylla[66824]:  [shard 0] gossip - All nodes={} are down for get_endpoint_states verb. Skip ShadowRound.
2023-04-25 05:36:36.424: (ClusterHealthValidatorEvent Severity.WARNING) period_type=one-time event_id=c26613d5-1309-472d-bf91-1b8964822cd0: type=NodeStatus node=artifacts-rocky8-jenkins-db-node-dc7ec884-0-1 message=Unable to get nodetool status from `artifacts-rocky8-jenkins-db-node-dc7ec884-0-1': error=<UnexpectedExit: cmd='/usr/bin/nodetool  status ' exited=1>
2023-04-25 05:36:57.472: (TestFrameworkEvent Severity.ERROR) period_type=one-time event_id=86afdf2c-607b-4467-8334-0613a0f2e28e, source=ArtifactsTest.SetUp()
exception=Encountered a bad command exit code!

Command: '/usr/bin/nodetool  status '

Exit code: 1

Stdout:



Stderr:

nodetool: Failed to connect to '127.0.0.1:7199' - ConnectException: 'Connection refused (Connection refused)'.
2023-04-25 05:36:57.524: (InfoEvent Severity.NORMAL) period_type=not-set event_id=519ea97e-5011-444d-a87a-3ef645bceb5f: message=TEST_END

job urls:
https://jenkins.scylladb.com/job/scylla-master/job/artifacts-offline-install/job/artifacts-rocky8-test/198/
https://jenkins.scylladb.com/job/scylla-master/job/artifacts/job/artifacts-rocky8-test/213/
https://jenkins.scylladb.com/job/scylla-master/job/artifacts-offline-install/job/artifacts-oel81-test/187/
https://jenkins.scylladb.com/job/scylla-master/job/artifacts-offline-install/job/artifacts-oel76-test/186/

@temichus
Copy link
Author

cc @fruch

@mykaul
Copy link
Contributor

mykaul commented Apr 25, 2023

Did the JMX process crash perhaps?

@temichus
Copy link
Author

Did the JMX process crash perhaps?

probably

● scylla-jmx.service - Scylla JMX
   Loaded: loaded (/usr/lib/systemd/system/scylla-jmx.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Tue 2023-04-25 05:34:52 UTC; 2min 46s ago
 Main PID: 66834 (code=exited, status=1/FAILURE)

Apr 25 05:34:52 artifacts-rocky8-jenkins-db-node-dc7ec884-0-1 systemd[1]: Started Scylla JMX.
Apr 25 05:34:52 artifacts-rocky8-jenkins-db-node-dc7ec884-0-1 systemd[1]: scylla-jmx.service: Main process exited, code=exited, status=1/FAILURE
Apr 25 05:34:52 artifacts-rocky8-jenkins-db-node-dc7ec884-0-1 systemd[1]: scylla-jmx.service: Failed with result 'exit-code'.

@temichus
Copy link
Author

https://jenkins.scylladb.com/job/scylla-master/job/artifacts-offline-install/job/artifacts-rocky8-nonroot-test/182/ - has no logs, but i believe the same issue here:

RetryError[Wait for: jmx_up: timeout - 200 seconds - expired]

@DoronArazii
Copy link

@tchaikov can you please have a look

@tchaikov
Copy link
Contributor

sure. Will take a look early tomorrow.

@tchaikov
Copy link
Contributor

tchaikov commented Apr 28, 2023

quote from artifacts-rocky8-test/scylla-cluster-tests/unit_tests/test_data/system.log from one of the artifact tarball collected by jenkins: , where the build id was 0a6bcf20fedb57959f501fc3caba2c4e61eacbce:

[10.0.73.70] [stdout] Apr 02 11:24:16 notice | scylla[124]: scylla-server.service: control process exited, code=exited status=1
[10.0.73.70] [stdout] Apr 02 11:24:16 err    | scylla[124]: Failed to start Scylla Server.
[10.0.73.70] [stdout] Apr 02 11:24:16 warning| scylla[124]: Dependency failed for Scylla JMX.
[10.0.73.70] [stdout] Apr 02 11:24:16 notice | scylla[124]: Job scylla-jmx.service/start failed with result 'dependency'.
[10.0.73.70] [stdout] Apr 02 11:24:16 notice | scylla[124]: Unit scylla-server.service entered failed state.

the scylla-jmx service unit failed to start because it depends on "scylla-server.service", see

Requires=scylla-server.service
After=scylla-server.service

in the very same system.log, i have the last words from scylladb:

[10.0.73.70] [stdout] Apr 02 17:31:17 info   | scylla[124]: Segmentation fault on shard 11.
[10.0.73.70] [stdout] Apr 02 17:31:17 info   | scylla[124]: Backtrace:
[10.0.73.70] [stdout] Apr 02 17:31:17 info   | scylla[124]: 0x00000000006c5af2
[10.0.73.70] [stdout] Apr 02 17:31:17 info   | scylla[124]: 0x00000000005d41ac
[10.0.73.70] [stdout] Apr 02 17:31:17 info   | scylla[124]: 0x00000000005d4455
[10.0.73.70] [stdout] Apr 02 17:31:17 info   | scylla[124]: 0x00000000005d44a3
[10.0.73.70] [stdout] Apr 02 17:31:17 info   | scylla[124]: /lib64/libpthread.so.0+0x000000000000f5cf
[10.0.73.70] [stdout] Apr 02 17:31:17 info   | scylla[124]: 0x0000000001c98946
[10.0.73.70] [stdout] Apr 02 17:31:17 info   | scylla[124]: 0x0000000001cd0761
[10.0.73.70] [stdout] Apr 02 17:31:17 info   | scylla[124]: 0x00000000013d58d2
[10.0.73.70] [stdout] Apr 02 17:31:17 info   | scylla[124]: 0x00000000005d8a2b
[10.0.73.70] [stdout] Apr 02 17:31:17 info   | scylla[124]: 0x00000000005b318b
[10.0.73.70] [stdout] Apr 02 17:31:17 info   | scylla[124]: 0x00000000005b0314
[10.0.73.70] [stdout] Apr 02 17:31:17 info   | scylla[124]: 0x000000000068795e
[10.0.73.70] [stdout] Apr 02 17:31:17 info   | scylla[124]: 0x000000000068c1fa
[10.0.73.70] [stdout] Apr 02 17:31:17 info   | scylla[124]: 0x000000000077535d

which is:

[10.0.73.70] [stdout] Apr 02 17:31:17 info   | scylla[124]: Backtrace:[10.0.73.70] [stdout] Apr 02 17:31:17 info   | scylla[124]:
[Backtrace #0]
?? ??:0
?? ??:0
?? ??:0
?? ??:0
__pthread_cond_timedwait at :?
void seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator>::for_each_fragment<seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator>::skip(unsigned long)::{lambda(auto:1)#1}>(unsigned long, seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator>::skip(unsigned long)::{lambda(auto:1)#1}&&) at ././seastar/include/seastar/core/simple-stream.hh:390
 (inlined by) seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator>::skip(unsigned long) at ././seastar/include/seastar/core/simple-stream.hh:414
 (inlined by) operator()<seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator> > at ./build/release/gen/idl/mutation.dist.impl.hh:386
 (inlined by) decltype(auto) seastar::with_serialized_stream<seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator>, ser::serializer<ser::live_cell_view>::skip<seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator> >(seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator>&)::{lambda(auto:1&)#1}, void, void>(seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator>&, ser::serializer<ser::live_cell_view>::skip<seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator> >(seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator>&)::{lambda(auto:1&)#1}&&) at ././seastar/include/seastar/core/simple-stream.hh:638
 (inlined by) void ser::serializer<ser::live_cell_view>::skip<seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator> >(seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator>&) at ./build/release/gen/idl/mutation.dist.impl.hh:385
 (inlined by) operator()<seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator> > at ./build/release/gen/idl/mutation.dist.impl.hh:375
 (inlined by) decltype(auto) seastar::with_serialized_stream<seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator>, ser::serializer<ser::live_cell_view>::read<seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator> >(seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator>&)::{lambda(auto:1&)#1}, void, void>(seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator>&, ser::serializer<ser::live_cell_view>::read<seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator> >(seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator>&)::{lambda(auto:1&)#1}&&) at ././seastar/include/seastar/core/simple-stream.hh:638
 (inlined by) ser::live_cell_view ser::serializer<ser::live_cell_view>::read<seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator> >(seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator>&) at ./build/release/gen/idl/mutation.dist.impl.hh:372
 (inlined by) auto ser::deserialize<ser::live_cell_view, seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator> >(seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator>&, boost::type<ser::live_cell_view>) at ././serializer.hh:261
 (inlined by) operator()<const seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator> > at ./build/release/gen/idl/mutation.dist.impl.hh:702
 (inlined by) decltype(auto) seastar::memory_input_stream<bytes_ostream::fragment_iterator>::with_stream<ser::expiring_cell_view::c() const::{lambda(auto:1&)#1}>(ser::expiring_cell_view::c() const::{lambda(auto:1&)#1}&&) const at ././seastar/include/seastar/core/simple-stream.hh:486
 (inlined by) decltype(auto) seastar::with_serialized_stream<seastar::memory_input_stream<bytes_ostream::fragment_iterator> const, ser::expiring_cell_view::c() const::{lambda(auto:1&)#1}, void>(seastar::memory_input_stream<bytes_ostream::fragment_iterator> const&, ser::expiring_cell_view::c() const::{lambda(auto:1&)#1}&&) at ././seastar/include/seastar/core/simple-stream.hh:631
 (inlined by) ser::expiring_cell_view::c() const at ./build/release/gen/idl/mutation.dist.impl.hh:696
 (inlined by) operator() at ./mutation/mutation_partition_view.cc:52
 (inlined by) _ZN5boost6detail7variant14invoke_visitorIKZN12_GLOBAL__N_116read_atomic_cellERK13abstract_typeNS_7variantIN3ser14live_cell_viewEJNS8_18expiring_cell_viewENS8_14dead_cell_viewENS8_17counter_cell_viewENS8_20unknown_variant_typeEEEEN7seastar10bool_classIN11atomic_cell21collection_member_tagEEEE19atomic_cell_visitorLb0EE14internal_visitIRSA_EENS_12disable_if_cIXaaLb0Esr7is_sameIT_SQ_EE5valueESH_E4typeEOSQ_i at /usr/include/boost/variant/variant.hpp:1028
 (inlined by) boost::detail::variant::invoke_visitor<(anonymous namespace)::read_atomic_cell(abstract_type const&, boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>, seastar::bool_class<atomic_cell::collection_member_tag>)::atomic_cell_visitor const, false>::result_type boost::detail::variant::visitation_impl_invoke_impl<boost::detail::variant::invoke_visitor<(anonymous namespace)::read_atomic_cell(abstract_type const&, boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>, seastar::bool_class<atomic_cell::collection_member_tag>)::atomic_cell_visitor const, false>, void*, ser::expiring_cell_view>(int, boost::detail::variant::invoke_visitor<(anonymous namespace)::read_atomic_cell(abstract_type const&, boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>, seastar::bool_class<atomic_cell::collection_member_tag>)::atomic_cell_visitor const, false>&, void*, ser::expiring_cell_view*, mpl_::bool_<true>) at /usr/include/boost/variant/detail/visitation_impl.hpp:117
 (inlined by) boost::detail::variant::invoke_visitor<(anonymous namespace)::read_atomic_cell(abstract_type const&, boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>, seastar::bool_class<atomic_cell::collection_member_tag>)::atomic_cell_visitor const, false>::result_type boost::detail::variant::visitation_impl_invoke<boost::detail::variant::invoke_visitor<(anonymous namespace)::read_atomic_cell(abstract_type const&, boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>, seastar::bool_class<atomic_cell::collection_member_tag>)::atomic_cell_visitor const, false>, void*, ser::expiring_cell_view, boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>::has_fallback_type_>(int, boost::detail::variant::invoke_visitor<(anonymous namespace)::read_atomic_cell(abstract_type const&, boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>, seastar::bool_class<atomic_cell::collection_member_tag>)::atomic_cell_visitor const, false>&, void*, ser::expiring_cell_view*, boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>::has_fallback_type_, int) at /usr/include/boost/variant/detail/visitation_impl.hpp:157
 (inlined by) boost::detail::variant::invoke_visitor<(anonymous namespace)::read_atomic_cell(abstract_type const&, boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>, seastar::bool_class<atomic_cell::collection_member_tag>)::atomic_cell_visitor const, false>::result_type boost::detail::variant::visitation_impl<mpl_::int_<0>, boost::detail::variant::visitation_impl_step<boost::mpl::l_iter<boost::mpl::l_item<mpl_::long_<5l>, ser::live_cell_view, boost::mpl::l_item<mpl_::long_<4l>, ser::expiring_cell_view, boost::mpl::l_item<mpl_::long_<3l>, ser::dead_cell_view, boost::mpl::l_item<mpl_::long_<2l>, ser::counter_cell_view, boost::mpl::l_item<mpl_::long_<1l>, ser::unknown_variant_type, boost::mpl::l_end> > > > > >, boost::mpl::l_iter<boost::mpl::l_end> >, boost::detail::variant::invoke_visitor<(anonymous namespace)::read_atomic_cell(abstract_type const&, boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>, seastar::bool_class<atomic_cell::collection_member_tag>)::atomic_cell_visitor const, false>, void*, boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>::has_fallback_type_>(int, int, boost::detail::variant::invoke_visitor<(anonymous namespace)::read_atomic_cell(abstract_type const&, boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>, seastar::bool_class<atomic_cell::collection_member_tag>)::atomic_cell_visitor const, false>&, void*, mpl_::bool_<false>, boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>::has_fallback_type_, mpl_::int_<0>*, boost::detail::variant::visitation_impl_step<boost::mpl::l_iter<boost::mpl::l_item<mpl_::long_<5l>, ser::live_cell_view, boost::mpl::l_item<mpl_::long_<4l>, ser::expiring_cell_view, boost::mpl::l_item<mpl_::long_<3l>, ser::dead_cell_view, boost::mpl::l_item<mpl_::long_<2l>, ser::counter_cell_view, boost::mpl::l_item<mpl_::long_<1l>, ser::unknown_variant_type, boost::mpl::l_end> > > > > >, boost::mpl::l_iter<boost::mpl::l_end> >*) at /usr/include/boost/variant/detail/visitation_impl.hpp:238
 (inlined by) boost::detail::variant::invoke_visitor<(anonymous namespace)::read_atomic_cell(abstract_type const&, boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>, seastar::bool_class<atomic_cell::collection_member_tag>)::atomic_cell_visitor const, false>::result_type boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>::internal_apply_visitor_impl<boost::detail::variant::invoke_visitor<(anonymous namespace)::read_atomic_cell(abstract_type const&, boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>, seastar::bool_class<atomic_cell::collection_member_tag>)::atomic_cell_visitor const, false>, void*>(int, int, boost::detail::variant::invoke_visitor<(anonymous namespace)::read_atomic_cell(abstract_type const&, boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>, seastar::bool_class<atomic_cell::collection_member_tag>)::atomic_cell_visitor const, false>&, void*) at /usr/include/boost/variant/variant.hpp:2337
 (inlined by) boost::detail::variant::invoke_visitor<(anonymous namespace)::read_atomic_cell(abstract_type const&, boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>, seastar::bool_class<atomic_cell::collection_member_tag>)::atomic_cell_visitor const, false>::result_type boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>::internal_apply_visitor<boost::detail::variant::invoke_visitor<(anonymous namespace)::read_atomic_cell(abstract_type const&, boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>, seastar::bool_class<atomic_cell::collection_member_tag>)::atomic_cell_visitor const, false> >(boost::detail::variant::invoke_visitor<(anonymous namespace)::read_atomic_cell(abstract_type const&, boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>, seastar::bool_class<atomic_cell::collection_member_tag>)::atomic_cell_visitor const, false>&) at /usr/include/boost/variant/variant.hpp:2349
 (inlined by) (anonymous namespace)::read_atomic_cell(abstract_type const&, boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>, seastar::bool_class<atomic_cell::collection_member_tag>)::atomic_cell_visitor const::result_type boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>::apply_visitor<(anonymous namespace)::read_atomic_cell(abstract_type const&, boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>, seastar::bool_class<atomic_cell::collection_member_tag>)::atomic_cell_visitor const>((anonymous namespace)::read_atomic_cell(abstract_type const&, boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>, seastar::bool_class<atomic_cell::collection_member_tag>)::atomic_cell_visitor const&) & at /usr/include/boost/variant/variant.hpp:2393
void allocation_strategy::destroy<partition_version>(partition_version*) at ././utils/allocation_strategy.hh:168
 (inlined by) remove_or_mark_as_unique_owner(partition_version*, mutation_cleaner*) at ./mutation/partition_version.cc:25
 (inlined by) operator() at ./mutation/partition_version.cc:161
 (inlined by) decltype(auto) with_allocator<partition_snapshot::~partition_snapshot()::$_10>(allocation_strategy&, partition_snapshot::~partition_snapshot()::$_10&&) at ././utils/allocation_strategy.hh:313
 (inlined by) ~partition_snapshot at ./mutation/partition_version.cc:154
 (inlined by) seastar::internal::lw_shared_ptr_accessors_esft<partition_snapshot>::dispose(partition_snapshot*) at ././seastar/include/seastar/core/shared_ptr.hh:205
 (inlined by) seastar::internal::lw_shared_ptr_accessors_esft<partition_snapshot>::dispose(seastar::lw_shared_ptr_counter_base*) at ././seastar/include/seastar/core/shared_ptr.hh:202
 (inlined by) ~lw_shared_ptr at ././seastar/include/seastar/core/shared_ptr.hh:317
 (inlined by) ~partition_snapshot_ptr at ./mutation/partition_version.cc:675
seastar::internal::future_base::move_it(seastar::internal::future_base&&, seastar::future_state_base*) at ././seastar/include/seastar/core/future.hh:1090
 (inlined by) future_base at ././seastar/include/seastar/core/future.hh:1099
 (inlined by) future at ././seastar/include/seastar/core/future.hh:1305
 (inlined by) _Head_base at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/tuple:196
 (inlined by) _Tuple_impl at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/tuple:456
 (inlined by) _Tuple_impl at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/tuple:301
 (inlined by) tuple at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/tuple:1090
 (inlined by) ~when_all_state at ././seastar/include/seastar/core/when_all.hh:153
 (inlined by) ~when_all_state at ././seastar/include/seastar/core/when_all.hh:152
?? ??:0

@tchaikov
Copy link
Contributor

filed scylladb/scylladb#13700

@mykaul
Copy link
Contributor

mykaul commented May 11, 2023

@tchaikov - are these issues fallout from the upgrade of the Java version? We'll need to revert or fix the Java code, I'm afraid.

@tchaikov
Copy link
Contributor

tchaikov commented May 11, 2023

@tchaikov - are these issues fallout from the upgrade of the Java version? We'll need to revert or fix the Java code, I'm afraid.

hi @mykaul i don't know. as i don't have any proof that they are. please see the analysis at #212 (comment) . scylladb crashed before scylla-jmx exporter tries to start. so i think these two things are correlated, but i am afraid this does not imply causation.

@mykaul
Copy link
Contributor

mykaul commented May 14, 2023

@avikivity - who should look at the above crash?

@fruch
Copy link

fruch commented May 14, 2023

@tchaikov

I don't think this issue is related to any scylla crash, i'm not sure where you are getting this crash information from

seems like the code in /opt/scylladb/jmx/select-java isn't picking correctly the java on those setups

since the output of /usr/bin/java -version 2>&1 is:

Picked up JAVA_TOOL_OPTIONS: openjdk version "11.0.19" 2023-04-18 LTS OpenJDK Runtime Environment (Red_Hat-11.0.19.0.7-1.el9_1) (build 11.0.19+7-LTS) OpenJDK 64-Bit Server VM (Red_Hat-11.0.19.0.7-1.el9_1) (build 11.0.19+7-LTS, mixed mode, sharing)

and the logic in:

function select_java_others() {
    local javaver
    javaver=$(/usr/bin/java -version 2>&1|head -n1|cut -f 3 -d " ")

    if [[ "$javaver" =~ "^\"1.8.0" ]] || [[ "$javaver" =~ "^\"11.0." ]]; then
        echo /usr/bin/java
    fi
}

kind of kind of breaks, and no java is being selected:

May 14 08:07:44 artifacts-rocky9-jenkins-db-node-81ce9ada-0-1 scylla-jmx[69027]: +++ head -n1
May 14 08:07:44 artifacts-rocky9-jenkins-db-node-81ce9ada-0-1 scylla-jmx[69028]: +++ cut -f 3 -d ' '
May 14 08:07:44 artifacts-rocky9-jenkins-db-node-81ce9ada-0-1 scylla-jmx[69024]: ++ javaver=JAVA_TOOL_OPTIONS:
May 14 08:07:44 artifacts-rocky9-jenkins-db-node-81ce9ada-0-1 scylla-jmx[69024]: ++ [[ JAVA_TOOL_OPTIONS: =~ \^"1\.8\.0 ]]
May 14 08:07:44 artifacts-rocky9-jenkins-db-node-81ce9ada-0-1 scylla-jmx[69024]: ++ [[ JAVA_TOOL_OPTIONS: =~ \^"11\.0\. ]]
May 14 08:07:44 artifacts-rocky9-jenkins-db-node-81ce9ada-0-1 scylla-jmx[69001]: + java=
May 14 08:07:44 artifacts-rocky9-jenkins-db-node-81ce9ada-0-1 scylla-jmx[69001]: + '[' -z '' ']'
May 14 08:07:44 artifacts-rocky9-jenkins-db-node-81ce9ada-0-1 scylla-jmx[69001]: + exit 1
May 14 08:07:44 artifacts-rocky9-jenkins-db-node-81ce9ada-0-1 systemd[1]: scylla-jmx.service: Main process exited, code=exited, status=1/FAILURE
May 14 08:07:44 artifacts-rocky9-jenkins-db-node-81ce9ada-0-1 systemd[1]: scylla-jmx.service: Failed with result 'exit-code'.

and java select silently fail, i.e. with no description visible in the log, why it's failing.

@fruch
Copy link

fruch commented May 14, 2023

@temichus @mykaul, can one of you change the title to "java-select fail to parse java version output"

@fruch
Copy link

fruch commented May 14, 2023

@tchaikov
also we should consider checking ID_LIKE, cause I can confirm when adding rocky to java-select it worked as expected

see /etc/os-release from one only the failing jobs:

NAME="Rocky Linux"
VERSION="9.1 (Blue Onyx)"
ID="rocky"
ID_LIKE="rhel centos fedora"

you can see examples from all the distos we do support here:
https://github.com/scylladb/scylla-cluster-tests/blob/08f927dd885ce1fc5ad7b712138146d4434d1451/unit_tests/test_utils_distro.py#L21

@mykaul mykaul changed the title scylla-jmx.service: is still failed during artifact tests after #206 fix scylla-jmx.service: is still failed during artifact tests after #206 fix - java-select fail to parse java version output May 14, 2023
@DoronArazii DoronArazii modified the milestones: Backlog, 5.3 May 15, 2023
@tchaikov
Copy link
Contributor

tchaikov commented May 15, 2023

@fruch hi Israel, thank you very much for pointing out the issue of select-java and for the suggestion. i am creating a pull request to address all of the issues noted here.

but the segfault in scylla is still a mystery. i captured a snapshot of the test_data.zip, and noted down the steps to get it in scylladb/scylladb#13700 .

tchaikov added a commit to tchaikov/scylla-jmx that referenced this issue May 15, 2023
there are some distros whose $ID in /etc/os-release does not match
with any of the known distro id in this script, but they do have
`$ID_LIKE` which contains one or more ids which match with the known
distro id, as they are considered the derivatives of the matched
distros. for instance RHEL's ID_LIKE is fedora, and OEL (Oracle
Enterprise Linux)'s ID_LIKE is fedora, Rocky Linux's ID_LIKE is
"rhel centos fedora".

so, in order to find java runtime for these distros, we need to
take ID_LIKE into consideration. in this change, both $ID and $ID_LIKE
are considered, the first matched id wins.

Fixes scylladb#212
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
tchaikov added a commit to tchaikov/scylla-jmx that referenced this issue May 15, 2023
instead of parsing the output of `-version`, parse output of
`-XshowSettings:properties`, whose format is more predicable.
as on EL9, we have

> Picked up JAVA_TOOL_OPTIONS: openjdk version "11.0.19" 2023-04-18 LTS OpenJDK Runtime Environment (Red_Hat-11.0.19.0.7-1.el9_1) (build 11.0.19+7-LTS) OpenJDK 64-Bit Server VM (Red_Hat-11.0.19.0.7-1.el9_1) (build 11.0.19+7-LTS, mixed mode, sharing)

while on fedora 38, jre-1.8 prints

> openjdk version "1.8.0_362"

jre-11 prints

> openjdk version "11.0.19" 2023-04-18

before this change, we just use `"` as the field separator for awk,
this works, but the parsing algorithm is fragile. so let's use the
"java.specification.version" property instead, it is always printed
like

>     java.specification.version = 1.8

so, after this change, we split this line with " = ", and pick the
last field, i.e., the value of `java.specification.version`.
also, in this change, we are using the case clause for better readability.

Fixes scylladb#212
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
@avikivity
Copy link
Member

@avikivity - who should look at the above crash?

The decode is bogus, so can't triage.

tchaikov added a commit to tchaikov/scylla-jmx that referenced this issue May 16, 2023
instead of parsing the output of `-version`, parse output of
`-XshowSettings:properties`, whose format is more predicable.
as on EL9, we have

> Picked up JAVA_TOOL_OPTIONS: openjdk version "11.0.19" 2023-04-18 LTS OpenJDK Runtime Environment (Red_Hat-11.0.19.0.7-1.el9_1) (build 11.0.19+7-LTS) OpenJDK 64-Bit Server VM (Red_Hat-11.0.19.0.7-1.el9_1) (build 11.0.19+7-LTS, mixed mode, sharing)

while on fedora 38, jre-1.8 prints

> openjdk version "1.8.0_362"

jre-11 prints

> openjdk version "11.0.19" 2023-04-18

before this change, we just use `"` as the field separator for awk,
this works, but the parsing algorithm is fragile. so let's use the
"java.specification.version" property instead, it is always printed
like

>     java.specification.version = 1.8

so, after this change, we split this line with " = ", and pick the
last field, i.e., the value of `java.specification.version`.
also, in this change, we are using the case clause for better readability.

also, in this change, instead of dispatch the logic by the distro
id, we just check all JVMs located under /usr/lib/jvm, and pick the
first match. this simplifies the existing complicated implementation.

also, instead of silently returning status code of 1, print error messages
for better user experience.

Fixes scylladb#212
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
tchaikov added a commit to tchaikov/scylla-jmx that referenced this issue May 16, 2023
instead of parsing the output of `-version`, parse output of
`-XshowSettings:properties`, whose format is more predicable.
as on EL9, we have

> Picked up JAVA_TOOL_OPTIONS: openjdk version "11.0.19" 2023-04-18 LTS OpenJDK Runtime Environment (Red_Hat-11.0.19.0.7-1.el9_1) (build 11.0.19+7-LTS) OpenJDK 64-Bit Server VM (Red_Hat-11.0.19.0.7-1.el9_1) (build 11.0.19+7-LTS, mixed mode, sharing)

while on fedora 38, jre-1.8 prints

> openjdk version "1.8.0_362"

jre-11 prints

> openjdk version "11.0.19" 2023-04-18

before this change, we just use `"` as the field separator for awk,
this works, but the parsing algorithm is fragile. so let's use the
"java.specification.version" property instead, it is always printed
like

>     java.specification.version = 1.8

so, after this change, we split this line with " = ", and pick the
last field, i.e., the value of `java.specification.version`.
also, in this change, we are using the case clause for better readability.

also, in this change, instead of dispatch the logic by the distro
id, we just check all JVMs located under /usr/lib/jvm, and pick the
first match. this simplifies the existing complicated implementation.

also, instead of silently returning status code of 1, print error messages
for better user experience.

Fixes scylladb#212
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
tchaikov added a commit to tchaikov/scylla-jmx that referenced this issue May 16, 2023
instead of parsing the output of `-version`, parse output of
`-XshowSettings:properties`, whose format is more predicable.
as on EL9, we have

> Picked up JAVA_TOOL_OPTIONS: openjdk version "11.0.19" 2023-04-18 LTS OpenJDK Runtime Environment (Red_Hat-11.0.19.0.7-1.el9_1) (build 11.0.19+7-LTS) OpenJDK 64-Bit Server VM (Red_Hat-11.0.19.0.7-1.el9_1) (build 11.0.19+7-LTS, mixed mode, sharing)

while on fedora 38, jre-1.8 prints

> openjdk version "1.8.0_362"

jre-11 prints

> openjdk version "11.0.19" 2023-04-18

before this change, we just use `"` as the field separator for awk,
this works, but the parsing algorithm is fragile. so let's use the
"java.specification.version" property instead, it is always printed
like

>     java.specification.version = 1.8

so, after this change, we split this line with " = ", and pick the
last field, i.e., the value of `java.specification.version`.
also, in this change, we are using the case clause for better readability.

also, in this change, instead of dispatch the logic by the distro
id, we just check all JVMs located under /usr/lib/jvm, and pick the
first match. this simplifies the existing complicated implementation.

also, instead of silently returning status code of 1, print error messages
for better user experience.

Fixes scylladb#212
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
tchaikov added a commit to tchaikov/scylla-jmx that referenced this issue May 16, 2023
instead of parsing the output of `-version`, parse output of
`-XshowSettings:properties`, whose format is more predicable.
as on EL9, we have

> Picked up JAVA_TOOL_OPTIONS: openjdk version "11.0.19" 2023-04-18 LTS OpenJDK Runtime Environment (Red_Hat-11.0.19.0.7-1.el9_1) (build 11.0.19+7-LTS) OpenJDK 64-Bit Server VM (Red_Hat-11.0.19.0.7-1.el9_1) (build 11.0.19+7-LTS, mixed mode, sharing)

while on fedora 38, jre-1.8 prints

> openjdk version "1.8.0_362"

jre-11 prints

> openjdk version "11.0.19" 2023-04-18

before this change, we just use `"` as the field separator for awk,
this works, but the parsing algorithm is fragile. so let's use the
"java.specification.version" property instead, it is always printed
like

>     java.specification.version = 1.8

so, after this change, we split this line with " = ", and pick the
last field, i.e., the value of `java.specification.version`.
also, in this change, we are using the case clause for better readability.

also, in this change, instead of dispatch the logic by the distro
id, we just check all JVMs located under /usr/lib/jvm, and pick the
first match. this simplifies the existing complicated implementation.

also, instead of silently returning status code of 1, print error messages
for better user experience.

Fixes scylladb#212
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
@denesb denesb closed this as completed in 1fd23b6 May 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants