Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI Failure (Adjacent segment & remote.write disabled) in CloudStorageScrubberTest.test_scrubber #14132

Closed
oleiman opened this issue Oct 12, 2023 · 11 comments
Assignees
Labels
area/cloud-storage Shadow indexing subsystem ci-failure kind/bug Something isn't working sev/low Bugs which are non-functional paper cuts, e.g. typos, issues in log messages

Comments

@oleiman
Copy link
Member

oleiman commented Oct 12, 2023

https://buildkite.com/redpanda/redpanda/builds/38818#018b24ac-3d01-4202-9f54-cffe8404ace6

Module: rptest.tests.cloud_storage_scrubber_test
Class:  CloudStorageScrubberTest
Method: test_scrubber
Arguments:
{
  "cloud_storage_type": 1
}
test_id:    rptest.tests.cloud_storage_scrubber_test.CloudStorageScrubberTest.test_scrubber.cloud_storage_type=CloudStorageType.S3
status:     FAIL
run time:   1 minute 58.939 seconds


    <BadLogLines nodes=docker-rp-23(1) example="ERROR 2023-10-12 16:32:48,364 [shard 0:au  ] archival - [fiber4 kafka/topic-wusbosvutb/2] - adjacent_segment_merger.cc:180 - Adjacent segment merging refusing to run on topic with remote.write disabled">
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 184, in _do_run
    data = self.run_test()
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 269, in run_test
    return self.test_context.function(self.test)
  File "/usr/local/lib/python3.10/dist-packages/ducktape/mark/_mark.py", line 481, in wrapper
    return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File "/root/tests/rptest/services/cluster.py", line 142, in wrapped
    redpanda.raise_on_bad_logs(
  File "/root/tests/rptest/services/redpanda.py", line 1249, in raise_on_bad_logs
    raise BadLogLines(bad_lines)
rptest.services.utils.BadLogLines: <BadLogLines nodes=docker-rp-23(1) example="ERROR 2023-10-12 16:32:48,364 [shard 0:au  ] archival - [fiber4 kafka/topic-wusbosvutb/2] - adjacent_segment_merger.cc:180 - Adjacent segment merging refusing to run on topic with remote.write disabled">
@oleiman oleiman added kind/bug Something isn't working ci-failure area/cloud-storage Shadow indexing subsystem labels Oct 12, 2023
@abhijat abhijat self-assigned this Oct 13, 2023
@abhijat abhijat added the sev/low Bugs which are non-functional paper cuts, e.g. typos, issues in log messages label Oct 13, 2023
@andijcr
Copy link
Contributor

andijcr commented Oct 13, 2023

the code for this log line:

    if (!_archiver.ntp_config().is_archival_enabled()) {
        // This should never happen because we should not have been constructed
        // for a read replica topic: this is a double-check for safety.
        vlog(
          _ctxlog.error,
          "Adjacent segment merging refusing to run on topic with remote.write "
          "disabled");
        co_return result;
    }

the log around the ERR

DEBUG 2023-10-12 16:32:48,363 [shard 0:main] cluster - controller_backend.cc:906 - [{kafka/topic-wusbosvutb/1}] (retry 0) finished operation: {type: update_properties, revision: 27, assignment: { id: 1, group_id: 2, replicas: {{node_id: 2, shard: 1}, {node_id: 1, shard: 1}, {node_id: 3, shard: 1}} }, previous assignment: {nullopt}}
DEBUG 2023-10-12 16:32:48,363 [shard 1:main] archival - upload_housekeeping_service.cc:237 - deregistered job, current backlog 3, num completed jobs 0, num running jobs 0
INFO  2023-10-12 16:32:48,363 [shard 1:main] archival - scrubber.cc:153 - Stopping scrubber (0)...
TRACE 2023-10-12 16:32:48,363 [shard 0:main] cluster - controller_backend.cc:970 - [{kafka/topic-wusbosvutb/2}] (retry 0) executing operation: {type: update_properties, ntp: {kafka/topic-wusbosvutb/2}, offset: 27, new_assignment: { id: 2, group_id: 3, replicas: {{node_id: 3, shard: 0}, {node_id: 1, shard: 0}, {node_id: 2, shard: 0}} }, previous_replica_set: {nullopt}, replica_revisions: {nullopt}}}
TRACE 2023-10-12 16:32:48,364 [shard 1:main] archival - [fiber4 kafka/topic-wusbosvutb/0] - ntp_archiver_service.cc:326 - upload loop shutting down
TRACE 2023-10-12 16:32:48,364 [shard 0:main] cluster - controller_backend.cc:1516 - [{kafka/topic-wusbosvutb/2}] updating configuration with properties: {compression: {nullopt}, cleanup_policy_bitflags: {delete}, compaction_strategy: {nullopt}, retention_bytes: {}, retention_duration_ms: {}, segment_size: {nullopt}, timestamp_type: {nullopt}, recovery_enabled: {nullopt}, shadow_indexing: {fetch}, read_replica: {nullopt}, read_replica_bucket: {nullopt} remote_topic_properties: {nullopt}, batch_max_bytes: {nullopt}, retention_local_target_bytes: {}, retention_local_target_ms: {}, remote_delete: true, segment_ms: {}, record_key_schema_id_validation: {nullopt}, record_key_schema_id_validation_compat: {nullopt}, record_key_subject_name_strategy:  {nullopt}, record_key_subject_name_strategy_compat: {nullopt}, record_value_schema_id_validation: {nullopt},  record_value_schema_id_validation_compat: {nullopt}, record_value_subject_name_strategy: {nullopt}, record_value_subject_name_strategy_compat: {nullopt}}
TRACE 2023-10-12 16:32:48,364 [shard 1:main] cluster - controller_backend.cc:970 - [{kafka/topic-wusbosvutb/1}] (retry 0) executing operation: {type: update_properties, ntp: {kafka/topic-wusbosvutb/1}, offset: 27, new_assignment: { id: 1, group_id: 2, replicas: {{node_id: 2, shard: 1}, {node_id: 1, shard: 1}, {node_id: 3, shard: 1}} }, previous_replica_set: {nullopt}, replica_revisions: {nullopt}}}
DEBUG 2023-10-12 16:32:48,364 [shard 0:au  ] archival - upload_housekeeping_service.cc:354 - Running job adjacent_segment_merger:{kafka/topic-wusbosvutb/2} with quota 12
ERROR 2023-10-12 16:32:48,364 [shard 0:au  ] archival - [fiber4 kafka/topic-wusbosvutb/2] - adjacent_segment_merger.cc:180 - Adjacent segment merging refusing to run on topic with remote.write disabled
TRACE 2023-10-12 16:32:48,364 [shard 1:main] cluster - controller_backend.cc:1516 - [{kafka/topic-wusbosvutb/1}] updating configuration with properties: {compression: {nullopt}, cleanup_policy_bitflags: {delete}, compaction_strategy: {nullopt}, retention_bytes: {}, retention_duration_ms: {}, segment_size: {nullopt}, timestamp_type: {nullopt}, recovery_enabled: {nullopt}, shadow_indexing: {fetch}, read_replica: {nullopt}, read_replica_bucket: {nullopt} remote_topic_properties: {nullopt}, batch_max_bytes: {nullopt}, retention_local_target_bytes: {}, retention_local_target_ms: {}, remote_delete: true, segment_ms: {}, record_key_schema_id_validation: {nullopt}, record_key_schema_id_validation_compat: {nullopt}, record_key_subject_name_strategy:  {nullopt}, record_key_subject_name_strategy_compat: {nullopt}, record_value_schema_id_validation: {nullopt},  record_value_schema_id_validation_compat: {nullopt}, record_value_subject_name_strategy: {nullopt}, record_value_subject_name_strategy_compat: {nullopt}}
DEBUG 2023-10-12 16:32:48,365 [shard 0:main] cluster - partition.cc:855 - update_configuration[{compaction_strategy: {nullopt}, cleanup_policy_bitflags: {delete}, segment_size: {nullopt}, retention_bytes: {}, retention_time_ms: {}, recovery_enabled: true, retention_local_target_bytes: {}, retention_local_target_ms: {}, remote_delete: {true}, segment_ms: {}}]: updating archiver for config {kafka/topic-wusbosvutb/2}
DEBUG 2023-10-12 16:32:48,365 [shard 1:main] cluster - partition.cc:855 - update_configuration[{compaction_strategy: {nullopt}, cleanup_policy_bitflags: {delete}, segment_size: {nullopt}, retention_bytes: {}, retention_time_ms: {}, recovery_enabled: true, retention_local_target_bytes: {}, retention_local_target_ms: {}, remote_delete: {true}, segment_ms: {}}]: updating archiver for config {kafka/topic-wusbosvutb/1}
DEBUG 2023-10-12 16:32:48,365 [shard 1:main] archival - [fiber1] - upload_housekeeping_service.cc:190 - Deregistering job: adjacent_segment_merger:{kafka/topic-wusbosvutb/1}
DEBUG 2023-10-12 16:32:48,365 [shard 1:main] archival - upload_housekeeping_service.cc:225 - removing pending job
DEBUG 2023-10-12 16:32:48,365 [shard 0:main] archival - [fiber1] - upload_housekeeping_service.cc:190 - Deregistering job: adjacent_segment_merger:{kafka/topic-wusbosvutb/2}

it points to a race while de-registering the adjacent_segment_merger

@abhijat
Copy link
Contributor

abhijat commented Oct 13, 2023

it points to a race while de-registering the adjacent_segment_merger

Good point, I assumed it was just the test needing adjustment and created #14143, but perhaps the housekeeping workflow should check the config before triggering the job.

@rystsov rystsov changed the title CI Failure (BadLogLines...archival - [fiber4 kafka/topic-wusbosvutb/2] - adjacent_segment_merger.cc:180 - Adjacent segment merging refusing to run on topic with remote.write disabled") in CloudStorageScrubberTest.test_scrubber CI Failure (Adjacent segment merging refusing to run on topic with remote.write disabled) in CloudStorageScrubberTest.test_scrubber Oct 13, 2023
@rystsov rystsov changed the title CI Failure (Adjacent segment merging refusing to run on topic with remote.write disabled) in CloudStorageScrubberTest.test_scrubber CI Failure (Adjacent segment & remote.write disabled) in CloudStorageScrubberTest.test_scrubber Oct 13, 2023
@michael-redpanda
Copy link
Contributor

@vbotbuildovich
Copy link
Collaborator

@vbotbuildovich
Copy link
Collaborator

@vbotbuildovich
Copy link
Collaborator

@vbotbuildovich
Copy link
Collaborator

@abhijat
Copy link
Contributor

abhijat commented Feb 5, 2024

All three recently posted failures are for a different error: Can't apply spillover command, eg

====================================================================================================
test_id:    rptest.tests.cloud_storage_scrubber_test.CloudStorageScrubberTest.test_scrubber.cloud_storage_type=CloudStorageType.S3
status:     FAIL
run time:   2 minutes 16.656 seconds


    <BadLogLines nodes=docker-rp-7(90),docker-rp-1(90),docker-rp-2(90) example="ERROR 2024-01-12 20:24:28,432 [shard 1:main] cluster - ntp: {kafka/topic-audvtrnike/0} - archival_metadata_stm.cc:1457 - Can't apply spillover_cmd: {is_compacted: false, size_bytes: 11536094, base_offset: 2212, committed_offset: 2926, base_timestamp: {timestamp: 1705090973753}, max_timestamp: {timestamp: 1705090979955}, delta_offset: 42, ntp_revision: 24, archiver_term: 1, segment_term: 1, delta_offset_end: 57, sname_format: {v3}, metadata_size_hint: 3232}">
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 184, in _do_run
    data = self.run_test()
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 269, in run_test
    return self.test_context.function(self.test)
  File "/usr/local/lib/python3.10/dist-packages/ducktape/mark/_mark.py", line 481, in wrapper
    return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File "/root/tests/rptest/services/cluster.py", line 142, in wrapped
    redpanda.raise_on_bad_logs(
  File "/root/tests/rptest/services/redpanda.py", line 1327, in raise_on_bad_logs
    raise BadLogLines(bad_lines)
rptest.services.utils.BadLogLines: <BadLogLines nodes=docker-rp-7(90),docker-rp-1(90),docker-rp-2(90) example="ERROR 2024-01-12 20:24:28,432 [shard 1:main] cluster - ntp: {kafka/topic-audvtrnike/0} - archival_metadata_stm.cc:1457 - Can't apply spillover_cmd: {is_compacted: false, size_bytes: 11536094, base_offset: 2212, committed_offset: 2926, base_timestamp: {timestamp: 1705090973753}, max_timestamp: {timestamp: 1705090979955}, delta_offset: 42, ntp_revision: 24, archiver_term: 1, segment_term: 1, delta_offset_end: 57, sname_format: {v3}, metadata_size_hint: 3232}">

https://ci-artifacts.dev.vectorized.cloud/redpanda/43732/018cff34-f55a-4472-b55e-121cc13c205c/vbuild/ducktape/results/final/CloudStorageScrubberTest/test_scrubber/cloud_storage_type=CloudStorageType.S3/100/report.txt

@abhijat
Copy link
Contributor

abhijat commented Feb 5, 2024

Created #16474 for the spillover cmd related issue (which seems to be the same as #16149)

Closing this ticket as this does not seem to have reappeared in a while.

@abhijat abhijat closed this as completed Feb 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cloud-storage Shadow indexing subsystem ci-failure kind/bug Something isn't working sev/low Bugs which are non-functional paper cuts, e.g. typos, issues in log messages
Projects
None yet
Development

No branches or pull requests

5 participants