Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rp_storage_tool fails to decode manifest in ShadowIndexingManyPartitionsTest.test_many_partitions_shutdown #11250

Closed
dotnwat opened this issue Jun 7, 2023 · 5 comments
Labels
area/cloud-storage Shadow indexing subsystem area/storage kind/bug Something isn't working

Comments

@dotnwat
Copy link
Member

dotnwat commented Jun 7, 2023

https://buildkite.com/redpanda/redpanda/builds/30740#01889338-b7f1-46f5-9406-183a83b1c38e

Module: rptest.tests.e2e_shadow_indexing_test
Class:  ShadowIndexingManyPartitionsTest
Method: test_many_partitions_shutdown
[INFO  - 2023-06-07 00:33:08,859 - redpanda - _cloud_storage_diagnostics - lineno:2470]: Fetching manifest f0000000/meta/kafka/panda-topic/122_9/manifest.bin
[WARNING - 2023-06-07 00:33:08,862 - redpanda - _cloud_storage_diagnostics - lineno:2486]: Failed to decode f0000000/meta/kafka/panda-topic/122_9/manifest.bin
[INFO  - 2023-06-07 00:33:08,862 - redpanda - _cloud_storage_diagnostics - lineno:2470]: Fetching manifest f0000000/meta/kafka/panda-topic/33_9/manifest.bin
[WARNING - 2023-06-07 00:33:08,865 - redpanda - _cloud_storage_diagnostics - lineno:2486]: Failed to decode f0000000/meta/kafka/panda-topic/33_9/manifest.bin
[INFO  - 2023-06-07 00:33:08,865 - redpanda - _cloud_storage_diagnostics - lineno:2470]: Fetching manifest f0000000/meta/kafka/panda-topic/88_9/manifest.bin
[WARNING - 2023-06-07 00:33:08,870 - redpanda - _cloud_storage_diagnostics - lineno:2486]: Failed to decode f0000000/meta/kafka/panda-topic/88_9/manifest.bin

rp_storage_tool is failing to decode manifests at the end of the test

@VladLazar
Copy link
Contributor

The failure to decode is interesting, but it's not the cause of the test failure.
The test failed at:

[ERROR - 2023-06-07 00:32:52,898 - cluster - wrapped - lineno:90]: Test failed, doing failure checks on RedpandaService-0-139916909148720

, but the first decoding failure happens later during the tear-down:

[WARNING - 2023-06-07 00:33:08,416 - redpanda - _cloud_storage_diagnostics - lineno:2486]: Failed to decode 00000000/meta/kafka/panda-topic/10_9/manifest.bin

@VladLazar
Copy link
Contributor

Let's keep this ticket for the decoding failures and chase the timeout in #11268

@VladLazar VladLazar changed the title CI Failure (Failed to decode manifest / timeout) in ShadowIndexingManyPartitionsTest.test_many_partitions_shutdown rp_storage_tool fails to decode manifest in ShadowIndexingManyPartitionsTest.test_many_partitions_shutdown Jun 9, 2023
@VladLazar VladLazar changed the title rp_storage_tool fails to decode manifest in ShadowIndexingManyPartitionsTest.test_many_partitions_shutdown rp_storage_tool fails to decode manifest in ShadowIndexingManyPartitionsTest.test_many_partitions_shutdown Jun 9, 2023
@VladLazar
Copy link
Contributor

I tried to decode this manifest with rp_storage_tool from the tip of dev (933ad04) and it worked fine. Maybe the tests are using an older version?

@jcsp
Copy link
Contributor

jcsp commented Jun 27, 2023

If it decodes with tip of dev and we haven't seen the issue again, I'm inclined to think this is fixed.

@jcsp jcsp closed this as completed Jun 27, 2023
@rockwotj
Copy link
Contributor

While debugging another test I hit this locally. Is there anything useful in my local environment I can grab to debug?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cloud-storage Shadow indexing subsystem area/storage kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants