Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests: fix a flake in test_sharding_split_compaction #8136

Merged
merged 1 commit into from
Jun 24, 2024

Conversation

jcsp
Copy link
Collaborator

@jcsp jcsp commented Jun 23, 2024

Problem

This test could occasionally trigger a "removing local file ... because it has unexpected length log" when using the compact-shard-ancestors-persistent failpoint is in use, which is unexpected because that failpoint stops the process when the remote metadata is in sync with local files.

It was because there are two shards on the same pageserver, and while the one being compacted explicitly stops at the failpoint, another shard was compacting in the background and failing at an unclean point. The test intends to disable background compaction, but was mistakenly revoking the value of compaction_period when it updated pitr_interval.

Example failure:
https://neon-github-public-dev.s3.amazonaws.com/reports/pr-8123/9602976462/index.html#/testresult/7dd6165da7daef40

Summary of changes

  • Update TENANT_CONF in the test to use properly typed values, so that it is usable in pageserver APIs as well as via neon_local.
  • When updating tenant config with pitr_interval, retain the overrides from the start of the test, so that there won't be any background compaction going on during the test.

Checklist before requesting a review

  • I have performed a self-review of my code.
  • If it is a core feature, I have added thorough tests.
  • Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
  • If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.

Checklist before merging

  • Do not forget to reformat commit message to not include the above checklist

@jcsp jcsp added c/storage/pageserver Component: storage: pageserver a/test Area: related to testing a/tech_debt Area: related to tech debt labels Jun 23, 2024
Copy link

2910 tests run: 2793 passed, 0 failed, 117 skipped (full report)


Code coverage* (full report)

  • functions: 32.5% (6873 of 21175 functions)
  • lines: 49.9% (53437 of 107076 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
cb3d8b3 at 2024-06-23T20:01:09.688Z :recycle:

@jcsp jcsp marked this pull request as ready for review June 24, 2024 08:38
@jcsp jcsp requested a review from bayandin June 24, 2024 08:38
@jcsp jcsp merged commit 47fdf93 into main Jun 24, 2024
68 checks passed
@jcsp jcsp deleted the jcsp/compaction-split-flake branch June 24, 2024 13:54
conradludgate pushed a commit that referenced this pull request Jun 27, 2024
## Problem

This test could occasionally trigger a "removing local file ... because
it has unexpected length log" when using the
`compact-shard-ancestors-persistent` failpoint is in use, which is
unexpected because that failpoint stops the process when the remote
metadata is in sync with local files.

It was because there are two shards on the same pageserver, and while
the one being compacted explicitly stops at the failpoint, another shard
was compacting in the background and failing at an unclean point. The
test intends to disable background compaction, but was mistakenly
revoking the value of `compaction_period` when it updated
`pitr_interval`.

Example failure:

https://neon-github-public-dev.s3.amazonaws.com/reports/pr-8123/9602976462/index.html#/testresult/7dd6165da7daef40

## Summary of changes

- Update `TENANT_CONF` in the test to use properly typed values, so that
it is usable in pageserver APIs as well as via neon_local.
- When updating tenant config with `pitr_interval`, retain the overrides
from the start of the test, so that there won't be any background
compaction going on during the test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a/tech_debt Area: related to tech debt a/test Area: related to testing c/storage/pageserver Component: storage: pageserver
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants